git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/5] scalar: implement the subcommand "diagnose"
@ 2022-01-26  8:41 Johannes Schindelin via GitGitGadget
  2022-01-26  8:41 ` [PATCH 1/5] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
                   ` (6 more replies)
  0 siblings, 7 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-01-26  8:41 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin

Over the course of the years, we developed a sub-command that gathers
diagnostic data into a .zip file that can then be attached to bug reports.
This sub-command turned out to be very useful in helping Scalar developers
identify and fix issues.

Johannes Schindelin (3):
  Implement `scalar diagnose`
  scalar diagnose: include disk space information
  scalar diagnose: show a spinner while staging content

Matthew John Cheetham (2):
  scalar: teach `diagnose` to gather packfile info
  scalar: teach `diagnose` to gather loose objects information

 contrib/scalar/scalar.c          | 336 +++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt        |  12 ++
 contrib/scalar/t/t9099-scalar.sh |  17 ++
 3 files changed, 365 insertions(+)


base-commit: ddc35d833dd6f9e8946b09cecd3311b8aa18d295
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1128%2Fdscho%2Fscalar-diagnose-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1128/dscho/scalar-diagnose-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1128
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 1/5] Implement `scalar diagnose`
  2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
@ 2022-01-26  8:41 ` Johannes Schindelin via GitGitGadget
  2022-01-26  9:34   ` René Scharfe
  2022-01-27 19:38   ` Elijah Newren
  2022-01-26  8:41 ` [PATCH 2/5] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-01-26  8:41 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Over the course of Scalar's development, it became obvious that there is
a need for a command that can gather all kinds of useful information
that can help identify the most typical problems with large
worktrees/repositories.

The `diagnose` command is the culmination of this hard-won knowledge: it
gathers the installed hooks, the config, a couple statistics describing
the data shape, among other pieces of information, and then wraps
everything up in a tidy, neat `.zip` archive.

Note: originally, Scalar was implemented in C# using the .NET API, where
we had the luxury of a comprehensive standard library that includes
basic functionality such as writing a `.zip` file. In the C version, we
lack such a commodity. Rather than introducing a dependency on, say,
libzip, we slightly abuse Git's `archive` command: Instead of writing
the `.zip` file directly, we stage the file contents in a Git index of a
temporary, bare repository, only to let `git archive` have at it, and
finally removing the temporary repository.

Also note: Due to the frequently-spawned `git hash-object` processes,
this command is quite a bit slow on Windows. Should it turn out to be a
big problem, the lack of a batch mode of the `hash-object` command could
potentially be worked around via using `git fast-import` with a crafted
`stdin`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 170 +++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt        |  12 +++
 contrib/scalar/t/t9099-scalar.sh |  13 +++
 3 files changed, 195 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 1ce9c2b00e8..13f2b0f4d5a 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -259,6 +259,108 @@ static int unregister_dir(void)
 	return res;
 }
 
+static int stage(const char *git_dir, struct strbuf *buf, const char *path)
+{
+	struct strbuf cacheinfo = STRBUF_INIT;
+	struct child_process cp = CHILD_PROCESS_INIT;
+	int res;
+
+	strbuf_addstr(&cacheinfo, "100644,");
+
+	cp.git_cmd = 1;
+	strvec_pushl(&cp.args, "--git-dir", git_dir,
+		     "hash-object", "-w", "--stdin", NULL);
+	res = pipe_command(&cp, buf->buf, buf->len, &cacheinfo, 256, NULL, 0);
+	if (!res) {
+		strbuf_rtrim(&cacheinfo);
+		strbuf_addch(&cacheinfo, ',');
+		/* We cannot stage `.git`, use `_git` instead. */
+		if (starts_with(path, ".git/"))
+			strbuf_addf(&cacheinfo, "_%s", path + 1);
+		else
+			strbuf_addstr(&cacheinfo, path);
+
+		child_process_init(&cp);
+		cp.git_cmd = 1;
+		strvec_pushl(&cp.args, "--git-dir", git_dir,
+			     "update-index", "--add", "--cacheinfo",
+			     cacheinfo.buf, NULL);
+		res = run_command(&cp);
+	}
+
+	strbuf_release(&cacheinfo);
+	return res;
+}
+
+static int stage_file(const char *git_dir, const char *path)
+{
+	struct strbuf buf = STRBUF_INIT;
+	int res;
+
+	if (strbuf_read_file(&buf, path, 0) < 0)
+		return error(_("could not read '%s'"), path);
+
+	res = stage(git_dir, &buf, path);
+
+	strbuf_release(&buf);
+	return res;
+}
+
+static int stage_directory(const char *git_dir, const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir = opendir(at_root ? "." : path);
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	if (!dir)
+		return error(_("could not open directory '%s'"), path);
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if ((e->d_type == DT_REG && stage_file(git_dir, buf.buf)) ||
+		    (e->d_type == DT_DIR && recurse &&
+		     stage_directory(git_dir, buf.buf, recurse)))
+			res = -1;
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
+static int index_to_zip(const char *git_dir)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	struct strbuf oid = STRBUF_INIT;
+
+	cp.git_cmd = 1;
+	strvec_pushl(&cp.args, "--git-dir", git_dir, "write-tree", NULL);
+	if (pipe_command(&cp, NULL, 0, &oid, the_hash_algo->hexsz + 1,
+			 NULL, 0))
+		return error(_("could not write temporary tree object"));
+
+	strbuf_rtrim(&oid);
+	child_process_init(&cp);
+	cp.git_cmd = 1;
+	strvec_pushl(&cp.args, "--git-dir", git_dir, "archive", "-o", NULL);
+	strvec_pushf(&cp.args, "%s.zip", git_dir);
+	strvec_pushl(&cp.args, oid.buf, "--", NULL);
+	strbuf_release(&oid);
+	return run_command(&cp);
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -499,6 +601,73 @@ cleanup:
 	return res;
 }
 
+static int cmd_diagnose(int argc, const char **argv)
+{
+	struct option options[] = {
+		OPT_END(),
+	};
+	const char * const usage[] = {
+		N_("scalar diagnose [<enlistment>]"),
+		NULL
+	};
+	struct strbuf tmp_dir = STRBUF_INIT;
+	time_t now = time(NULL);
+	struct tm tm;
+	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
+	int res = 0;
+
+	argc = parse_options(argc, argv, NULL, options,
+			     usage, 0);
+
+	setup_enlistment_directory(argc, argv, usage, options, &buf);
+
+	strbuf_addstr(&buf, "/.scalarDiagnostics/scalar_");
+	strbuf_addftime(&buf, "%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
+	if (run_git("init", "-q", "-b", "dummy", "--bare", buf.buf, NULL)) {
+		res = error(_("could not initialize temporary repository: %s"),
+			    buf.buf);
+		goto diagnose_cleanup;
+	}
+	strbuf_realpath(&tmp_dir, buf.buf, 1);
+
+	strbuf_reset(&buf);
+	strbuf_addf(&buf, "Collecting diagnostic info into temp folder %s\n\n",
+		    tmp_dir.buf);
+
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	fwrite(buf.buf, buf.len, 1, stdout);
+
+	if ((res = stage(tmp_dir.buf, &buf, "diagnostics.log")))
+		goto diagnose_cleanup;
+
+	if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
+	    (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
+	    (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
+	    (res = stage_directory(tmp_dir.buf, ".git/logs", 1)) ||
+	    (res = stage_directory(tmp_dir.buf, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	res = index_to_zip(tmp_dir.buf);
+
+	if (!res)
+		res = remove_dir_recursively(&tmp_dir, 0);
+
+	if (!res)
+		printf("\n"
+		       "Diagnostics complete.\n"
+		       "All of the gathered info is captured in '%s.zip'\n",
+		       tmp_dir.buf);
+
+diagnose_cleanup:
+	strbuf_release(&tmp_dir);
+	strbuf_release(&path);
+	strbuf_release(&buf);
+
+	return res;
+}
+
 static int cmd_list(int argc, const char **argv)
 {
 	if (argc != 1)
@@ -800,6 +969,7 @@ static struct {
 	{ "reconfigure", cmd_reconfigure },
 	{ "delete", cmd_delete },
 	{ "version", cmd_version },
+	{ "diagnose", cmd_diagnose },
 	{ NULL, NULL},
 };
 
diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
index f416d637289..22583fe046e 100644
--- a/contrib/scalar/scalar.txt
+++ b/contrib/scalar/scalar.txt
@@ -14,6 +14,7 @@ scalar register [<enlistment>]
 scalar unregister [<enlistment>]
 scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
 scalar reconfigure [ --all | <enlistment> ]
+scalar diagnose [<enlistment>]
 scalar delete <enlistment>
 
 DESCRIPTION
@@ -129,6 +130,17 @@ reconfigure the enlistment.
 With the `--all` option, all enlistments currently registered with Scalar
 will be reconfigured. Use this option after each Scalar upgrade.
 
+Diagnose
+~~~~~~~~
+
+diagnose [<enlistment>]::
+    When reporting issues with Scalar, it is often helpful to provide the
+    information gathered by this command, including logs and certain
+    statistics describing the data shape of the current enlistment.
++
+The output of this command is a `.zip` file that is written into
+a directory adjacent to the worktree in the `src` directory.
+
 Delete
 ~~~~~~
 
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 2e1502ad45e..ecd06e207c2 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -65,6 +65,19 @@ test_expect_success 'scalar clone' '
 	)
 '
 
+SQ="'"
+test_expect_success UNZIP 'scalar diagnose' '
+	scalar diagnose cloned >out &&
+	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
+	zip_path=$(cat zip_path) &&
+	test -n "$zip_path" &&
+	unzip -v "$zip_path" &&
+	folder=${zip_path%.zip} &&
+	test_path_is_missing "$folder" &&
+	unzip -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out
+'
+
 test_expect_success 'scalar reconfigure' '
 	git init one/src &&
 	scalar register one &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 2/5] scalar diagnose: include disk space information
  2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
  2022-01-26  8:41 ` [PATCH 1/5] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-01-26  8:41 ` Johannes Schindelin via GitGitGadget
  2022-01-26  8:41 ` [PATCH 3/5] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-01-26  8:41 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When analyzing problems with large worktrees/repositories, it is useful
to know how close to a "full disk" situation Scalar/Git operates. Let's
include this information.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c | 53 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 13f2b0f4d5a..e26fb2fc018 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -361,6 +361,58 @@ static int index_to_zip(const char *git_dir)
 	return run_command(&cp);
 }
 
+#ifndef WIN32
+#include <sys/statvfs.h>
+#endif
+
+static int get_disk_info(struct strbuf *out)
+{
+#ifdef WIN32
+	struct strbuf buf = STRBUF_INIT;
+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
+	DWORD serial_number, component_length, flags;
+	ULARGE_INTEGER avail2caller, total, avail;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
+		error(_("could not determine free disk size for '%s'"),
+		      buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
+				   &serial_number, &component_length, &flags,
+				   fs_name, sizeof(fs_name))) {
+		error(_("could not get info for '%s'"), buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
+	strbuf_addch(out, '\n');
+	strbuf_release(&buf);
+#else
+	struct strbuf buf = STRBUF_INIT;
+	struct statvfs stat;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (statvfs(buf.buf, &stat) < 0) {
+		error_errno(_("could not determine free disk size for '%s'"),
+			    buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
+	strbuf_release(&buf);
+#endif
+	return 0;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -637,6 +689,7 @@ static int cmd_diagnose(int argc, const char **argv)
 	get_version_info(&buf, 1);
 
 	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	get_disk_info(&buf);
 	fwrite(buf.buf, buf.len, 1, stdout);
 
 	if ((res = stage(tmp_dir.buf, &buf, "diagnostics.log")))
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 3/5] scalar: teach `diagnose` to gather packfile info
  2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
  2022-01-26  8:41 ` [PATCH 1/5] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
  2022-01-26  8:41 ` [PATCH 2/5] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
@ 2022-01-26  8:41 ` Matthew John Cheetham via GitGitGadget
  2022-01-26 22:43   ` Taylor Blau
  2022-01-26  8:41 ` [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 140+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-01-26  8:41 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

Teach the `scalar diagnose` command to gather file size information
about pack files.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
---
 contrib/scalar/scalar.c          | 39 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  2 ++
 2 files changed, 41 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index e26fb2fc018..690933ffdf3 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -653,6 +653,39 @@ cleanup:
 	return res;
 }
 
+static void dir_file_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	struct stat e_stat;
+	struct strbuf file_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Contents of ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&file_path, path);
+	strbuf_addch(&file_path, '/');
+	base_path_len = file_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG) {
+			strbuf_setlen(&file_path, base_path_len);
+			strbuf_addstr(&file_path, e->d_name);
+			if (!stat(file_path.buf, &e_stat))
+				strbuf_addf(buf, "%-70s %16"PRIuMAX"\n",
+					    e->d_name,
+					    (uintmax_t)e_stat.st_size);
+		}
+
+	strbuf_release(&file_path);
+	closedir(dir);
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -695,6 +728,12 @@ static int cmd_diagnose(int argc, const char **argv)
 	if ((res = stage(tmp_dir.buf, &buf, "diagnostics.log")))
 		goto diagnose_cleanup;
 
+	strbuf_reset(&buf);
+	dir_file_stats(&buf, ".git/objects/pack");
+
+	if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt")))
+		goto diagnose_cleanup;
+
 	if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
 	    (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
 	    (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index ecd06e207c2..b1745851e31 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -75,6 +75,8 @@ test_expect_success UNZIP 'scalar diagnose' '
 	folder=${zip_path%.zip} &&
 	test_path_is_missing "$folder" &&
 	unzip -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out &&
+	unzip -p "$zip_path" packs-local.txt >out &&
 	test_file_not_empty out
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information
  2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-01-26  8:41 ` [PATCH 3/5] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
@ 2022-01-26  8:41 ` Matthew John Cheetham via GitGitGadget
  2022-01-26 22:50   ` Taylor Blau
  2022-01-27 18:59   ` Elijah Newren
  2022-01-26  8:41 ` [PATCH 5/5] scalar diagnose: show a spinner while staging content Johannes Schindelin via GitGitGadget
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 140+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-01-26  8:41 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

When operating at the scale that Scalar wants to support, certain data
shapes are more likely to cause undesirable performance issues, such as
large numbers or large sizes of loose objects.

By including statistics about this, `scalar diagnose` now makes it
easier to identify such scenarios.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
---
 contrib/scalar/scalar.c          | 60 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  2 ++
 2 files changed, 62 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 690933ffdf3..c0ad4948215 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -686,6 +686,60 @@ static void dir_file_stats(struct strbuf *buf, const char *path)
 	closedir(dir);
 }
 
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -734,6 +788,12 @@ static int cmd_diagnose(int argc, const char **argv)
 	if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt")))
 		goto diagnose_cleanup;
 
+	strbuf_reset(&buf);
+	loose_objs_stats(&buf, ".git/objects");
+
+	if ((res = stage(tmp_dir.buf, &buf, "objects-local.txt")))
+		goto diagnose_cleanup;
+
 	if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
 	    (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
 	    (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index b1745851e31..f2ec156d819 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -77,6 +77,8 @@ test_expect_success UNZIP 'scalar diagnose' '
 	unzip -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
 	unzip -p "$zip_path" packs-local.txt >out &&
+	test_file_not_empty out &&
+	unzip -p "$zip_path" objects-local.txt >out &&
 	test_file_not_empty out
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 5/5] scalar diagnose: show a spinner while staging content
  2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-01-26  8:41 ` [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
@ 2022-01-26  8:41 ` Johannes Schindelin via GitGitGadget
  2022-01-27 15:19 ` [PATCH 0/5] scalar: implement the subcommand "diagnose" Derrick Stolee
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
  6 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-01-26  8:41 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

It can take a while to gather all the information that `scalar diagnose`
wants to accumulate. Typically this happens when the user is in need of
quick solutions and therefore their patience is tested already. By
showing a little spinner that spins around, we hope to help the user
muster just a tiny bit more patience until `scalar diagnose` is done.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index c0ad4948215..224329f38f5 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -259,12 +259,26 @@ static int unregister_dir(void)
 	return res;
 }
 
+static void spinner(void)
+{
+	static const char whee[] = "|\010/\010-\010\\\010", *next = whee;
+
+	if (!next)
+		return;
+	if (write(2, next, 2) < 0)
+		next = NULL;
+	else
+		next = next[2] ? next + 2 : whee;
+}
+
 static int stage(const char *git_dir, struct strbuf *buf, const char *path)
 {
 	struct strbuf cacheinfo = STRBUF_INIT;
 	struct child_process cp = CHILD_PROCESS_INIT;
 	int res;
 
+	spinner();
+
 	strbuf_addstr(&cacheinfo, "100644,");
 
 	cp.git_cmd = 1;
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 1/5] Implement `scalar diagnose`
  2022-01-26  8:41 ` [PATCH 1/5] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-01-26  9:34   ` René Scharfe
  2022-01-26 22:20     ` Taylor Blau
  2022-01-27 19:38   ` Elijah Newren
  1 sibling, 1 reply; 140+ messages in thread
From: René Scharfe @ 2022-01-26  9:34 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git; +Cc: Johannes Schindelin

Am 26.01.22 um 09:41 schrieb Johannes Schindelin via GitGitGadget:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> Over the course of Scalar's development, it became obvious that there is
> a need for a command that can gather all kinds of useful information
> that can help identify the most typical problems with large
> worktrees/repositories.
>
> The `diagnose` command is the culmination of this hard-won knowledge: it
> gathers the installed hooks, the config, a couple statistics describing
> the data shape, among other pieces of information, and then wraps
> everything up in a tidy, neat `.zip` archive.
>
> Note: originally, Scalar was implemented in C# using the .NET API, where
> we had the luxury of a comprehensive standard library that includes
> basic functionality such as writing a `.zip` file. In the C version, we
> lack such a commodity. Rather than introducing a dependency on, say,
> libzip, we slightly abuse Git's `archive` command: Instead of writing
> the `.zip` file directly, we stage the file contents in a Git index of a
> temporary, bare repository, only to let `git archive` have at it, and
> finally removing the temporary repository.

git archive allows you to include untracked files in an archive with its
option --add-file.  You can see an example in Git's Makefile; search for
GIT_ARCHIVE_EXTRA_FILES.  It still requires a tree argument, but the
empty tree object should suffice if you don't want to include any
tracked files.  It doesn't currently support streaming, though, i.e.
files are fully read into memory, so it's impractical for huge ones.

> Also note: Due to the frequently-spawned `git hash-object` processes,
> this command is quite a bit slow on Windows. Should it turn out to be a
> big problem, the lack of a batch mode of the `hash-object` command could
> potentially be worked around via using `git fast-import` with a crafted
> `stdin`.

Or we could add streaming support to git archive --add-file..

>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  contrib/scalar/scalar.c          | 170 +++++++++++++++++++++++++++++++
>  contrib/scalar/scalar.txt        |  12 +++
>  contrib/scalar/t/t9099-scalar.sh |  13 +++
>  3 files changed, 195 insertions(+)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index 1ce9c2b00e8..13f2b0f4d5a 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -259,6 +259,108 @@ static int unregister_dir(void)
>  	return res;
>  }
>
> +static int stage(const char *git_dir, struct strbuf *buf, const char *path)
> +{
> +	struct strbuf cacheinfo = STRBUF_INIT;
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	int res;
> +
> +	strbuf_addstr(&cacheinfo, "100644,");
> +
> +	cp.git_cmd = 1;
> +	strvec_pushl(&cp.args, "--git-dir", git_dir,
> +		     "hash-object", "-w", "--stdin", NULL);
> +	res = pipe_command(&cp, buf->buf, buf->len, &cacheinfo, 256, NULL, 0);
> +	if (!res) {
> +		strbuf_rtrim(&cacheinfo);
> +		strbuf_addch(&cacheinfo, ',');
> +		/* We cannot stage `.git`, use `_git` instead. */
> +		if (starts_with(path, ".git/"))
> +			strbuf_addf(&cacheinfo, "_%s", path + 1);
> +		else
> +			strbuf_addstr(&cacheinfo, path);
> +
> +		child_process_init(&cp);
> +		cp.git_cmd = 1;
> +		strvec_pushl(&cp.args, "--git-dir", git_dir,
> +			     "update-index", "--add", "--cacheinfo",
> +			     cacheinfo.buf, NULL);
> +		res = run_command(&cp);
> +	}
> +
> +	strbuf_release(&cacheinfo);
> +	return res;
> +}
> +
> +static int stage_file(const char *git_dir, const char *path)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +	int res;
> +
> +	if (strbuf_read_file(&buf, path, 0) < 0)
> +		return error(_("could not read '%s'"), path);
> +
> +	res = stage(git_dir, &buf, path);
> +
> +	strbuf_release(&buf);
> +	return res;
> +}
> +
> +static int stage_directory(const char *git_dir, const char *path, int recurse)
> +{
> +	int at_root = !*path;
> +	DIR *dir = opendir(at_root ? "." : path);
> +	struct dirent *e;
> +	struct strbuf buf = STRBUF_INIT;
> +	size_t len;
> +	int res = 0;
> +
> +	if (!dir)
> +		return error(_("could not open directory '%s'"), path);
> +
> +	if (!at_root)
> +		strbuf_addf(&buf, "%s/", path);
> +	len = buf.len;
> +
> +	while (!res && (e = readdir(dir))) {
> +		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
> +			continue;
> +
> +		strbuf_setlen(&buf, len);
> +		strbuf_addstr(&buf, e->d_name);
> +
> +		if ((e->d_type == DT_REG && stage_file(git_dir, buf.buf)) ||
> +		    (e->d_type == DT_DIR && recurse &&
> +		     stage_directory(git_dir, buf.buf, recurse)))
> +			res = -1;
> +	}
> +
> +	closedir(dir);
> +	strbuf_release(&buf);
> +	return res;
> +}
> +
> +static int index_to_zip(const char *git_dir)
> +{
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	struct strbuf oid = STRBUF_INIT;
> +
> +	cp.git_cmd = 1;
> +	strvec_pushl(&cp.args, "--git-dir", git_dir, "write-tree", NULL);
> +	if (pipe_command(&cp, NULL, 0, &oid, the_hash_algo->hexsz + 1,
> +			 NULL, 0))
> +		return error(_("could not write temporary tree object"));
> +
> +	strbuf_rtrim(&oid);
> +	child_process_init(&cp);
> +	cp.git_cmd = 1;
> +	strvec_pushl(&cp.args, "--git-dir", git_dir, "archive", "-o", NULL);
> +	strvec_pushf(&cp.args, "%s.zip", git_dir);
> +	strvec_pushl(&cp.args, oid.buf, "--", NULL);
> +	strbuf_release(&oid);
> +	return run_command(&cp);
> +}
> +
>  /* printf-style interface, expects `<key>=<value>` argument */
>  static int set_config(const char *fmt, ...)
>  {
> @@ -499,6 +601,73 @@ cleanup:
>  	return res;
>  }
>
> +static int cmd_diagnose(int argc, const char **argv)
> +{
> +	struct option options[] = {
> +		OPT_END(),
> +	};
> +	const char * const usage[] = {
> +		N_("scalar diagnose [<enlistment>]"),
> +		NULL
> +	};
> +	struct strbuf tmp_dir = STRBUF_INIT;
> +	time_t now = time(NULL);
> +	struct tm tm;
> +	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
> +	int res = 0;
> +
> +	argc = parse_options(argc, argv, NULL, options,
> +			     usage, 0);
> +
> +	setup_enlistment_directory(argc, argv, usage, options, &buf);
> +
> +	strbuf_addstr(&buf, "/.scalarDiagnostics/scalar_");
> +	strbuf_addftime(&buf, "%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
> +	if (run_git("init", "-q", "-b", "dummy", "--bare", buf.buf, NULL)) {
> +		res = error(_("could not initialize temporary repository: %s"),
> +			    buf.buf);
> +		goto diagnose_cleanup;
> +	}
> +	strbuf_realpath(&tmp_dir, buf.buf, 1);
> +
> +	strbuf_reset(&buf);
> +	strbuf_addf(&buf, "Collecting diagnostic info into temp folder %s\n\n",
> +		    tmp_dir.buf);
> +
> +	get_version_info(&buf, 1);
> +
> +	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
> +	fwrite(buf.buf, buf.len, 1, stdout);
> +
> +	if ((res = stage(tmp_dir.buf, &buf, "diagnostics.log")))
> +		goto diagnose_cleanup;
> +
> +	if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
> +	    (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
> +	    (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
> +	    (res = stage_directory(tmp_dir.buf, ".git/logs", 1)) ||
> +	    (res = stage_directory(tmp_dir.buf, ".git/objects/info", 0)))
> +		goto diagnose_cleanup;
> +
> +	res = index_to_zip(tmp_dir.buf);
> +
> +	if (!res)
> +		res = remove_dir_recursively(&tmp_dir, 0);
> +
> +	if (!res)
> +		printf("\n"
> +		       "Diagnostics complete.\n"
> +		       "All of the gathered info is captured in '%s.zip'\n",
> +		       tmp_dir.buf);
> +
> +diagnose_cleanup:
> +	strbuf_release(&tmp_dir);
> +	strbuf_release(&path);
> +	strbuf_release(&buf);
> +
> +	return res;
> +}
> +
>  static int cmd_list(int argc, const char **argv)
>  {
>  	if (argc != 1)
> @@ -800,6 +969,7 @@ static struct {
>  	{ "reconfigure", cmd_reconfigure },
>  	{ "delete", cmd_delete },
>  	{ "version", cmd_version },
> +	{ "diagnose", cmd_diagnose },
>  	{ NULL, NULL},
>  };
>
> diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
> index f416d637289..22583fe046e 100644
> --- a/contrib/scalar/scalar.txt
> +++ b/contrib/scalar/scalar.txt
> @@ -14,6 +14,7 @@ scalar register [<enlistment>]
>  scalar unregister [<enlistment>]
>  scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
>  scalar reconfigure [ --all | <enlistment> ]
> +scalar diagnose [<enlistment>]
>  scalar delete <enlistment>
>
>  DESCRIPTION
> @@ -129,6 +130,17 @@ reconfigure the enlistment.
>  With the `--all` option, all enlistments currently registered with Scalar
>  will be reconfigured. Use this option after each Scalar upgrade.
>
> +Diagnose
> +~~~~~~~~
> +
> +diagnose [<enlistment>]::
> +    When reporting issues with Scalar, it is often helpful to provide the
> +    information gathered by this command, including logs and certain
> +    statistics describing the data shape of the current enlistment.
> ++
> +The output of this command is a `.zip` file that is written into
> +a directory adjacent to the worktree in the `src` directory.
> +
>  Delete
>  ~~~~~~
>
> diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> index 2e1502ad45e..ecd06e207c2 100755
> --- a/contrib/scalar/t/t9099-scalar.sh
> +++ b/contrib/scalar/t/t9099-scalar.sh
> @@ -65,6 +65,19 @@ test_expect_success 'scalar clone' '
>  	)
>  '
>
> +SQ="'"
> +test_expect_success UNZIP 'scalar diagnose' '
> +	scalar diagnose cloned >out &&
> +	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
> +	zip_path=$(cat zip_path) &&
> +	test -n "$zip_path" &&
> +	unzip -v "$zip_path" &&
> +	folder=${zip_path%.zip} &&
> +	test_path_is_missing "$folder" &&
> +	unzip -p "$zip_path" diagnostics.log >out &&
> +	test_file_not_empty out
> +'
> +
>  test_expect_success 'scalar reconfigure' '
>  	git init one/src &&
>  	scalar register one &&


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 1/5] Implement `scalar diagnose`
  2022-01-26  9:34   ` René Scharfe
@ 2022-01-26 22:20     ` Taylor Blau
  2022-02-06 21:34       ` Johannes Schindelin
  0 siblings, 1 reply; 140+ messages in thread
From: Taylor Blau @ 2022-01-26 22:20 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

On Wed, Jan 26, 2022 at 10:34:04AM +0100, René Scharfe wrote:
> Am 26.01.22 um 09:41 schrieb Johannes Schindelin via GitGitGadget:
> > Note: originally, Scalar was implemented in C# using the .NET API, where
> > we had the luxury of a comprehensive standard library that includes
> > basic functionality such as writing a `.zip` file. In the C version, we
> > lack such a commodity. Rather than introducing a dependency on, say,
> > libzip, we slightly abuse Git's `archive` command: Instead of writing
> > the `.zip` file directly, we stage the file contents in a Git index of a
> > temporary, bare repository, only to let `git archive` have at it, and
> > finally removing the temporary repository.
>
> git archive allows you to include untracked files in an archive with its
> option --add-file.  You can see an example in Git's Makefile; search for
> GIT_ARCHIVE_EXTRA_FILES.  It still requires a tree argument, but the
> empty tree object should suffice if you don't want to include any
> tracked files.  It doesn't currently support streaming, though, i.e.
> files are fully read into memory, so it's impractical for huge ones.

Using `--add-file` would likely be preferable to setting up a temporary
repository just to invoke `git archive` in it. Johannes would be the
expert to ask whether or not big files are going to be a problem here
(based on a cursory scan of the new functions in scalar.c, I don't
expect this to be the case).

The new stage_directory() function _could_ add `--add-file` arguments in
a loop around readdir(), but it might also be nice to add a new
`--add-directory` function to `git archive` which would do the "heavy"
lifting for us.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 3/5] scalar: teach `diagnose` to gather packfile info
  2022-01-26  8:41 ` [PATCH 3/5] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
@ 2022-01-26 22:43   ` Taylor Blau
  2022-01-27 15:14     ` Derrick Stolee
  0 siblings, 1 reply; 140+ messages in thread
From: Taylor Blau @ 2022-01-26 22:43 UTC (permalink / raw)
  To: Matthew John Cheetham via GitGitGadget
  Cc: git, Johannes Schindelin, Matthew John Cheetham

On Wed, Jan 26, 2022 at 08:41:45AM +0000, Matthew John Cheetham via GitGitGadget wrote:
> From: Matthew John Cheetham <mjcheetham@outlook.com>
>
> Teach the `scalar diagnose` command to gather file size information
> about pack files.
>
> Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
> ---
>  contrib/scalar/scalar.c          | 39 ++++++++++++++++++++++++++++++++
>  contrib/scalar/t/t9099-scalar.sh |  2 ++
>  2 files changed, 41 insertions(+)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index e26fb2fc018..690933ffdf3 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -653,6 +653,39 @@ cleanup:
>  	return res;
>  }
>
> +static void dir_file_stats(struct strbuf *buf, const char *path)
> +{
> +	DIR *dir = opendir(path);
> +	struct dirent *e;
> +	struct stat e_stat;
> +	struct strbuf file_path = STRBUF_INIT;
> +	size_t base_path_len;
> +
> +	if (!dir)
> +		return;
> +
> +	strbuf_addstr(buf, "Contents of ");
> +	strbuf_add_absolute_path(buf, path);
> +	strbuf_addstr(buf, ":\n");
> +
> +	strbuf_add_absolute_path(&file_path, path);
> +	strbuf_addch(&file_path, '/');
> +	base_path_len = file_path.len;
> +
> +	while ((e = readdir(dir)) != NULL)

Hmm. Is there a reason that this couldn't use
for_each_file_in_pack_dir() with a callback that just does the stat()
and buffer manipulation?

I don't think it's critical either way, but it would eliminate some of
the boilerplate that is shared between this implementation and the one
that already exists in for_each_file_in_pack_dir().

> +		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG) {
> +			strbuf_setlen(&file_path, base_path_len);
> +			strbuf_addstr(&file_path, e->d_name);

For what it's worth, I think the callback would start here:

> +			if (!stat(file_path.buf, &e_stat))
> +				strbuf_addf(buf, "%-70s %16"PRIuMAX"\n",
> +					    e->d_name,
> +					    (uintmax_t)e_stat.st_size);

...and end here.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information
  2022-01-26  8:41 ` [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
@ 2022-01-26 22:50   ` Taylor Blau
  2022-01-27 15:17     ` Derrick Stolee
  2022-01-27 18:59   ` Elijah Newren
  1 sibling, 1 reply; 140+ messages in thread
From: Taylor Blau @ 2022-01-26 22:50 UTC (permalink / raw)
  To: Matthew John Cheetham via GitGitGadget
  Cc: git, Johannes Schindelin, Matthew John Cheetham

On Wed, Jan 26, 2022 at 08:41:46AM +0000, Matthew John Cheetham via GitGitGadget wrote:
> +	while ((e = readdir(dir)) != NULL)
> +		if (!is_dot_or_dotdot(e->d_name) &&
> +		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
> +		    !hex_to_bytes(&c, e->d_name, 1)) {

What is this call to hex_to_bytes() for? I assume it's checking to make
sure the directory we're looking at is one of the shards of loose
objects.

Similar to my suggestion on the previous patch, I think that we could
get rid of this function entirely and replace it with a call to
for_each_loose_file_in_objdir().

We'll pay a little bit of extra cost to parse out each loose object's
OID, but it should be negligible since we're not actually opening up
each object.

> diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> index b1745851e31..f2ec156d819 100755
> --- a/contrib/scalar/t/t9099-scalar.sh
> +++ b/contrib/scalar/t/t9099-scalar.sh
> @@ -77,6 +77,8 @@ test_expect_success UNZIP 'scalar diagnose' '
>  	unzip -p "$zip_path" diagnostics.log >out &&
>  	test_file_not_empty out &&
>  	unzip -p "$zip_path" packs-local.txt >out &&
> +	test_file_not_empty out &&

A more comprehensive test (here, and in the earlier instances, too)
might be useful beyond just "does this file exist in the archive".

Constructing an example repository where the number of loose objects is
known ahead of time, and then finding that number in the output of
objects-local.txt might be worthwhile to give us some extra confidence
that this is working as intended.

> +	unzip -p "$zip_path" objects-local.txt >out &&
>  	test_file_not_empty out
>  '

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 3/5] scalar: teach `diagnose` to gather packfile info
  2022-01-26 22:43   ` Taylor Blau
@ 2022-01-27 15:14     ` Derrick Stolee
  2022-02-06 21:38       ` Johannes Schindelin
  0 siblings, 1 reply; 140+ messages in thread
From: Derrick Stolee @ 2022-01-27 15:14 UTC (permalink / raw)
  To: Taylor Blau, Matthew John Cheetham via GitGitGadget
  Cc: git, Johannes Schindelin, Matthew John Cheetham

On 1/26/2022 5:43 PM, Taylor Blau wrote:
> On Wed, Jan 26, 2022 at 08:41:45AM +0000, Matthew John Cheetham via GitGitGadget wrote:
>> From: Matthew John Cheetham <mjcheetham@outlook.com>
>>
>> Teach the `scalar diagnose` command to gather file size information
>> about pack files.
>>
>> Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
>> ---
>>  contrib/scalar/scalar.c          | 39 ++++++++++++++++++++++++++++++++
>>  contrib/scalar/t/t9099-scalar.sh |  2 ++
>>  2 files changed, 41 insertions(+)
>>
>> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
>> index e26fb2fc018..690933ffdf3 100644
>> --- a/contrib/scalar/scalar.c
>> +++ b/contrib/scalar/scalar.c
>> @@ -653,6 +653,39 @@ cleanup:
>>  	return res;
>>  }
>>
>> +static void dir_file_stats(struct strbuf *buf, const char *path)
>> +{
>> +	DIR *dir = opendir(path);
>> +	struct dirent *e;
>> +	struct stat e_stat;
>> +	struct strbuf file_path = STRBUF_INIT;
>> +	size_t base_path_len;
>> +
>> +	if (!dir)
>> +		return;
>> +
>> +	strbuf_addstr(buf, "Contents of ");
>> +	strbuf_add_absolute_path(buf, path);
>> +	strbuf_addstr(buf, ":\n");
>> +
>> +	strbuf_add_absolute_path(&file_path, path);
>> +	strbuf_addch(&file_path, '/');
>> +	base_path_len = file_path.len;
>> +
>> +	while ((e = readdir(dir)) != NULL)
> 
> Hmm. Is there a reason that this couldn't use
> for_each_file_in_pack_dir() with a callback that just does the stat()
> and buffer manipulation?
> 
> I don't think it's critical either way, but it would eliminate some of
> the boilerplate that is shared between this implementation and the one
> that already exists in for_each_file_in_pack_dir().

It's helpful to see if there are other crud files in the pack
directory. This method is also extended in microsoft/git to
scan the alternates directory (which we expect to exist as the
"shared objects cache).

We might want to modify the implementation in this series to
run dir_file_stats() on each odb in the_repository. This would
give us the data for the shared object cache for free while
being more general to other Git repos. (It would require us to
do some reaction work in microsoft/git and be a change of
behavior, but we are the only ones who have looked at these
diagnose files before, so that change will be easy to manage.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information
  2022-01-26 22:50   ` Taylor Blau
@ 2022-01-27 15:17     ` Derrick Stolee
  0 siblings, 0 replies; 140+ messages in thread
From: Derrick Stolee @ 2022-01-27 15:17 UTC (permalink / raw)
  To: Taylor Blau, Matthew John Cheetham via GitGitGadget
  Cc: git, Johannes Schindelin, Matthew John Cheetham

On 1/26/2022 5:50 PM, Taylor Blau wrote:
> On Wed, Jan 26, 2022 at 08:41:46AM +0000, Matthew John Cheetham via GitGitGadget wrote:
>> +	while ((e = readdir(dir)) != NULL)
>> +		if (!is_dot_or_dotdot(e->d_name) &&
>> +		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
>> +		    !hex_to_bytes(&c, e->d_name, 1)) {
> 
> What is this call to hex_to_bytes() for? I assume it's checking to make
> sure the directory we're looking at is one of the shards of loose
> objects.
> 
> Similar to my suggestion on the previous patch, I think that we could
> get rid of this function entirely and replace it with a call to
> for_each_loose_file_in_objdir().

There is a possibility that there are files other than loose objects
in these directories, so summarizing those counts might be helpful
information. For example: if somehow .git/objects/00/ was full of a
bunch of non-objects, it would still slow down Git commands that ask
for a short-sha starting with "00".

While this shouldn't be a normal case, the 'diagnose' command is
built to help us find these extremely odd scenarios because they
_have_ happened before (typically because of a VFS for Git bug
taught us how to look for these situations).

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 0/5] scalar: implement the subcommand "diagnose"
  2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                   ` (4 preceding siblings ...)
  2022-01-26  8:41 ` [PATCH 5/5] scalar diagnose: show a spinner while staging content Johannes Schindelin via GitGitGadget
@ 2022-01-27 15:19 ` Derrick Stolee
  2022-02-06 21:13   ` Johannes Schindelin
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
  6 siblings, 1 reply; 140+ messages in thread
From: Derrick Stolee @ 2022-01-27 15:19 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git
  Cc: Johannes Schindelin, Emily Shaffer

On 1/26/2022 3:41 AM, Johannes Schindelin via GitGitGadget wrote:
> Over the course of the years, we developed a sub-command that gathers
> diagnostic data into a .zip file that can then be attached to bug reports.
> This sub-command turned out to be very useful in helping Scalar developers
> identify and fix issues.

For historical context: The 'diagnose' command was implemented in VFS for
Git and ported to the C# version of Scalar before 'git bugreport' existed,
but they serve very similar purposes.

I wonder if 'scalar diagnose' could include some of the information
captured by 'git bugreport' or whether this implementation of 'diagnose'
could help inform 'git bugreport' in any way.

CC'ing Emily for thoughts.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information
  2022-01-26  8:41 ` [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
  2022-01-26 22:50   ` Taylor Blau
@ 2022-01-27 18:59   ` Elijah Newren
  2022-02-06 21:25     ` Johannes Schindelin
  1 sibling, 1 reply; 140+ messages in thread
From: Elijah Newren @ 2022-01-27 18:59 UTC (permalink / raw)
  To: Matthew John Cheetham via GitGitGadget
  Cc: Git Mailing List, Johannes Schindelin, Matthew John Cheetham

On Wed, Jan 26, 2022 at 3:37 PM Matthew John Cheetham via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Matthew John Cheetham <mjcheetham@outlook.com>
>
> When operating at the scale that Scalar wants to support, certain data
> shapes are more likely to cause undesirable performance issues, such as
> large numbers or large sizes of loose objects.

Makes sense.

> By including statistics about this, `scalar diagnose` now makes it
> easier to identify such scenarios.
>
> Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
> ---
>  contrib/scalar/scalar.c          | 60 ++++++++++++++++++++++++++++++++
>  contrib/scalar/t/t9099-scalar.sh |  2 ++
>  2 files changed, 62 insertions(+)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index 690933ffdf3..c0ad4948215 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -686,6 +686,60 @@ static void dir_file_stats(struct strbuf *buf, const char *path)
>         closedir(dir);
>  }
>
> +static int count_files(char *path)
> +{
> +       DIR *dir = opendir(path);
> +       struct dirent *e;
> +       int count = 0;
> +
> +       if (!dir)
> +               return 0;
> +
> +       while ((e = readdir(dir)) != NULL)
> +               if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
> +                       count++;
> +
> +       closedir(dir);
> +       return count;
> +}
> +
> +static void loose_objs_stats(struct strbuf *buf, const char *path)
> +{
> +       DIR *dir = opendir(path);
> +       struct dirent *e;
> +       int count;
> +       int total = 0;
> +       unsigned char c;
> +       struct strbuf count_path = STRBUF_INIT;
> +       size_t base_path_len;
> +
> +       if (!dir)
> +               return;
> +
> +       strbuf_addstr(buf, "Object directory stats for ");
> +       strbuf_add_absolute_path(buf, path);
> +       strbuf_addstr(buf, ":\n");
> +
> +       strbuf_add_absolute_path(&count_path, path);
> +       strbuf_addch(&count_path, '/');
> +       base_path_len = count_path.len;
> +
> +       while ((e = readdir(dir)) != NULL)
> +               if (!is_dot_or_dotdot(e->d_name) &&
> +                   e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
> +                   !hex_to_bytes(&c, e->d_name, 1)) {

You only recurse into directories, ignoring individual files.

> +                       strbuf_setlen(&count_path, base_path_len);
> +                       strbuf_addstr(&count_path, e->d_name);
> +                       total += (count = count_files(count_path.buf));
> +                       strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);

This shows the number of files within a directory.

> +               }
> +
> +       strbuf_addf(buf, "Total: %d loose objects", total);

and this shows the total number of files across all the directories.

But the commit message suggested you also wanted to check for large
sizes of loose objects.  Did that get ripped out at some point with
the commit message not being updated, or is it perhaps going to be
included later?

> +
> +       strbuf_release(&count_path);
> +       closedir(dir);
> +}
> +
>  static int cmd_diagnose(int argc, const char **argv)
>  {
>         struct option options[] = {
> @@ -734,6 +788,12 @@ static int cmd_diagnose(int argc, const char **argv)
>         if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt")))
>                 goto diagnose_cleanup;
>
> +       strbuf_reset(&buf);
> +       loose_objs_stats(&buf, ".git/objects");
> +
> +       if ((res = stage(tmp_dir.buf, &buf, "objects-local.txt")))
> +               goto diagnose_cleanup;
> +
>         if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
>             (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
>             (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
> diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> index b1745851e31..f2ec156d819 100755
> --- a/contrib/scalar/t/t9099-scalar.sh
> +++ b/contrib/scalar/t/t9099-scalar.sh
> @@ -77,6 +77,8 @@ test_expect_success UNZIP 'scalar diagnose' '
>         unzip -p "$zip_path" diagnostics.log >out &&
>         test_file_not_empty out &&
>         unzip -p "$zip_path" packs-local.txt >out &&
> +       test_file_not_empty out &&
> +       unzip -p "$zip_path" objects-local.txt >out &&
>         test_file_not_empty out
>  '
>
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 1/5] Implement `scalar diagnose`
  2022-01-26  8:41 ` [PATCH 1/5] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
  2022-01-26  9:34   ` René Scharfe
@ 2022-01-27 19:38   ` Elijah Newren
  1 sibling, 0 replies; 140+ messages in thread
From: Elijah Newren @ 2022-01-27 19:38 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: Git Mailing List, Johannes Schindelin

On Wed, Jan 26, 2022 at 3:37 PM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> Over the course of Scalar's development, it became obvious that there is
> a need for a command that can gather all kinds of useful information
> that can help identify the most typical problems with large
> worktrees/repositories.
>
> The `diagnose` command is the culmination of this hard-won knowledge: it
> gathers the installed hooks, the config, a couple statistics describing
> the data shape, among other pieces of information, and then wraps
> everything up in a tidy, neat `.zip` archive.
>
> Note: originally, Scalar was implemented in C# using the .NET API, where
> we had the luxury of a comprehensive standard library that includes
> basic functionality such as writing a `.zip` file. In the C version, we
> lack such a commodity. Rather than introducing a dependency on, say,
> libzip, we slightly abuse Git's `archive` command: Instead of writing
> the `.zip` file directly, we stage the file contents in a Git index of a
> temporary, bare repository, only to let `git archive` have at it, and
> finally removing the temporary repository.
>
> Also note: Due to the frequently-spawned `git hash-object` processes,
> this command is quite a bit slow on Windows. Should it turn out to be a
> big problem, the lack of a batch mode of the `hash-object` command could
> potentially be worked around via using `git fast-import` with a crafted
> `stdin`.

hash-object and update-index processes, right?  You spawn one of each
for each object.

I was you investigate the fast-import idea because it gets rid of the
N hash-object processes, the N update-index processes, and the
write-tree process, instead giving you a single fast-import process as
a preliminary to calling out to git archive.  It'd also have the
advantage of providing just one pack instead of many loose objects.

But René's suggestion to use and extend archive's ability to handle
untracked files sounds like a better idea.

>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  contrib/scalar/scalar.c          | 170 +++++++++++++++++++++++++++++++
>  contrib/scalar/scalar.txt        |  12 +++
>  contrib/scalar/t/t9099-scalar.sh |  13 +++
>  3 files changed, 195 insertions(+)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index 1ce9c2b00e8..13f2b0f4d5a 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -259,6 +259,108 @@ static int unregister_dir(void)
>         return res;
>  }
>
> +static int stage(const char *git_dir, struct strbuf *buf, const char *path)
> +{
> +       struct strbuf cacheinfo = STRBUF_INIT;
> +       struct child_process cp = CHILD_PROCESS_INIT;
> +       int res;
> +
> +       strbuf_addstr(&cacheinfo, "100644,");
> +
> +       cp.git_cmd = 1;
> +       strvec_pushl(&cp.args, "--git-dir", git_dir,
> +                    "hash-object", "-w", "--stdin", NULL);
> +       res = pipe_command(&cp, buf->buf, buf->len, &cacheinfo, 256, NULL, 0);
> +       if (!res) {
> +               strbuf_rtrim(&cacheinfo);
> +               strbuf_addch(&cacheinfo, ',');
> +               /* We cannot stage `.git`, use `_git` instead. */
> +               if (starts_with(path, ".git/"))
> +                       strbuf_addf(&cacheinfo, "_%s", path + 1);
> +               else
> +                       strbuf_addstr(&cacheinfo, path);
> +
> +               child_process_init(&cp);
> +               cp.git_cmd = 1;
> +               strvec_pushl(&cp.args, "--git-dir", git_dir,
> +                            "update-index", "--add", "--cacheinfo",
> +                            cacheinfo.buf, NULL);
> +               res = run_command(&cp);
> +       }
> +
> +       strbuf_release(&cacheinfo);
> +       return res;
> +}
> +
> +static int stage_file(const char *git_dir, const char *path)
> +{
> +       struct strbuf buf = STRBUF_INIT;
> +       int res;
> +
> +       if (strbuf_read_file(&buf, path, 0) < 0)
> +               return error(_("could not read '%s'"), path);
> +
> +       res = stage(git_dir, &buf, path);
> +
> +       strbuf_release(&buf);
> +       return res;
> +}
> +
> +static int stage_directory(const char *git_dir, const char *path, int recurse)
> +{
> +       int at_root = !*path;
> +       DIR *dir = opendir(at_root ? "." : path);
> +       struct dirent *e;
> +       struct strbuf buf = STRBUF_INIT;
> +       size_t len;
> +       int res = 0;
> +
> +       if (!dir)
> +               return error(_("could not open directory '%s'"), path);
> +
> +       if (!at_root)
> +               strbuf_addf(&buf, "%s/", path);
> +       len = buf.len;
> +
> +       while (!res && (e = readdir(dir))) {
> +               if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
> +                       continue;
> +
> +               strbuf_setlen(&buf, len);
> +               strbuf_addstr(&buf, e->d_name);
> +
> +               if ((e->d_type == DT_REG && stage_file(git_dir, buf.buf)) ||
> +                   (e->d_type == DT_DIR && recurse &&
> +                    stage_directory(git_dir, buf.buf, recurse)))
> +                       res = -1;
> +       }
> +
> +       closedir(dir);
> +       strbuf_release(&buf);
> +       return res;
> +}
> +
> +static int index_to_zip(const char *git_dir)
> +{
> +       struct child_process cp = CHILD_PROCESS_INIT;
> +       struct strbuf oid = STRBUF_INIT;
> +
> +       cp.git_cmd = 1;
> +       strvec_pushl(&cp.args, "--git-dir", git_dir, "write-tree", NULL);
> +       if (pipe_command(&cp, NULL, 0, &oid, the_hash_algo->hexsz + 1,
> +                        NULL, 0))
> +               return error(_("could not write temporary tree object"));
> +
> +       strbuf_rtrim(&oid);
> +       child_process_init(&cp);
> +       cp.git_cmd = 1;
> +       strvec_pushl(&cp.args, "--git-dir", git_dir, "archive", "-o", NULL);
> +       strvec_pushf(&cp.args, "%s.zip", git_dir);
> +       strvec_pushl(&cp.args, oid.buf, "--", NULL);
> +       strbuf_release(&oid);
> +       return run_command(&cp);
> +}
> +
>  /* printf-style interface, expects `<key>=<value>` argument */
>  static int set_config(const char *fmt, ...)
>  {
> @@ -499,6 +601,73 @@ cleanup:
>         return res;
>  }
>
> +static int cmd_diagnose(int argc, const char **argv)
> +{
> +       struct option options[] = {
> +               OPT_END(),
> +       };
> +       const char * const usage[] = {
> +               N_("scalar diagnose [<enlistment>]"),
> +               NULL
> +       };
> +       struct strbuf tmp_dir = STRBUF_INIT;
> +       time_t now = time(NULL);
> +       struct tm tm;
> +       struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
> +       int res = 0;
> +
> +       argc = parse_options(argc, argv, NULL, options,
> +                            usage, 0);
> +
> +       setup_enlistment_directory(argc, argv, usage, options, &buf);
> +
> +       strbuf_addstr(&buf, "/.scalarDiagnostics/scalar_");
> +       strbuf_addftime(&buf, "%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
> +       if (run_git("init", "-q", "-b", "dummy", "--bare", buf.buf, NULL)) {
> +               res = error(_("could not initialize temporary repository: %s"),
> +                           buf.buf);
> +               goto diagnose_cleanup;
> +       }
> +       strbuf_realpath(&tmp_dir, buf.buf, 1);
> +
> +       strbuf_reset(&buf);
> +       strbuf_addf(&buf, "Collecting diagnostic info into temp folder %s\n\n",
> +                   tmp_dir.buf);
> +
> +       get_version_info(&buf, 1);
> +
> +       strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
> +       fwrite(buf.buf, buf.len, 1, stdout);
> +
> +       if ((res = stage(tmp_dir.buf, &buf, "diagnostics.log")))
> +               goto diagnose_cleanup;
> +
> +       if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
> +           (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
> +           (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
> +           (res = stage_directory(tmp_dir.buf, ".git/logs", 1)) ||
> +           (res = stage_directory(tmp_dir.buf, ".git/objects/info", 0)))
> +               goto diagnose_cleanup;
> +
> +       res = index_to_zip(tmp_dir.buf);
> +
> +       if (!res)
> +               res = remove_dir_recursively(&tmp_dir, 0);
> +
> +       if (!res)
> +               printf("\n"
> +                      "Diagnostics complete.\n"
> +                      "All of the gathered info is captured in '%s.zip'\n",
> +                      tmp_dir.buf);
> +
> +diagnose_cleanup:
> +       strbuf_release(&tmp_dir);
> +       strbuf_release(&path);
> +       strbuf_release(&buf);
> +
> +       return res;
> +}
> +
>  static int cmd_list(int argc, const char **argv)
>  {
>         if (argc != 1)
> @@ -800,6 +969,7 @@ static struct {
>         { "reconfigure", cmd_reconfigure },
>         { "delete", cmd_delete },
>         { "version", cmd_version },
> +       { "diagnose", cmd_diagnose },
>         { NULL, NULL},
>  };
>
> diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
> index f416d637289..22583fe046e 100644
> --- a/contrib/scalar/scalar.txt
> +++ b/contrib/scalar/scalar.txt
> @@ -14,6 +14,7 @@ scalar register [<enlistment>]
>  scalar unregister [<enlistment>]
>  scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
>  scalar reconfigure [ --all | <enlistment> ]
> +scalar diagnose [<enlistment>]
>  scalar delete <enlistment>
>
>  DESCRIPTION
> @@ -129,6 +130,17 @@ reconfigure the enlistment.
>  With the `--all` option, all enlistments currently registered with Scalar
>  will be reconfigured. Use this option after each Scalar upgrade.
>
> +Diagnose
> +~~~~~~~~
> +
> +diagnose [<enlistment>]::
> +    When reporting issues with Scalar, it is often helpful to provide the
> +    information gathered by this command, including logs and certain
> +    statistics describing the data shape of the current enlistment.
> ++
> +The output of this command is a `.zip` file that is written into
> +a directory adjacent to the worktree in the `src` directory.
> +
>  Delete
>  ~~~~~~
>
> diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> index 2e1502ad45e..ecd06e207c2 100755
> --- a/contrib/scalar/t/t9099-scalar.sh
> +++ b/contrib/scalar/t/t9099-scalar.sh
> @@ -65,6 +65,19 @@ test_expect_success 'scalar clone' '
>         )
>  '
>
> +SQ="'"
> +test_expect_success UNZIP 'scalar diagnose' '
> +       scalar diagnose cloned >out &&
> +       sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
> +       zip_path=$(cat zip_path) &&
> +       test -n "$zip_path" &&
> +       unzip -v "$zip_path" &&
> +       folder=${zip_path%.zip} &&
> +       test_path_is_missing "$folder" &&
> +       unzip -p "$zip_path" diagnostics.log >out &&
> +       test_file_not_empty out
> +'
> +
>  test_expect_success 'scalar reconfigure' '
>         git init one/src &&
>         scalar register one &&
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 0/5] scalar: implement the subcommand "diagnose"
  2022-01-27 15:19 ` [PATCH 0/5] scalar: implement the subcommand "diagnose" Derrick Stolee
@ 2022-02-06 21:13   ` Johannes Schindelin
  0 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin @ 2022-02-06 21:13 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Johannes Schindelin via GitGitGadget, git, Emily Shaffer

Hi Stolee & Emily,

On Thu, 27 Jan 2022, Derrick Stolee wrote:

> On 1/26/2022 3:41 AM, Johannes Schindelin via GitGitGadget wrote:
> > Over the course of the years, we developed a sub-command that gathers
> > diagnostic data into a .zip file that can then be attached to bug reports.
> > This sub-command turned out to be very useful in helping Scalar developers
> > identify and fix issues.
>
> For historical context: The 'diagnose' command was implemented in VFS for
> Git and ported to the C# version of Scalar before 'git bugreport' existed,
> but they serve very similar purposes.
>
> I wonder if 'scalar diagnose' could include some of the information
> captured by 'git bugreport' or whether this implementation of 'diagnose'
> could help inform 'git bugreport' in any way.

Indeed, I think that the `bugreport` command could easily benefit from at
least the number of pack files and loose objects.

Ciao,
Dscho

>
> CC'ing Emily for thoughts.
>
> Thanks,
> -Stolee
>

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information
  2022-01-27 18:59   ` Elijah Newren
@ 2022-02-06 21:25     ` Johannes Schindelin
  0 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin @ 2022-02-06 21:25 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Matthew John Cheetham via GitGitGadget, Git Mailing List,
	Matthew John Cheetham

Hi Elijah,

On Thu, 27 Jan 2022, Elijah Newren wrote:

> On Wed, Jan 26, 2022 at 3:37 PM Matthew John Cheetham via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Matthew John Cheetham <mjcheetham@outlook.com>
> >
> > When operating at the scale that Scalar wants to support, certain data
> > shapes are more likely to cause undesirable performance issues, such as
> > large numbers or large sizes of loose objects.
>
> Makes sense.
>
> > By including statistics about this, `scalar diagnose` now makes it
> > easier to identify such scenarios.
> >
> > Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
> > ---
> >  contrib/scalar/scalar.c          | 60 ++++++++++++++++++++++++++++++++
> >  contrib/scalar/t/t9099-scalar.sh |  2 ++
> >  2 files changed, 62 insertions(+)
> >
> > diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> > index 690933ffdf3..c0ad4948215 100644
> > --- a/contrib/scalar/scalar.c
> > +++ b/contrib/scalar/scalar.c
> > @@ -686,6 +686,60 @@ static void dir_file_stats(struct strbuf *buf, const char *path)
> >         closedir(dir);
> >  }
> >
> > +static int count_files(char *path)
> > +{
> > +       DIR *dir = opendir(path);
> > +       struct dirent *e;
> > +       int count = 0;
> > +
> > +       if (!dir)
> > +               return 0;
> > +
> > +       while ((e = readdir(dir)) != NULL)
> > +               if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
> > +                       count++;
> > +
> > +       closedir(dir);
> > +       return count;
> > +}
> > +
> > +static void loose_objs_stats(struct strbuf *buf, const char *path)
> > +{
> > +       DIR *dir = opendir(path);
> > +       struct dirent *e;
> > +       int count;
> > +       int total = 0;
> > +       unsigned char c;
> > +       struct strbuf count_path = STRBUF_INIT;
> > +       size_t base_path_len;
> > +
> > +       if (!dir)
> > +               return;
> > +
> > +       strbuf_addstr(buf, "Object directory stats for ");
> > +       strbuf_add_absolute_path(buf, path);
> > +       strbuf_addstr(buf, ":\n");
> > +
> > +       strbuf_add_absolute_path(&count_path, path);
> > +       strbuf_addch(&count_path, '/');
> > +       base_path_len = count_path.len;
> > +
> > +       while ((e = readdir(dir)) != NULL)
> > +               if (!is_dot_or_dotdot(e->d_name) &&
> > +                   e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
> > +                   !hex_to_bytes(&c, e->d_name, 1)) {
>
> You only recurse into directories, ignoring individual files.
>
> > +                       strbuf_setlen(&count_path, base_path_len);
> > +                       strbuf_addstr(&count_path, e->d_name);
> > +                       total += (count = count_files(count_path.buf));
> > +                       strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
>
> This shows the number of files within a directory.
>
> > +               }
> > +
> > +       strbuf_addf(buf, "Total: %d loose objects", total);
>
> and this shows the total number of files across all the directories.
>
> But the commit message suggested you also wanted to check for large
> sizes of loose objects.  Did that get ripped out at some point with
> the commit message not being updated, or is it perhaps going to be
> included later?

No, there was no plan to include this information later, as the original
.NET implementation of `scalar diagnose` did not provide that information,
either (which I take as a strong sign that we never needed this type of
information to help users, at least not up until this point).

Besides, it would be kind of a difficult thing to say conclusively what
makes a loose file "big". Is it the zlib-compressed size on disk? Or the
unpacked size? Should there be a configurable threshold to determine when
an object is big? Should `core.bigFileThreshold` be co-opted for this?

Together with the fact that there was no need for this information in
practice, it makes me doubt that we should add this type of information. I
actually suspect that _iff_ information of that type would be helpful, a
more complete tool like git-sizer (https://github.com/github/git-sizer/)
would be needed, and I do not really want to subsume git-sizer's
functionality in `scalar diagnose`.

I rephrased the commit message.

Ciao,
Dscho

>
> > +
> > +       strbuf_release(&count_path);
> > +       closedir(dir);
> > +}
> > +
> >  static int cmd_diagnose(int argc, const char **argv)
> >  {
> >         struct option options[] = {
> > @@ -734,6 +788,12 @@ static int cmd_diagnose(int argc, const char **argv)
> >         if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt")))
> >                 goto diagnose_cleanup;
> >
> > +       strbuf_reset(&buf);
> > +       loose_objs_stats(&buf, ".git/objects");
> > +
> > +       if ((res = stage(tmp_dir.buf, &buf, "objects-local.txt")))
> > +               goto diagnose_cleanup;
> > +
> >         if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
> >             (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
> >             (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
> > diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> > index b1745851e31..f2ec156d819 100755
> > --- a/contrib/scalar/t/t9099-scalar.sh
> > +++ b/contrib/scalar/t/t9099-scalar.sh
> > @@ -77,6 +77,8 @@ test_expect_success UNZIP 'scalar diagnose' '
> >         unzip -p "$zip_path" diagnostics.log >out &&
> >         test_file_not_empty out &&
> >         unzip -p "$zip_path" packs-local.txt >out &&
> > +       test_file_not_empty out &&
> > +       unzip -p "$zip_path" objects-local.txt >out &&
> >         test_file_not_empty out
> >  '
> >
> > --
> > gitgitgadget
>

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 1/5] Implement `scalar diagnose`
  2022-01-26 22:20     ` Taylor Blau
@ 2022-02-06 21:34       ` Johannes Schindelin
  0 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin @ 2022-02-06 21:34 UTC (permalink / raw)
  To: Taylor Blau; +Cc: René Scharfe, Johannes Schindelin via GitGitGadget, git

[-- Attachment #1: Type: text/plain, Size: 2461 bytes --]

Hi René & Taylor,

On Wed, 26 Jan 2022, Taylor Blau wrote:

> On Wed, Jan 26, 2022 at 10:34:04AM +0100, René Scharfe wrote:
> > Am 26.01.22 um 09:41 schrieb Johannes Schindelin via GitGitGadget:
> > > Note: originally, Scalar was implemented in C# using the .NET API, where
> > > we had the luxury of a comprehensive standard library that includes
> > > basic functionality such as writing a `.zip` file. In the C version, we
> > > lack such a commodity. Rather than introducing a dependency on, say,
> > > libzip, we slightly abuse Git's `archive` command: Instead of writing
> > > the `.zip` file directly, we stage the file contents in a Git index of a
> > > temporary, bare repository, only to let `git archive` have at it, and
> > > finally removing the temporary repository.
> >
> > git archive allows you to include untracked files in an archive with its
> > option --add-file.  You can see an example in Git's Makefile; search for
> > GIT_ARCHIVE_EXTRA_FILES.  It still requires a tree argument, but the
> > empty tree object should suffice if you don't want to include any
> > tracked files.  It doesn't currently support streaming, though, i.e.
> > files are fully read into memory, so it's impractical for huge ones.

That's a good point.

I did not want to invent any `fast-import`-like streaming protocol just
for the sake of supporting the "funny" use case of `scalar diagnose`, so I
invented a new option `--add-file-with-content=<path>:<content>` (with the
obvious limitation that the `<path>` cannot contain any colon, if that is
desired, users will still need to write out untracked files).

> Using `--add-file` would likely be preferable to setting up a temporary
> repository just to invoke `git archive` in it. Johannes would be the
> expert to ask whether or not big files are going to be a problem here
> (based on a cursory scan of the new functions in scalar.c, I don't
> expect this to be the case).

Indeed, it is unlikely that any large files are included.

> The new stage_directory() function _could_ add `--add-file` arguments in
> a loop around readdir(), but it might also be nice to add a new
> `--add-directory` function to `git archive` which would do the "heavy"
> lifting for us.

I went one step further and used `write_archive()` to do the
heavy-lifting. That way, we truly avoid spawning any separate process let
alone creating any throw-away repository.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH 3/5] scalar: teach `diagnose` to gather packfile info
  2022-01-27 15:14     ` Derrick Stolee
@ 2022-02-06 21:38       ` Johannes Schindelin
  0 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin @ 2022-02-06 21:38 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Taylor Blau, Matthew John Cheetham via GitGitGadget, git,
	Matthew John Cheetham

Hi Stolee & Taylor,

On Thu, 27 Jan 2022, Derrick Stolee wrote:

> On 1/26/2022 5:43 PM, Taylor Blau wrote:
> > On Wed, Jan 26, 2022 at 08:41:45AM +0000, Matthew John Cheetham via GitGitGadget wrote:
> >> From: Matthew John Cheetham <mjcheetham@outlook.com>
> >>
> >> Teach the `scalar diagnose` command to gather file size information
> >> about pack files.
> >>
> >> Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
> >> ---
> >>  contrib/scalar/scalar.c          | 39 ++++++++++++++++++++++++++++++++
> >>  contrib/scalar/t/t9099-scalar.sh |  2 ++
> >>  2 files changed, 41 insertions(+)
> >>
> >> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> >> index e26fb2fc018..690933ffdf3 100644
> >> --- a/contrib/scalar/scalar.c
> >> +++ b/contrib/scalar/scalar.c
> >> @@ -653,6 +653,39 @@ cleanup:
> >>  	return res;
> >>  }
> >>
> >> +static void dir_file_stats(struct strbuf *buf, const char *path)
> >> +{
> >> +	DIR *dir = opendir(path);
> >> +	struct dirent *e;
> >> +	struct stat e_stat;
> >> +	struct strbuf file_path = STRBUF_INIT;
> >> +	size_t base_path_len;
> >> +
> >> +	if (!dir)
> >> +		return;
> >> +
> >> +	strbuf_addstr(buf, "Contents of ");
> >> +	strbuf_add_absolute_path(buf, path);
> >> +	strbuf_addstr(buf, ":\n");
> >> +
> >> +	strbuf_add_absolute_path(&file_path, path);
> >> +	strbuf_addch(&file_path, '/');
> >> +	base_path_len = file_path.len;
> >> +
> >> +	while ((e = readdir(dir)) != NULL)
> >
> > Hmm. Is there a reason that this couldn't use
> > for_each_file_in_pack_dir() with a callback that just does the stat()
> > and buffer manipulation?
> >
> > I don't think it's critical either way, but it would eliminate some of
> > the boilerplate that is shared between this implementation and the one
> > that already exists in for_each_file_in_pack_dir().
>
> It's helpful to see if there are other crud files in the pack
> directory. This method is also extended in microsoft/git to
> scan the alternates directory (which we expect to exist as the
> "shared objects cache).
>
> We might want to modify the implementation in this series to
> run dir_file_stats() on each odb in the_repository. This would
> give us the data for the shared object cache for free while
> being more general to other Git repos. (It would require us to
> do some reaction work in microsoft/git and be a change of
> behavior, but we are the only ones who have looked at these
> diagnose files before, so that change will be easy to manage.)

Good points all around. I went with the `for_each_file_in_pack_dir()`
approach, and threw in the now very simple change to also enumerate the
alternates, if there are any.

And yes, that will require some reaction work in microsoft/git, but for an
obvious improvement like this one, I don't grumble about the extra burden.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 0/6] scalar: implement the subcommand "diagnose"
  2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                   ` (5 preceding siblings ...)
  2022-01-27 15:19 ` [PATCH 0/5] scalar: implement the subcommand "diagnose" Derrick Stolee
@ 2022-02-06 22:39 ` Johannes Schindelin via GitGitGadget
  2022-02-06 22:39   ` [PATCH v2 1/6] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
                     ` (6 more replies)
  6 siblings, 7 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-02-06 22:39 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin

Over the course of the years, we developed a sub-command that gathers
diagnostic data into a .zip file that can then be attached to bug reports.
This sub-command turned out to be very useful in helping Scalar developers
identify and fix issues.

Changes since v1:

 * Instead of creating a throw-away repository, staging the contents of the
   .zip file and then using git write-tree and git archive to write the .zip
   file, the patch series now introduces a new option to git archive and
   uses write_archive() directly (avoiding any separate process).
 * Since the command avoids separate processes, it is now blazing fast on
   Windows, and I dropped the spinner() function because it's no longer
   needed.
 * While reworking the test case, I noticed that scalar [...] <enlistment>
   failed to verify that the specified directory exists, and would happily
   "traverse to its parent directory" on its quest to find a Scalar
   enlistment. That is of course incorrect, and has been fixed as a "while
   at it" sort of preparatory commit.
 * I had forgotten to sign off on all the commits, which has been fixed.
 * Instead of some "home-grown" readdir()-based function, the code now uses
   for_each_file_in_pack_dir() to look through the pack directories.
 * If any alternates are configured, their pack directories are now included
   in the output.
 * The commit message that might be interpreted to promise information about
   large loose files has been corrected to no longer promise that.
 * The test cases have been adjusted to test a little bit more (e.g.
   verifying that specific paths are mentioned in the output, instead of
   merely verifying that the output is non-empty).

Johannes Schindelin (4):
  archive: optionally add "virtual" files
  scalar: validate the optional enlistment argument
  Implement `scalar diagnose`
  scalar diagnose: include disk space information

Matthew John Cheetham (2):
  scalar: teach `diagnose` to gather packfile info
  scalar: teach `diagnose` to gather loose objects information

 Documentation/git-archive.txt    |  11 ++
 archive.c                        |  51 +++++-
 contrib/scalar/scalar.c          | 291 ++++++++++++++++++++++++++++++-
 contrib/scalar/scalar.txt        |  12 ++
 contrib/scalar/t/t9099-scalar.sh |  27 +++
 t/t5003-archive-zip.sh           |  12 ++
 6 files changed, 394 insertions(+), 10 deletions(-)


base-commit: ddc35d833dd6f9e8946b09cecd3311b8aa18d295
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1128%2Fdscho%2Fscalar-diagnose-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1128/dscho/scalar-diagnose-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1128

Range-diff vs v1:

 -:  ----------- > 1:  49ff3c1f2b3 archive: optionally add "virtual" files
 -:  ----------- > 2:  600da8d465e scalar: validate the optional enlistment argument
 1:  ce85506e7a4 ! 3:  0d570137bb6 Implement `scalar diagnose`
     @@ Commit message
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## contrib/scalar/scalar.c ##
     +@@
     + #include "dir.h"
     + #include "packfile.h"
     + #include "help.h"
     ++#include "archive.h"
     + 
     + /*
     +  * Remove the deepest subdirectory in the provided path string. Path must not
      @@ contrib/scalar/scalar.c: static int unregister_dir(void)
       	return res;
       }
       
     -+static int stage(const char *git_dir, struct strbuf *buf, const char *path)
     -+{
     -+	struct strbuf cacheinfo = STRBUF_INIT;
     -+	struct child_process cp = CHILD_PROCESS_INIT;
     -+	int res;
     -+
     -+	strbuf_addstr(&cacheinfo, "100644,");
     -+
     -+	cp.git_cmd = 1;
     -+	strvec_pushl(&cp.args, "--git-dir", git_dir,
     -+		     "hash-object", "-w", "--stdin", NULL);
     -+	res = pipe_command(&cp, buf->buf, buf->len, &cacheinfo, 256, NULL, 0);
     -+	if (!res) {
     -+		strbuf_rtrim(&cacheinfo);
     -+		strbuf_addch(&cacheinfo, ',');
     -+		/* We cannot stage `.git`, use `_git` instead. */
     -+		if (starts_with(path, ".git/"))
     -+			strbuf_addf(&cacheinfo, "_%s", path + 1);
     -+		else
     -+			strbuf_addstr(&cacheinfo, path);
     -+
     -+		child_process_init(&cp);
     -+		cp.git_cmd = 1;
     -+		strvec_pushl(&cp.args, "--git-dir", git_dir,
     -+			     "update-index", "--add", "--cacheinfo",
     -+			     cacheinfo.buf, NULL);
     -+		res = run_command(&cp);
     -+	}
     -+
     -+	strbuf_release(&cacheinfo);
     -+	return res;
     -+}
     -+
     -+static int stage_file(const char *git_dir, const char *path)
     -+{
     -+	struct strbuf buf = STRBUF_INIT;
     -+	int res;
     -+
     -+	if (strbuf_read_file(&buf, path, 0) < 0)
     -+		return error(_("could not read '%s'"), path);
     -+
     -+	res = stage(git_dir, &buf, path);
     -+
     -+	strbuf_release(&buf);
     -+	return res;
     -+}
     -+
     -+static int stage_directory(const char *git_dir, const char *path, int recurse)
     ++static int add_directory_to_archiver(struct strvec *archiver_args,
     ++					  const char *path, int recurse)
      +{
      +	int at_root = !*path;
      +	DIR *dir = opendir(at_root ? "." : path);
     @@ contrib/scalar/scalar.c: static int unregister_dir(void)
      +	if (!at_root)
      +		strbuf_addf(&buf, "%s/", path);
      +	len = buf.len;
     ++	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
      +
      +	while (!res && (e = readdir(dir))) {
      +		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
     @@ contrib/scalar/scalar.c: static int unregister_dir(void)
      +		strbuf_setlen(&buf, len);
      +		strbuf_addstr(&buf, e->d_name);
      +
     -+		if ((e->d_type == DT_REG && stage_file(git_dir, buf.buf)) ||
     -+		    (e->d_type == DT_DIR && recurse &&
     -+		     stage_directory(git_dir, buf.buf, recurse)))
     ++		if (e->d_type == DT_REG)
     ++			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
     ++		else if (e->d_type != DT_DIR)
      +			res = -1;
     ++		else if (recurse)
     ++		     add_directory_to_archiver(archiver_args, buf.buf, recurse);
      +	}
      +
      +	closedir(dir);
      +	strbuf_release(&buf);
      +	return res;
      +}
     -+
     -+static int index_to_zip(const char *git_dir)
     -+{
     -+	struct child_process cp = CHILD_PROCESS_INIT;
     -+	struct strbuf oid = STRBUF_INIT;
     -+
     -+	cp.git_cmd = 1;
     -+	strvec_pushl(&cp.args, "--git-dir", git_dir, "write-tree", NULL);
     -+	if (pipe_command(&cp, NULL, 0, &oid, the_hash_algo->hexsz + 1,
     -+			 NULL, 0))
     -+		return error(_("could not write temporary tree object"));
     -+
     -+	strbuf_rtrim(&oid);
     -+	child_process_init(&cp);
     -+	cp.git_cmd = 1;
     -+	strvec_pushl(&cp.args, "--git-dir", git_dir, "archive", "-o", NULL);
     -+	strvec_pushf(&cp.args, "%s.zip", git_dir);
     -+	strvec_pushl(&cp.args, oid.buf, "--", NULL);
     -+	strbuf_release(&oid);
     -+	return run_command(&cp);
     -+}
      +
       /* printf-style interface, expects `<key>=<value>` argument */
       static int set_config(const char *fmt, ...)
     @@ contrib/scalar/scalar.c: cleanup:
      +		N_("scalar diagnose [<enlistment>]"),
      +		NULL
      +	};
     -+	struct strbuf tmp_dir = STRBUF_INIT;
     ++	struct strbuf zip_path = STRBUF_INIT;
     ++	struct strvec archiver_args = STRVEC_INIT;
     ++	char **argv_copy = NULL;
     ++	int stdout_fd = -1, archiver_fd = -1;
      +	time_t now = time(NULL);
      +	struct tm tm;
      +	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
     ++	size_t off;
      +	int res = 0;
      +
      +	argc = parse_options(argc, argv, NULL, options,
      +			     usage, 0);
      +
     -+	setup_enlistment_directory(argc, argv, usage, options, &buf);
     ++	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
     ++
     ++	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
     ++	strbuf_addftime(&zip_path,
     ++			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
     ++	strbuf_addstr(&zip_path, ".zip");
     ++	switch (safe_create_leading_directories(zip_path.buf)) {
     ++	case SCLD_EXISTS:
     ++	case SCLD_OK:
     ++		break;
     ++	default:
     ++		error_errno(_("could not create directory for '%s'"),
     ++			    zip_path.buf);
     ++		goto diagnose_cleanup;
     ++	}
     ++	stdout_fd = dup(1);
     ++	if (stdout_fd < 0) {
     ++		res = error_errno(_("could not duplicate stdout"));
     ++		goto diagnose_cleanup;
     ++	}
      +
     -+	strbuf_addstr(&buf, "/.scalarDiagnostics/scalar_");
     -+	strbuf_addftime(&buf, "%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
     -+	if (run_git("init", "-q", "-b", "dummy", "--bare", buf.buf, NULL)) {
     -+		res = error(_("could not initialize temporary repository: %s"),
     -+			    buf.buf);
     ++	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
     ++	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
     ++		res = error_errno(_("could not redirect output"));
      +		goto diagnose_cleanup;
      +	}
     -+	strbuf_realpath(&tmp_dir, buf.buf, 1);
      +
     -+	strbuf_reset(&buf);
     -+	strbuf_addf(&buf, "Collecting diagnostic info into temp folder %s\n\n",
     -+		    tmp_dir.buf);
     ++	init_zip_archiver();
     ++	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
      +
     ++	strbuf_reset(&buf);
     ++	strbuf_addstr(&buf,
     ++		      "--add-file-with-content=diagnostics.log:"
     ++		      "Collecting diagnostic info\n\n");
      +	get_version_info(&buf, 1);
      +
      +	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
     -+	fwrite(buf.buf, buf.len, 1, stdout);
     -+
     -+	if ((res = stage(tmp_dir.buf, &buf, "diagnostics.log")))
     -+		goto diagnose_cleanup;
     -+
     -+	if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
     -+	    (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
     -+	    (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
     -+	    (res = stage_directory(tmp_dir.buf, ".git/logs", 1)) ||
     -+	    (res = stage_directory(tmp_dir.buf, ".git/objects/info", 0)))
     ++	off = strchr(buf.buf, ':') + 1 - buf.buf;
     ++	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
     ++	strvec_push(&archiver_args, buf.buf);
     ++
     ++	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
     ++	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
     ++	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
     ++	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
     ++	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
      +		goto diagnose_cleanup;
      +
     -+	res = index_to_zip(tmp_dir.buf);
     ++	strvec_pushl(&archiver_args, "--prefix=",
     ++		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
      +
     -+	if (!res)
     -+		res = remove_dir_recursively(&tmp_dir, 0);
     ++	/* `write_archive()` modifies the `argv` passed to it. Let it. */
     ++	argv_copy = xmemdupz(archiver_args.v,
     ++			     sizeof(char *) * archiver_args.nr);
     ++	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
     ++			    the_repository, NULL, 0);
     ++	if (res) {
     ++		error(_("failed to write archive"));
     ++		goto diagnose_cleanup;
     ++	}
      +
      +	if (!res)
      +		printf("\n"
      +		       "Diagnostics complete.\n"
     -+		       "All of the gathered info is captured in '%s.zip'\n",
     -+		       tmp_dir.buf);
     ++		       "All of the gathered info is captured in '%s'\n",
     ++		       zip_path.buf);
      +
      +diagnose_cleanup:
     -+	strbuf_release(&tmp_dir);
     ++	if (archiver_fd >= 0) {
     ++		close(1);
     ++		dup2(stdout_fd, 1);
     ++	}
     ++	free(argv_copy);
     ++	strvec_clear(&archiver_args);
     ++	strbuf_release(&zip_path);
      +	strbuf_release(&path);
      +	strbuf_release(&buf);
      +
     @@ contrib/scalar/scalar.txt: reconfigure the enlistment.
       
      
       ## contrib/scalar/t/t9099-scalar.sh ##
     -@@ contrib/scalar/t/t9099-scalar.sh: test_expect_success 'scalar clone' '
     - 	)
     +@@ contrib/scalar/t/t9099-scalar.sh: test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
     + 	grep "cloned. does not exist" err
       '
       
      +SQ="'"
      +test_expect_success UNZIP 'scalar diagnose' '
     ++	scalar clone "file://$(pwd)" cloned --single-branch &&
      +	scalar diagnose cloned >out &&
      +	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
      +	zip_path=$(cat zip_path) &&
     @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success 'scalar clone' '
      +	test_file_not_empty out
      +'
      +
     - test_expect_success 'scalar reconfigure' '
     - 	git init one/src &&
     - 	scalar register one &&
     + test_done
 2:  f8885b27502 ! 4:  938e38b5a09 scalar diagnose: include disk space information
     @@ Commit message
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## contrib/scalar/scalar.c ##
     -@@ contrib/scalar/scalar.c: static int index_to_zip(const char *git_dir)
     - 	return run_command(&cp);
     +@@ contrib/scalar/scalar.c: static int add_directory_to_archiver(struct strvec *archiver_args,
     + 	return res;
       }
       
      +#ifndef WIN32
     @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
       
       	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
      +	get_disk_info(&buf);
     - 	fwrite(buf.buf, buf.len, 1, stdout);
     - 
     - 	if ((res = stage(tmp_dir.buf, &buf, "diagnostics.log")))
     + 	off = strchr(buf.buf, ':') + 1 - buf.buf;
     + 	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
     + 	strvec_push(&archiver_args, buf.buf);
     +
     + ## contrib/scalar/t/t9099-scalar.sh ##
     +@@ contrib/scalar/t/t9099-scalar.sh: SQ="'"
     + test_expect_success UNZIP 'scalar diagnose' '
     + 	scalar clone "file://$(pwd)" cloned --single-branch &&
     + 	scalar diagnose cloned >out &&
     ++	grep "Available space" out &&
     + 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
     + 	zip_path=$(cat zip_path) &&
     + 	test -n "$zip_path" &&
 3:  330b36de799 ! 5:  bd9428919fa scalar: teach `diagnose` to gather packfile info
     @@ Metadata
       ## Commit message ##
          scalar: teach `diagnose` to gather packfile info
      
     -    Teach the `scalar diagnose` command to gather file size information
     -    about pack files.
     +    It's helpful to see if there are other crud files in the pack
     +    directory. Let's teach the `scalar diagnose` command to gather
     +    file size information about pack files.
     +
     +    While at it, also enumerate the pack files in the alternate
     +    object directories, if any are registered.
      
          Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
     +    Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## contrib/scalar/scalar.c ##
     +@@
     + #include "packfile.h"
     + #include "help.h"
     + #include "archive.h"
     ++#include "object-store.h"
     + 
     + /*
     +  * Remove the deepest subdirectory in the provided path string. Path must not
      @@ contrib/scalar/scalar.c: cleanup:
       	return res;
       }
       
     -+static void dir_file_stats(struct strbuf *buf, const char *path)
     ++static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
     ++				   const char *file_name, void *data)
      +{
     -+	DIR *dir = opendir(path);
     -+	struct dirent *e;
     -+	struct stat e_stat;
     -+	struct strbuf file_path = STRBUF_INIT;
     -+	size_t base_path_len;
     ++	struct strbuf *buf = data;
     ++	struct stat st;
      +
     -+	if (!dir)
     -+		return;
     ++	if (!stat(full_path, &st))
     ++		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
     ++			    (uintmax_t)st.st_size);
     ++}
      +
     -+	strbuf_addstr(buf, "Contents of ");
     -+	strbuf_add_absolute_path(buf, path);
     -+	strbuf_addstr(buf, ":\n");
     ++static int dir_file_stats(struct object_directory *object_dir, void *data)
     ++{
     ++	struct strbuf *buf = data;
      +
     -+	strbuf_add_absolute_path(&file_path, path);
     -+	strbuf_addch(&file_path, '/');
     -+	base_path_len = file_path.len;
     ++	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
      +
     -+	while ((e = readdir(dir)) != NULL)
     -+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG) {
     -+			strbuf_setlen(&file_path, base_path_len);
     -+			strbuf_addstr(&file_path, e->d_name);
     -+			if (!stat(file_path.buf, &e_stat))
     -+				strbuf_addf(buf, "%-70s %16"PRIuMAX"\n",
     -+					    e->d_name,
     -+					    (uintmax_t)e_stat.st_size);
     -+		}
     ++	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
     ++				  data);
      +
     -+	strbuf_release(&file_path);
     -+	closedir(dir);
     ++	return 0;
      +}
      +
       static int cmd_diagnose(int argc, const char **argv)
       {
       	struct option options[] = {
      @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
     - 	if ((res = stage(tmp_dir.buf, &buf, "diagnostics.log")))
     - 		goto diagnose_cleanup;
     + 	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
     + 	strvec_push(&archiver_args, buf.buf);
       
      +	strbuf_reset(&buf);
     -+	dir_file_stats(&buf, ".git/objects/pack");
     -+
     -+	if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt")))
     -+		goto diagnose_cleanup;
     ++	strbuf_addstr(&buf, "--add-file-with-content=packs-local.txt:");
     ++	dir_file_stats(the_repository->objects->odb, &buf);
     ++	foreach_alt_odb(dir_file_stats, &buf);
     ++	strvec_push(&archiver_args, buf.buf);
      +
     - 	if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
     - 	    (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
     - 	    (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
     + 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
     + 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
     + 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
      
       ## contrib/scalar/t/t9099-scalar.sh ##
     +@@ contrib/scalar/t/t9099-scalar.sh: test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
     + SQ="'"
     + test_expect_success UNZIP 'scalar diagnose' '
     + 	scalar clone "file://$(pwd)" cloned --single-branch &&
     ++	git repack &&
     ++	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
     + 	scalar diagnose cloned >out &&
     + 	grep "Available space" out &&
     + 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
      @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success UNZIP 'scalar diagnose' '
       	folder=${zip_path%.zip} &&
       	test_path_is_missing "$folder" &&
       	unzip -p "$zip_path" diagnostics.log >out &&
     +-	test_file_not_empty out
      +	test_file_not_empty out &&
      +	unzip -p "$zip_path" packs-local.txt >out &&
     - 	test_file_not_empty out
     ++	grep "$(pwd)/.git/objects" out
       '
       
     + test_done
 4:  213f2c94b73 ! 6:  7a8875be425 scalar: teach `diagnose` to gather loose objects information
     @@ Commit message
      
          When operating at the scale that Scalar wants to support, certain data
          shapes are more likely to cause undesirable performance issues, such as
     -    large numbers or large sizes of loose objects.
     +    large numbers of loose objects.
      
          By including statistics about this, `scalar diagnose` now makes it
          easier to identify such scenarios.
      
          Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
     +    Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## contrib/scalar/scalar.c ##
     -@@ contrib/scalar/scalar.c: static void dir_file_stats(struct strbuf *buf, const char *path)
     - 	closedir(dir);
     +@@ contrib/scalar/scalar.c: static int dir_file_stats(struct object_directory *object_dir, void *data)
     + 	return 0;
       }
       
      +static int count_files(char *path)
     @@ contrib/scalar/scalar.c: static void dir_file_stats(struct strbuf *buf, const ch
       {
       	struct option options[] = {
      @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
     - 	if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt")))
     - 		goto diagnose_cleanup;
     + 	foreach_alt_odb(dir_file_stats, &buf);
     + 	strvec_push(&archiver_args, buf.buf);
       
      +	strbuf_reset(&buf);
     ++	strbuf_addstr(&buf, "--add-file-with-content=objects-local.txt:");
      +	loose_objs_stats(&buf, ".git/objects");
     ++	strvec_push(&archiver_args, buf.buf);
      +
     -+	if ((res = stage(tmp_dir.buf, &buf, "objects-local.txt")))
     -+		goto diagnose_cleanup;
     -+
     - 	if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
     - 	    (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
     - 	    (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
     + 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
     + 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
     + 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
      
       ## contrib/scalar/t/t9099-scalar.sh ##
     +@@ contrib/scalar/t/t9099-scalar.sh: test_expect_success UNZIP 'scalar diagnose' '
     + 	scalar clone "file://$(pwd)" cloned --single-branch &&
     + 	git repack &&
     + 	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
     ++	test_commit -C cloned/src loose &&
     + 	scalar diagnose cloned >out &&
     + 	grep "Available space" out &&
     + 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
      @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success UNZIP 'scalar diagnose' '
       	unzip -p "$zip_path" diagnostics.log >out &&
       	test_file_not_empty out &&
       	unzip -p "$zip_path" packs-local.txt >out &&
     -+	test_file_not_empty out &&
     +-	grep "$(pwd)/.git/objects" out
     ++	grep "$(pwd)/.git/objects" out &&
      +	unzip -p "$zip_path" objects-local.txt >out &&
     - 	test_file_not_empty out
     ++	grep "^Total: [1-9]" out
       '
       
     + test_done
 5:  3a2cdce554a < -:  ----------- scalar diagnose: show a spinner while staging content

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
@ 2022-02-06 22:39   ` Johannes Schindelin via GitGitGadget
  2022-02-07 19:55     ` René Scharfe
  2022-02-06 22:39   ` [PATCH v2 2/6] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-02-06 22:39 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

With the `--add-file-with-content=<path>:<content>` option, `git
archive` now supports use cases where relatively trivial files need to
be added that do not exist on disk.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 11 ++++++++
 archive.c                     | 51 +++++++++++++++++++++++++++++------
 t/t5003-archive-zip.sh        | 12 +++++++++
 3 files changed, 66 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index bc4e76a7834..1b52a0a65a1 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -61,6 +61,17 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 
+--add-file-with-content=<path>:<content>::
+	Add the specified contents to the archive.  Can be repeated to add
+	multiple files.  The path of the file in the archive is built
+	by concatenating the value for `--prefix` (if any) and the
+	basename of <file>.
++
+The `<path>` cannot contain any colon, the file mode is limited to
+a regular file, and the option may be subject platform-dependent
+command-line limits. For non-trivial cases, write an untracked file
+and use `--add-file` instead.
+
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
 	as well (see <<ATTRIBUTES>>).
diff --git a/archive.c b/archive.c
index a3bbb091256..172efd690c3 100644
--- a/archive.c
+++ b/archive.c
@@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
 struct extra_file_info {
 	char *base;
 	struct stat stat;
+	void *content;
 };
 
 int write_archive_entries(struct archiver_args *args,
@@ -337,7 +338,13 @@ int write_archive_entries(struct archiver_args *args,
 		strbuf_addstr(&path_in_archive, basename(path));
 
 		strbuf_reset(&content);
-		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
+		if (info->content)
+			err = write_entry(args, &fake_oid, path_in_archive.buf,
+					  path_in_archive.len,
+					  info->stat.st_mode,
+					  info->content, info->stat.st_size);
+		else if (strbuf_read_file(&content, path,
+					  info->stat.st_size) < 0)
 			err = error_errno(_("could not read '%s'"), path);
 		else
 			err = write_entry(args, &fake_oid, path_in_archive.buf,
@@ -493,6 +500,7 @@ static void extra_file_info_clear(void *util, const char *str)
 {
 	struct extra_file_info *info = util;
 	free(info->base);
+	free(info->content);
 	free(info);
 }
 
@@ -514,14 +522,38 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 	if (!arg)
 		return -1;
 
-	path = prefix_filename(args->prefix, arg);
-	item = string_list_append_nodup(&args->extra_files, path);
-	item->util = info = xmalloc(sizeof(*info));
+	info = xmalloc(sizeof(*info));
 	info->base = xstrdup_or_null(base);
-	if (stat(path, &info->stat))
-		die(_("File not found: %s"), path);
-	if (!S_ISREG(info->stat.st_mode))
-		die(_("Not a regular file: %s"), path);
+
+	if (strcmp(opt->long_name, "add-file-with-content")) {
+		path = prefix_filename(args->prefix, arg);
+		if (stat(path, &info->stat))
+			die(_("File not found: %s"), path);
+		if (!S_ISREG(info->stat.st_mode))
+			die(_("Not a regular file: %s"), path);
+		info->content = NULL; /* read the file later */
+	} else {
+		const char *colon = strchr(arg, ':');
+		char *p;
+
+		if (!colon)
+			die(_("missing colon: '%s'"), arg);
+
+		p = xstrndup(arg, colon - arg);
+		if (!args->prefix)
+			path = p;
+		else {
+			path = prefix_filename(args->prefix, p);
+			free(p);
+		}
+		memset(&info->stat, 0, sizeof(info->stat));
+		info->stat.st_mode = S_IFREG | 0644;
+		info->content = xstrdup(colon + 1);
+		info->stat.st_size = strlen(info->content);
+	}
+	item = string_list_append_nodup(&args->extra_files, path);
+	item->util = info;
+
 	return 0;
 }
 
@@ -554,6 +586,9 @@ static int parse_archive_args(int argc, const char **argv,
 		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
 		  N_("add untracked file to archive"), 0, add_file_cb,
 		  (intptr_t)&base },
+		{ OPTION_CALLBACK, 0, "add-file-with-content", args,
+		  N_("file"), N_("add untracked file to archive"), 0,
+		  add_file_cb, (intptr_t)&base },
 		OPT_STRING('o', "output", &output, N_("file"),
 			N_("write the archive to this file")),
 		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 1e6d18b140e..8ff1257f1a0 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -206,6 +206,18 @@ test_expect_success 'git archive --format=zip --add-file' '
 check_zip with_untracked
 check_added with_untracked untracked untracked
 
+test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
+	git archive --format=zip >with_file_with_content.zip \
+		--add-file-with-content=hello:world $EMPTY_TREE &&
+	test_when_finished "rm -rf tmp-unpack" &&
+	mkdir tmp-unpack && (
+		cd tmp-unpack &&
+		"$GIT_UNZIP" ../with_file_with_content.zip &&
+		test_path_is_file hello &&
+		test world = $(cat hello)
+	)
+'
+
 test_expect_success 'git archive --format=zip --add-file twice' '
 	echo untracked >untracked &&
 	git archive --format=zip --prefix=one/ --add-file=untracked \
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 2/6] scalar: validate the optional enlistment argument
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
  2022-02-06 22:39   ` [PATCH v2 1/6] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
@ 2022-02-06 22:39   ` Johannes Schindelin via GitGitGadget
  2022-02-06 22:39   ` [PATCH v2 3/6] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-02-06 22:39 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `scalar` command needs a Scalar enlistment for many subcommands, and
looks in the current directory for such an enlistment (traversing the
parent directories until it finds one).

These is subcommands can also be called with an optional argument
specifying the enlistment. Here, too, we traverse parent directories as
needed, until we find an enlistment.

However, if the specified directory does not even exist, or is not a
directory, we should stop right there, with an error message.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 6 ++++--
 contrib/scalar/t/t9099-scalar.sh | 5 +++++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 1ce9c2b00e8..00dcd4b50ef 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -43,9 +43,11 @@ static void setup_enlistment_directory(int argc, const char **argv,
 		usage_with_options(usagestr, options);
 
 	/* find the worktree, determine its corresponding root */
-	if (argc == 1)
+	if (argc == 1) {
 		strbuf_add_absolute_path(&path, argv[0]);
-	else if (strbuf_getcwd(&path) < 0)
+		if (!is_directory(path.buf))
+			die(_("'%s' does not exist"), path.buf);
+	} else if (strbuf_getcwd(&path) < 0)
 		die(_("need a working directory"));
 
 	strbuf_trim_trailing_dir_sep(&path);
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 2e1502ad45e..9d83fdf25e8 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -85,4 +85,9 @@ test_expect_success 'scalar delete with enlistment' '
 	test_path_is_missing cloned
 '
 
+test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
+	! scalar run config cloned 2>err &&
+	grep "cloned. does not exist" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 3/6] Implement `scalar diagnose`
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
  2022-02-06 22:39   ` [PATCH v2 1/6] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
  2022-02-06 22:39   ` [PATCH v2 2/6] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
@ 2022-02-06 22:39   ` Johannes Schindelin via GitGitGadget
  2022-02-07 19:55     ` René Scharfe
  2022-02-06 22:39   ` [PATCH v2 4/6] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-02-06 22:39 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Over the course of Scalar's development, it became obvious that there is
a need for a command that can gather all kinds of useful information
that can help identify the most typical problems with large
worktrees/repositories.

The `diagnose` command is the culmination of this hard-won knowledge: it
gathers the installed hooks, the config, a couple statistics describing
the data shape, among other pieces of information, and then wraps
everything up in a tidy, neat `.zip` archive.

Note: originally, Scalar was implemented in C# using the .NET API, where
we had the luxury of a comprehensive standard library that includes
basic functionality such as writing a `.zip` file. In the C version, we
lack such a commodity. Rather than introducing a dependency on, say,
libzip, we slightly abuse Git's `archive` command: Instead of writing
the `.zip` file directly, we stage the file contents in a Git index of a
temporary, bare repository, only to let `git archive` have at it, and
finally removing the temporary repository.

Also note: Due to the frequently-spawned `git hash-object` processes,
this command is quite a bit slow on Windows. Should it turn out to be a
big problem, the lack of a batch mode of the `hash-object` command could
potentially be worked around via using `git fast-import` with a crafted
`stdin`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 143 +++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt        |  12 +++
 contrib/scalar/t/t9099-scalar.sh |  14 +++
 3 files changed, 169 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 00dcd4b50ef..30ce0799c7a 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,6 +11,7 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
+#include "archive.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -261,6 +262,44 @@ static int unregister_dir(void)
 	return res;
 }
 
+static int add_directory_to_archiver(struct strvec *archiver_args,
+					  const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir = opendir(at_root ? "." : path);
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	if (!dir)
+		return error(_("could not open directory '%s'"), path);
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if (e->d_type == DT_REG)
+			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
+		else if (e->d_type != DT_DIR)
+			res = -1;
+		else if (recurse)
+		     add_directory_to_archiver(archiver_args, buf.buf, recurse);
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -501,6 +540,109 @@ cleanup:
 	return res;
 }
 
+static int cmd_diagnose(int argc, const char **argv)
+{
+	struct option options[] = {
+		OPT_END(),
+	};
+	const char * const usage[] = {
+		N_("scalar diagnose [<enlistment>]"),
+		NULL
+	};
+	struct strbuf zip_path = STRBUF_INIT;
+	struct strvec archiver_args = STRVEC_INIT;
+	char **argv_copy = NULL;
+	int stdout_fd = -1, archiver_fd = -1;
+	time_t now = time(NULL);
+	struct tm tm;
+	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
+	size_t off;
+	int res = 0;
+
+	argc = parse_options(argc, argv, NULL, options,
+			     usage, 0);
+
+	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
+
+	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
+	strbuf_addftime(&zip_path,
+			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
+	strbuf_addstr(&zip_path, ".zip");
+	switch (safe_create_leading_directories(zip_path.buf)) {
+	case SCLD_EXISTS:
+	case SCLD_OK:
+		break;
+	default:
+		error_errno(_("could not create directory for '%s'"),
+			    zip_path.buf);
+		goto diagnose_cleanup;
+	}
+	stdout_fd = dup(1);
+	if (stdout_fd < 0) {
+		res = error_errno(_("could not duplicate stdout"));
+		goto diagnose_cleanup;
+	}
+
+	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
+		res = error_errno(_("could not redirect output"));
+		goto diagnose_cleanup;
+	}
+
+	init_zip_archiver();
+	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf,
+		      "--add-file-with-content=diagnostics.log:"
+		      "Collecting diagnostic info\n\n");
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	off = strchr(buf.buf, ':') + 1 - buf.buf;
+	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
+	strvec_push(&archiver_args, buf.buf);
+
+	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	strvec_pushl(&archiver_args, "--prefix=",
+		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
+
+	/* `write_archive()` modifies the `argv` passed to it. Let it. */
+	argv_copy = xmemdupz(archiver_args.v,
+			     sizeof(char *) * archiver_args.nr);
+	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
+			    the_repository, NULL, 0);
+	if (res) {
+		error(_("failed to write archive"));
+		goto diagnose_cleanup;
+	}
+
+	if (!res)
+		printf("\n"
+		       "Diagnostics complete.\n"
+		       "All of the gathered info is captured in '%s'\n",
+		       zip_path.buf);
+
+diagnose_cleanup:
+	if (archiver_fd >= 0) {
+		close(1);
+		dup2(stdout_fd, 1);
+	}
+	free(argv_copy);
+	strvec_clear(&archiver_args);
+	strbuf_release(&zip_path);
+	strbuf_release(&path);
+	strbuf_release(&buf);
+
+	return res;
+}
+
 static int cmd_list(int argc, const char **argv)
 {
 	if (argc != 1)
@@ -802,6 +944,7 @@ static struct {
 	{ "reconfigure", cmd_reconfigure },
 	{ "delete", cmd_delete },
 	{ "version", cmd_version },
+	{ "diagnose", cmd_diagnose },
 	{ NULL, NULL},
 };
 
diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
index f416d637289..22583fe046e 100644
--- a/contrib/scalar/scalar.txt
+++ b/contrib/scalar/scalar.txt
@@ -14,6 +14,7 @@ scalar register [<enlistment>]
 scalar unregister [<enlistment>]
 scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
 scalar reconfigure [ --all | <enlistment> ]
+scalar diagnose [<enlistment>]
 scalar delete <enlistment>
 
 DESCRIPTION
@@ -129,6 +130,17 @@ reconfigure the enlistment.
 With the `--all` option, all enlistments currently registered with Scalar
 will be reconfigured. Use this option after each Scalar upgrade.
 
+Diagnose
+~~~~~~~~
+
+diagnose [<enlistment>]::
+    When reporting issues with Scalar, it is often helpful to provide the
+    information gathered by this command, including logs and certain
+    statistics describing the data shape of the current enlistment.
++
+The output of this command is a `.zip` file that is written into
+a directory adjacent to the worktree in the `src` directory.
+
 Delete
 ~~~~~~
 
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 9d83fdf25e8..bbd07a44426 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -90,4 +90,18 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 	grep "cloned. does not exist" err
 '
 
+SQ="'"
+test_expect_success UNZIP 'scalar diagnose' '
+	scalar clone "file://$(pwd)" cloned --single-branch &&
+	scalar diagnose cloned >out &&
+	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
+	zip_path=$(cat zip_path) &&
+	test -n "$zip_path" &&
+	unzip -v "$zip_path" &&
+	folder=${zip_path%.zip} &&
+	test_path_is_missing "$folder" &&
+	unzip -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 4/6] scalar diagnose: include disk space information
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-02-06 22:39   ` [PATCH v2 3/6] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-02-06 22:39   ` Johannes Schindelin via GitGitGadget
  2022-02-06 22:39   ` [PATCH v2 5/6] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-02-06 22:39 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When analyzing problems with large worktrees/repositories, it is useful
to know how close to a "full disk" situation Scalar/Git operates. Let's
include this information.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 53 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  1 +
 2 files changed, 54 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 30ce0799c7a..fd666376109 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -300,6 +300,58 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 	return res;
 }
 
+#ifndef WIN32
+#include <sys/statvfs.h>
+#endif
+
+static int get_disk_info(struct strbuf *out)
+{
+#ifdef WIN32
+	struct strbuf buf = STRBUF_INIT;
+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
+	DWORD serial_number, component_length, flags;
+	ULARGE_INTEGER avail2caller, total, avail;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
+		error(_("could not determine free disk size for '%s'"),
+		      buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
+				   &serial_number, &component_length, &flags,
+				   fs_name, sizeof(fs_name))) {
+		error(_("could not get info for '%s'"), buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
+	strbuf_addch(out, '\n');
+	strbuf_release(&buf);
+#else
+	struct strbuf buf = STRBUF_INIT;
+	struct statvfs stat;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (statvfs(buf.buf, &stat) < 0) {
+		error_errno(_("could not determine free disk size for '%s'"),
+			    buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
+	strbuf_release(&buf);
+#endif
+	return 0;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -599,6 +651,7 @@ static int cmd_diagnose(int argc, const char **argv)
 	get_version_info(&buf, 1);
 
 	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	get_disk_info(&buf);
 	off = strchr(buf.buf, ':') + 1 - buf.buf;
 	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
 	strvec_push(&archiver_args, buf.buf);
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index bbd07a44426..f3d037823c8 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -94,6 +94,7 @@ SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	scalar diagnose cloned >out &&
+	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
 	zip_path=$(cat zip_path) &&
 	test -n "$zip_path" &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 5/6] scalar: teach `diagnose` to gather packfile info
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
                     ` (3 preceding siblings ...)
  2022-02-06 22:39   ` [PATCH v2 4/6] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
@ 2022-02-06 22:39   ` Matthew John Cheetham via GitGitGadget
  2022-02-06 22:39   ` [PATCH v2 6/6] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
  6 siblings, 0 replies; 140+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-02-06 22:39 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

It's helpful to see if there are other crud files in the pack
directory. Let's teach the `scalar diagnose` command to gather
file size information about pack files.

While at it, also enumerate the pack files in the alternate
object directories, if any are registered.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 30 ++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  6 +++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index fd666376109..331d48b2a80 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "help.h"
 #include "archive.h"
+#include "object-store.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -592,6 +593,29 @@ cleanup:
 	return res;
 }
 
+static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
+				   const char *file_name, void *data)
+{
+	struct strbuf *buf = data;
+	struct stat st;
+
+	if (!stat(full_path, &st))
+		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
+			    (uintmax_t)st.st_size);
+}
+
+static int dir_file_stats(struct object_directory *object_dir, void *data)
+{
+	struct strbuf *buf = data;
+
+	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
+
+	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
+				  data);
+
+	return 0;
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -656,6 +680,12 @@ static int cmd_diagnose(int argc, const char **argv)
 	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
 	strvec_push(&archiver_args, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-file-with-content=packs-local.txt:");
+	dir_file_stats(the_repository->objects->odb, &buf);
+	foreach_alt_odb(dir_file_stats, &buf);
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index f3d037823c8..e049221609d 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -93,6 +93,8 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
+	git repack &&
+	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
 	scalar diagnose cloned >out &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
@@ -102,7 +104,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	folder=${zip_path%.zip} &&
 	test_path_is_missing "$folder" &&
 	unzip -p "$zip_path" diagnostics.log >out &&
-	test_file_not_empty out
+	test_file_not_empty out &&
+	unzip -p "$zip_path" packs-local.txt >out &&
+	grep "$(pwd)/.git/objects" out
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 6/6] scalar: teach `diagnose` to gather loose objects information
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
                     ` (4 preceding siblings ...)
  2022-02-06 22:39   ` [PATCH v2 5/6] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
@ 2022-02-06 22:39   ` Matthew John Cheetham via GitGitGadget
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
  6 siblings, 0 replies; 140+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-02-06 22:39 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

When operating at the scale that Scalar wants to support, certain data
shapes are more likely to cause undesirable performance issues, such as
large numbers of loose objects.

By including statistics about this, `scalar diagnose` now makes it
easier to identify such scenarios.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 59 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  5 ++-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 331d48b2a80..537b97ae734 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -616,6 +616,60 @@ static int dir_file_stats(struct object_directory *object_dir, void *data)
 	return 0;
 }
 
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -686,6 +740,11 @@ static int cmd_diagnose(int argc, const char **argv)
 	foreach_alt_odb(dir_file_stats, &buf);
 	strvec_push(&archiver_args, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-file-with-content=objects-local.txt:");
+	loose_objs_stats(&buf, ".git/objects");
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index e049221609d..9b4eedbb0aa 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -95,6 +95,7 @@ test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	git repack &&
 	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
+	test_commit -C cloned/src loose &&
 	scalar diagnose cloned >out &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
@@ -106,7 +107,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	unzip -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
 	unzip -p "$zip_path" packs-local.txt >out &&
-	grep "$(pwd)/.git/objects" out
+	grep "$(pwd)/.git/objects" out &&
+	unzip -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [1-9]" out
 '
 
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-06 22:39   ` [PATCH v2 1/6] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
@ 2022-02-07 19:55     ` René Scharfe
  2022-02-07 23:30       ` Junio C Hamano
  2022-02-08 12:54       ` Johannes Schindelin
  0 siblings, 2 replies; 140+ messages in thread
From: René Scharfe @ 2022-02-07 19:55 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git
  Cc: Taylor Blau, Derrick Stolee, Elijah Newren, Johannes Schindelin

Am 06.02.22 um 23:39 schrieb Johannes Schindelin via GitGitGadget:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> With the `--add-file-with-content=<path>:<content>` option, `git
> archive` now supports use cases where relatively trivial files need to
> be added that do not exist on disk.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  Documentation/git-archive.txt | 11 ++++++++
>  archive.c                     | 51 +++++++++++++++++++++++++++++------
>  t/t5003-archive-zip.sh        | 12 +++++++++
>  3 files changed, 66 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
> index bc4e76a7834..1b52a0a65a1 100644
> --- a/Documentation/git-archive.txt
> +++ b/Documentation/git-archive.txt
> @@ -61,6 +61,17 @@ OPTIONS
>  	by concatenating the value for `--prefix` (if any) and the
>  	basename of <file>.
>
> +--add-file-with-content=<path>:<content>::
> +	Add the specified contents to the archive.  Can be repeated to add
> +	multiple files.  The path of the file in the archive is built
> +	by concatenating the value for `--prefix` (if any) and the
> +	basename of <file>.
> ++
> +The `<path>` cannot contain any colon, the file mode is limited to
> +a regular file, and the option may be subject platform-dependent

s/subject/& to/

> +command-line limits. For non-trivial cases, write an untracked file
> +and use `--add-file` instead.
> +

We could use that option in Git's own Makefile to add the file named
"version", which contains $GIT_VERSION.  Hmm, but it also contains a
terminating newline, which would be a bit tricky (but not impossible) to
add.  Would it make sense to add one automatically if it's missing (e.g.
with strbuf_complete_line)?  Not sure.

>  --worktree-attributes::
>  	Look for attributes in .gitattributes files in the working tree
>  	as well (see <<ATTRIBUTES>>).
> diff --git a/archive.c b/archive.c
> index a3bbb091256..172efd690c3 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
>  struct extra_file_info {
>  	char *base;
>  	struct stat stat;
> +	void *content;
>  };
>
>  int write_archive_entries(struct archiver_args *args,
> @@ -337,7 +338,13 @@ int write_archive_entries(struct archiver_args *args,
>  		strbuf_addstr(&path_in_archive, basename(path));
>
>  		strbuf_reset(&content);
> -		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
> +		if (info->content)
> +			err = write_entry(args, &fake_oid, path_in_archive.buf,
> +					  path_in_archive.len,
> +					  info->stat.st_mode,
> +					  info->content, info->stat.st_size);
> +		else if (strbuf_read_file(&content, path,
> +					  info->stat.st_size) < 0)
>  			err = error_errno(_("could not read '%s'"), path);
>  		else
>  			err = write_entry(args, &fake_oid, path_in_archive.buf,
> @@ -493,6 +500,7 @@ static void extra_file_info_clear(void *util, const char *str)
>  {
>  	struct extra_file_info *info = util;
>  	free(info->base);
> +	free(info->content);
>  	free(info);
>  }
>
> @@ -514,14 +522,38 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
>  	if (!arg)
>  		return -1;
>
> -	path = prefix_filename(args->prefix, arg);
> -	item = string_list_append_nodup(&args->extra_files, path);
> -	item->util = info = xmalloc(sizeof(*info));
> +	info = xmalloc(sizeof(*info));
>  	info->base = xstrdup_or_null(base);
> -	if (stat(path, &info->stat))
> -		die(_("File not found: %s"), path);
> -	if (!S_ISREG(info->stat.st_mode))
> -		die(_("Not a regular file: %s"), path);
> +
> +	if (strcmp(opt->long_name, "add-file-with-content")) {

Equivalent to:

	if (!strcmp(opt->long_name, "add-file")) {

I mention that because the inequality check confused me a bit at first.

> +		path = prefix_filename(args->prefix, arg);
> +		if (stat(path, &info->stat))
> +			die(_("File not found: %s"), path);
> +		if (!S_ISREG(info->stat.st_mode))
> +			die(_("Not a regular file: %s"), path);
> +		info->content = NULL; /* read the file later */
> +	} else {
> +		const char *colon = strchr(arg, ':');
> +		char *p;
> +
> +		if (!colon)
> +			die(_("missing colon: '%s'"), arg);
> +
> +		p = xstrndup(arg, colon - arg);
> +		if (!args->prefix)
> +			path = p;
> +		else {
> +			path = prefix_filename(args->prefix, p);
> +			free(p);
> +		}
> +		memset(&info->stat, 0, sizeof(info->stat));
> +		info->stat.st_mode = S_IFREG | 0644;
> +		info->content = xstrdup(colon + 1);
> +		info->stat.st_size = strlen(info->content);
> +	}
> +	item = string_list_append_nodup(&args->extra_files, path);
> +	item->util = info;
> +
>  	return 0;
>  }
>
> @@ -554,6 +586,9 @@ static int parse_archive_args(int argc, const char **argv,
>  		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
>  		  N_("add untracked file to archive"), 0, add_file_cb,
>  		  (intptr_t)&base },
> +		{ OPTION_CALLBACK, 0, "add-file-with-content", args,
> +		  N_("file"), N_("add untracked file to archive"), 0,
                      ^^^^
"<file>" seems wrong, because there is no actual file.  It should rather
be "<name>:<content>" for the virtual one, right?

> +		  add_file_cb, (intptr_t)&base },
>  		OPT_STRING('o', "output", &output, N_("file"),
>  			N_("write the archive to this file")),
>  		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> index 1e6d18b140e..8ff1257f1a0 100755
> --- a/t/t5003-archive-zip.sh
> +++ b/t/t5003-archive-zip.sh
> @@ -206,6 +206,18 @@ test_expect_success 'git archive --format=zip --add-file' '
>  check_zip with_untracked
>  check_added with_untracked untracked untracked
>
> +test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
> +	git archive --format=zip >with_file_with_content.zip \
> +		--add-file-with-content=hello:world $EMPTY_TREE &&
> +	test_when_finished "rm -rf tmp-unpack" &&
> +	mkdir tmp-unpack && (
> +		cd tmp-unpack &&
> +		"$GIT_UNZIP" ../with_file_with_content.zip &&
> +		test_path_is_file hello &&
> +		test world = $(cat hello)
> +	)
> +'
> +
>  test_expect_success 'git archive --format=zip --add-file twice' '
>  	echo untracked >untracked &&
>  	git archive --format=zip --prefix=one/ --add-file=untracked \

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v2 3/6] Implement `scalar diagnose`
  2022-02-06 22:39   ` [PATCH v2 3/6] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-02-07 19:55     ` René Scharfe
  2022-02-08 12:08       ` Johannes Schindelin
  0 siblings, 1 reply; 140+ messages in thread
From: René Scharfe @ 2022-02-07 19:55 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git
  Cc: Taylor Blau, Derrick Stolee, Elijah Newren, Johannes Schindelin



Am 06.02.22 um 23:39 schrieb Johannes Schindelin via GitGitGadget:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> Over the course of Scalar's development, it became obvious that there is
> a need for a command that can gather all kinds of useful information
> that can help identify the most typical problems with large
> worktrees/repositories.
>
> The `diagnose` command is the culmination of this hard-won knowledge: it
> gathers the installed hooks, the config, a couple statistics describing
> the data shape, among other pieces of information, and then wraps
> everything up in a tidy, neat `.zip` archive.
>
> Note: originally, Scalar was implemented in C# using the .NET API, where
> we had the luxury of a comprehensive standard library that includes
> basic functionality such as writing a `.zip` file. In the C version, we
> lack such a commodity. Rather than introducing a dependency on, say,
> libzip, we slightly abuse Git's `archive` command: Instead of writing
> the `.zip` file directly, we stage the file contents in a Git index of a
> temporary, bare repository, only to let `git archive` have at it, and
> finally removing the temporary repository.
>
> Also note: Due to the frequently-spawned `git hash-object` processes,
> this command is quite a bit slow on Windows. Should it turn out to be a
> big problem, the lack of a batch mode of the `hash-object` command could
> potentially be worked around via using `git fast-import` with a crafted
> `stdin`.

The two paragraphs above are not in sync with the patch.

>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  contrib/scalar/scalar.c          | 143 +++++++++++++++++++++++++++++++
>  contrib/scalar/scalar.txt        |  12 +++
>  contrib/scalar/t/t9099-scalar.sh |  14 +++
>  3 files changed, 169 insertions(+)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index 00dcd4b50ef..30ce0799c7a 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -11,6 +11,7 @@
>  #include "dir.h"
>  #include "packfile.h"
>  #include "help.h"
> +#include "archive.h"
>
>  /*
>   * Remove the deepest subdirectory in the provided path string. Path must not
> @@ -261,6 +262,44 @@ static int unregister_dir(void)
>  	return res;
>  }
>
> +static int add_directory_to_archiver(struct strvec *archiver_args,
> +					  const char *path, int recurse)
> +{
> +	int at_root = !*path;
> +	DIR *dir = opendir(at_root ? "." : path);
> +	struct dirent *e;
> +	struct strbuf buf = STRBUF_INIT;
> +	size_t len;
> +	int res = 0;
> +
> +	if (!dir)
> +		return error(_("could not open directory '%s'"), path);
> +
> +	if (!at_root)
> +		strbuf_addf(&buf, "%s/", path);
> +	len = buf.len;
> +	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
> +
> +	while (!res && (e = readdir(dir))) {
> +		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
> +			continue;
> +
> +		strbuf_setlen(&buf, len);
> +		strbuf_addstr(&buf, e->d_name);
> +
> +		if (e->d_type == DT_REG)
> +			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
> +		else if (e->d_type != DT_DIR)
> +			res = -1;
> +		else if (recurse)
> +		     add_directory_to_archiver(archiver_args, buf.buf, recurse);
> +	}
> +
> +	closedir(dir);
> +	strbuf_release(&buf);
> +	return res;
> +}
> +
>  /* printf-style interface, expects `<key>=<value>` argument */
>  static int set_config(const char *fmt, ...)
>  {
> @@ -501,6 +540,109 @@ cleanup:
>  	return res;
>  }
>
> +static int cmd_diagnose(int argc, const char **argv)
> +{
> +	struct option options[] = {
> +		OPT_END(),
> +	};
> +	const char * const usage[] = {
> +		N_("scalar diagnose [<enlistment>]"),
> +		NULL
> +	};
> +	struct strbuf zip_path = STRBUF_INIT;
> +	struct strvec archiver_args = STRVEC_INIT;
> +	char **argv_copy = NULL;
> +	int stdout_fd = -1, archiver_fd = -1;
> +	time_t now = time(NULL);
> +	struct tm tm;
> +	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
> +	size_t off;
> +	int res = 0;
> +
> +	argc = parse_options(argc, argv, NULL, options,
> +			     usage, 0);
> +
> +	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
> +
> +	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
> +	strbuf_addftime(&zip_path,
> +			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
> +	strbuf_addstr(&zip_path, ".zip");
> +	switch (safe_create_leading_directories(zip_path.buf)) {
> +	case SCLD_EXISTS:
> +	case SCLD_OK:
> +		break;
> +	default:
> +		error_errno(_("could not create directory for '%s'"),
> +			    zip_path.buf);
> +		goto diagnose_cleanup;
> +	}
> +	stdout_fd = dup(1);
> +	if (stdout_fd < 0) {
> +		res = error_errno(_("could not duplicate stdout"));
> +		goto diagnose_cleanup;
> +	}
> +
> +	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
> +	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
> +		res = error_errno(_("could not redirect output"));
> +		goto diagnose_cleanup;
> +	}
> +
> +	init_zip_archiver();
> +	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
> +
> +	strbuf_reset(&buf);
> +	strbuf_addstr(&buf,
> +		      "--add-file-with-content=diagnostics.log:"
> +		      "Collecting diagnostic info\n\n");
> +	get_version_info(&buf, 1);
> +
> +	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
> +	off = strchr(buf.buf, ':') + 1 - buf.buf;
> +	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
> +	strvec_push(&archiver_args, buf.buf);

Fun trick to reuse the buffer for both the ZIP entry and stdout. :)  I'd
have omitted the option from buf and added it like this, for simplicity:

	strvec_pushf(&archiver_args,
		     "--add-file-with-content=diagnostics.log:%s", buf.buf);

Just a thought.

> +
> +	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
> +	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
> +	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
> +	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
> +	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
> +		goto diagnose_cleanup;
> +
> +	strvec_pushl(&archiver_args, "--prefix=",
> +		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
> +
> +	/* `write_archive()` modifies the `argv` passed to it. Let it. */
> +	argv_copy = xmemdupz(archiver_args.v,
> +			     sizeof(char *) * archiver_args.nr);

Leaking the whole thing would be fine as well for this command, but
cleaning up is tidier, of course.

> +	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
> +			    the_repository, NULL, 0);

Ah -- no shell means no command line length limits. :)

> +	if (res) {
> +		error(_("failed to write archive"));
> +		goto diagnose_cleanup;
> +	}
> +
> +	if (!res)
> +		printf("\n"
> +		       "Diagnostics complete.\n"
> +		       "All of the gathered info is captured in '%s'\n",
> +		       zip_path.buf);

Is this message appended to the ZIP file or does it go to stdout?

In any case: mixing write(2) and stdio(3) is not a good idea.  Using
fwrite(3) instead of write_or_die above and doing the stdout dup(2)
dance only tightly around the write_archive call would help, I think.

> +
> +diagnose_cleanup:
> +	if (archiver_fd >= 0) {
> +		close(1);
> +		dup2(stdout_fd, 1);
> +	}
> +	free(argv_copy);
> +	strvec_clear(&archiver_args);
> +	strbuf_release(&zip_path);
> +	strbuf_release(&path);
> +	strbuf_release(&buf);
> +
> +	return res;
> +}
> +
>  static int cmd_list(int argc, const char **argv)
>  {
>  	if (argc != 1)
> @@ -802,6 +944,7 @@ static struct {
>  	{ "reconfigure", cmd_reconfigure },
>  	{ "delete", cmd_delete },
>  	{ "version", cmd_version },
> +	{ "diagnose", cmd_diagnose },
>  	{ NULL, NULL},
>  };
>
> diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
> index f416d637289..22583fe046e 100644
> --- a/contrib/scalar/scalar.txt
> +++ b/contrib/scalar/scalar.txt
> @@ -14,6 +14,7 @@ scalar register [<enlistment>]
>  scalar unregister [<enlistment>]
>  scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
>  scalar reconfigure [ --all | <enlistment> ]
> +scalar diagnose [<enlistment>]
>  scalar delete <enlistment>
>
>  DESCRIPTION
> @@ -129,6 +130,17 @@ reconfigure the enlistment.
>  With the `--all` option, all enlistments currently registered with Scalar
>  will be reconfigured. Use this option after each Scalar upgrade.
>
> +Diagnose
> +~~~~~~~~
> +
> +diagnose [<enlistment>]::
> +    When reporting issues with Scalar, it is often helpful to provide the
> +    information gathered by this command, including logs and certain
> +    statistics describing the data shape of the current enlistment.
> ++
> +The output of this command is a `.zip` file that is written into
> +a directory adjacent to the worktree in the `src` directory.
> +
>  Delete
>  ~~~~~~
>
> diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> index 9d83fdf25e8..bbd07a44426 100755
> --- a/contrib/scalar/t/t9099-scalar.sh
> +++ b/contrib/scalar/t/t9099-scalar.sh
> @@ -90,4 +90,18 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
>  	grep "cloned. does not exist" err
>  '
>
> +SQ="'"
> +test_expect_success UNZIP 'scalar diagnose' '
> +	scalar clone "file://$(pwd)" cloned --single-branch &&
> +	scalar diagnose cloned >out &&
> +	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
> +	zip_path=$(cat zip_path) &&
> +	test -n "$zip_path" &&
> +	unzip -v "$zip_path" &&
> +	folder=${zip_path%.zip} &&
> +	test_path_is_missing "$folder" &&
> +	unzip -p "$zip_path" diagnostics.log >out &&
> +	test_file_not_empty out
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-07 19:55     ` René Scharfe
@ 2022-02-07 23:30       ` Junio C Hamano
  2022-02-08 13:12         ` Johannes Schindelin
  2022-02-08 12:54       ` Johannes Schindelin
  1 sibling, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-02-07 23:30 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin via GitGitGadget, git, Taylor Blau,
	Derrick Stolee, Elijah Newren, Johannes Schindelin

René Scharfe <l.s.r@web.de> writes:

> We could use that option in Git's own Makefile to add the file named
> "version", which contains $GIT_VERSION.  Hmm, but it also contains a
> terminating newline, which would be a bit tricky (but not impossible) to
> add.  Would it make sense to add one automatically if it's missing (e.g.
> with strbuf_complete_line)?  Not sure.

I do not think it is a good UI to give raw file content from the
command line, which will be usable only for trivial, even single
liner files, and forces people to learn two parallel option, one
for trivial ones and the other for contents with meaningful size.

"--add-blob=<path>:<blob-object-name>" may be another option, useful
when you have done "hash-object -w" already, and can be used to add
single-liner, or an entire novel.

In any case, "--add-file=<file>", which we already have, would be
more appropriate feature to use to record our "version" file, so
there is no need to change our Makefile for it.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v2 3/6] Implement `scalar diagnose`
  2022-02-07 19:55     ` René Scharfe
@ 2022-02-08 12:08       ` Johannes Schindelin
  0 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin @ 2022-02-08 12:08 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin via GitGitGadget, git, Taylor Blau,
	Derrick Stolee, Elijah Newren

[-- Attachment #1: Type: text/plain, Size: 4447 bytes --]

Hi René,

On Mon, 7 Feb 2022, René Scharfe wrote:

> > Note: originally, Scalar was implemented in C# using the .NET API, where
> > we had the luxury of a comprehensive standard library that includes
> > basic functionality such as writing a `.zip` file. In the C version, we
> > lack such a commodity. Rather than introducing a dependency on, say,
> > libzip, we slightly abuse Git's `archive` command: Instead of writing
> > the `.zip` file directly, we stage the file contents in a Git index of a
> > temporary, bare repository, only to let `git archive` have at it, and
> > finally removing the temporary repository.
> >
> > Also note: Due to the frequently-spawned `git hash-object` processes,
> > this command is quite a bit slow on Windows. Should it turn out to be a
> > big problem, the lack of a batch mode of the `hash-object` command could
> > potentially be worked around via using `git fast-import` with a crafted
> > `stdin`.
>
> The two paragraphs above are not in sync with the patch.

Whoopsie!

> > +	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
> > +	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
> > +		res = error_errno(_("could not redirect output"));
> > +		goto diagnose_cleanup;
> > +	}
> > +
> > +	init_zip_archiver();
> > +	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
> > +
> > +	strbuf_reset(&buf);
> > +	strbuf_addstr(&buf,
> > +		      "--add-file-with-content=diagnostics.log:"
> > +		      "Collecting diagnostic info\n\n");
> > +	get_version_info(&buf, 1);
> > +
> > +	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
> > +	off = strchr(buf.buf, ':') + 1 - buf.buf;
> > +	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
> > +	strvec_push(&archiver_args, buf.buf);
>
> Fun trick to reuse the buffer for both the ZIP entry and stdout. :)  I'd
> have omitted the option from buf and added it like this, for simplicity:
>
> 	strvec_pushf(&archiver_args,
> 		     "--add-file-with-content=diagnostics.log:%s", buf.buf);
>
> Just a thought.

Oh, that's even better. I did not like that `off` pattern at all but
forgot to think of `pushf()`. Thanks!

> > +
> > +	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
> > +	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
> > +	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
> > +	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
> > +	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
> > +		goto diagnose_cleanup;
> > +
> > +	strvec_pushl(&archiver_args, "--prefix=",
> > +		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
> > +
> > +	/* `write_archive()` modifies the `argv` passed to it. Let it. */
> > +	argv_copy = xmemdupz(archiver_args.v,
> > +			     sizeof(char *) * archiver_args.nr);
>
> Leaking the whole thing would be fine as well for this command, but
> cleaning up is tidier, of course.
>
> > +	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
> > +			    the_repository, NULL, 0);
>
> Ah -- no shell means no command line length limits. :)

Yes!!!

It also makes the command a ridiculous amount faster on Windows.

> > +	if (res) {
> > +		error(_("failed to write archive"));
> > +		goto diagnose_cleanup;
> > +	}
> > +
> > +	if (!res)
> > +		printf("\n"
> > +		       "Diagnostics complete.\n"
> > +		       "All of the gathered info is captured in '%s'\n",
> > +		       zip_path.buf);
>
> Is this message appended to the ZIP file or does it go to stdout?

It goes to `stdout`, this is for the user who runs `scalar diagnose`.

Hmm.

Now that you pointed it out, I think I want it to go to `stderr` instead.

> In any case: mixing write(2) and stdio(3) is not a good idea.  Using
> fwrite(3) instead of write_or_die above and doing the stdout dup(2)
> dance only tightly around the write_archive call would help, I think.

Sure, but let's print this message to `stderr` instead, that'll be much
cleaner, right?

Alternatively, I think I'd rather move the `printf()` below...

>
> > +
> > +diagnose_cleanup:
> > +	if (archiver_fd >= 0) {
> > +		close(1);
> > +		dup2(stdout_fd, 1);
> > +	}

... this re-redirection.

What do you think? `stdout` or `stderr`?

Thank you for your review!
Dscho

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-07 19:55     ` René Scharfe
  2022-02-07 23:30       ` Junio C Hamano
@ 2022-02-08 12:54       ` Johannes Schindelin
  1 sibling, 0 replies; 140+ messages in thread
From: Johannes Schindelin @ 2022-02-08 12:54 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin via GitGitGadget, git, Taylor Blau,
	Derrick Stolee, Elijah Newren

[-- Attachment #1: Type: text/plain, Size: 6224 bytes --]

Hi René,

On Mon, 7 Feb 2022, René Scharfe wrote:

> Am 06.02.22 um 23:39 schrieb Johannes Schindelin via GitGitGadget:
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > With the `--add-file-with-content=<path>:<content>` option, `git
> > archive` now supports use cases where relatively trivial files need to
> > be added that do not exist on disk.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  Documentation/git-archive.txt | 11 ++++++++
> >  archive.c                     | 51 +++++++++++++++++++++++++++++------
> >  t/t5003-archive-zip.sh        | 12 +++++++++
> >  3 files changed, 66 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
> > index bc4e76a7834..1b52a0a65a1 100644
> > --- a/Documentation/git-archive.txt
> > +++ b/Documentation/git-archive.txt
> > @@ -61,6 +61,17 @@ OPTIONS
> >  	by concatenating the value for `--prefix` (if any) and the
> >  	basename of <file>.
> >
> > +--add-file-with-content=<path>:<content>::
> > +	Add the specified contents to the archive.  Can be repeated to add
> > +	multiple files.  The path of the file in the archive is built
> > +	by concatenating the value for `--prefix` (if any) and the
> > +	basename of <file>.
> > ++
> > +The `<path>` cannot contain any colon, the file mode is limited to
> > +a regular file, and the option may be subject platform-dependent
>
> s/subject/& to/

Thanks.

> > +command-line limits. For non-trivial cases, write an untracked file
> > +and use `--add-file` instead.
> > +
>
> We could use that option in Git's own Makefile to add the file named
> "version", which contains $GIT_VERSION.

We could do that, that opportunity is a side effect of this patch series.

> Hmm, but it also contains a terminating newline, which would be a bit
> tricky (but not impossible) to add.  Would it make sense to add one
> automatically if it's missing (e.g. with strbuf_complete_line)?  Not
> sure.

It is really easy:

	LF='
	'

	git archive --add-file-with-content=version:"$GIT_VERSION$LF" ...

(That's shell script, in the Makefile it would need those `\`
continuations.)

> >  --worktree-attributes::
> >  	Look for attributes in .gitattributes files in the working tree
> >  	as well (see <<ATTRIBUTES>>).
> > diff --git a/archive.c b/archive.c
> > index a3bbb091256..172efd690c3 100644
> > --- a/archive.c
> > +++ b/archive.c
> > @@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
> >  struct extra_file_info {
> >  	char *base;
> >  	struct stat stat;
> > +	void *content;
> >  };
> >
> >  int write_archive_entries(struct archiver_args *args,
> > @@ -337,7 +338,13 @@ int write_archive_entries(struct archiver_args *args,
> >  		strbuf_addstr(&path_in_archive, basename(path));
> >
> >  		strbuf_reset(&content);
> > -		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
> > +		if (info->content)
> > +			err = write_entry(args, &fake_oid, path_in_archive.buf,
> > +					  path_in_archive.len,
> > +					  info->stat.st_mode,
> > +					  info->content, info->stat.st_size);
> > +		else if (strbuf_read_file(&content, path,
> > +					  info->stat.st_size) < 0)
> >  			err = error_errno(_("could not read '%s'"), path);
> >  		else
> >  			err = write_entry(args, &fake_oid, path_in_archive.buf,
> > @@ -493,6 +500,7 @@ static void extra_file_info_clear(void *util, const char *str)
> >  {
> >  	struct extra_file_info *info = util;
> >  	free(info->base);
> > +	free(info->content);
> >  	free(info);
> >  }
> >
> > @@ -514,14 +522,38 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
> >  	if (!arg)
> >  		return -1;
> >
> > -	path = prefix_filename(args->prefix, arg);
> > -	item = string_list_append_nodup(&args->extra_files, path);
> > -	item->util = info = xmalloc(sizeof(*info));
> > +	info = xmalloc(sizeof(*info));
> >  	info->base = xstrdup_or_null(base);
> > -	if (stat(path, &info->stat))
> > -		die(_("File not found: %s"), path);
> > -	if (!S_ISREG(info->stat.st_mode))
> > -		die(_("Not a regular file: %s"), path);
> > +
> > +	if (strcmp(opt->long_name, "add-file-with-content")) {
>
> Equivalent to:
>
> 	if (!strcmp(opt->long_name, "add-file")) {
>
> I mention that because the inequality check confused me a bit at first.

Good point. For some reason I thought it would be clearer to handle
everything but `--add-file-with-content` here, but that "everything but"
is only `--add-file`, so I sowed more confusion. Sorry about that.

>
> > +		path = prefix_filename(args->prefix, arg);
> > +		if (stat(path, &info->stat))
> > +			die(_("File not found: %s"), path);
> > +		if (!S_ISREG(info->stat.st_mode))
> > +			die(_("Not a regular file: %s"), path);
> > +		info->content = NULL; /* read the file later */
> > +	} else {
> > +		const char *colon = strchr(arg, ':');
> > +		char *p;
> > +
> > +		if (!colon)
> > +			die(_("missing colon: '%s'"), arg);
> > +
> > +		p = xstrndup(arg, colon - arg);
> > +		if (!args->prefix)
> > +			path = p;
> > +		else {
> > +			path = prefix_filename(args->prefix, p);
> > +			free(p);
> > +		}
> > +		memset(&info->stat, 0, sizeof(info->stat));
> > +		info->stat.st_mode = S_IFREG | 0644;
> > +		info->content = xstrdup(colon + 1);
> > +		info->stat.st_size = strlen(info->content);
> > +	}
> > +	item = string_list_append_nodup(&args->extra_files, path);
> > +	item->util = info;
> > +
> >  	return 0;
> >  }
> >
> > @@ -554,6 +586,9 @@ static int parse_archive_args(int argc, const char **argv,
> >  		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
> >  		  N_("add untracked file to archive"), 0, add_file_cb,
> >  		  (intptr_t)&base },
> > +		{ OPTION_CALLBACK, 0, "add-file-with-content", args,
> > +		  N_("file"), N_("add untracked file to archive"), 0,
>                       ^^^^
> "<file>" seems wrong, because there is no actual file.  It should rather
> be "<name>:<content>" for the virtual one, right?

Or `<path>:<content>`. Yes.

Again, thank you for your clear and helpful review,
Dscho

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-07 23:30       ` Junio C Hamano
@ 2022-02-08 13:12         ` Johannes Schindelin
  2022-02-08 17:44           ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Johannes Schindelin @ 2022-02-08 13:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: René Scharfe, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

[-- Attachment #1: Type: text/plain, Size: 2099 bytes --]

Hi Junio,

On Mon, 7 Feb 2022, Junio C Hamano wrote:

> René Scharfe <l.s.r@web.de> writes:
>
> > We could use that option in Git's own Makefile to add the file named
> > "version", which contains $GIT_VERSION.  Hmm, but it also contains a
> > terminating newline, which would be a bit tricky (but not impossible) to
> > add.  Would it make sense to add one automatically if it's missing (e.g.
> > with strbuf_complete_line)?  Not sure.
>
> I do not think it is a good UI to give raw file content from the
> command line, which will be usable only for trivial, even single
> liner files, and forces people to learn two parallel option, one
> for trivial ones and the other for contents with meaningful size.

Nevertheless, it is still the most elegant way that I can think of to
generate a diagnostic `.zip` file without messing up the very things that
are to be diagnosed: the repository and the worktree.

> "--add-blob=<path>:<blob-object-name>" may be another option, useful
> when you have done "hash-object -w" already, and can be used to add
> single-liner, or an entire novel.

This would mess with the repository. Granted, it is unlikely that adding a
tiny blob will all of a sudden work around a bug that the user wanted to
report, but less big mutations have been known to subtly change a bug's
manifested symptoms.

So I really do not want to do that, not in `scalar diagnose.

> In any case, "--add-file=<file>", which we already have, would be
> more appropriate feature to use to record our "version" file, so
> there is no need to change our Makefile for it.

Same here. It is bad enough that `scalar diagnose` has to create a
directory in the current enlistment. Let's not make the situation even
worse.

The most elegant solution would have been that streaming `--add-file` mode
suggested by René, I think, but that's too involved to implement just to
benefit `scalar diagnose`. It's not like we can simply stream the contents
via `stdin`, as there are more than one "virtual" file we need to add to
that `.zip` file.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-08 13:12         ` Johannes Schindelin
@ 2022-02-08 17:44           ` Junio C Hamano
  2022-02-08 20:58             ` René Scharfe
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-02-08 17:44 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: René Scharfe, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> > We could use that option in Git's own Makefile to add the file named
>> > "version", which contains $GIT_VERSION.  Hmm, but it also contains a
>> > terminating newline, which would be a bit tricky (but not impossible) to
>> > add.  Would it make sense to add one automatically if it's missing (e.g.
>> > with strbuf_complete_line)?  Not sure.
>>
>> I do not think it is a good UI to give raw file content from the
>> command line, which will be usable only for trivial, even single
>> liner files, and forces people to learn two parallel option, one
>> for trivial ones and the other for contents with meaningful size.
>
> Nevertheless, it is still the most elegant way that I can think of to
> generate a diagnostic `.zip` file without messing up the very things that
> are to be diagnosed: the repository and the worktree.

Puzzled.  Are you feeding contents of a .zip file from the command
line?

I was mostly worried about busting command line argument limit by
trying to feed too many bytes, as the ceiling is fairly low on some
platforms.  Another worry was that when <contents> can have
arbitrary bytes, with --opt=<path>:<contents> syntax, the input
becomes ambiguous (i.e. "which colon is the <path> separator?"),
without some way to escape a colon in the payload.

For a single-liner, --add-file-with-contents=<path>:<contents> would
be an OK way, and my comment was not a strong objection against this
new option existing.  It was primarily an objection against changing
the way to add the 'version' file in our "make dist" procedure to
use it anyway.

But now I think about it more, I am becoming less happy about it
existing in the first place.

This will throw another monkey wrench to Konstantin's plan [*] to
make "git archive" output verifiable with the signature on original
Git objects, but it is not a new problem ;-)


[Reference]

* https://lore.kernel.org/git/20220207213449.ljqjhdx4f45a3lx5@meerkat.local/

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-08 17:44           ` Junio C Hamano
@ 2022-02-08 20:58             ` René Scharfe
  2022-02-09 22:48               ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: René Scharfe @ 2022-02-08 20:58 UTC (permalink / raw)
  To: Junio C Hamano, Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, Taylor Blau,
	Derrick Stolee, Elijah Newren

Am 08.02.22 um 18:44 schrieb Junio C Hamano:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>>>> We could use that option in Git's own Makefile to add the file named
>>>> "version", which contains $GIT_VERSION.  Hmm, but it also contains a
>>>> terminating newline, which would be a bit tricky (but not impossible) to
>>>> add.  Would it make sense to add one automatically if it's missing (e.g.
>>>> with strbuf_complete_line)?  Not sure.
>>>
>>> I do not think it is a good UI to give raw file content from the
>>> command line, which will be usable only for trivial, even single
>>> liner files, and forces people to learn two parallel option, one
>>> for trivial ones and the other for contents with meaningful size.
>>
>> Nevertheless, it is still the most elegant way that I can think of to
>> generate a diagnostic `.zip` file without messing up the very things that
>> are to be diagnosed: the repository and the worktree.
>
> Puzzled.  Are you feeding contents of a .zip file from the command
> line?

Kind of.  Command line arguments are built and handed to write_archive()
in-process.  It's done by patch 3 and extended by 5 and 6.

The number of files is relatively low and they aren't huge, right?
Staging their content in the object database would be messy, but $TMPDIR
might be able to take them with a low impact.  Unless the problem to
diagnose is that this directory is full -- but you don't need a fancy
report for that. :)

Currently there is no easy way to write a temporary file with a chosen
name.  diff.c would benefit from such a thing when running an external
diff program; currently it adds a random prefix.  git archive --add-file
also uses the filename (and discards the directory part).  The patch
below adds a function to create temporary files with a chosen name.
Perhaps it would be useful here as well, instead of the new option?

> I was mostly worried about busting command line argument limit by
> trying to feed too many bytes, as the ceiling is fairly low on some
> platforms.

Command line length limits don't apply to the way scalar uses the new
option.

> Another worry was that when <contents> can have
> arbitrary bytes, with --opt=<path>:<contents> syntax, the input
> becomes ambiguous (i.e. "which colon is the <path> separator?"),
> without some way to escape a colon in the payload.

The first colon is the separator here.

> For a single-liner, --add-file-with-contents=<path>:<contents> would
> be an OK way, and my comment was not a strong objection against this
> new option existing.  It was primarily an objection against changing
> the way to add the 'version' file in our "make dist" procedure to
> use it anyway.
>
> But now I think about it more, I am becoming less happy about it
> existing in the first place.
>
> This will throw another monkey wrench to Konstantin's plan [*] to
> make "git archive" output verifiable with the signature on original
> Git objects, but it is not a new problem ;-)
>
>
> [Reference]
>
> * https://lore.kernel.org/git/20220207213449.ljqjhdx4f45a3lx5@meerkat.local/

I don't see the conflict: If an untracked file is added to an archive
using --add-file, --add-file-with-content, or ZIP or tar then we'd
*want* the verification against a signed commit or tag to fail, no?  A
different signature would be required for the non-tracked parts.

René


--- >8 ---
Subject: [PATCH] tempfile: add mks_tempfile_dt()

Add a function to create a temporary file with a certain name in a
temporary directory created using mkdtemp(3).  Its result is more
sightly than the paths created by mks_tempfile_ts(), which include
a random prefix.  That's useful for files passed to a program that
displays their name, e.g. an external diff tool.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 tempfile.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tempfile.h | 13 +++++++++++
 2 files changed, 76 insertions(+)

diff --git a/tempfile.c b/tempfile.c
index 94aa18f3f7..2024c82691 100644
--- a/tempfile.c
+++ b/tempfile.c
@@ -56,6 +56,20 @@

 static VOLATILE_LIST_HEAD(tempfile_list);

+static void remove_template_directory(struct tempfile *tempfile,
+				      int in_signal_handler)
+{
+	if (tempfile->directorylen > 0 &&
+	    tempfile->directorylen < tempfile->filename.len &&
+	    tempfile->filename.buf[tempfile->directorylen] == '/') {
+		strbuf_setlen(&tempfile->filename, tempfile->directorylen);
+		if (in_signal_handler)
+			rmdir(tempfile->filename.buf);
+		else
+			rmdir_or_warn(tempfile->filename.buf);
+	}
+}
+
 static void remove_tempfiles(int in_signal_handler)
 {
 	pid_t me = getpid();
@@ -74,6 +88,7 @@ static void remove_tempfiles(int in_signal_handler)
 			unlink(p->filename.buf);
 		else
 			unlink_or_warn(p->filename.buf);
+		remove_template_directory(p, in_signal_handler);

 		p->active = 0;
 	}
@@ -100,6 +115,7 @@ static struct tempfile *new_tempfile(void)
 	tempfile->owner = 0;
 	INIT_LIST_HEAD(&tempfile->list);
 	strbuf_init(&tempfile->filename, 0);
+	tempfile->directorylen = 0;
 	return tempfile;
 }

@@ -198,6 +214,52 @@ struct tempfile *mks_tempfile_tsm(const char *filename_template, int suffixlen,
 	return tempfile;
 }

+struct tempfile *mks_tempfile_dt(const char *directory_template,
+				 const char *filename)
+{
+	struct tempfile *tempfile;
+	const char *tmpdir;
+	struct strbuf sb = STRBUF_INIT;
+	int fd;
+	size_t directorylen;
+
+	if (!ends_with(directory_template, "XXXXXX")) {
+		errno = EINVAL;
+		return NULL;
+	}
+
+	tmpdir = getenv("TMPDIR");
+	if (!tmpdir)
+		tmpdir = "/tmp";
+
+	strbuf_addf(&sb, "%s/%s", tmpdir, directory_template);
+	directorylen = sb.len;
+	if (!mkdtemp(sb.buf)) {
+		int orig_errno = errno;
+		strbuf_release(&sb);
+		errno = orig_errno;
+		return NULL;
+	}
+
+	strbuf_addf(&sb, "/%s", filename);
+	fd = open(sb.buf, O_CREAT | O_EXCL | O_RDWR, 0600);
+	if (fd < 0) {
+		int orig_errno = errno;
+		strbuf_setlen(&sb, directorylen);
+		rmdir(sb.buf);
+		strbuf_release(&sb);
+		errno = orig_errno;
+		return NULL;
+	}
+
+	tempfile = new_tempfile();
+	strbuf_swap(&tempfile->filename, &sb);
+	tempfile->directorylen = directorylen;
+	tempfile->fd = fd;
+	activate_tempfile(tempfile);
+	return tempfile;
+}
+
 struct tempfile *xmks_tempfile_m(const char *filename_template, int mode)
 {
 	struct tempfile *tempfile;
@@ -316,6 +378,7 @@ void delete_tempfile(struct tempfile **tempfile_p)

 	close_tempfile_gently(tempfile);
 	unlink_or_warn(tempfile->filename.buf);
+	remove_template_directory(tempfile, 0);
 	deactivate_tempfile(tempfile);
 	*tempfile_p = NULL;
 }
diff --git a/tempfile.h b/tempfile.h
index 4de3bc77d2..d7804a214a 100644
--- a/tempfile.h
+++ b/tempfile.h
@@ -82,6 +82,7 @@ struct tempfile {
 	FILE *volatile fp;
 	volatile pid_t owner;
 	struct strbuf filename;
+	size_t directorylen;
 };

 /*
@@ -198,6 +199,18 @@ static inline struct tempfile *xmks_tempfile(const char *filename_template)
 	return xmks_tempfile_m(filename_template, 0600);
 }

+/*
+ * Attempt to create a temporary directory in $TMPDIR and to create and
+ * open a file in that new directory. Derive the directory name from the
+ * template in the manner of mkdtemp(). Arrange for directory and file
+ * to be deleted if the program exits before they are deleted
+ * explicitly. On success return a tempfile whose "filename" member
+ * contains the full path of the file and its "fd" member is open for
+ * writing the file. On error return NULL and set errno appropriately.
+ */
+struct tempfile *mks_tempfile_dt(const char *directory_template,
+				 const char *filename);
+
 /*
  * Associate a stdio stream with the temporary file (which must still
  * be open). Return `NULL` (*without* deleting the file) on error. The
--
2.35.1

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-08 20:58             ` René Scharfe
@ 2022-02-09 22:48               ` Junio C Hamano
  2022-02-10 19:10                 ` René Scharfe
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-02-09 22:48 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

René Scharfe <l.s.r@web.de> writes:

>>> Nevertheless, it is still the most elegant way that I can think of to
>>> generate a diagnostic `.zip` file without messing up the very things that
>>> are to be diagnosed: the repository and the worktree.
>>
>> Puzzled.  Are you feeding contents of a .zip file from the command
>> line?
>
> Kind of.  Command line arguments are built and handed to write_archive()
> in-process.  It's done by patch 3 and extended by 5 and 6.

I meant to ask if this is doing

    git archive --store-contents-at-path="report.zip:$(cat diag.zip)"

as I misunderstood what 'the diagnostic .zip file' referred to.
That was a reference to the output of the "git archive" command.

> The number of files is relatively low and they aren't huge, right?

As long as it is expected to fit on the command line, that's fine.
But if the question is "it is OK to add a new option with known
limitation", then it should be stated a bit differently.

"We add this option for use cases where we handle only small number
of one-liner files", and it is OK.  We may however want to do
something imilar to what we do to the "-m '<message>'" option used
by "git commit" and "git merge", i.e. add the final LF when it is
missing to make it a complete line, to hint the fact that this is
meant to add a small number of single liner files.

>> Another worry was that when <contents> can have
>> arbitrary bytes, with --opt=<path>:<contents> syntax, the input
>> becomes ambiguous (i.e. "which colon is the <path> separator?"),
>> without some way to escape a colon in the payload.
>
> The first colon is the separator here.

Meaning you cannot have a colon in the path, which is not exactly
pleasing limitation.  I know you may not be able to do so on Windows
or CIFS mounted on non-Windows, but we do not limit ourselves to
portable filename character set (POSIX.1 3.282), either.

>> This will throw another monkey wrench to Konstantin's plan [*] to
>> make "git archive" output verifiable with the signature on original
>> Git objects, but it is not a new problem ;-)
>>
>>
>> [Reference]
>>
>> * https://lore.kernel.org/git/20220207213449.ljqjhdx4f45a3lx5@meerkat.local/
>
> I don't see the conflict: If an untracked file is added to an archive
> using --add-file, --add-file-with-content, or ZIP or tar then we'd
> *want* the verification against a signed commit or tag to fail, no?  A
> different signature would be required for the non-tracked parts.

Yes, which is exactly how this (and existing --add-file) makes
Konstantin's plan much less useful.

Thanks.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-09 22:48               ` Junio C Hamano
@ 2022-02-10 19:10                 ` René Scharfe
  2022-02-10 19:23                   ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: René Scharfe @ 2022-02-10 19:10 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

Am 09.02.22 um 23:48 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> The number of files is relatively low and they aren't huge, right?
>
> As long as it is expected to fit on the command line, that's fine.
> But if the question is "it is OK to add a new option with known
> limitation", then it should be stated a bit differently.

I asked this question to find out if writing the files to $TMPDIR and
adding them with --add-file instead of with --add-file-with-content
would be feasible in patches 3 to 6.  git archive would not have to be
changed in that case.

>>> This will throw another monkey wrench to Konstantin's plan [*] to
>>> make "git archive" output verifiable with the signature on original
>>> Git objects, but it is not a new problem ;-)
>>>
>>>
>>> [Reference]
>>>
>>> * https://lore.kernel.org/git/20220207213449.ljqjhdx4f45a3lx5@meerkat.local/
>>
>> I don't see the conflict: If an untracked file is added to an archive
>> using --add-file, --add-file-with-content, or ZIP or tar then we'd
>> *want* the verification against a signed commit or tag to fail, no?  A
>> different signature would be required for the non-tracked parts.
>
> Yes, which is exactly how this (and existing --add-file) makes
> Konstantin's plan much less useful.
People added untracked files to archives before --add-file existed.

--add-file-with-content could be used to add the .GIT_ARCHIVE_SIG file.

Additional untracked files would need a manifest to specify which files
are (not) covered by the signed commit/tag.  Or the .GIT_ARCHIVE_SIG
files could be added just after the signed files as a rule, before any
other untracked files, as some kind of a separator.

Just listing untracked files and verifying the others might still be
useful.  Warning about untracked files shadowing tracked ones would be
very useful.

Some equivalent to the .GIT_ARCHIVE_SIG file containing a signature of
the untracked files could optionally be added at the end to allow full
verification -- but would require signing at archive creation time.

René

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-10 19:10                 ` René Scharfe
@ 2022-02-10 19:23                   ` Junio C Hamano
  2022-02-11 19:16                     ` René Scharfe
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-02-10 19:23 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

René Scharfe <l.s.r@web.de> writes:

>> Yes, which is exactly how this (and existing --add-file) makes
>> Konstantin's plan much less useful.
> People added untracked files to archives before --add-file existed.
>
> --add-file-with-content could be used to add the .GIT_ARCHIVE_SIG file.
>
> Additional untracked files would need a manifest to specify which files
> are (not) covered by the signed commit/tag.  Or the .GIT_ARCHIVE_SIG
> files could be added just after the signed files as a rule, before any
> other untracked files, as some kind of a separator.

Or if people do not _exclude_ tracked files from the archive, then
the verifier who has a tarball and a Git tree object can consult the
tree object to see which ones are added untracked cruft.

> Just listing untracked files and verifying the others might still be
> useful.  Warning about untracked files shadowing tracked ones would be
> very useful.

Yup.

> Some equivalent to the .GIT_ARCHIVE_SIG file containing a signature of
> the untracked files could optionally be added at the end to allow full
> verification -- but would require signing at archive creation time.

Yeah, and at that point, it is not much more convenient than just
signing the whole archive (sans the SIG part, obviously), which is
what people have always done ;-)

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-10 19:23                   ` Junio C Hamano
@ 2022-02-11 19:16                     ` René Scharfe
  2022-02-11 21:27                       ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: René Scharfe @ 2022-02-11 19:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

Am 10.02.22 um 20:23 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>> Yes, which is exactly how this (and existing --add-file) makes
>>> Konstantin's plan much less useful.

A harder obstacle to verification would be end-of-line conversion.
Retrying a failed signature check after applying convert_to_git() might
work, but not for files that have mixed line endings in the repository
and end up being homogenized during checkout (and thus archiving).

>> People added untracked files to archives before --add-file existed.
>>
>> --add-file-with-content could be used to add the .GIT_ARCHIVE_SIG file.
>>
>> Additional untracked files would need a manifest to specify which files
>> are (not) covered by the signed commit/tag.  Or the .GIT_ARCHIVE_SIG
>> files could be added just after the signed files as a rule, before any
>> other untracked files, as some kind of a separator.
>
> Or if people do not _exclude_ tracked files from the archive, then
> the verifier who has a tarball and a Git tree object can consult the
> tree object to see which ones are added untracked cruft.

True, but if you have the tree objects then you probably also have the
blobs and don't need the archive?  Or is this some kind of sparse
checkout scenario?

>> Some equivalent to the .GIT_ARCHIVE_SIG file containing a signature of
>> the untracked files could optionally be added at the end to allow full
>> verification -- but would require signing at archive creation time.
>
> Yeah, and at that point, it is not much more convenient than just
> signing the whole archive (sans the SIG part, obviously), which is
> what people have always done ;-)

Indeed.

René

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-11 19:16                     ` René Scharfe
@ 2022-02-11 21:27                       ` Junio C Hamano
  2022-02-12  9:12                         ` René Scharfe
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-02-11 21:27 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

René Scharfe <l.s.r@web.de> writes:

>> Or if people do not _exclude_ tracked files from the archive, then
>> the verifier who has a tarball and a Git tree object can consult the
>> tree object to see which ones are added untracked cruft.
>
> True, but if you have the tree objects then you probably also have the
> blobs and don't need the archive?  Or is this some kind of sparse
> checkout scenario?

My phrasing was too loose.  This is a "how to verify a distro
tarball" (without having a copy of the project repository, but with
some common tools like "git") scenario.

The verifier has a tarball.  In addition, the verifier knows the
object name of the Git tree object the tarball was taken from, and
somehow trusts that the object name is genuine.  We can do either
"untar + git-add . && git write-tree" or its equivalent to see how
the contents hashes to the expected tree (or not).

How the verifier trusts the object name is out of scope (it may come
from a copy of a signed tag object and a copy of the commit object
that the tag points at and the contents of signed tag object, with
its known format, would allow you to write a stand alone tool to
verify the PGP signature).

Line-end normalization and smudge filter rules may get in the way,
if we truly did "untar" to the filesystem, but I thought "git
archive" didn't do smudge conversion and core.crlf handling when
creating the archive?



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-11 21:27                       ` Junio C Hamano
@ 2022-02-12  9:12                         ` René Scharfe
  2022-02-13  6:25                           ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: René Scharfe @ 2022-02-12  9:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

Am 11.02.22 um 22:27 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>> Or if people do not _exclude_ tracked files from the archive, then
>>> the verifier who has a tarball and a Git tree object can consult the
>>> tree object to see which ones are added untracked cruft.
>>
>> True, but if you have the tree objects then you probably also have the
>> blobs and don't need the archive?  Or is this some kind of sparse
>> checkout scenario?
>
> My phrasing was too loose.  This is a "how to verify a distro
> tarball" (without having a copy of the project repository, but with
> some common tools like "git") scenario.
>
> The verifier has a tarball.  In addition, the verifier knows the
> object name of the Git tree object the tarball was taken from, and
> somehow trusts that the object name is genuine.  We can do either
> "untar + git-add . && git write-tree" or its equivalent to see how
> the contents hashes to the expected tree (or not).
>
> How the verifier trusts the object name is out of scope (it may come
> from a copy of a signed tag object and a copy of the commit object
> that the tag points at and the contents of signed tag object, with
> its known format, would allow you to write a stand alone tool to
> verify the PGP signature).

Right, but the tree hash does not directly allow to see which objects
are tracked or not.  This information is necessary to reconstruct the
signed tree.  (Having tracked files first, then the signature file and
then untracked files in the archive would be an easy way to transmit
it.)

> Line-end normalization and smudge filter rules may get in the way,
> if we truly did "untar" to the filesystem, but I thought "git
> archive" didn't do smudge conversion and core.crlf handling when
> creating the archive?

git archive uses convert_to_working_tree() to archive the same file
contents as tar or zip would.

René

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-12  9:12                         ` René Scharfe
@ 2022-02-13  6:25                           ` Junio C Hamano
  2022-02-13  9:02                             ` René Scharfe
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-02-13  6:25 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

René Scharfe <l.s.r@web.de> writes:

>> The verifier has a tarball.  In addition, the verifier knows the
>> object name of the Git tree object the tarball was taken from, and
>> somehow trusts that the object name is genuine.  We can do either
>> "untar + git-add . && git write-tree" or its equivalent to see how
>> the contents hashes to the expected tree (or not).
> ...
> Right, but the tree hash does not directly allow to see which objects
> are tracked or not.

Ah, of course---it was silly of me to overlook this obvious fact X-<.
So we do need some extra "manifest" to declare what's untracked etc.,
if we allow --add-file etc. to munge the tree when creating a tarball
out of it.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-13  6:25                           ` Junio C Hamano
@ 2022-02-13  9:02                             ` René Scharfe
  2022-02-14 17:22                               ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: René Scharfe @ 2022-02-13  9:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

Am 13.02.22 um 07:25 schrieb Junio C Hamano:
>
> So we do need some extra "manifest" to declare what's untracked etc.,
> if we allow --add-file etc. to munge the tree when creating a tarball
> out of it.

Right, or get that information from the order of files in the archive,
by having tracked files come first, then the signature file with a
certain name and then untracked files.

René

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-13  9:02                             ` René Scharfe
@ 2022-02-14 17:22                               ` Junio C Hamano
  0 siblings, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2022-02-14 17:22 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

René Scharfe <l.s.r@web.de> writes:

> Am 13.02.22 um 07:25 schrieb Junio C Hamano:
>>
>> So we do need some extra "manifest" to declare what's untracked etc.,
>> if we allow --add-file etc. to munge the tree when creating a tarball
>> out of it.
>
> Right, or get that information from the order of files in the archive,
> by having tracked files come first, then the signature file with a
> certain name and then untracked files.

That sounds like a workable approach, modulo that the details of the
"signature file with a certain name" part needs to be worked out.

We should make sure that we clearly document that "--add-file=" and
friends add their material after the contents that come from the
tree-ish, and make sure that the program does so and will stay doing
so.  Otherwise users cannot easily create an archive that follows
the above rule.

Thanks.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 0/7] scalar: implement the subcommand "diagnose"
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
                     ` (5 preceding siblings ...)
  2022-02-06 22:39   ` [PATCH v2 6/6] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
@ 2022-05-04 15:25   ` Johannes Schindelin via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
                       ` (8 more replies)
  6 siblings, 9 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-04 15:25 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin

Over the course of the years, we developed a sub-command that gathers
diagnostic data into a .zip file that can then be attached to bug reports.
This sub-command turned out to be very useful in helping Scalar developers
identify and fix issues.

Changes since v2:

 * Clarified in the commit message what the biggest benefit of
   --add-file-with-content is.
 * The <path> part of the -add-file-with-content argument can now contain
   colons. To do this, the path needs to start and end in double-quote
   characters (which are stripped), and the backslash serves as escape
   character in that case (to allow the path to contain both colons and
   double-quotes).
 * Fixed incorrect grammar.
 * Instead of strcmp(<what-we-don't-want>), we now say
   !strcmp(<what-we-want>).
 * The help text for --add-file-with-content was improved a tiny bit.
 * Adjusted the commit message that still talked about spawning plenty of
   processes and about a throw-away repository for the sake of generating a
   .zip file.
 * Simplified the code that shows the diagnostics and adds them to the .zip
   file.
 * The final message that reports that the archive is complete is now
   printed to stderr instead of stdout.

Changes since v1:

 * Instead of creating a throw-away repository, staging the contents of the
   .zip file and then using git write-tree and git archive to write the .zip
   file, the patch series now introduces a new option to git archive and
   uses write_archive() directly (avoiding any separate process).
 * Since the command avoids separate processes, it is now blazing fast on
   Windows, and I dropped the spinner() function because it's no longer
   needed.
 * While reworking the test case, I noticed that scalar [...] <enlistment>
   failed to verify that the specified directory exists, and would happily
   "traverse to its parent directory" on its quest to find a Scalar
   enlistment. That is of course incorrect, and has been fixed as a "while
   at it" sort of preparatory commit.
 * I had forgotten to sign off on all the commits, which has been fixed.
 * Instead of some "home-grown" readdir()-based function, the code now uses
   for_each_file_in_pack_dir() to look through the pack directories.
 * If any alternates are configured, their pack directories are now included
   in the output.
 * The commit message that might be interpreted to promise information about
   large loose files has been corrected to no longer promise that.
 * The test cases have been adjusted to test a little bit more (e.g.
   verifying that specific paths are mentioned in the output, instead of
   merely verifying that the output is non-empty).

Johannes Schindelin (5):
  archive: optionally add "virtual" files
  archive --add-file-with-contents: allow paths containing colons
  scalar: validate the optional enlistment argument
  Implement `scalar diagnose`
  scalar diagnose: include disk space information

Matthew John Cheetham (2):
  scalar: teach `diagnose` to gather packfile info
  scalar: teach `diagnose` to gather loose objects information

 Documentation/git-archive.txt    |  16 ++
 archive.c                        |  75 +++++++-
 contrib/scalar/scalar.c          | 289 ++++++++++++++++++++++++++++++-
 contrib/scalar/scalar.txt        |  12 ++
 contrib/scalar/t/t9099-scalar.sh |  27 +++
 t/t5003-archive-zip.sh           |  20 +++
 6 files changed, 429 insertions(+), 10 deletions(-)


base-commit: ddc35d833dd6f9e8946b09cecd3311b8aa18d295
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1128%2Fdscho%2Fscalar-diagnose-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1128/dscho/scalar-diagnose-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1128

Range-diff vs v2:

 1:  49ff3c1f2b3 ! 1:  45662cf582a archive: optionally add "virtual" files
     @@ Commit message
          archive` now supports use cases where relatively trivial files need to
          be added that do not exist on disk.
      
     +    This will allow us to generate `.zip` files with generated content,
     +    without having to add said content to the object database and without
     +    having to write it out to disk.
     +
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## Documentation/git-archive.txt ##
     @@ Documentation/git-archive.txt: OPTIONS
      +	basename of <file>.
      ++
      +The `<path>` cannot contain any colon, the file mode is limited to
     -+a regular file, and the option may be subject platform-dependent
     ++a regular file, and the option may be subject to platform-dependent
      +command-line limits. For non-trivial cases, write an untracked file
      +and use `--add-file` instead.
      +
     @@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int
      -	if (!S_ISREG(info->stat.st_mode))
      -		die(_("Not a regular file: %s"), path);
      +
     -+	if (strcmp(opt->long_name, "add-file-with-content")) {
     ++	if (!strcmp(opt->long_name, "add-file")) {
      +		path = prefix_filename(args->prefix, arg);
      +		if (stat(path, &info->stat))
      +			die(_("File not found: %s"), path);
     @@ archive.c: static int parse_archive_args(int argc, const char **argv,
       		  N_("add untracked file to archive"), 0, add_file_cb,
       		  (intptr_t)&base },
      +		{ OPTION_CALLBACK, 0, "add-file-with-content", args,
     -+		  N_("file"), N_("add untracked file to archive"), 0,
     ++		  N_("path:content"), N_("add untracked file to archive"), 0,
      +		  add_file_cb, (intptr_t)&base },
       		OPT_STRING('o', "output", &output, N_("file"),
       			N_("write the archive to this file")),
 -:  ----------- > 2:  ce4b1b680c9 archive --add-file-with-contents: allow paths containing colons
 2:  600da8d465e = 3:  5a3eeb55409 scalar: validate the optional enlistment argument
 3:  0d570137bb6 ! 4:  dfe821d10fe Implement `scalar diagnose`
     @@ Commit message
          we had the luxury of a comprehensive standard library that includes
          basic functionality such as writing a `.zip` file. In the C version, we
          lack such a commodity. Rather than introducing a dependency on, say,
     -    libzip, we slightly abuse Git's `archive` command: Instead of writing
     -    the `.zip` file directly, we stage the file contents in a Git index of a
     -    temporary, bare repository, only to let `git archive` have at it, and
     -    finally removing the temporary repository.
     -
     -    Also note: Due to the frequently-spawned `git hash-object` processes,
     -    this command is quite a bit slow on Windows. Should it turn out to be a
     -    big problem, the lack of a batch mode of the `hash-object` command could
     -    potentially be worked around via using `git fast-import` with a crafted
     -    `stdin`.
     +    libzip, we slightly abuse Git's `archive` machinery: we write out a
     +    `.zip` of the empty try, augmented by a couple files that are added via
     +    the `--add-file*` options. We are careful trying not to modify the
     +    current repository in any way lest the very circumstances that required
     +    `scalar diagnose` to be run are changed by the `diagnose` run itself.
      
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
     @@ contrib/scalar/scalar.c: cleanup:
      +	time_t now = time(NULL);
      +	struct tm tm;
      +	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
     -+	size_t off;
      +	int res = 0;
      +
      +	argc = parse_options(argc, argv, NULL, options,
     @@ contrib/scalar/scalar.c: cleanup:
      +	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
      +
      +	strbuf_reset(&buf);
     -+	strbuf_addstr(&buf,
     -+		      "--add-file-with-content=diagnostics.log:"
     -+		      "Collecting diagnostic info\n\n");
     ++	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
      +	get_version_info(&buf, 1);
      +
      +	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
     -+	off = strchr(buf.buf, ':') + 1 - buf.buf;
     -+	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
     -+	strvec_push(&archiver_args, buf.buf);
     ++	write_or_die(stdout_fd, buf.buf, buf.len);
     ++	strvec_pushf(&archiver_args,
     ++		     "--add-file-with-content=diagnostics.log:%.*s",
     ++		     (int)buf.len, buf.buf);
      +
      +	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
      +	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
     @@ contrib/scalar/scalar.c: cleanup:
      +	}
      +
      +	if (!res)
     -+		printf("\n"
     ++		fprintf(stderr, "\n"
      +		       "Diagnostics complete.\n"
      +		       "All of the gathered info is captured in '%s'\n",
      +		       zip_path.buf);
 4:  938e38b5a09 ! 5:  bb162abd383 scalar diagnose: include disk space information
     @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
       
       	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
      +	get_disk_info(&buf);
     - 	off = strchr(buf.buf, ':') + 1 - buf.buf;
     - 	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
     - 	strvec_push(&archiver_args, buf.buf);
     + 	write_or_die(stdout_fd, buf.buf, buf.len);
     + 	strvec_pushf(&archiver_args,
     + 		     "--add-file-with-content=diagnostics.log:%.*s",
      
       ## contrib/scalar/t/t9099-scalar.sh ##
      @@ contrib/scalar/t/t9099-scalar.sh: SQ="'"
 5:  bd9428919fa ! 6:  32aaad7cce1 scalar: teach `diagnose` to gather packfile info
     @@ contrib/scalar/scalar.c: cleanup:
       {
       	struct option options[] = {
      @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
     - 	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
     - 	strvec_push(&archiver_args, buf.buf);
     + 		     "--add-file-with-content=diagnostics.log:%.*s",
     + 		     (int)buf.len, buf.buf);
       
      +	strbuf_reset(&buf);
      +	strbuf_addstr(&buf, "--add-file-with-content=packs-local.txt:");
 6:  7a8875be425 = 7:  322932f0bb8 scalar: teach `diagnose` to gather loose objects information

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 1/7] archive: optionally add "virtual" files
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
@ 2022-05-04 15:25     ` Johannes Schindelin via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
                       ` (7 subsequent siblings)
  8 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-04 15:25 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

With the `--add-file-with-content=<path>:<content>` option, `git
archive` now supports use cases where relatively trivial files need to
be added that do not exist on disk.

This will allow us to generate `.zip` files with generated content,
without having to add said content to the object database and without
having to write it out to disk.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 11 ++++++++
 archive.c                     | 51 +++++++++++++++++++++++++++++------
 t/t5003-archive-zip.sh        | 12 +++++++++
 3 files changed, 66 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index bc4e76a7834..a0edc9167b2 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -61,6 +61,17 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 
+--add-file-with-content=<path>:<content>::
+	Add the specified contents to the archive.  Can be repeated to add
+	multiple files.  The path of the file in the archive is built
+	by concatenating the value for `--prefix` (if any) and the
+	basename of <file>.
++
+The `<path>` cannot contain any colon, the file mode is limited to
+a regular file, and the option may be subject to platform-dependent
+command-line limits. For non-trivial cases, write an untracked file
+and use `--add-file` instead.
+
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
 	as well (see <<ATTRIBUTES>>).
diff --git a/archive.c b/archive.c
index a3bbb091256..d798624cd5f 100644
--- a/archive.c
+++ b/archive.c
@@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
 struct extra_file_info {
 	char *base;
 	struct stat stat;
+	void *content;
 };
 
 int write_archive_entries(struct archiver_args *args,
@@ -337,7 +338,13 @@ int write_archive_entries(struct archiver_args *args,
 		strbuf_addstr(&path_in_archive, basename(path));
 
 		strbuf_reset(&content);
-		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
+		if (info->content)
+			err = write_entry(args, &fake_oid, path_in_archive.buf,
+					  path_in_archive.len,
+					  info->stat.st_mode,
+					  info->content, info->stat.st_size);
+		else if (strbuf_read_file(&content, path,
+					  info->stat.st_size) < 0)
 			err = error_errno(_("could not read '%s'"), path);
 		else
 			err = write_entry(args, &fake_oid, path_in_archive.buf,
@@ -493,6 +500,7 @@ static void extra_file_info_clear(void *util, const char *str)
 {
 	struct extra_file_info *info = util;
 	free(info->base);
+	free(info->content);
 	free(info);
 }
 
@@ -514,14 +522,38 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 	if (!arg)
 		return -1;
 
-	path = prefix_filename(args->prefix, arg);
-	item = string_list_append_nodup(&args->extra_files, path);
-	item->util = info = xmalloc(sizeof(*info));
+	info = xmalloc(sizeof(*info));
 	info->base = xstrdup_or_null(base);
-	if (stat(path, &info->stat))
-		die(_("File not found: %s"), path);
-	if (!S_ISREG(info->stat.st_mode))
-		die(_("Not a regular file: %s"), path);
+
+	if (!strcmp(opt->long_name, "add-file")) {
+		path = prefix_filename(args->prefix, arg);
+		if (stat(path, &info->stat))
+			die(_("File not found: %s"), path);
+		if (!S_ISREG(info->stat.st_mode))
+			die(_("Not a regular file: %s"), path);
+		info->content = NULL; /* read the file later */
+	} else {
+		const char *colon = strchr(arg, ':');
+		char *p;
+
+		if (!colon)
+			die(_("missing colon: '%s'"), arg);
+
+		p = xstrndup(arg, colon - arg);
+		if (!args->prefix)
+			path = p;
+		else {
+			path = prefix_filename(args->prefix, p);
+			free(p);
+		}
+		memset(&info->stat, 0, sizeof(info->stat));
+		info->stat.st_mode = S_IFREG | 0644;
+		info->content = xstrdup(colon + 1);
+		info->stat.st_size = strlen(info->content);
+	}
+	item = string_list_append_nodup(&args->extra_files, path);
+	item->util = info;
+
 	return 0;
 }
 
@@ -554,6 +586,9 @@ static int parse_archive_args(int argc, const char **argv,
 		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
 		  N_("add untracked file to archive"), 0, add_file_cb,
 		  (intptr_t)&base },
+		{ OPTION_CALLBACK, 0, "add-file-with-content", args,
+		  N_("path:content"), N_("add untracked file to archive"), 0,
+		  add_file_cb, (intptr_t)&base },
 		OPT_STRING('o', "output", &output, N_("file"),
 			N_("write the archive to this file")),
 		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 1e6d18b140e..8ff1257f1a0 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -206,6 +206,18 @@ test_expect_success 'git archive --format=zip --add-file' '
 check_zip with_untracked
 check_added with_untracked untracked untracked
 
+test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
+	git archive --format=zip >with_file_with_content.zip \
+		--add-file-with-content=hello:world $EMPTY_TREE &&
+	test_when_finished "rm -rf tmp-unpack" &&
+	mkdir tmp-unpack && (
+		cd tmp-unpack &&
+		"$GIT_UNZIP" ../with_file_with_content.zip &&
+		test_path_is_file hello &&
+		test world = $(cat hello)
+	)
+'
+
 test_expect_success 'git archive --format=zip --add-file twice' '
 	echo untracked >untracked &&
 	git archive --format=zip --prefix=one/ --add-file=untracked \
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
@ 2022-05-04 15:25     ` Johannes Schindelin via GitGitGadget
  2022-05-07  2:06       ` Elijah Newren
  2022-05-04 15:25     ` [PATCH v3 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
                       ` (6 subsequent siblings)
  8 siblings, 1 reply; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-04 15:25 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

By allowing the path to be enclosed in double-quotes, we can avoid
the limitation that paths cannot contain colons.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 13 +++++++++----
 archive.c                     | 34 +++++++++++++++++++++++++++++-----
 t/t5003-archive-zip.sh        |  8 ++++++++
 3 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index a0edc9167b2..1789ce4c232 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -67,10 +67,15 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 +
-The `<path>` cannot contain any colon, the file mode is limited to
-a regular file, and the option may be subject to platform-dependent
-command-line limits. For non-trivial cases, write an untracked file
-and use `--add-file` instead.
+The `<path>` argument can start and end with a literal double-quote
+character. In this case, the backslash is interpreted as escape
+character. The path must be quoted if it contains a colon, to avoid
+the colon from being misinterpreted as the separator between the
+path and the contents.
++
+The file mode is limited to a regular file, and the option may be
+subject to platform-dependent command-line limits. For non-trivial
+cases, write an untracked file and use `--add-file` instead.
 
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
diff --git a/archive.c b/archive.c
index d798624cd5f..3b751027143 100644
--- a/archive.c
+++ b/archive.c
@@ -533,13 +533,37 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 			die(_("Not a regular file: %s"), path);
 		info->content = NULL; /* read the file later */
 	} else {
-		const char *colon = strchr(arg, ':');
 		char *p;
 
-		if (!colon)
-			die(_("missing colon: '%s'"), arg);
+		if (*arg != '"') {
+			const char *colon = strchr(arg, ':');
+
+			if (!colon)
+				die(_("missing colon: '%s'"), arg);
+			p = xstrndup(arg, colon - arg);
+			arg = colon + 1;
+		} else {
+			struct strbuf buf = STRBUF_INIT;
+			const char *orig = arg;
+
+			for (;;) {
+				if (!*(++arg))
+					die(_("unclosed quote: '%s'"), orig);
+				if (*arg == '"')
+					break;
+				if (*arg == '\\' && *(++arg) == '\0')
+					die(_("trailing backslash: '%s"), orig);
+				else
+					strbuf_addch(&buf, *arg);
+			}
+
+			if (*(++arg) != ':')
+				die(_("missing colon: '%s'"), orig);
+
+			p = strbuf_detach(&buf, NULL);
+			arg++;
+		}
 
-		p = xstrndup(arg, colon - arg);
 		if (!args->prefix)
 			path = p;
 		else {
@@ -548,7 +572,7 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 		}
 		memset(&info->stat, 0, sizeof(info->stat));
 		info->stat.st_mode = S_IFREG | 0644;
-		info->content = xstrdup(colon + 1);
+		info->content = xstrdup(arg);
 		info->stat.st_size = strlen(info->content);
 	}
 	item = string_list_append_nodup(&args->extra_files, path);
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 8ff1257f1a0..5b8bbfc2692 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -207,13 +207,21 @@ check_zip with_untracked
 check_added with_untracked untracked untracked
 
 test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
+	if test_have_prereq FUNNYNAMES
+	then
+		QUOTED=quoted:colon
+	else
+		QUOTED=quoted
+	fi &&
 	git archive --format=zip >with_file_with_content.zip \
+		--add-file-with-content=\"$QUOTED\": \
 		--add-file-with-content=hello:world $EMPTY_TREE &&
 	test_when_finished "rm -rf tmp-unpack" &&
 	mkdir tmp-unpack && (
 		cd tmp-unpack &&
 		"$GIT_UNZIP" ../with_file_with_content.zip &&
 		test_path_is_file hello &&
+		test_path_is_file $QUOTED &&
 		test world = $(cat hello)
 	)
 '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 3/7] scalar: validate the optional enlistment argument
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
@ 2022-05-04 15:25     ` Johannes Schindelin via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
                       ` (5 subsequent siblings)
  8 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-04 15:25 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `scalar` command needs a Scalar enlistment for many subcommands, and
looks in the current directory for such an enlistment (traversing the
parent directories until it finds one).

These is subcommands can also be called with an optional argument
specifying the enlistment. Here, too, we traverse parent directories as
needed, until we find an enlistment.

However, if the specified directory does not even exist, or is not a
directory, we should stop right there, with an error message.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 6 ++++--
 contrib/scalar/t/t9099-scalar.sh | 5 +++++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 1ce9c2b00e8..00dcd4b50ef 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -43,9 +43,11 @@ static void setup_enlistment_directory(int argc, const char **argv,
 		usage_with_options(usagestr, options);
 
 	/* find the worktree, determine its corresponding root */
-	if (argc == 1)
+	if (argc == 1) {
 		strbuf_add_absolute_path(&path, argv[0]);
-	else if (strbuf_getcwd(&path) < 0)
+		if (!is_directory(path.buf))
+			die(_("'%s' does not exist"), path.buf);
+	} else if (strbuf_getcwd(&path) < 0)
 		die(_("need a working directory"));
 
 	strbuf_trim_trailing_dir_sep(&path);
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 2e1502ad45e..9d83fdf25e8 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -85,4 +85,9 @@ test_expect_success 'scalar delete with enlistment' '
 	test_path_is_missing cloned
 '
 
+test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
+	! scalar run config cloned 2>err &&
+	grep "cloned. does not exist" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 4/7] Implement `scalar diagnose`
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                       ` (2 preceding siblings ...)
  2022-05-04 15:25     ` [PATCH v3 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
@ 2022-05-04 15:25     ` Johannes Schindelin via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
                       ` (4 subsequent siblings)
  8 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-04 15:25 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Over the course of Scalar's development, it became obvious that there is
a need for a command that can gather all kinds of useful information
that can help identify the most typical problems with large
worktrees/repositories.

The `diagnose` command is the culmination of this hard-won knowledge: it
gathers the installed hooks, the config, a couple statistics describing
the data shape, among other pieces of information, and then wraps
everything up in a tidy, neat `.zip` archive.

Note: originally, Scalar was implemented in C# using the .NET API, where
we had the luxury of a comprehensive standard library that includes
basic functionality such as writing a `.zip` file. In the C version, we
lack such a commodity. Rather than introducing a dependency on, say,
libzip, we slightly abuse Git's `archive` machinery: we write out a
`.zip` of the empty try, augmented by a couple files that are added via
the `--add-file*` options. We are careful trying not to modify the
current repository in any way lest the very circumstances that required
`scalar diagnose` to be run are changed by the `diagnose` run itself.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 141 +++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt        |  12 +++
 contrib/scalar/t/t9099-scalar.sh |  14 +++
 3 files changed, 167 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 00dcd4b50ef..a290e52e1d2 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,6 +11,7 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
+#include "archive.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -261,6 +262,44 @@ static int unregister_dir(void)
 	return res;
 }
 
+static int add_directory_to_archiver(struct strvec *archiver_args,
+					  const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir = opendir(at_root ? "." : path);
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	if (!dir)
+		return error(_("could not open directory '%s'"), path);
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if (e->d_type == DT_REG)
+			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
+		else if (e->d_type != DT_DIR)
+			res = -1;
+		else if (recurse)
+		     add_directory_to_archiver(archiver_args, buf.buf, recurse);
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -501,6 +540,107 @@ cleanup:
 	return res;
 }
 
+static int cmd_diagnose(int argc, const char **argv)
+{
+	struct option options[] = {
+		OPT_END(),
+	};
+	const char * const usage[] = {
+		N_("scalar diagnose [<enlistment>]"),
+		NULL
+	};
+	struct strbuf zip_path = STRBUF_INIT;
+	struct strvec archiver_args = STRVEC_INIT;
+	char **argv_copy = NULL;
+	int stdout_fd = -1, archiver_fd = -1;
+	time_t now = time(NULL);
+	struct tm tm;
+	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
+	int res = 0;
+
+	argc = parse_options(argc, argv, NULL, options,
+			     usage, 0);
+
+	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
+
+	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
+	strbuf_addftime(&zip_path,
+			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
+	strbuf_addstr(&zip_path, ".zip");
+	switch (safe_create_leading_directories(zip_path.buf)) {
+	case SCLD_EXISTS:
+	case SCLD_OK:
+		break;
+	default:
+		error_errno(_("could not create directory for '%s'"),
+			    zip_path.buf);
+		goto diagnose_cleanup;
+	}
+	stdout_fd = dup(1);
+	if (stdout_fd < 0) {
+		res = error_errno(_("could not duplicate stdout"));
+		goto diagnose_cleanup;
+	}
+
+	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
+		res = error_errno(_("could not redirect output"));
+		goto diagnose_cleanup;
+	}
+
+	init_zip_archiver();
+	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	write_or_die(stdout_fd, buf.buf, buf.len);
+	strvec_pushf(&archiver_args,
+		     "--add-file-with-content=diagnostics.log:%.*s",
+		     (int)buf.len, buf.buf);
+
+	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	strvec_pushl(&archiver_args, "--prefix=",
+		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
+
+	/* `write_archive()` modifies the `argv` passed to it. Let it. */
+	argv_copy = xmemdupz(archiver_args.v,
+			     sizeof(char *) * archiver_args.nr);
+	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
+			    the_repository, NULL, 0);
+	if (res) {
+		error(_("failed to write archive"));
+		goto diagnose_cleanup;
+	}
+
+	if (!res)
+		fprintf(stderr, "\n"
+		       "Diagnostics complete.\n"
+		       "All of the gathered info is captured in '%s'\n",
+		       zip_path.buf);
+
+diagnose_cleanup:
+	if (archiver_fd >= 0) {
+		close(1);
+		dup2(stdout_fd, 1);
+	}
+	free(argv_copy);
+	strvec_clear(&archiver_args);
+	strbuf_release(&zip_path);
+	strbuf_release(&path);
+	strbuf_release(&buf);
+
+	return res;
+}
+
 static int cmd_list(int argc, const char **argv)
 {
 	if (argc != 1)
@@ -802,6 +942,7 @@ static struct {
 	{ "reconfigure", cmd_reconfigure },
 	{ "delete", cmd_delete },
 	{ "version", cmd_version },
+	{ "diagnose", cmd_diagnose },
 	{ NULL, NULL},
 };
 
diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
index f416d637289..22583fe046e 100644
--- a/contrib/scalar/scalar.txt
+++ b/contrib/scalar/scalar.txt
@@ -14,6 +14,7 @@ scalar register [<enlistment>]
 scalar unregister [<enlistment>]
 scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
 scalar reconfigure [ --all | <enlistment> ]
+scalar diagnose [<enlistment>]
 scalar delete <enlistment>
 
 DESCRIPTION
@@ -129,6 +130,17 @@ reconfigure the enlistment.
 With the `--all` option, all enlistments currently registered with Scalar
 will be reconfigured. Use this option after each Scalar upgrade.
 
+Diagnose
+~~~~~~~~
+
+diagnose [<enlistment>]::
+    When reporting issues with Scalar, it is often helpful to provide the
+    information gathered by this command, including logs and certain
+    statistics describing the data shape of the current enlistment.
++
+The output of this command is a `.zip` file that is written into
+a directory adjacent to the worktree in the `src` directory.
+
 Delete
 ~~~~~~
 
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 9d83fdf25e8..bbd07a44426 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -90,4 +90,18 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 	grep "cloned. does not exist" err
 '
 
+SQ="'"
+test_expect_success UNZIP 'scalar diagnose' '
+	scalar clone "file://$(pwd)" cloned --single-branch &&
+	scalar diagnose cloned >out &&
+	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
+	zip_path=$(cat zip_path) &&
+	test -n "$zip_path" &&
+	unzip -v "$zip_path" &&
+	folder=${zip_path%.zip} &&
+	test_path_is_missing "$folder" &&
+	unzip -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 5/7] scalar diagnose: include disk space information
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                       ` (3 preceding siblings ...)
  2022-05-04 15:25     ` [PATCH v3 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-05-04 15:25     ` Johannes Schindelin via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
                       ` (3 subsequent siblings)
  8 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-04 15:25 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When analyzing problems with large worktrees/repositories, it is useful
to know how close to a "full disk" situation Scalar/Git operates. Let's
include this information.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 53 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  1 +
 2 files changed, 54 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index a290e52e1d2..df44902c909 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -300,6 +300,58 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 	return res;
 }
 
+#ifndef WIN32
+#include <sys/statvfs.h>
+#endif
+
+static int get_disk_info(struct strbuf *out)
+{
+#ifdef WIN32
+	struct strbuf buf = STRBUF_INIT;
+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
+	DWORD serial_number, component_length, flags;
+	ULARGE_INTEGER avail2caller, total, avail;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
+		error(_("could not determine free disk size for '%s'"),
+		      buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
+				   &serial_number, &component_length, &flags,
+				   fs_name, sizeof(fs_name))) {
+		error(_("could not get info for '%s'"), buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
+	strbuf_addch(out, '\n');
+	strbuf_release(&buf);
+#else
+	struct strbuf buf = STRBUF_INIT;
+	struct statvfs stat;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (statvfs(buf.buf, &stat) < 0) {
+		error_errno(_("could not determine free disk size for '%s'"),
+			    buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
+	strbuf_release(&buf);
+#endif
+	return 0;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -596,6 +648,7 @@ static int cmd_diagnose(int argc, const char **argv)
 	get_version_info(&buf, 1);
 
 	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	get_disk_info(&buf);
 	write_or_die(stdout_fd, buf.buf, buf.len);
 	strvec_pushf(&archiver_args,
 		     "--add-file-with-content=diagnostics.log:%.*s",
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index bbd07a44426..f3d037823c8 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -94,6 +94,7 @@ SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	scalar diagnose cloned >out &&
+	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
 	zip_path=$(cat zip_path) &&
 	test -n "$zip_path" &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 6/7] scalar: teach `diagnose` to gather packfile info
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                       ` (4 preceding siblings ...)
  2022-05-04 15:25     ` [PATCH v3 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
@ 2022-05-04 15:25     ` Matthew John Cheetham via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
                       ` (2 subsequent siblings)
  8 siblings, 0 replies; 140+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-05-04 15:25 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

It's helpful to see if there are other crud files in the pack
directory. Let's teach the `scalar diagnose` command to gather
file size information about pack files.

While at it, also enumerate the pack files in the alternate
object directories, if any are registered.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 30 ++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  6 +++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index df44902c909..9adde8cf4b9 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "help.h"
 #include "archive.h"
+#include "object-store.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -592,6 +593,29 @@ cleanup:
 	return res;
 }
 
+static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
+				   const char *file_name, void *data)
+{
+	struct strbuf *buf = data;
+	struct stat st;
+
+	if (!stat(full_path, &st))
+		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
+			    (uintmax_t)st.st_size);
+}
+
+static int dir_file_stats(struct object_directory *object_dir, void *data)
+{
+	struct strbuf *buf = data;
+
+	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
+
+	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
+				  data);
+
+	return 0;
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -654,6 +678,12 @@ static int cmd_diagnose(int argc, const char **argv)
 		     "--add-file-with-content=diagnostics.log:%.*s",
 		     (int)buf.len, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-file-with-content=packs-local.txt:");
+	dir_file_stats(the_repository->objects->odb, &buf);
+	foreach_alt_odb(dir_file_stats, &buf);
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index f3d037823c8..e049221609d 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -93,6 +93,8 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
+	git repack &&
+	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
 	scalar diagnose cloned >out &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
@@ -102,7 +104,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	folder=${zip_path%.zip} &&
 	test_path_is_missing "$folder" &&
 	unzip -p "$zip_path" diagnostics.log >out &&
-	test_file_not_empty out
+	test_file_not_empty out &&
+	unzip -p "$zip_path" packs-local.txt >out &&
+	grep "$(pwd)/.git/objects" out
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 7/7] scalar: teach `diagnose` to gather loose objects information
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                       ` (5 preceding siblings ...)
  2022-05-04 15:25     ` [PATCH v3 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
@ 2022-05-04 15:25     ` Matthew John Cheetham via GitGitGadget
  2022-05-07  2:23     ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Elijah Newren
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
  8 siblings, 0 replies; 140+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-05-04 15:25 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

When operating at the scale that Scalar wants to support, certain data
shapes are more likely to cause undesirable performance issues, such as
large numbers of loose objects.

By including statistics about this, `scalar diagnose` now makes it
easier to identify such scenarios.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 59 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  5 ++-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 9adde8cf4b9..f2fe3858eca 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -616,6 +616,60 @@ static int dir_file_stats(struct object_directory *object_dir, void *data)
 	return 0;
 }
 
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -684,6 +738,11 @@ static int cmd_diagnose(int argc, const char **argv)
 	foreach_alt_odb(dir_file_stats, &buf);
 	strvec_push(&archiver_args, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-file-with-content=objects-local.txt:");
+	loose_objs_stats(&buf, ".git/objects");
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index e049221609d..9b4eedbb0aa 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -95,6 +95,7 @@ test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	git repack &&
 	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
+	test_commit -C cloned/src loose &&
 	scalar diagnose cloned >out &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
@@ -106,7 +107,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	unzip -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
 	unzip -p "$zip_path" packs-local.txt >out &&
-	grep "$(pwd)/.git/objects" out
+	grep "$(pwd)/.git/objects" out &&
+	unzip -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [1-9]" out
 '
 
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-04 15:25     ` [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
@ 2022-05-07  2:06       ` Elijah Newren
  2022-05-09 21:04         ` Johannes Schindelin
  0 siblings, 1 reply; 140+ messages in thread
From: Elijah Newren @ 2022-05-07  2:06 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: Git Mailing List, René Scharfe, Taylor Blau, Derrick Stolee,
	Johannes Schindelin

On Wed, May 4, 2022 at 8:25 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> By allowing the path to be enclosed in double-quotes, we can avoid
> the limitation that paths cannot contain colons.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  Documentation/git-archive.txt | 13 +++++++++----
>  archive.c                     | 34 +++++++++++++++++++++++++++++-----
>  t/t5003-archive-zip.sh        |  8 ++++++++
>  3 files changed, 46 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
> index a0edc9167b2..1789ce4c232 100644
> --- a/Documentation/git-archive.txt
> +++ b/Documentation/git-archive.txt
> @@ -67,10 +67,15 @@ OPTIONS
>         by concatenating the value for `--prefix` (if any) and the
>         basename of <file>.
>  +
> -The `<path>` cannot contain any colon, the file mode is limited to
> -a regular file, and the option may be subject to platform-dependent
> -command-line limits. For non-trivial cases, write an untracked file
> -and use `--add-file` instead.
> +The `<path>` argument can start and end with a literal double-quote
> +character. In this case, the backslash is interpreted as escape
> +character. The path must be quoted if it contains a colon, to avoid
> +the colon from being misinterpreted as the separator between the
> +path and the contents.

The path must also be quoted if it begins or ends with a double-quote, right?

Also, would people want to be able to pass a pathname from the output
of e.g. `git ls-files -o`, which may quote additional characters?

> ++
> +The file mode is limited to a regular file, and the option may be
> +subject to platform-dependent command-line limits. For non-trivial
> +cases, write an untracked file and use `--add-file` instead.
>
>  --worktree-attributes::
>         Look for attributes in .gitattributes files in the working tree
> diff --git a/archive.c b/archive.c
> index d798624cd5f..3b751027143 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -533,13 +533,37 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
>                         die(_("Not a regular file: %s"), path);
>                 info->content = NULL; /* read the file later */
>         } else {
> -               const char *colon = strchr(arg, ':');
>                 char *p;
>
> -               if (!colon)
> -                       die(_("missing colon: '%s'"), arg);
> +               if (*arg != '"') {
> +                       const char *colon = strchr(arg, ':');
> +
> +                       if (!colon)
> +                               die(_("missing colon: '%s'"), arg);
> +                       p = xstrndup(arg, colon - arg);
> +                       arg = colon + 1;
> +               } else {
> +                       struct strbuf buf = STRBUF_INIT;
> +                       const char *orig = arg;
> +
> +                       for (;;) {
> +                               if (!*(++arg))
> +                                       die(_("unclosed quote: '%s'"), orig);
> +                               if (*arg == '"')
> +                                       break;
> +                               if (*arg == '\\' && *(++arg) == '\0')
> +                                       die(_("trailing backslash: '%s"), orig);
> +                               else
> +                                       strbuf_addch(&buf, *arg);
> +                       }
> +
> +                       if (*(++arg) != ':')
> +                               die(_("missing colon: '%s'"), orig);
> +
> +                       p = strbuf_detach(&buf, NULL);
> +                       arg++;
> +               }

Should we use unquote_c_style() here instead of rolling another parser
to do unquoting?  That would have the added benefit of allowing people
to use filenames from the output of various git commands that do
special quoting -- such as octal sequences for non-ascii characters.

>
> -               p = xstrndup(arg, colon - arg);
>                 if (!args->prefix)
>                         path = p;
>                 else {
> @@ -548,7 +572,7 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
>                 }
>                 memset(&info->stat, 0, sizeof(info->stat));
>                 info->stat.st_mode = S_IFREG | 0644;
> -               info->content = xstrdup(colon + 1);
> +               info->content = xstrdup(arg);
>                 info->stat.st_size = strlen(info->content);
>         }
>         item = string_list_append_nodup(&args->extra_files, path);
> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> index 8ff1257f1a0..5b8bbfc2692 100755
> --- a/t/t5003-archive-zip.sh
> +++ b/t/t5003-archive-zip.sh
> @@ -207,13 +207,21 @@ check_zip with_untracked
>  check_added with_untracked untracked untracked
>
>  test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
> +       if test_have_prereq FUNNYNAMES
> +       then
> +               QUOTED=quoted:colon
> +       else
> +               QUOTED=quoted
> +       fi &&
>         git archive --format=zip >with_file_with_content.zip \
> +               --add-file-with-content=\"$QUOTED\": \
>                 --add-file-with-content=hello:world $EMPTY_TREE &&
>         test_when_finished "rm -rf tmp-unpack" &&
>         mkdir tmp-unpack && (
>                 cd tmp-unpack &&
>                 "$GIT_UNZIP" ../with_file_with_content.zip &&
>                 test_path_is_file hello &&
> +               test_path_is_file $QUOTED &&
>                 test world = $(cat hello)
>         )
>  '
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v3 0/7] scalar: implement the subcommand "diagnose"
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                       ` (6 preceding siblings ...)
  2022-05-04 15:25     ` [PATCH v3 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
@ 2022-05-07  2:23     ` Elijah Newren
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
  8 siblings, 0 replies; 140+ messages in thread
From: Elijah Newren @ 2022-05-07  2:23 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: Git Mailing List, René Scharfe, Taylor Blau, Derrick Stolee,
	Johannes Schindelin

On Wed, May 4, 2022 at 8:25 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Over the course of the years, we developed a sub-command that gathers
> diagnostic data into a .zip file that can then be attached to bug reports.
> This sub-command turned out to be very useful in helping Scalar developers
> identify and fix issues.
>
> Changes since v2:
>
>  * Clarified in the commit message what the biggest benefit of
>    --add-file-with-content is.
>  * The <path> part of the -add-file-with-content argument can now contain
>    colons. To do this, the path needs to start and end in double-quote
>    characters (which are stripped), and the backslash serves as escape
>    character in that case (to allow the path to contain both colons and
>    double-quotes).

You addressed all my previous feedback from an earlier round.  The
only thing I noticed in this round is I wonder if we should use
unquote_c_style() for this, as commented on the patch in question.

>  * Fixed incorrect grammar.
>  * Instead of strcmp(<what-we-don't-want>), we now say
>    !strcmp(<what-we-want>).
>  * The help text for --add-file-with-content was improved a tiny bit.
>  * Adjusted the commit message that still talked about spawning plenty of
>    processes and about a throw-away repository for the sake of generating a
>    .zip file.
>  * Simplified the code that shows the diagnostics and adds them to the .zip
>    file.
>  * The final message that reports that the archive is complete is now
>    printed to stderr instead of stdout.
>
> Changes since v1:
>
>  * Instead of creating a throw-away repository, staging the contents of the
>    .zip file and then using git write-tree and git archive to write the .zip
>    file, the patch series now introduces a new option to git archive and
>    uses write_archive() directly (avoiding any separate process).
>  * Since the command avoids separate processes, it is now blazing fast on
>    Windows, and I dropped the spinner() function because it's no longer
>    needed.
>  * While reworking the test case, I noticed that scalar [...] <enlistment>
>    failed to verify that the specified directory exists, and would happily
>    "traverse to its parent directory" on its quest to find a Scalar
>    enlistment. That is of course incorrect, and has been fixed as a "while
>    at it" sort of preparatory commit.
>  * I had forgotten to sign off on all the commits, which has been fixed.
>  * Instead of some "home-grown" readdir()-based function, the code now uses
>    for_each_file_in_pack_dir() to look through the pack directories.
>  * If any alternates are configured, their pack directories are now included
>    in the output.
>  * The commit message that might be interpreted to promise information about
>    large loose files has been corrected to no longer promise that.
>  * The test cases have been adjusted to test a little bit more (e.g.
>    verifying that specific paths are mentioned in the output, instead of
>    merely verifying that the output is non-empty).
>
> Johannes Schindelin (5):
>   archive: optionally add "virtual" files
>   archive --add-file-with-contents: allow paths containing colons
>   scalar: validate the optional enlistment argument
>   Implement `scalar diagnose`
>   scalar diagnose: include disk space information
>
> Matthew John Cheetham (2):
>   scalar: teach `diagnose` to gather packfile info
>   scalar: teach `diagnose` to gather loose objects information
>
>  Documentation/git-archive.txt    |  16 ++
>  archive.c                        |  75 +++++++-
>  contrib/scalar/scalar.c          | 289 ++++++++++++++++++++++++++++++-
>  contrib/scalar/scalar.txt        |  12 ++
>  contrib/scalar/t/t9099-scalar.sh |  27 +++
>  t/t5003-archive-zip.sh           |  20 +++
>  6 files changed, 429 insertions(+), 10 deletions(-)
>
>
> base-commit: ddc35d833dd6f9e8946b09cecd3311b8aa18d295
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1128%2Fdscho%2Fscalar-diagnose-v3
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1128/dscho/scalar-diagnose-v3
> Pull-Request: https://github.com/gitgitgadget/git/pull/1128
>
> Range-diff vs v2:
>
>  1:  49ff3c1f2b3 ! 1:  45662cf582a archive: optionally add "virtual" files
>      @@ Commit message
>           archive` now supports use cases where relatively trivial files need to
>           be added that do not exist on disk.
>
>      +    This will allow us to generate `.zip` files with generated content,
>      +    without having to add said content to the object database and without
>      +    having to write it out to disk.
>      +
>           Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>
>        ## Documentation/git-archive.txt ##
>      @@ Documentation/git-archive.txt: OPTIONS
>       + basename of <file>.
>       ++
>       +The `<path>` cannot contain any colon, the file mode is limited to
>      -+a regular file, and the option may be subject platform-dependent
>      ++a regular file, and the option may be subject to platform-dependent
>       +command-line limits. For non-trivial cases, write an untracked file
>       +and use `--add-file` instead.
>       +
>      @@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int
>       - if (!S_ISREG(info->stat.st_mode))
>       -         die(_("Not a regular file: %s"), path);
>       +
>      -+ if (strcmp(opt->long_name, "add-file-with-content")) {
>      ++ if (!strcmp(opt->long_name, "add-file")) {
>       +         path = prefix_filename(args->prefix, arg);
>       +         if (stat(path, &info->stat))
>       +                 die(_("File not found: %s"), path);
>      @@ archive.c: static int parse_archive_args(int argc, const char **argv,
>                   N_("add untracked file to archive"), 0, add_file_cb,
>                   (intptr_t)&base },
>       +         { OPTION_CALLBACK, 0, "add-file-with-content", args,
>      -+           N_("file"), N_("add untracked file to archive"), 0,
>      ++           N_("path:content"), N_("add untracked file to archive"), 0,
>       +           add_file_cb, (intptr_t)&base },
>                 OPT_STRING('o', "output", &output, N_("file"),
>                         N_("write the archive to this file")),
>  -:  ----------- > 2:  ce4b1b680c9 archive --add-file-with-contents: allow paths containing colons
>  2:  600da8d465e = 3:  5a3eeb55409 scalar: validate the optional enlistment argument
>  3:  0d570137bb6 ! 4:  dfe821d10fe Implement `scalar diagnose`
>      @@ Commit message
>           we had the luxury of a comprehensive standard library that includes
>           basic functionality such as writing a `.zip` file. In the C version, we
>           lack such a commodity. Rather than introducing a dependency on, say,
>      -    libzip, we slightly abuse Git's `archive` command: Instead of writing
>      -    the `.zip` file directly, we stage the file contents in a Git index of a
>      -    temporary, bare repository, only to let `git archive` have at it, and
>      -    finally removing the temporary repository.
>      -
>      -    Also note: Due to the frequently-spawned `git hash-object` processes,
>      -    this command is quite a bit slow on Windows. Should it turn out to be a
>      -    big problem, the lack of a batch mode of the `hash-object` command could
>      -    potentially be worked around via using `git fast-import` with a crafted
>      -    `stdin`.
>      +    libzip, we slightly abuse Git's `archive` machinery: we write out a
>      +    `.zip` of the empty try, augmented by a couple files that are added via
>      +    the `--add-file*` options. We are careful trying not to modify the
>      +    current repository in any way lest the very circumstances that required
>      +    `scalar diagnose` to be run are changed by the `diagnose` run itself.
>
>           Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>
>      @@ contrib/scalar/scalar.c: cleanup:
>       + time_t now = time(NULL);
>       + struct tm tm;
>       + struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
>      -+ size_t off;
>       + int res = 0;
>       +
>       + argc = parse_options(argc, argv, NULL, options,
>      @@ contrib/scalar/scalar.c: cleanup:
>       + strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
>       +
>       + strbuf_reset(&buf);
>      -+ strbuf_addstr(&buf,
>      -+               "--add-file-with-content=diagnostics.log:"
>      -+               "Collecting diagnostic info\n\n");
>      ++ strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
>       + get_version_info(&buf, 1);
>       +
>       + strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
>      -+ off = strchr(buf.buf, ':') + 1 - buf.buf;
>      -+ write_or_die(stdout_fd, buf.buf + off, buf.len - off);
>      -+ strvec_push(&archiver_args, buf.buf);
>      ++ write_or_die(stdout_fd, buf.buf, buf.len);
>      ++ strvec_pushf(&archiver_args,
>      ++              "--add-file-with-content=diagnostics.log:%.*s",
>      ++              (int)buf.len, buf.buf);
>       +
>       + if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
>       +     (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
>      @@ contrib/scalar/scalar.c: cleanup:
>       + }
>       +
>       + if (!res)
>      -+         printf("\n"
>      ++         fprintf(stderr, "\n"
>       +                "Diagnostics complete.\n"
>       +                "All of the gathered info is captured in '%s'\n",
>       +                zip_path.buf);
>  4:  938e38b5a09 ! 5:  bb162abd383 scalar diagnose: include disk space information
>      @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
>
>         strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
>       + get_disk_info(&buf);
>      -  off = strchr(buf.buf, ':') + 1 - buf.buf;
>      -  write_or_die(stdout_fd, buf.buf + off, buf.len - off);
>      -  strvec_push(&archiver_args, buf.buf);
>      +  write_or_die(stdout_fd, buf.buf, buf.len);
>      +  strvec_pushf(&archiver_args,
>      +               "--add-file-with-content=diagnostics.log:%.*s",
>
>        ## contrib/scalar/t/t9099-scalar.sh ##
>       @@ contrib/scalar/t/t9099-scalar.sh: SQ="'"
>  5:  bd9428919fa ! 6:  32aaad7cce1 scalar: teach `diagnose` to gather packfile info
>      @@ contrib/scalar/scalar.c: cleanup:
>        {
>         struct option options[] = {
>       @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
>      -  write_or_die(stdout_fd, buf.buf + off, buf.len - off);
>      -  strvec_push(&archiver_args, buf.buf);
>      +               "--add-file-with-content=diagnostics.log:%.*s",
>      +               (int)buf.len, buf.buf);
>
>       + strbuf_reset(&buf);
>       + strbuf_addstr(&buf, "--add-file-with-content=packs-local.txt:");
>  6:  7a8875be425 = 7:  322932f0bb8 scalar: teach `diagnose` to gather loose objects information
>
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-07  2:06       ` Elijah Newren
@ 2022-05-09 21:04         ` Johannes Schindelin
  0 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin @ 2022-05-09 21:04 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Johannes Schindelin via GitGitGadget, Git Mailing List,
	René Scharfe, Taylor Blau, Derrick Stolee

Hi Elijah,

On Fri, 6 May 2022, Elijah Newren wrote:

> On Wed, May 4, 2022 at 8:25 AM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > By allowing the path to be enclosed in double-quotes, we can avoid
> > the limitation that paths cannot contain colons.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  Documentation/git-archive.txt | 13 +++++++++----
> >  archive.c                     | 34 +++++++++++++++++++++++++++++-----
> >  t/t5003-archive-zip.sh        |  8 ++++++++
> >  3 files changed, 46 insertions(+), 9 deletions(-)
> >
> > diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
> > index a0edc9167b2..1789ce4c232 100644
> > --- a/Documentation/git-archive.txt
> > +++ b/Documentation/git-archive.txt
> > @@ -67,10 +67,15 @@ OPTIONS
> >         by concatenating the value for `--prefix` (if any) and the
> >         basename of <file>.
> >  +
> > -The `<path>` cannot contain any colon, the file mode is limited to
> > -a regular file, and the option may be subject to platform-dependent
> > -command-line limits. For non-trivial cases, write an untracked file
> > -and use `--add-file` instead.
> > +The `<path>` argument can start and end with a literal double-quote
> > +character. In this case, the backslash is interpreted as escape
> > +character. The path must be quoted if it contains a colon, to avoid
> > +the colon from being misinterpreted as the separator between the
> > +path and the contents.
>
> The path must also be quoted if it begins or ends with a double-quote, right?

True.

> Also, would people want to be able to pass a pathname from the output
> of e.g. `git ls-files -o`, which may quote additional characters?

Also true.

> > ++
> > +The file mode is limited to a regular file, and the option may be
> > +subject to platform-dependent command-line limits. For non-trivial
> > +cases, write an untracked file and use `--add-file` instead.
> >
> >  --worktree-attributes::
> >         Look for attributes in .gitattributes files in the working tree
> > diff --git a/archive.c b/archive.c
> > index d798624cd5f..3b751027143 100644
> > --- a/archive.c
> > +++ b/archive.c
> > @@ -533,13 +533,37 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
> >                         die(_("Not a regular file: %s"), path);
> >                 info->content = NULL; /* read the file later */
> >         } else {
> > -               const char *colon = strchr(arg, ':');
> >                 char *p;
> >
> > -               if (!colon)
> > -                       die(_("missing colon: '%s'"), arg);
> > +               if (*arg != '"') {
> > +                       const char *colon = strchr(arg, ':');
> > +
> > +                       if (!colon)
> > +                               die(_("missing colon: '%s'"), arg);
> > +                       p = xstrndup(arg, colon - arg);
> > +                       arg = colon + 1;
> > +               } else {
> > +                       struct strbuf buf = STRBUF_INIT;
> > +                       const char *orig = arg;
> > +
> > +                       for (;;) {
> > +                               if (!*(++arg))
> > +                                       die(_("unclosed quote: '%s'"), orig);
> > +                               if (*arg == '"')
> > +                                       break;
> > +                               if (*arg == '\\' && *(++arg) == '\0')
> > +                                       die(_("trailing backslash: '%s"), orig);
> > +                               else
> > +                                       strbuf_addch(&buf, *arg);
> > +                       }
> > +
> > +                       if (*(++arg) != ':')
> > +                               die(_("missing colon: '%s'"), orig);
> > +
> > +                       p = strbuf_detach(&buf, NULL);
> > +                       arg++;
> > +               }
>
> Should we use unquote_c_style() here instead of rolling another parser
> to do unquoting?  That would have the added benefit of allowing people
> to use filenames from the output of various git commands that do
> special quoting -- such as octal sequences for non-ascii characters.

Yep, let's do that. I somehow missed that function while glimpsing at
`quote.h`.

Thank you for your review!
Dscho

> >
> > -               p = xstrndup(arg, colon - arg);
> >                 if (!args->prefix)
> >                         path = p;
> >                 else {
> > @@ -548,7 +572,7 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
> >                 }
> >                 memset(&info->stat, 0, sizeof(info->stat));
> >                 info->stat.st_mode = S_IFREG | 0644;
> > -               info->content = xstrdup(colon + 1);
> > +               info->content = xstrdup(arg);
> >                 info->stat.st_size = strlen(info->content);
> >         }
> >         item = string_list_append_nodup(&args->extra_files, path);
> > diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> > index 8ff1257f1a0..5b8bbfc2692 100755
> > --- a/t/t5003-archive-zip.sh
> > +++ b/t/t5003-archive-zip.sh
> > @@ -207,13 +207,21 @@ check_zip with_untracked
> >  check_added with_untracked untracked untracked
> >
> >  test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
> > +       if test_have_prereq FUNNYNAMES
> > +       then
> > +               QUOTED=quoted:colon
> > +       else
> > +               QUOTED=quoted
> > +       fi &&
> >         git archive --format=zip >with_file_with_content.zip \
> > +               --add-file-with-content=\"$QUOTED\": \
> >                 --add-file-with-content=hello:world $EMPTY_TREE &&
> >         test_when_finished "rm -rf tmp-unpack" &&
> >         mkdir tmp-unpack && (
> >                 cd tmp-unpack &&
> >                 "$GIT_UNZIP" ../with_file_with_content.zip &&
> >                 test_path_is_file hello &&
> > +               test_path_is_file $QUOTED &&
> >                 test world = $(cat hello)
> >         )
> >  '
> > --
> > gitgitgadget
>
>

^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 0/7] scalar: implement the subcommand "diagnose"
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                       ` (7 preceding siblings ...)
  2022-05-07  2:23     ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Elijah Newren
@ 2022-05-10 19:26     ` Johannes Schindelin via GitGitGadget
  2022-05-10 19:26       ` [PATCH v4 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
                         ` (8 more replies)
  8 siblings, 9 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-10 19:26 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin

Over the course of the years, we developed a sub-command that gathers
diagnostic data into a .zip file that can then be attached to bug reports.
This sub-command turned out to be very useful in helping Scalar developers
identify and fix issues.

Changes since v3:

 * We're now using unquote_c_style() instead of rolling our own unquoter.
 * Fixed the added regression test.
 * As pointed out by Scalar's Functional Tests, the
   add_directory_to_archiver() function should not fail when scalar diagnose
   encounters FSMonitor's Unix socket, but only warn instead.
 * Related: add_directory_to_archiver() needs to propagate errors from
   processing subdirectories so that the top-level call returns an error,
   too.

Changes since v2:

 * Clarified in the commit message what the biggest benefit of
   --add-file-with-content is.
 * The <path> part of the -add-file-with-content argument can now contain
   colons. To do this, the path needs to start and end in double-quote
   characters (which are stripped), and the backslash serves as escape
   character in that case (to allow the path to contain both colons and
   double-quotes).
 * Fixed incorrect grammar.
 * Instead of strcmp(<what-we-don't-want>), we now say
   !strcmp(<what-we-want>).
 * The help text for --add-file-with-content was improved a tiny bit.
 * Adjusted the commit message that still talked about spawning plenty of
   processes and about a throw-away repository for the sake of generating a
   .zip file.
 * Simplified the code that shows the diagnostics and adds them to the .zip
   file.
 * The final message that reports that the archive is complete is now
   printed to stderr instead of stdout.

Changes since v1:

 * Instead of creating a throw-away repository, staging the contents of the
   .zip file and then using git write-tree and git archive to write the .zip
   file, the patch series now introduces a new option to git archive and
   uses write_archive() directly (avoiding any separate process).
 * Since the command avoids separate processes, it is now blazing fast on
   Windows, and I dropped the spinner() function because it's no longer
   needed.
 * While reworking the test case, I noticed that scalar [...] <enlistment>
   failed to verify that the specified directory exists, and would happily
   "traverse to its parent directory" on its quest to find a Scalar
   enlistment. That is of course incorrect, and has been fixed as a "while
   at it" sort of preparatory commit.
 * I had forgotten to sign off on all the commits, which has been fixed.
 * Instead of some "home-grown" readdir()-based function, the code now uses
   for_each_file_in_pack_dir() to look through the pack directories.
 * If any alternates are configured, their pack directories are now included
   in the output.
 * The commit message that might be interpreted to promise information about
   large loose files has been corrected to no longer promise that.
 * The test cases have been adjusted to test a little bit more (e.g.
   verifying that specific paths are mentioned in the output, instead of
   merely verifying that the output is non-empty).

Johannes Schindelin (5):
  archive: optionally add "virtual" files
  archive --add-file-with-contents: allow paths containing colons
  scalar: validate the optional enlistment argument
  Implement `scalar diagnose`
  scalar diagnose: include disk space information

Matthew John Cheetham (2):
  scalar: teach `diagnose` to gather packfile info
  scalar: teach `diagnose` to gather loose objects information

 Documentation/git-archive.txt    |  17 ++
 archive.c                        |  61 ++++++-
 contrib/scalar/scalar.c          | 292 ++++++++++++++++++++++++++++++-
 contrib/scalar/scalar.txt        |  12 ++
 contrib/scalar/t/t9099-scalar.sh |  27 +++
 t/t5003-archive-zip.sh           |  20 +++
 6 files changed, 419 insertions(+), 10 deletions(-)


base-commit: ddc35d833dd6f9e8946b09cecd3311b8aa18d295
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1128%2Fdscho%2Fscalar-diagnose-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1128/dscho/scalar-diagnose-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1128

Range-diff vs v3:

 1:  45662cf582a = 1:  45662cf582a archive: optionally add "virtual" files
 2:  ce4b1b680c9 ! 2:  fdba4ed6f4d archive --add-file-with-contents: allow paths containing colons
     @@ Documentation/git-archive.txt: OPTIONS
      -command-line limits. For non-trivial cases, write an untracked file
      -and use `--add-file` instead.
      +The `<path>` argument can start and end with a literal double-quote
     -+character. In this case, the backslash is interpreted as escape
     -+character. The path must be quoted if it contains a colon, to avoid
     -+the colon from being misinterpreted as the separator between the
     -+path and the contents.
     ++character; The contained file name is interpreted as a C-style string,
     ++i.e. the backslash is interpreted as escape character. The path must
     ++be quoted if it contains a colon, to avoid the colon from being
     ++misinterpreted as the separator between the path and the contents, or
     ++if the path begins or ends with a double-quote character.
      ++
      +The file mode is limited to a regular file, and the option may be
      +subject to platform-dependent command-line limits. For non-trivial
     @@ Documentation/git-archive.txt: OPTIONS
       	Look for attributes in .gitattributes files in the working tree
      
       ## archive.c ##
     +@@
     + #include "parse-options.h"
     + #include "unpack-trees.h"
     + #include "dir.h"
     ++#include "quote.h"
     + 
     + static char const * const archive_usage[] = {
     + 	N_("git archive [<options>] <tree-ish> [<path>...]"),
      @@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int unset)
       			die(_("Not a regular file: %s"), path);
       		info->content = NULL; /* read the file later */
       	} else {
      -		const char *colon = strchr(arg, ':');
     - 		char *p;
     +-		char *p;
     ++		struct strbuf buf = STRBUF_INIT;
     ++		const char *p = arg;
     ++
     ++		if (*p != '"')
     ++			p = strchr(p, ':');
     ++		else if (unquote_c_style(&buf, p, &p) < 0)
     ++			die(_("unclosed quote: '%s'"), arg);
       
      -		if (!colon)
     --			die(_("missing colon: '%s'"), arg);
     -+		if (*arg != '"') {
     -+			const char *colon = strchr(arg, ':');
     -+
     -+			if (!colon)
     -+				die(_("missing colon: '%s'"), arg);
     -+			p = xstrndup(arg, colon - arg);
     -+			arg = colon + 1;
     -+		} else {
     -+			struct strbuf buf = STRBUF_INIT;
     -+			const char *orig = arg;
     -+
     -+			for (;;) {
     -+				if (!*(++arg))
     -+					die(_("unclosed quote: '%s'"), orig);
     -+				if (*arg == '"')
     -+					break;
     -+				if (*arg == '\\' && *(++arg) == '\0')
     -+					die(_("trailing backslash: '%s"), orig);
     -+				else
     -+					strbuf_addch(&buf, *arg);
     -+			}
     -+
     -+			if (*(++arg) != ':')
     -+				die(_("missing colon: '%s'"), orig);
     -+
     -+			p = strbuf_detach(&buf, NULL);
     -+			arg++;
     -+		}
     ++		if (!p || *p != ':')
     + 			die(_("missing colon: '%s'"), arg);
       
      -		p = xstrndup(arg, colon - arg);
     - 		if (!args->prefix)
     - 			path = p;
     - 		else {
     -@@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int unset)
     +-		if (!args->prefix)
     +-			path = p;
     +-		else {
     +-			path = prefix_filename(args->prefix, p);
     +-			free(p);
     ++		if (p == arg)
     ++			die(_("empty file name: '%s'"), arg);
     ++
     ++		path = buf.len ?
     ++			strbuf_detach(&buf, NULL) : xstrndup(arg, p - arg);
     ++
     ++		if (args->prefix) {
     ++			char *save = path;
     ++			path = prefix_filename(args->prefix, path);
     ++			free(save);
       		}
       		memset(&info->stat, 0, sizeof(info->stat));
       		info->stat.st_mode = S_IFREG | 0644;
      -		info->content = xstrdup(colon + 1);
     -+		info->content = xstrdup(arg);
     ++		info->content = xstrdup(p + 1);
       		info->stat.st_size = strlen(info->content);
       	}
       	item = string_list_append_nodup(&args->extra_files, path);
 3:  5a3eeb55409 = 3:  da9f52a8240 scalar: validate the optional enlistment argument
 4:  dfe821d10fe ! 4:  87bdc22322b Implement `scalar diagnose`
     @@ contrib/scalar/scalar.c: static int unregister_dir(void)
      +		if (e->d_type == DT_REG)
      +			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
      +		else if (e->d_type != DT_DIR)
     ++			warning(_("skipping '%s', which is neither file nor "
     ++				  "directory"), buf.buf);
     ++		else if (recurse &&
     ++			 add_directory_to_archiver(archiver_args,
     ++						   buf.buf, recurse) < 0)
      +			res = -1;
     -+		else if (recurse)
     -+		     add_directory_to_archiver(archiver_args, buf.buf, recurse);
      +	}
      +
      +	closedir(dir);
     @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success '`scalar [...] <dir>` erro
      +SQ="'"
      +test_expect_success UNZIP 'scalar diagnose' '
      +	scalar clone "file://$(pwd)" cloned --single-branch &&
     -+	scalar diagnose cloned >out &&
     -+	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
     ++	scalar diagnose cloned >out 2>err &&
     ++	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
      +	zip_path=$(cat zip_path) &&
      +	test -n "$zip_path" &&
      +	unzip -v "$zip_path" &&
 5:  bb162abd383 ! 5:  3f63b197d42 scalar diagnose: include disk space information
     @@ contrib/scalar/t/t9099-scalar.sh
      @@ contrib/scalar/t/t9099-scalar.sh: SQ="'"
       test_expect_success UNZIP 'scalar diagnose' '
       	scalar clone "file://$(pwd)" cloned --single-branch &&
     - 	scalar diagnose cloned >out &&
     + 	scalar diagnose cloned >out 2>err &&
      +	grep "Available space" out &&
     - 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
     + 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
       	zip_path=$(cat zip_path) &&
       	test -n "$zip_path" &&
 6:  32aaad7cce1 ! 6:  fc1319338fc scalar: teach `diagnose` to gather packfile info
     @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success '`scalar [...] <dir>` erro
       	scalar clone "file://$(pwd)" cloned --single-branch &&
      +	git repack &&
      +	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
     - 	scalar diagnose cloned >out &&
     + 	scalar diagnose cloned >out 2>err &&
       	grep "Available space" out &&
     - 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
     + 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
      @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success UNZIP 'scalar diagnose' '
       	folder=${zip_path%.zip} &&
       	test_path_is_missing "$folder" &&
 7:  322932f0bb8 ! 7:  e8f5b42f7b7 scalar: teach `diagnose` to gather loose objects information
     @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success UNZIP 'scalar diagnose' '
       	git repack &&
       	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
      +	test_commit -C cloned/src loose &&
     - 	scalar diagnose cloned >out &&
     + 	scalar diagnose cloned >out 2>err &&
       	grep "Available space" out &&
     - 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
     + 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
      @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success UNZIP 'scalar diagnose' '
       	unzip -p "$zip_path" diagnostics.log >out &&
       	test_file_not_empty out &&

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
@ 2022-05-10 19:26       ` Johannes Schindelin via GitGitGadget
  2022-05-10 21:48         ` Junio C Hamano
  2022-05-10 19:26       ` [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
                         ` (7 subsequent siblings)
  8 siblings, 1 reply; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-10 19:26 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

With the `--add-file-with-content=<path>:<content>` option, `git
archive` now supports use cases where relatively trivial files need to
be added that do not exist on disk.

This will allow us to generate `.zip` files with generated content,
without having to add said content to the object database and without
having to write it out to disk.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 11 ++++++++
 archive.c                     | 51 +++++++++++++++++++++++++++++------
 t/t5003-archive-zip.sh        | 12 +++++++++
 3 files changed, 66 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index bc4e76a7834..a0edc9167b2 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -61,6 +61,17 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 
+--add-file-with-content=<path>:<content>::
+	Add the specified contents to the archive.  Can be repeated to add
+	multiple files.  The path of the file in the archive is built
+	by concatenating the value for `--prefix` (if any) and the
+	basename of <file>.
++
+The `<path>` cannot contain any colon, the file mode is limited to
+a regular file, and the option may be subject to platform-dependent
+command-line limits. For non-trivial cases, write an untracked file
+and use `--add-file` instead.
+
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
 	as well (see <<ATTRIBUTES>>).
diff --git a/archive.c b/archive.c
index a3bbb091256..d798624cd5f 100644
--- a/archive.c
+++ b/archive.c
@@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
 struct extra_file_info {
 	char *base;
 	struct stat stat;
+	void *content;
 };
 
 int write_archive_entries(struct archiver_args *args,
@@ -337,7 +338,13 @@ int write_archive_entries(struct archiver_args *args,
 		strbuf_addstr(&path_in_archive, basename(path));
 
 		strbuf_reset(&content);
-		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
+		if (info->content)
+			err = write_entry(args, &fake_oid, path_in_archive.buf,
+					  path_in_archive.len,
+					  info->stat.st_mode,
+					  info->content, info->stat.st_size);
+		else if (strbuf_read_file(&content, path,
+					  info->stat.st_size) < 0)
 			err = error_errno(_("could not read '%s'"), path);
 		else
 			err = write_entry(args, &fake_oid, path_in_archive.buf,
@@ -493,6 +500,7 @@ static void extra_file_info_clear(void *util, const char *str)
 {
 	struct extra_file_info *info = util;
 	free(info->base);
+	free(info->content);
 	free(info);
 }
 
@@ -514,14 +522,38 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 	if (!arg)
 		return -1;
 
-	path = prefix_filename(args->prefix, arg);
-	item = string_list_append_nodup(&args->extra_files, path);
-	item->util = info = xmalloc(sizeof(*info));
+	info = xmalloc(sizeof(*info));
 	info->base = xstrdup_or_null(base);
-	if (stat(path, &info->stat))
-		die(_("File not found: %s"), path);
-	if (!S_ISREG(info->stat.st_mode))
-		die(_("Not a regular file: %s"), path);
+
+	if (!strcmp(opt->long_name, "add-file")) {
+		path = prefix_filename(args->prefix, arg);
+		if (stat(path, &info->stat))
+			die(_("File not found: %s"), path);
+		if (!S_ISREG(info->stat.st_mode))
+			die(_("Not a regular file: %s"), path);
+		info->content = NULL; /* read the file later */
+	} else {
+		const char *colon = strchr(arg, ':');
+		char *p;
+
+		if (!colon)
+			die(_("missing colon: '%s'"), arg);
+
+		p = xstrndup(arg, colon - arg);
+		if (!args->prefix)
+			path = p;
+		else {
+			path = prefix_filename(args->prefix, p);
+			free(p);
+		}
+		memset(&info->stat, 0, sizeof(info->stat));
+		info->stat.st_mode = S_IFREG | 0644;
+		info->content = xstrdup(colon + 1);
+		info->stat.st_size = strlen(info->content);
+	}
+	item = string_list_append_nodup(&args->extra_files, path);
+	item->util = info;
+
 	return 0;
 }
 
@@ -554,6 +586,9 @@ static int parse_archive_args(int argc, const char **argv,
 		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
 		  N_("add untracked file to archive"), 0, add_file_cb,
 		  (intptr_t)&base },
+		{ OPTION_CALLBACK, 0, "add-file-with-content", args,
+		  N_("path:content"), N_("add untracked file to archive"), 0,
+		  add_file_cb, (intptr_t)&base },
 		OPT_STRING('o', "output", &output, N_("file"),
 			N_("write the archive to this file")),
 		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 1e6d18b140e..8ff1257f1a0 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -206,6 +206,18 @@ test_expect_success 'git archive --format=zip --add-file' '
 check_zip with_untracked
 check_added with_untracked untracked untracked
 
+test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
+	git archive --format=zip >with_file_with_content.zip \
+		--add-file-with-content=hello:world $EMPTY_TREE &&
+	test_when_finished "rm -rf tmp-unpack" &&
+	mkdir tmp-unpack && (
+		cd tmp-unpack &&
+		"$GIT_UNZIP" ../with_file_with_content.zip &&
+		test_path_is_file hello &&
+		test world = $(cat hello)
+	)
+'
+
 test_expect_success 'git archive --format=zip --add-file twice' '
 	echo untracked >untracked &&
 	git archive --format=zip --prefix=one/ --add-file=untracked \
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
  2022-05-10 19:26       ` [PATCH v4 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
@ 2022-05-10 19:26       ` Johannes Schindelin via GitGitGadget
  2022-05-10 21:56         ` Junio C Hamano
  2022-05-10 19:27       ` [PATCH v4 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
                         ` (6 subsequent siblings)
  8 siblings, 1 reply; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-10 19:26 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

By allowing the path to be enclosed in double-quotes, we can avoid
the limitation that paths cannot contain colons.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 14 ++++++++++----
 archive.c                     | 30 ++++++++++++++++++++----------
 t/t5003-archive-zip.sh        |  8 ++++++++
 3 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index a0edc9167b2..21eab5690ad 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -67,10 +67,16 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 +
-The `<path>` cannot contain any colon, the file mode is limited to
-a regular file, and the option may be subject to platform-dependent
-command-line limits. For non-trivial cases, write an untracked file
-and use `--add-file` instead.
+The `<path>` argument can start and end with a literal double-quote
+character; The contained file name is interpreted as a C-style string,
+i.e. the backslash is interpreted as escape character. The path must
+be quoted if it contains a colon, to avoid the colon from being
+misinterpreted as the separator between the path and the contents, or
+if the path begins or ends with a double-quote character.
++
+The file mode is limited to a regular file, and the option may be
+subject to platform-dependent command-line limits. For non-trivial
+cases, write an untracked file and use `--add-file` instead.
 
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
diff --git a/archive.c b/archive.c
index d798624cd5f..477eba60ac3 100644
--- a/archive.c
+++ b/archive.c
@@ -9,6 +9,7 @@
 #include "parse-options.h"
 #include "unpack-trees.h"
 #include "dir.h"
+#include "quote.h"
 
 static char const * const archive_usage[] = {
 	N_("git archive [<options>] <tree-ish> [<path>...]"),
@@ -533,22 +534,31 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 			die(_("Not a regular file: %s"), path);
 		info->content = NULL; /* read the file later */
 	} else {
-		const char *colon = strchr(arg, ':');
-		char *p;
+		struct strbuf buf = STRBUF_INIT;
+		const char *p = arg;
+
+		if (*p != '"')
+			p = strchr(p, ':');
+		else if (unquote_c_style(&buf, p, &p) < 0)
+			die(_("unclosed quote: '%s'"), arg);
 
-		if (!colon)
+		if (!p || *p != ':')
 			die(_("missing colon: '%s'"), arg);
 
-		p = xstrndup(arg, colon - arg);
-		if (!args->prefix)
-			path = p;
-		else {
-			path = prefix_filename(args->prefix, p);
-			free(p);
+		if (p == arg)
+			die(_("empty file name: '%s'"), arg);
+
+		path = buf.len ?
+			strbuf_detach(&buf, NULL) : xstrndup(arg, p - arg);
+
+		if (args->prefix) {
+			char *save = path;
+			path = prefix_filename(args->prefix, path);
+			free(save);
 		}
 		memset(&info->stat, 0, sizeof(info->stat));
 		info->stat.st_mode = S_IFREG | 0644;
-		info->content = xstrdup(colon + 1);
+		info->content = xstrdup(p + 1);
 		info->stat.st_size = strlen(info->content);
 	}
 	item = string_list_append_nodup(&args->extra_files, path);
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 8ff1257f1a0..5b8bbfc2692 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -207,13 +207,21 @@ check_zip with_untracked
 check_added with_untracked untracked untracked
 
 test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
+	if test_have_prereq FUNNYNAMES
+	then
+		QUOTED=quoted:colon
+	else
+		QUOTED=quoted
+	fi &&
 	git archive --format=zip >with_file_with_content.zip \
+		--add-file-with-content=\"$QUOTED\": \
 		--add-file-with-content=hello:world $EMPTY_TREE &&
 	test_when_finished "rm -rf tmp-unpack" &&
 	mkdir tmp-unpack && (
 		cd tmp-unpack &&
 		"$GIT_UNZIP" ../with_file_with_content.zip &&
 		test_path_is_file hello &&
+		test_path_is_file $QUOTED &&
 		test world = $(cat hello)
 	)
 '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
  2022-05-10 19:26       ` [PATCH v4 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
  2022-05-10 19:26       ` [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
@ 2022-05-10 19:27       ` Johannes Schindelin via GitGitGadget
  2022-05-17 14:51         ` Ævar Arnfjörð Bjarmason
  2022-05-10 19:27       ` [PATCH v4 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
                         ` (5 subsequent siblings)
  8 siblings, 1 reply; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-10 19:27 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `scalar` command needs a Scalar enlistment for many subcommands, and
looks in the current directory for such an enlistment (traversing the
parent directories until it finds one).

These is subcommands can also be called with an optional argument
specifying the enlistment. Here, too, we traverse parent directories as
needed, until we find an enlistment.

However, if the specified directory does not even exist, or is not a
directory, we should stop right there, with an error message.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 6 ++++--
 contrib/scalar/t/t9099-scalar.sh | 5 +++++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 1ce9c2b00e8..00dcd4b50ef 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -43,9 +43,11 @@ static void setup_enlistment_directory(int argc, const char **argv,
 		usage_with_options(usagestr, options);
 
 	/* find the worktree, determine its corresponding root */
-	if (argc == 1)
+	if (argc == 1) {
 		strbuf_add_absolute_path(&path, argv[0]);
-	else if (strbuf_getcwd(&path) < 0)
+		if (!is_directory(path.buf))
+			die(_("'%s' does not exist"), path.buf);
+	} else if (strbuf_getcwd(&path) < 0)
 		die(_("need a working directory"));
 
 	strbuf_trim_trailing_dir_sep(&path);
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 2e1502ad45e..9d83fdf25e8 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -85,4 +85,9 @@ test_expect_success 'scalar delete with enlistment' '
 	test_path_is_missing cloned
 '
 
+test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
+	! scalar run config cloned 2>err &&
+	grep "cloned. does not exist" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 4/7] Implement `scalar diagnose`
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
                         ` (2 preceding siblings ...)
  2022-05-10 19:27       ` [PATCH v4 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
@ 2022-05-10 19:27       ` Johannes Schindelin via GitGitGadget
  2022-05-17 14:53         ` Ævar Arnfjörð Bjarmason
  2022-05-10 19:27       ` [PATCH v4 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
                         ` (4 subsequent siblings)
  8 siblings, 1 reply; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-10 19:27 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Over the course of Scalar's development, it became obvious that there is
a need for a command that can gather all kinds of useful information
that can help identify the most typical problems with large
worktrees/repositories.

The `diagnose` command is the culmination of this hard-won knowledge: it
gathers the installed hooks, the config, a couple statistics describing
the data shape, among other pieces of information, and then wraps
everything up in a tidy, neat `.zip` archive.

Note: originally, Scalar was implemented in C# using the .NET API, where
we had the luxury of a comprehensive standard library that includes
basic functionality such as writing a `.zip` file. In the C version, we
lack such a commodity. Rather than introducing a dependency on, say,
libzip, we slightly abuse Git's `archive` machinery: we write out a
`.zip` of the empty try, augmented by a couple files that are added via
the `--add-file*` options. We are careful trying not to modify the
current repository in any way lest the very circumstances that required
`scalar diagnose` to be run are changed by the `diagnose` run itself.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 144 +++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt        |  12 +++
 contrib/scalar/t/t9099-scalar.sh |  14 +++
 3 files changed, 170 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 00dcd4b50ef..367a2c50e25 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,6 +11,7 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
+#include "archive.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -261,6 +262,47 @@ static int unregister_dir(void)
 	return res;
 }
 
+static int add_directory_to_archiver(struct strvec *archiver_args,
+					  const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir = opendir(at_root ? "." : path);
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	if (!dir)
+		return error(_("could not open directory '%s'"), path);
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if (e->d_type == DT_REG)
+			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
+		else if (e->d_type != DT_DIR)
+			warning(_("skipping '%s', which is neither file nor "
+				  "directory"), buf.buf);
+		else if (recurse &&
+			 add_directory_to_archiver(archiver_args,
+						   buf.buf, recurse) < 0)
+			res = -1;
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -501,6 +543,107 @@ cleanup:
 	return res;
 }
 
+static int cmd_diagnose(int argc, const char **argv)
+{
+	struct option options[] = {
+		OPT_END(),
+	};
+	const char * const usage[] = {
+		N_("scalar diagnose [<enlistment>]"),
+		NULL
+	};
+	struct strbuf zip_path = STRBUF_INIT;
+	struct strvec archiver_args = STRVEC_INIT;
+	char **argv_copy = NULL;
+	int stdout_fd = -1, archiver_fd = -1;
+	time_t now = time(NULL);
+	struct tm tm;
+	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
+	int res = 0;
+
+	argc = parse_options(argc, argv, NULL, options,
+			     usage, 0);
+
+	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
+
+	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
+	strbuf_addftime(&zip_path,
+			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
+	strbuf_addstr(&zip_path, ".zip");
+	switch (safe_create_leading_directories(zip_path.buf)) {
+	case SCLD_EXISTS:
+	case SCLD_OK:
+		break;
+	default:
+		error_errno(_("could not create directory for '%s'"),
+			    zip_path.buf);
+		goto diagnose_cleanup;
+	}
+	stdout_fd = dup(1);
+	if (stdout_fd < 0) {
+		res = error_errno(_("could not duplicate stdout"));
+		goto diagnose_cleanup;
+	}
+
+	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
+		res = error_errno(_("could not redirect output"));
+		goto diagnose_cleanup;
+	}
+
+	init_zip_archiver();
+	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	write_or_die(stdout_fd, buf.buf, buf.len);
+	strvec_pushf(&archiver_args,
+		     "--add-file-with-content=diagnostics.log:%.*s",
+		     (int)buf.len, buf.buf);
+
+	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	strvec_pushl(&archiver_args, "--prefix=",
+		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
+
+	/* `write_archive()` modifies the `argv` passed to it. Let it. */
+	argv_copy = xmemdupz(archiver_args.v,
+			     sizeof(char *) * archiver_args.nr);
+	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
+			    the_repository, NULL, 0);
+	if (res) {
+		error(_("failed to write archive"));
+		goto diagnose_cleanup;
+	}
+
+	if (!res)
+		fprintf(stderr, "\n"
+		       "Diagnostics complete.\n"
+		       "All of the gathered info is captured in '%s'\n",
+		       zip_path.buf);
+
+diagnose_cleanup:
+	if (archiver_fd >= 0) {
+		close(1);
+		dup2(stdout_fd, 1);
+	}
+	free(argv_copy);
+	strvec_clear(&archiver_args);
+	strbuf_release(&zip_path);
+	strbuf_release(&path);
+	strbuf_release(&buf);
+
+	return res;
+}
+
 static int cmd_list(int argc, const char **argv)
 {
 	if (argc != 1)
@@ -802,6 +945,7 @@ static struct {
 	{ "reconfigure", cmd_reconfigure },
 	{ "delete", cmd_delete },
 	{ "version", cmd_version },
+	{ "diagnose", cmd_diagnose },
 	{ NULL, NULL},
 };
 
diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
index f416d637289..22583fe046e 100644
--- a/contrib/scalar/scalar.txt
+++ b/contrib/scalar/scalar.txt
@@ -14,6 +14,7 @@ scalar register [<enlistment>]
 scalar unregister [<enlistment>]
 scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
 scalar reconfigure [ --all | <enlistment> ]
+scalar diagnose [<enlistment>]
 scalar delete <enlistment>
 
 DESCRIPTION
@@ -129,6 +130,17 @@ reconfigure the enlistment.
 With the `--all` option, all enlistments currently registered with Scalar
 will be reconfigured. Use this option after each Scalar upgrade.
 
+Diagnose
+~~~~~~~~
+
+diagnose [<enlistment>]::
+    When reporting issues with Scalar, it is often helpful to provide the
+    information gathered by this command, including logs and certain
+    statistics describing the data shape of the current enlistment.
++
+The output of this command is a `.zip` file that is written into
+a directory adjacent to the worktree in the `src` directory.
+
 Delete
 ~~~~~~
 
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 9d83fdf25e8..6802d317258 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -90,4 +90,18 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 	grep "cloned. does not exist" err
 '
 
+SQ="'"
+test_expect_success UNZIP 'scalar diagnose' '
+	scalar clone "file://$(pwd)" cloned --single-branch &&
+	scalar diagnose cloned >out 2>err &&
+	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
+	zip_path=$(cat zip_path) &&
+	test -n "$zip_path" &&
+	unzip -v "$zip_path" &&
+	folder=${zip_path%.zip} &&
+	test_path_is_missing "$folder" &&
+	unzip -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 5/7] scalar diagnose: include disk space information
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
                         ` (3 preceding siblings ...)
  2022-05-10 19:27       ` [PATCH v4 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-05-10 19:27       ` Johannes Schindelin via GitGitGadget
  2022-05-10 19:27       ` [PATCH v4 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
                         ` (3 subsequent siblings)
  8 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-10 19:27 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When analyzing problems with large worktrees/repositories, it is useful
to know how close to a "full disk" situation Scalar/Git operates. Let's
include this information.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 53 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  1 +
 2 files changed, 54 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 367a2c50e25..34cbec59b45 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -303,6 +303,58 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 	return res;
 }
 
+#ifndef WIN32
+#include <sys/statvfs.h>
+#endif
+
+static int get_disk_info(struct strbuf *out)
+{
+#ifdef WIN32
+	struct strbuf buf = STRBUF_INIT;
+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
+	DWORD serial_number, component_length, flags;
+	ULARGE_INTEGER avail2caller, total, avail;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
+		error(_("could not determine free disk size for '%s'"),
+		      buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
+				   &serial_number, &component_length, &flags,
+				   fs_name, sizeof(fs_name))) {
+		error(_("could not get info for '%s'"), buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
+	strbuf_addch(out, '\n');
+	strbuf_release(&buf);
+#else
+	struct strbuf buf = STRBUF_INIT;
+	struct statvfs stat;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (statvfs(buf.buf, &stat) < 0) {
+		error_errno(_("could not determine free disk size for '%s'"),
+			    buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
+	strbuf_release(&buf);
+#endif
+	return 0;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -599,6 +651,7 @@ static int cmd_diagnose(int argc, const char **argv)
 	get_version_info(&buf, 1);
 
 	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	get_disk_info(&buf);
 	write_or_die(stdout_fd, buf.buf, buf.len);
 	strvec_pushf(&archiver_args,
 		     "--add-file-with-content=diagnostics.log:%.*s",
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 6802d317258..934b2485d91 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -94,6 +94,7 @@ SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	scalar diagnose cloned >out 2>err &&
+	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
 	zip_path=$(cat zip_path) &&
 	test -n "$zip_path" &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 6/7] scalar: teach `diagnose` to gather packfile info
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
                         ` (4 preceding siblings ...)
  2022-05-10 19:27       ` [PATCH v4 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
@ 2022-05-10 19:27       ` Matthew John Cheetham via GitGitGadget
  2022-05-10 19:27       ` [PATCH v4 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
                         ` (2 subsequent siblings)
  8 siblings, 0 replies; 140+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-05-10 19:27 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

It's helpful to see if there are other crud files in the pack
directory. Let's teach the `scalar diagnose` command to gather
file size information about pack files.

While at it, also enumerate the pack files in the alternate
object directories, if any are registered.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 30 ++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  6 +++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 34cbec59b45..e8e0a5ec473 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "help.h"
 #include "archive.h"
+#include "object-store.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -595,6 +596,29 @@ cleanup:
 	return res;
 }
 
+static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
+				   const char *file_name, void *data)
+{
+	struct strbuf *buf = data;
+	struct stat st;
+
+	if (!stat(full_path, &st))
+		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
+			    (uintmax_t)st.st_size);
+}
+
+static int dir_file_stats(struct object_directory *object_dir, void *data)
+{
+	struct strbuf *buf = data;
+
+	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
+
+	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
+				  data);
+
+	return 0;
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -657,6 +681,12 @@ static int cmd_diagnose(int argc, const char **argv)
 		     "--add-file-with-content=diagnostics.log:%.*s",
 		     (int)buf.len, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-file-with-content=packs-local.txt:");
+	dir_file_stats(the_repository->objects->odb, &buf);
+	foreach_alt_odb(dir_file_stats, &buf);
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 934b2485d91..3dd5650cceb 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -93,6 +93,8 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
+	git repack &&
+	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
 	scalar diagnose cloned >out 2>err &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
@@ -102,7 +104,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	folder=${zip_path%.zip} &&
 	test_path_is_missing "$folder" &&
 	unzip -p "$zip_path" diagnostics.log >out &&
-	test_file_not_empty out
+	test_file_not_empty out &&
+	unzip -p "$zip_path" packs-local.txt >out &&
+	grep "$(pwd)/.git/objects" out
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 7/7] scalar: teach `diagnose` to gather loose objects information
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
                         ` (5 preceding siblings ...)
  2022-05-10 19:27       ` [PATCH v4 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
@ 2022-05-10 19:27       ` Matthew John Cheetham via GitGitGadget
  2022-05-17 15:03       ` [PATCH v4 0/7] scalar: implement the subcommand "diagnose" Ævar Arnfjörð Bjarmason
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
  8 siblings, 0 replies; 140+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-05-10 19:27 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

When operating at the scale that Scalar wants to support, certain data
shapes are more likely to cause undesirable performance issues, such as
large numbers of loose objects.

By including statistics about this, `scalar diagnose` now makes it
easier to identify such scenarios.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 59 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  5 ++-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index e8e0a5ec473..03da7452d83 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -619,6 +619,60 @@ static int dir_file_stats(struct object_directory *object_dir, void *data)
 	return 0;
 }
 
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -687,6 +741,11 @@ static int cmd_diagnose(int argc, const char **argv)
 	foreach_alt_odb(dir_file_stats, &buf);
 	strvec_push(&archiver_args, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-file-with-content=objects-local.txt:");
+	loose_objs_stats(&buf, ".git/objects");
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 3dd5650cceb..72023a1ca1d 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -95,6 +95,7 @@ test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	git repack &&
 	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
+	test_commit -C cloned/src loose &&
 	scalar diagnose cloned >out 2>err &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
@@ -106,7 +107,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	unzip -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
 	unzip -p "$zip_path" packs-local.txt >out &&
-	grep "$(pwd)/.git/objects" out
+	grep "$(pwd)/.git/objects" out &&
+	unzip -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [1-9]" out
 '
 
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-10 19:26       ` [PATCH v4 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
@ 2022-05-10 21:48         ` Junio C Hamano
  2022-05-10 22:06           ` rsbecker
  2022-05-12 22:31           ` [PATCH] fixup! " Junio C Hamano
  0 siblings, 2 replies; 140+ messages in thread
From: Junio C Hamano @ 2022-05-10 21:48 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, René Scharfe, Taylor Blau, Derrick Stolee,
	Elijah Newren, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> @@ -514,14 +522,38 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
>  	if (!arg)
>  		return -1;
>  
> -	path = prefix_filename(args->prefix, arg);
> -	item = string_list_append_nodup(&args->extra_files, path);
> -	item->util = info = xmalloc(sizeof(*info));
> +	info = xmalloc(sizeof(*info));
>  	info->base = xstrdup_or_null(base);
> -	if (stat(path, &info->stat))
> -		die(_("File not found: %s"), path);
> -	if (!S_ISREG(info->stat.st_mode))
> -		die(_("Not a regular file: %s"), path);
> +
> +	if (!strcmp(opt->long_name, "add-file")) {
> +		path = prefix_filename(args->prefix, arg);
> +		if (stat(path, &info->stat))
> +			die(_("File not found: %s"), path);
> +		if (!S_ISREG(info->stat.st_mode))
> +			die(_("Not a regular file: %s"), path);
> +		info->content = NULL; /* read the file later */
> +	} else {

This pretends that this new one will stay to be the only other
option that uses the same callback in the future.  To be more
defensive, it should do

	} else if (!strcmp(opt->long_name, "...")) {

and end the if/else if/else cascade with

	} else {
		BUG("add_file_cb called for unknown option");
	}

> +		const char *colon = strchr(arg, ':');
> +		char *p;
> +
> +		if (!colon)
> +			die(_("missing colon: '%s'"), arg);
> +
> +		p = xstrndup(arg, colon - arg);
> +		if (!args->prefix)
> +			path = p;
> +		else {
> +			path = prefix_filename(args->prefix, p);
> +			free(p);
> +		}
> +		memset(&info->stat, 0, sizeof(info->stat));
> +		info->stat.st_mode = S_IFREG | 0644;

I can sympathize with the desire to omit the mode bits because it
may not be useful for the immediate purpose of "scalar diagnose"
where the extracting end won't care what the file's permission bits
are, but by letting this "mode is hardcoded" thing squat here would
later make it more work when other people want to add an option that
truely lets the caller add a "vitual" file, in response to end-user
complaints that they cannot use the existing one to add an
exectuable file, for example.  I do not care too much about the
pathname limitation that does not allow a colon in it, simply
because it is unusual enough, but I am not sure about hardcoded
permission bits.

If we did "--add-virtual-file=<path>:0644:<contents>" instead from
day one, it certainly adds a few more lines of logic to this patch,
and the calling "scalar diagnose" may have to pass a few more bytes,
but I suspect that such a change would help the project in the
longer run.

Thanks.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-10 19:26       ` [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
@ 2022-05-10 21:56         ` Junio C Hamano
  2022-05-10 22:23           ` rsbecker
  2022-05-19 18:09           ` Johannes Schindelin
  0 siblings, 2 replies; 140+ messages in thread
From: Junio C Hamano @ 2022-05-10 21:56 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, René Scharfe, Taylor Blau, Derrick Stolee,
	Elijah Newren, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> By allowing the path to be enclosed in double-quotes, we can avoid
> the limitation that paths cannot contain colons.
> ...
> +		struct strbuf buf = STRBUF_INIT;
> +		const char *p = arg;
> +
> +		if (*p != '"')
> +			p = strchr(p, ':');
> +		else if (unquote_c_style(&buf, p, &p) < 0)
> +			die(_("unclosed quote: '%s'"), arg);

Even though I do not think people necessarily would want to use
colons in their pathnames (it has problems interoperating with other
systems), lifting the limitation is a good thing to do.  I totally
forgot that we designed unquote_c_style() to self terminate and
return the end pointer to the caller so the caller does not have to
worry, which is very nice. 

Even if this step weren't here in the series, I would have thought
the mode bits issue was more serious than "no colons in path"
limitation, but given that we address this unusual corner case
limitation, I would think we should address the hardcoded mode bits
at the same time.

> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> index 8ff1257f1a0..5b8bbfc2692 100755
> --- a/t/t5003-archive-zip.sh
> +++ b/t/t5003-archive-zip.sh
> @@ -207,13 +207,21 @@ check_zip with_untracked
>  check_added with_untracked untracked untracked
>  
>  test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
> +	if test_have_prereq FUNNYNAMES
> +	then
> +		QUOTED=quoted:colon
> +	else
> +		QUOTED=quoted
> +	fi &&

;-)

>  	git archive --format=zip >with_file_with_content.zip \
> +		--add-file-with-content=\"$QUOTED\": \
>  		--add-file-with-content=hello:world $EMPTY_TREE &&
>  	test_when_finished "rm -rf tmp-unpack" &&
>  	mkdir tmp-unpack && (
>  		cd tmp-unpack &&
>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
>  		test_path_is_file hello &&
> +		test_path_is_file $QUOTED &&

Looks OK, even though it probably is a good idea to have dq around
$QUOTED, so that future developers can easily insert SP into its
value to use a bit more common but still a bit more problematic
pathnames in the test.

Thanks.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-10 21:48         ` Junio C Hamano
@ 2022-05-10 22:06           ` rsbecker
  2022-05-10 23:21             ` Junio C Hamano
  2022-05-12 22:31           ` [PATCH] fixup! " Junio C Hamano
  1 sibling, 1 reply; 140+ messages in thread
From: rsbecker @ 2022-05-10 22:06 UTC (permalink / raw)
  To: 'Junio C Hamano', 'Johannes Schindelin via GitGitGadget'
  Cc: git, 'René Scharfe', 'Taylor Blau',
	'Derrick Stolee', 'Elijah Newren',
	'Johannes Schindelin'

On May 10, 2022 5:48 PM, Junio C Hamano wrote:
>"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
>writes:
>
>> @@ -514,14 +522,38 @@ static int add_file_cb(const struct option *opt, const
>char *arg, int unset)
>>  	if (!arg)
>>  		return -1;
>>
>> -	path = prefix_filename(args->prefix, arg);
>> -	item = string_list_append_nodup(&args->extra_files, path);
>> -	item->util = info = xmalloc(sizeof(*info));
>> +	info = xmalloc(sizeof(*info));
>>  	info->base = xstrdup_or_null(base);
>> -	if (stat(path, &info->stat))
>> -		die(_("File not found: %s"), path);
>> -	if (!S_ISREG(info->stat.st_mode))
>> -		die(_("Not a regular file: %s"), path);
>> +
>> +	if (!strcmp(opt->long_name, "add-file")) {
>> +		path = prefix_filename(args->prefix, arg);
>> +		if (stat(path, &info->stat))
>> +			die(_("File not found: %s"), path);
>> +		if (!S_ISREG(info->stat.st_mode))
>> +			die(_("Not a regular file: %s"), path);
>> +		info->content = NULL; /* read the file later */
>> +	} else {
>
>This pretends that this new one will stay to be the only other option that uses the
>same callback in the future.  To be more defensive, it should do
>
>	} else if (!strcmp(opt->long_name, "...")) {
>
>and end the if/else if/else cascade with
>
>	} else {
>		BUG("add_file_cb called for unknown option");
>	}
>
>> +		const char *colon = strchr(arg, ':');
>> +		char *p;
>> +
>> +		if (!colon)
>> +			die(_("missing colon: '%s'"), arg);
>> +
>> +		p = xstrndup(arg, colon - arg);
>> +		if (!args->prefix)
>> +			path = p;
>> +		else {
>> +			path = prefix_filename(args->prefix, p);
>> +			free(p);
>> +		}
>> +		memset(&info->stat, 0, sizeof(info->stat));
>> +		info->stat.st_mode = S_IFREG | 0644;
>
>I can sympathize with the desire to omit the mode bits because it may not be
>useful for the immediate purpose of "scalar diagnose"
>where the extracting end won't care what the file's permission bits are, but by
>letting this "mode is hardcoded" thing squat here would later make it more work
>when other people want to add an option that truely lets the caller add a "vitual"
>file, in response to end-user complaints that they cannot use the existing one to
>add an exectuable file, for example.  I do not care too much about the pathname
>limitation that does not allow a colon in it, simply because it is unusual enough, but
>I am not sure about hardcoded permission bits.
>
>If we did "--add-virtual-file=<path>:0644:<contents>" instead from day one, it
>certainly adds a few more lines of logic to this patch, and the calling "scalar
>diagnose" may have to pass a few more bytes, but I suspect that such a change
>would help the project in the longer run.

Would not core.filemode=false somewhat simulate this? The consumer-client would not care/do anything with the mode anyway. Or am I missing something?
--Randall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-10 21:56         ` Junio C Hamano
@ 2022-05-10 22:23           ` rsbecker
  2022-05-19 18:12             ` Johannes Schindelin
  2022-05-19 18:09           ` Johannes Schindelin
  1 sibling, 1 reply; 140+ messages in thread
From: rsbecker @ 2022-05-10 22:23 UTC (permalink / raw)
  To: 'Junio C Hamano', 'Johannes Schindelin via GitGitGadget'
  Cc: git, 'René Scharfe', 'Taylor Blau',
	'Derrick Stolee', 'Elijah Newren',
	'Johannes Schindelin'

On May 10, 2022 5:57 PM, Junio C Hamano wrote:
>"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
>writes:
>
>> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>>
>> By allowing the path to be enclosed in double-quotes, we can avoid the
>> limitation that paths cannot contain colons.
>> ...
>> +		struct strbuf buf = STRBUF_INIT;
>> +		const char *p = arg;
>> +
>> +		if (*p != '"')
>> +			p = strchr(p, ':');
>> +		else if (unquote_c_style(&buf, p, &p) < 0)
>> +			die(_("unclosed quote: '%s'"), arg);
>
>Even though I do not think people necessarily would want to use colons in their
>pathnames (it has problems interoperating with other systems), lifting the
>limitation is a good thing to do.  I totally forgot that we designed
>unquote_c_style() to self terminate and return the end pointer to the caller so the
>caller does not have to worry, which is very nice.
>
>Even if this step weren't here in the series, I would have thought the mode bits
>issue was more serious than "no colons in path"
>limitation, but given that we address this unusual corner case limitation, I would
>think we should address the hardcoded mode bits at the same time.
>
>> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh index
>> 8ff1257f1a0..5b8bbfc2692 100755
>> --- a/t/t5003-archive-zip.sh
>> +++ b/t/t5003-archive-zip.sh
>> @@ -207,13 +207,21 @@ check_zip with_untracked  check_added
>> with_untracked untracked untracked
>>
>>  test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
>> +	if test_have_prereq FUNNYNAMES
>> +	then
>> +		QUOTED=quoted:colon
>> +	else
>> +		QUOTED=quoted
>> +	fi &&
>
>;-)
>
>>  	git archive --format=zip >with_file_with_content.zip \
>> +		--add-file-with-content=\"$QUOTED\": \
>>  		--add-file-with-content=hello:world $EMPTY_TREE &&
>>  	test_when_finished "rm -rf tmp-unpack" &&
>>  	mkdir tmp-unpack && (
>>  		cd tmp-unpack &&
>>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
>>  		test_path_is_file hello &&
>> +		test_path_is_file $QUOTED &&
>
>Looks OK, even though it probably is a good idea to have dq around $QUOTED, so
>that future developers can easily insert SP into its value to use a bit more common
>but still a bit more problematic pathnames in the test.

A test case for .gitignore in this would be good too. People on our exotic platform do this stuff as a matter of course. As an example, a name of $Z3P4:12399334 being used as a named pipe (associated with the unique name of a process) actually has been seen in the wild recently. My solution was to wild card this and/or contain it in an ignored directory.
Regards,
Randall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-10 22:06           ` rsbecker
@ 2022-05-10 23:21             ` Junio C Hamano
  2022-05-11 16:14               ` René Scharfe
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-05-10 23:21 UTC (permalink / raw)
  To: rsbecker
  Cc: 'Johannes Schindelin via GitGitGadget',
	git, 'René Scharfe', 'Taylor Blau',
	'Derrick Stolee', 'Elijah Newren',
	'Johannes Schindelin'

<rsbecker@nexbridge.com> writes:

>>If we did "--add-virtual-file=<path>:0644:<contents>" instead from day one, it
>>certainly adds a few more lines of logic to this patch, and the calling "scalar
>>diagnose" may have to pass a few more bytes, but I suspect that such a change
>>would help the project in the longer run.
>
> Would not core.filemode=false somewhat simulate this? The
> consumer-client would not care/do anything with the mode
> anyway. Or am I missing something?

Or I must be missing something.  This is part of "git archive" where
its output is a tarball (or a zipfile) in which each entry knows its
permission bits (or at least, if it is executable).  Running "tar xf" 
or "unzip" on the receiving end of the output of this command should
set the executable bit (and other permission bits) correctly I would
certainly hope, so it does matter, no?

I did say "scalar diagnose" may not care.  But a patch to "git
archive" will affect other people, and among them there would be
people who say "gee, now I can add a handful of files from the
command line with their contents, without actually having them in
throw-away untracked files, when running 'git archive'.  That's
handy!", try it out and get disappointed by their inability to
create executable files that way.  And obviously I care more about
"git archive" than "scalar diagnose".  I very welcome to enhance the
former to support the need for the latter.  I do not see a good
reason to stop at a half-feature added to the former, even that
added half is enough to satisfy the latter, when the other half is
not all that hard to add, and it is reasonably expected that users
other than "scalar diagnose" would naturally want the other half,
too.



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-10 23:21             ` Junio C Hamano
@ 2022-05-11 16:14               ` René Scharfe
  2022-05-11 19:27                 ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: René Scharfe @ 2022-05-11 16:14 UTC (permalink / raw)
  To: Junio C Hamano, rsbecker
  Cc: 'Johannes Schindelin via GitGitGadget',
	git, 'Taylor Blau', 'Derrick Stolee',
	'Elijah Newren', 'Johannes Schindelin'

Am 11.05.22 um 01:21 schrieb Junio C Hamano:
> <rsbecker@nexbridge.com> writes:
>
>>> If we did "--add-virtual-file=<path>:0644:<contents>" instead from day one, it
>>> certainly adds a few more lines of logic to this patch, and the calling "scalar
>>> diagnose" may have to pass a few more bytes, but I suspect that such a change
>>> would help the project in the longer run.

> I did say "scalar diagnose" may not care.  But a patch to "git
> archive" will affect other people, and among them there would be
> people who say "gee, now I can add a handful of files from the
> command line with their contents, without actually having them in
> throw-away untracked files, when running 'git archive'.  That's
> handy!", try it out and get disappointed by their inability to
> create executable files that way.

Which might motivate them to contribute a patch to add that feature.
Give them a chance! :)

> And obviously I care more about
> "git archive" than "scalar diagnose".  I very welcome to enhance the
> former to support the need for the latter.  I do not see a good
> reason to stop at a half-feature added to the former, even that
> added half is enough to satisfy the latter, when the other half is
> not all that hard to add, and it is reasonably expected that users
> other than "scalar diagnose" would naturally want the other half,
> too.

FWIW, I'd already be satisfied by a convincing outline of a way towards
a complete solution to accept the partial feature, just to be sure we
don't paint ourselves into a corner.  But I'm bad at both strategy and
saying no, so that's that.

Regarding file modes: We only effectively support the executable bit,
so an additional option --add-virtual-executable-file=<path>:<contents>
would suffice.  It would also prevent the false impression that
arbitrary file modes can be used ("I said 0123 and got 0644, bug!").
And it would not even be the longest Git option..

René

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-11 16:14               ` René Scharfe
@ 2022-05-11 19:27                 ` Junio C Hamano
  2022-05-12 16:16                   ` René Scharfe
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-05-11 19:27 UTC (permalink / raw)
  To: René Scharfe
  Cc: rsbecker, 'Johannes Schindelin via GitGitGadget',
	git, 'Taylor Blau', 'Derrick Stolee',
	'Elijah Newren', 'Johannes Schindelin'

René Scharfe <l.s.r@web.de> writes:

> Am 11.05.22 um 01:21 schrieb Junio C Hamano:
>> <rsbecker@nexbridge.com> writes:
>>
>>>> If we did "--add-virtual-file=<path>:0644:<contents>" instead from day one, it
>>>> certainly adds a few more lines of logic to this patch, and the calling "scalar
>>>> diagnose" may have to pass a few more bytes, but I suspect that such a change
>>>> would help the project in the longer run.
>
>> I did say "scalar diagnose" may not care.  But a patch to "git
>> archive" will affect other people, and among them there would be
>> people who say "gee, now I can add a handful of files from the
>> command line with their contents, without actually having them in
>> throw-away untracked files, when running 'git archive'.  That's
>> handy!", try it out and get disappointed by their inability to
>> create executable files that way.
>
> Which might motivate them to contribute a patch to add that feature.
> Give them a chance! :)

Yes, but there is no way to reuse the same option in a backward
compatible way to later add the mode information, and that is why we
want to be careful before a half-feature squats on an option.

> FWIW, I'd already be satisfied by a convincing outline of a way towards
> a complete solution to accept the partial feature, just to be sure we
> don't paint ourselves into a corner.

Exactly.  As you say, an extra and separate option can be used.  I
do not know if that is a workaround because we didn't design the
first option to take an additional option, or a welcome feature.

> Regarding file modes: We only effectively support the executable bit,
> so an additional option --add-virtual-executable-file=<path>:<contents>
> would suffice.

While I do not think we want to support more than one "is it
executable or not?" bit, I am not so sure about what the current
code does, though, for these "not from a tree, but added as extra
files" entries.

If you add an extra file from an on-disk untracked file, the
add_file_cb() callback picks up the full st.st_mode for the file,
and write_archive_entries() in its loop over args->extra_files pass
the full info->stat.st_mode down to write_entry(), which is used by
archive-tar.c::write_tar_entry() to obtain mode bits pretty much
as-is.  For tracked paths, we probably are normalizing the blobs
between 0644 and 0755 way before the values are passed as "mode"
parameter to the write_entry() functions, but for these extra files,
there is no such massaging.

So, I am OK with --add-virtual-executable=<path>:<contents> (but the
point still stands that the way the code in the patch squats in the
codepath makes it necessary to first refator it before it can
happen) as a separate option.  We may want to massage the mode bit
we grab from these extra files, if we were to go that route, though.

Thanks.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-11 19:27                 ` Junio C Hamano
@ 2022-05-12 16:16                   ` René Scharfe
  2022-05-12 18:15                     ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: René Scharfe @ 2022-05-12 16:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: rsbecker, 'Johannes Schindelin via GitGitGadget',
	git, 'Taylor Blau', 'Derrick Stolee',
	'Elijah Newren', 'Johannes Schindelin'

Am 11.05.22 um 21:27 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> Regarding file modes: We only effectively support the executable bit,
>> so an additional option --add-virtual-executable-file=<path>:<contents>
>> would suffice.
>
> While I do not think we want to support more than one "is it
> executable or not?" bit, I am not so sure about what the current
> code does, though, for these "not from a tree, but added as extra
> files" entries.
>
> If you add an extra file from an on-disk untracked file, the
> add_file_cb() callback picks up the full st.st_mode for the file,
> and write_archive_entries() in its loop over args->extra_files pass
> the full info->stat.st_mode down to write_entry(), which is used by
> archive-tar.c::write_tar_entry() to obtain mode bits pretty much
> as-is.

Good point.  write_tar_entry() actually normalizes the permission bits
and applies tar.umask (0002 by default):

	if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
		*header.typeflag = TYPEFLAG_DIR;
		mode = (mode | 0777) & ~tar_umask;
	} else if (S_ISLNK(mode)) {
		*header.typeflag = TYPEFLAG_LNK;
		mode |= 0777;
	} else if (S_ISREG(mode)) {
		*header.typeflag = TYPEFLAG_REG;
		mode = (mode | ((mode & 0100) ? 0777 : 0666)) & ~tar_umask;

But write_zip_entry() only normalizes (drops) the permission bits of
non-executable files:

                attr2 = S_ISLNK(mode) ? ((mode | 0777) << 16) :
                        (mode & 0111) ? ((mode) << 16) : 0;
                if (S_ISLNK(mode) || (mode & 0111))
                        creator_version = 0x0317;

attr2 corresponds to the field "external file attributes" mentioned in
the ZIP format specification, APPNOTE.TXT.  It's interpreted based on
the "version made by" (creator_version here); that 0x03 part above
means "UNIX".  The default is MS-DOS (FAT filesystem), with effectivly
no support for file permissions.

So we currently leak permission bits of executable files into ZIP
archives, but not tar files. :-|  Normalizing those to 0755 would be
more consistent.

> For tracked paths, we probably are normalizing the blobs
> between 0644 and 0755 way before the values are passed as "mode"
> parameter to the write_entry() functions, but for these extra files,
> there is no such massaging.

Right, mode values from read_tree() pass through canon_mode(), so only
untracked files (those appended with --add-file) are affected by the
leakage mentioned above.

René

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-12 16:16                   ` René Scharfe
@ 2022-05-12 18:15                     ` Junio C Hamano
  2022-05-12 21:31                       ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-05-12 18:15 UTC (permalink / raw)
  To: René Scharfe
  Cc: rsbecker, 'Johannes Schindelin via GitGitGadget',
	git, 'Taylor Blau', 'Derrick Stolee',
	'Elijah Newren', 'Johannes Schindelin'

René Scharfe <l.s.r@web.de> writes:

> Good point.  write_tar_entry() actually normalizes the permission bits
> and applies tar.umask (0002 by default):
>
> 	if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
> 		*header.typeflag = TYPEFLAG_DIR;
> 		mode = (mode | 0777) & ~tar_umask;
> 	} else if (S_ISLNK(mode)) {
> 		*header.typeflag = TYPEFLAG_LNK;
> 		mode |= 0777;
> 	} else if (S_ISREG(mode)) {
> 		*header.typeflag = TYPEFLAG_REG;
> 		mode = (mode | ((mode & 0100) ? 0777 : 0666)) & ~tar_umask;

Yeah, this side seems to care only about u+x bit, so
"add-executable" as a separate option would fly we..

> But write_zip_entry() only normalizes (drops) the permission bits of
> non-executable files:
>
>                 attr2 = S_ISLNK(mode) ? ((mode | 0777) << 16) :
>                         (mode & 0111) ? ((mode) << 16) : 0;
>                 if (S_ISLNK(mode) || (mode & 0111))
>                         creator_version = 0x0317;
>
> attr2 corresponds to the field "external file attributes" mentioned in
> the ZIP format specification, APPNOTE.TXT.  It's interpreted based on
> the "version made by" (creator_version here); that 0x03 part above
> means "UNIX".  The default is MS-DOS (FAT filesystem), with effectivly
> no support for file permissions.
>
> So we currently leak permission bits of executable files into ZIP
> archives, but not tar files. :-|  Normalizing those to 0755 would be
> more consistent.

Yup.

>> For tracked paths, we probably are normalizing the blobs
>> between 0644 and 0755 way before the values are passed as "mode"
>> parameter to the write_entry() functions, but for these extra files,
>> there is no such massaging.
>
> Right, mode values from read_tree() pass through canon_mode(), so only
> untracked files (those appended with --add-file) are affected by the
> leakage mentioned above.

Thanks for sanity-checking.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-12 18:15                     ` Junio C Hamano
@ 2022-05-12 21:31                       ` Junio C Hamano
  2022-05-14  7:06                         ` René Scharfe
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-05-12 21:31 UTC (permalink / raw)
  To: René Scharfe
  Cc: rsbecker, 'Johannes Schindelin via GitGitGadget',
	git, 'Taylor Blau', 'Derrick Stolee',
	'Elijah Newren', 'Johannes Schindelin'

Junio C Hamano <gitster@pobox.com> writes:

>> So we currently leak permission bits of executable files into ZIP
>> archives, but not tar files. :-|  Normalizing those to 0755 would be
>> more consistent.

Today, I was scanning the "What's cooking" draft and saw too many
topics that are marked with "Expecting a reroll".  It turns out that
this "mode bits" thing will not be a blocker to make us wait for a
reroll of the topic, so let's handle it separately, before we
forget, as an independent fix outside the series under discussion.

Thanks.

--- >8 ---
Subject: [PATCH] archive: do not let on-disk mode leak to zip archives

When the "--add-file" option is used to add the contents from an
untracked file to the archive, the permission mode bits for these
files are sent to the archive-backend specific "write_entry()"
method as-is.  We normalize the mode bits for tracked files way
before we pass them to the write_entry() method; we should do the
same here.

This is not strictly needed for "tar" archive-backend, as it has its
own code to further clean them up, but "zip" archive-backend is not
so well prepared.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 archive.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/archive.c b/archive.c
index e29d0e00f6..12a08af531 100644
--- a/archive.c
+++ b/archive.c
@@ -342,7 +342,7 @@ int write_archive_entries(struct archiver_args *args,
 		else
 			err = write_entry(args, &fake_oid, path_in_archive.buf,
 					  path_in_archive.len,
-					  info->stat.st_mode,
+					  canon_mode(info->stat.st_mode),
 					  content.buf, content.len);
 		if (err)
 			break;
-- 
2.36.1-338-g1c7f76a54c


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH] fixup! archive: optionally add "virtual" files
  2022-05-10 21:48         ` Junio C Hamano
  2022-05-10 22:06           ` rsbecker
@ 2022-05-12 22:31           ` Junio C Hamano
  1 sibling, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2022-05-12 22:31 UTC (permalink / raw)
  To: git
  Cc: Johannes Schindelin via GitGitGadget, René Scharfe,
	Taylor Blau, Derrick Stolee, Elijah Newren, Johannes Schindelin

Do not let add_file_cb() assume that two existing callers are the
only ones, and checking that the caller is not one of them is
sufficient to determine it is the other one.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * To be squashed to the commit with the title in the series.

   The "What's cooking" report is getting crowded with too many
   topics marked as "Expecting a reroll", and I'm trying to do
   easier ones myself to see how much reduction we can make.

 archive.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/archive.c b/archive.c
index 477eba60ac..98c7449ea1 100644
--- a/archive.c
+++ b/archive.c
@@ -533,7 +533,7 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 		if (!S_ISREG(info->stat.st_mode))
 			die(_("Not a regular file: %s"), path);
 		info->content = NULL; /* read the file later */
-	} else {
+	} else if (!strcmp(opt->long_name, "add-file-with-content")) {
 		struct strbuf buf = STRBUF_INIT;
 		const char *p = arg;
 
@@ -560,6 +560,8 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 		info->stat.st_mode = S_IFREG | 0644;
 		info->content = xstrdup(p + 1);
 		info->stat.st_size = strlen(info->content);
+	} else {
+		BUG("add_file_cb() called for %s", opt->long_name);
 	}
 	item = string_list_append_nodup(&args->extra_files, path);
 	item->util = info;
-- 
2.36.1-338-g1c7f76a54c


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-12 21:31                       ` Junio C Hamano
@ 2022-05-14  7:06                         ` René Scharfe
  0 siblings, 0 replies; 140+ messages in thread
From: René Scharfe @ 2022-05-14  7:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: rsbecker, 'Johannes Schindelin via GitGitGadget',
	git, 'Taylor Blau', 'Derrick Stolee',
	'Elijah Newren', 'Johannes Schindelin'

Am 12.05.22 um 23:31 schrieb Junio C Hamano:
> Junio C Hamano <gitster@pobox.com> writes:
>
>>> So we currently leak permission bits of executable files into ZIP
>>> archives, but not tar files. :-|  Normalizing those to 0755 would be
>>> more consistent.
>
> Today, I was scanning the "What's cooking" draft and saw too many
> topics that are marked with "Expecting a reroll".  It turns out that
> this "mode bits" thing will not be a blocker to make us wait for a
> reroll of the topic, so let's handle it separately, before we
> forget, as an independent fix outside the series under discussion.
>
> Thanks.
>
> --- >8 ---
> Subject: [PATCH] archive: do not let on-disk mode leak to zip archives
>
> When the "--add-file" option is used to add the contents from an
> untracked file to the archive, the permission mode bits for these
> files are sent to the archive-backend specific "write_entry()"
> method as-is.  We normalize the mode bits for tracked files way
> before we pass them to the write_entry() method; we should do the
> same here.
>
> This is not strictly needed for "tar" archive-backend, as it has its
> own code to further clean them up, but "zip" archive-backend is not
> so well prepared.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  archive.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/archive.c b/archive.c
> index e29d0e00f6..12a08af531 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -342,7 +342,7 @@ int write_archive_entries(struct archiver_args *args,
>  		else
>  			err = write_entry(args, &fake_oid, path_in_archive.buf,
>  					  path_in_archive.len,
> -					  info->stat.st_mode,
> +					  canon_mode(info->stat.st_mode),
>  					  content.buf, content.len);
>  		if (err)
>  			break;

Looks good to me, thank you!

René

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-10 19:27       ` [PATCH v4 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
@ 2022-05-17 14:51         ` Ævar Arnfjörð Bjarmason
  2022-05-18 17:35           ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-17 14:51 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, René Scharfe, Taylor Blau, Derrick Stolee,
	Elijah Newren, Johannes Schindelin


On Tue, May 10 2022, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> The `scalar` command needs a Scalar enlistment for many subcommands, and
> looks in the current directory for such an enlistment (traversing the
> parent directories until it finds one).
>
> These is subcommands can also be called with an optional argument
> specifying the enlistment. Here, too, we traverse parent directories as
> needed, until we find an enlistment.
>
> However, if the specified directory does not even exist, or is not a
> directory, we should stop right there, with an error message.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  contrib/scalar/scalar.c          | 6 ++++--
>  contrib/scalar/t/t9099-scalar.sh | 5 +++++
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index 1ce9c2b00e8..00dcd4b50ef 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -43,9 +43,11 @@ static void setup_enlistment_directory(int argc, const char **argv,
>  		usage_with_options(usagestr, options);
>  
>  	/* find the worktree, determine its corresponding root */
> -	if (argc == 1)
> +	if (argc == 1) {
>  		strbuf_add_absolute_path(&path, argv[0]);
> -	else if (strbuf_getcwd(&path) < 0)
> +		if (!is_directory(path.buf))
> +			die(_("'%s' does not exist"), path.buf);
> +	} else if (strbuf_getcwd(&path) < 0)
>  		die(_("need a working directory"));
>  
>  	strbuf_trim_trailing_dir_sep(&path);
> diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> index 2e1502ad45e..9d83fdf25e8 100755
> --- a/contrib/scalar/t/t9099-scalar.sh
> +++ b/contrib/scalar/t/t9099-scalar.sh
> @@ -85,4 +85,9 @@ test_expect_success 'scalar delete with enlistment' '
>  	test_path_is_missing cloned
>  '
>  
> +test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
> +	! scalar run config cloned 2>err &&

Needs to use test_must_fail, not !

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 4/7] Implement `scalar diagnose`
  2022-05-10 19:27       ` [PATCH v4 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-05-17 14:53         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 140+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-17 14:53 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, René Scharfe, Taylor Blau, Derrick Stolee,
	Elijah Newren, Johannes Schindelin


On Tue, May 10 2022, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> Over the course of Scalar's development, it became obvious that there is
> a need for a command that can gather all kinds of useful information
> that can help identify the most typical problems with large
> worktrees/repositories.
>
> The `diagnose` command is the culmination of this hard-won knowledge: it
> gathers the installed hooks, the config, a couple statistics describing
> the data shape, among other pieces of information, and then wraps
> everything up in a tidy, neat `.zip` archive.
>
> Note: originally, Scalar was implemented in C# using the .NET API, where
> we had the luxury of a comprehensive standard library that includes
> basic functionality such as writing a `.zip` file. In the C version, we
> lack such a commodity. Rather than introducing a dependency on, say,
> libzip, we slightly abuse Git's `archive` machinery: we write out a
> `.zip` of the empty try, augmented by a couple files that are added via
> the `--add-file*` options. We are careful trying not to modify the
> current repository in any way lest the very circumstances that required
> `scalar diagnose` to be run are changed by the `diagnose` run itself.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  contrib/scalar/scalar.c          | 144 +++++++++++++++++++++++++++++++
>  contrib/scalar/scalar.txt        |  12 +++
>  contrib/scalar/t/t9099-scalar.sh |  14 +++
>  3 files changed, 170 insertions(+)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index 00dcd4b50ef..367a2c50e25 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -11,6 +11,7 @@
>  #include "dir.h"
>  #include "packfile.h"
>  #include "help.h"
> +#include "archive.h"
>  
>  /*
>   * Remove the deepest subdirectory in the provided path string. Path must not
> @@ -261,6 +262,47 @@ static int unregister_dir(void)
>  	return res;
>  }
>  
> +static int add_directory_to_archiver(struct strvec *archiver_args,
> +					  const char *path, int recurse)
> +{
> +	int at_root = !*path;
> +	DIR *dir = opendir(at_root ? "." : path);
> +	struct dirent *e;
> +	struct strbuf buf = STRBUF_INIT;
> +	size_t len;
> +	int res = 0;
> +
> +	if (!dir)
> +		return error(_("could not open directory '%s'"), path);


s/error/error_errno/, surely?

> +	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
> +	strbuf_addftime(&zip_path,
> +			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);

Would we be worse off if we stole this timestamp from some known file
(or HEAD), and thus made a second run of this reproducable?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 0/7] scalar: implement the subcommand "diagnose"
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
                         ` (6 preceding siblings ...)
  2022-05-10 19:27       ` [PATCH v4 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
@ 2022-05-17 15:03       ` Ævar Arnfjörð Bjarmason
  2022-05-17 15:28         ` rsbecker
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
  8 siblings, 1 reply; 140+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-17 15:03 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, René Scharfe, Taylor Blau, Derrick Stolee,
	Elijah Newren, Johannes Schindelin


On Tue, May 10 2022, Johannes Schindelin via GitGitGadget wrote:

> Over the course of the years, we developed a sub-command that gathers
> diagnostic data into a .zip file that can then be attached to bug reports.
> This sub-command turned out to be very useful in helping Scalar developers
> identify and fix issues.

I don't mind this as some intermediate step, but re the context of the
plan for scalar "eventually going away" (discussed in previous threads)
I wonder why (especially re the earlier thread upthread at [1]) this
isn't being added to "git bugreport".

Is the plan to integrate this into "git bugreport" eventually?

1. https://lore.kernel.org/git/nycvar.QRO.7.76.6.2202062213030.347@tvgsbejvaqbjf.bet/

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v4 0/7] scalar: implement the subcommand "diagnose"
  2022-05-17 15:03       ` [PATCH v4 0/7] scalar: implement the subcommand "diagnose" Ævar Arnfjörð Bjarmason
@ 2022-05-17 15:28         ` rsbecker
  2022-05-19 18:17           ` Johannes Schindelin
  0 siblings, 1 reply; 140+ messages in thread
From: rsbecker @ 2022-05-17 15:28 UTC (permalink / raw)
  To: 'Ævar Arnfjörð Bjarmason',
	'Johannes Schindelin via GitGitGadget'
  Cc: git, 'René Scharfe', 'Taylor Blau',
	'Derrick Stolee', 'Elijah Newren',
	'Johannes Schindelin'

On May 17, 2022 11:03 AM, Ævar Arnfjörð Bjarmason wrote:
>On Tue, May 10 2022, Johannes Schindelin via GitGitGadget wrote:
>
>> Over the course of the years, we developed a sub-command that gathers
>> diagnostic data into a .zip file that can then be attached to bug reports.
>> This sub-command turned out to be very useful in helping Scalar
>> developers identify and fix issues.
>
>I don't mind this as some intermediate step, but re the context of the plan for
>scalar "eventually going away" (discussed in previous threads) I wonder why
>(especially re the earlier thread upthread at [1]) this isn't being added to "git
>bugreport".
>
>Is the plan to integrate this into "git bugreport" eventually?
>
>1.
>https://lore.kernel.org/git/nycvar.QRO.7.76.6.2202062213030.347@tvgsbejvaqbjf.
>bet/

Could this also not be useful in fsck, as --diagnose? That's the go-to command when there are issues for many users.
--Randall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-17 14:51         ` Ævar Arnfjörð Bjarmason
@ 2022-05-18 17:35           ` Junio C Hamano
  2022-05-20  7:30             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-05-18 17:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Schindelin via GitGitGadget, git, René Scharfe,
	Taylor Blau, Derrick Stolee, Elijah Newren, Johannes Schindelin

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> +test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
>> +	! scalar run config cloned 2>err &&
>
> Needs to use test_must_fail, not !

Good eyes and careful reading are very much appreciated, but in this
case, doesn't such an improvement depend on an update to teach
test_must_fail_acceptable about scalar being whitelisted?



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-10 21:56         ` Junio C Hamano
  2022-05-10 22:23           ` rsbecker
@ 2022-05-19 18:09           ` Johannes Schindelin
  2022-05-19 18:44             ` Junio C Hamano
  1 sibling, 1 reply; 140+ messages in thread
From: Johannes Schindelin @ 2022-05-19 18:09 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, René Scharfe,
	Taylor Blau, Derrick Stolee, Elijah Newren

Hi Junio,

On Tue, 10 May 2022, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > By allowing the path to be enclosed in double-quotes, we can avoid
> > the limitation that paths cannot contain colons.
> > ...
> > +		struct strbuf buf = STRBUF_INIT;
> > +		const char *p = arg;
> > +
> > +		if (*p != '"')
> > +			p = strchr(p, ':');
> > +		else if (unquote_c_style(&buf, p, &p) < 0)
> > +			die(_("unclosed quote: '%s'"), arg);
>
> Even though I do not think people necessarily would want to use
> colons in their pathnames (it has problems interoperating with other
> systems), lifting the limitation is a good thing to do.  I totally
> forgot that we designed unquote_c_style() to self terminate and
> return the end pointer to the caller so the caller does not have to
> worry, which is very nice.
>
> Even if this step weren't here in the series, I would have thought
> the mode bits issue was more serious than "no colons in path"
> limitation, but given that we address this unusual corner case
> limitation, I would think we should address the hardcoded mode bits
> at the same time.
>
> > diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> > index 8ff1257f1a0..5b8bbfc2692 100755
> > --- a/t/t5003-archive-zip.sh
> > +++ b/t/t5003-archive-zip.sh
> > @@ -207,13 +207,21 @@ check_zip with_untracked
> >  check_added with_untracked untracked untracked
> >
> >  test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
> > +	if test_have_prereq FUNNYNAMES
> > +	then
> > +		QUOTED=quoted:colon
> > +	else
> > +		QUOTED=quoted
> > +	fi &&
>
> ;-)
>
> >  	git archive --format=zip >with_file_with_content.zip \
> > +		--add-file-with-content=\"$QUOTED\": \
> >  		--add-file-with-content=hello:world $EMPTY_TREE &&
> >  	test_when_finished "rm -rf tmp-unpack" &&
> >  	mkdir tmp-unpack && (
> >  		cd tmp-unpack &&
> >  		"$GIT_UNZIP" ../with_file_with_content.zip &&
> >  		test_path_is_file hello &&
> > +		test_path_is_file $QUOTED &&
>
> Looks OK, even though it probably is a good idea to have dq around
> $QUOTED, so that future developers can easily insert SP into its
> value to use a bit more common but still a bit more problematic
> pathnames in the test.

I actually decided against this because reading

	"$QUOTED"

would mislead future me to think that the double quotes that enclose
$QUOTED are the quotes that the variable's name talks about. But the
quotes are actually the escaped ones that are passed to `git archive`
above.

So, to help future Dscho should they read this code six months from now or
even later, I wanted to specifically only add quotes to the `git archive`
call to make the intention abundantly clear.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-10 22:23           ` rsbecker
@ 2022-05-19 18:12             ` Johannes Schindelin
  0 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin @ 2022-05-19 18:12 UTC (permalink / raw)
  To: rsbecker
  Cc: 'Junio C Hamano',
	'Johannes Schindelin via GitGitGadget',
	git, 'René Scharfe', 'Taylor Blau',
	'Derrick Stolee', 'Elijah Newren'

Hi Randall,

On Tue, 10 May 2022, rsbecker@nexbridge.com wrote:

> On May 10, 2022 5:57 PM, Junio C Hamano wrote:
> >"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> >writes:
> >
> >>  	git archive --format=zip >with_file_with_content.zip \
> >> +		--add-file-with-content=\"$QUOTED\": \
> >>  		--add-file-with-content=hello:world $EMPTY_TREE &&
> >>  	test_when_finished "rm -rf tmp-unpack" &&
> >>  	mkdir tmp-unpack && (
> >>  		cd tmp-unpack &&
> >>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
> >>  		test_path_is_file hello &&
> >> +		test_path_is_file $QUOTED &&
> >
> >Looks OK, even though it probably is a good idea to have dq around $QUOTED, so
> >that future developers can easily insert SP into its value to use a bit more common
> >but still a bit more problematic pathnames in the test.
>
> A test case for .gitignore in this would be good too. People on our
> exotic platform do this stuff as a matter of course. As an example, a
> name of $Z3P4:12399334 being used as a named pipe (associated with the
> unique name of a process) actually has been seen in the wild recently.
> My solution was to wild card this and/or contain it in an ignored
> directory.

The `--add-file-with-content` option, which this test case is all about,
specifically does not heed `.gitignore`. Is this what you want to test? If
so, I don't think that's necessary. Unless you expect some future version
to introduce a patch by mistake that makes `--add-file-with-content`
subject to the `.gitignore` rules.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v4 0/7] scalar: implement the subcommand "diagnose"
  2022-05-17 15:28         ` rsbecker
@ 2022-05-19 18:17           ` Johannes Schindelin
  0 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin @ 2022-05-19 18:17 UTC (permalink / raw)
  To: rsbecker
  Cc: 'Ævar Arnfjörð Bjarmason',
	'Johannes Schindelin via GitGitGadget',
	git, 'René Scharfe', 'Taylor Blau',
	'Derrick Stolee', 'Elijah Newren'

[-- Attachment #1: Type: text/plain, Size: 1433 bytes --]

Hi Randall and Ævar,

On Tue, 17 May 2022, rsbecker@nexbridge.com wrote:

> On May 17, 2022 11:03 AM, Ævar Arnfjörð Bjarmason wrote:
> >On Tue, May 10 2022, Johannes Schindelin via GitGitGadget wrote:
> >
> >> Over the course of the years, we developed a sub-command that gathers
> >> diagnostic data into a .zip file that can then be attached to bug
> >> reports. This sub-command turned out to be very useful in helping
> >> Scalar developers identify and fix issues.
> >
> >I don't mind this as some intermediate step, but re the context of the
> >plan for scalar "eventually going away" (discussed in previous threads)
> >I wonder why (especially re the earlier thread upthread at [1]) this
> >isn't being added to "git bugreport".
> >
> >Is the plan to integrate this into "git bugreport" eventually?

Potentially a variation of the `scalar diagnose` code could be useful in
`git bugreport`, opt-in via a new option.

But that's not the purpose of this patch series.

> Could this also not be useful in fsck, as --diagnose? That's the go-to
> command when there are issues for many users.

I can see where you're coming from, but `fsck`'s mission is to verify the
integrity of the local Git database. That is very different from the
mission of `scalar diagnose`, which is to help diagnose issues (whether
they are truly bugs or usage patterns causing unfortunate performance).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 0/7] scalar: implement the subcommand "diagnose"
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
                         ` (7 preceding siblings ...)
  2022-05-17 15:03       ` [PATCH v4 0/7] scalar: implement the subcommand "diagnose" Ævar Arnfjörð Bjarmason
@ 2022-05-19 18:17       ` Johannes Schindelin via GitGitGadget
  2022-05-19 18:17         ` [PATCH v5 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
                           ` (8 more replies)
  8 siblings, 9 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-19 18:17 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin

Over the course of the years, we developed a sub-command that gathers
diagnostic data into a .zip file that can then be attached to bug reports.
This sub-command turned out to be very useful in helping Scalar developers
identify and fix issues.

Changes since v4:

 * Squashed in Junio's suggested fixups
 * Renamed the option from --add-file-with-content=<name>:<content> to
   --add-virtual-file=<name>:<content>
 * Fixed one instance where I had used error() instead of error_errno().

Changes since v3:

 * We're now using unquote_c_style() instead of rolling our own unquoter.
 * Fixed the added regression test.
 * As pointed out by Scalar's Functional Tests, the
   add_directory_to_archiver() function should not fail when scalar diagnose
   encounters FSMonitor's Unix socket, but only warn instead.
 * Related: add_directory_to_archiver() needs to propagate errors from
   processing subdirectories so that the top-level call returns an error,
   too.

Changes since v2:

 * Clarified in the commit message what the biggest benefit of
   --add-file-with-content is.
 * The <path> part of the -add-file-with-content argument can now contain
   colons. To do this, the path needs to start and end in double-quote
   characters (which are stripped), and the backslash serves as escape
   character in that case (to allow the path to contain both colons and
   double-quotes).
 * Fixed incorrect grammar.
 * Instead of strcmp(<what-we-don't-want>), we now say
   !strcmp(<what-we-want>).
 * The help text for --add-file-with-content was improved a tiny bit.
 * Adjusted the commit message that still talked about spawning plenty of
   processes and about a throw-away repository for the sake of generating a
   .zip file.
 * Simplified the code that shows the diagnostics and adds them to the .zip
   file.
 * The final message that reports that the archive is complete is now
   printed to stderr instead of stdout.

Changes since v1:

 * Instead of creating a throw-away repository, staging the contents of the
   .zip file and then using git write-tree and git archive to write the .zip
   file, the patch series now introduces a new option to git archive and
   uses write_archive() directly (avoiding any separate process).
 * Since the command avoids separate processes, it is now blazing fast on
   Windows, and I dropped the spinner() function because it's no longer
   needed.
 * While reworking the test case, I noticed that scalar [...] <enlistment>
   failed to verify that the specified directory exists, and would happily
   "traverse to its parent directory" on its quest to find a Scalar
   enlistment. That is of course incorrect, and has been fixed as a "while
   at it" sort of preparatory commit.
 * I had forgotten to sign off on all the commits, which has been fixed.
 * Instead of some "home-grown" readdir()-based function, the code now uses
   for_each_file_in_pack_dir() to look through the pack directories.
 * If any alternates are configured, their pack directories are now included
   in the output.
 * The commit message that might be interpreted to promise information about
   large loose files has been corrected to no longer promise that.
 * The test cases have been adjusted to test a little bit more (e.g.
   verifying that specific paths are mentioned in the output, instead of
   merely verifying that the output is non-empty).

Johannes Schindelin (5):
  archive: optionally add "virtual" files
  archive --add-file-with-contents: allow paths containing colons
  scalar: validate the optional enlistment argument
  Implement `scalar diagnose`
  scalar diagnose: include disk space information

Matthew John Cheetham (2):
  scalar: teach `diagnose` to gather packfile info
  scalar: teach `diagnose` to gather loose objects information

 Documentation/git-archive.txt    |  17 ++
 archive.c                        |  63 ++++++-
 contrib/scalar/scalar.c          | 292 ++++++++++++++++++++++++++++++-
 contrib/scalar/scalar.txt        |  12 ++
 contrib/scalar/t/t9099-scalar.sh |  27 +++
 t/t5003-archive-zip.sh           |  20 +++
 6 files changed, 421 insertions(+), 10 deletions(-)


base-commit: ddc35d833dd6f9e8946b09cecd3311b8aa18d295
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1128%2Fdscho%2Fscalar-diagnose-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1128/dscho/scalar-diagnose-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/1128

Range-diff vs v4:

 1:  45662cf582a ! 1:  42e73fb0aac archive: optionally add "virtual" files
     @@ Documentation/git-archive.txt: OPTIONS
       	by concatenating the value for `--prefix` (if any) and the
       	basename of <file>.
       
     -+--add-file-with-content=<path>:<content>::
     ++--add-virtual-file=<path>:<content>::
      +	Add the specified contents to the archive.  Can be repeated to add
      +	multiple files.  The path of the file in the archive is built
      +	by concatenating the value for `--prefix` (if any) and the
     @@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int
      +		if (!S_ISREG(info->stat.st_mode))
      +			die(_("Not a regular file: %s"), path);
      +		info->content = NULL; /* read the file later */
     -+	} else {
     ++	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
      +		const char *colon = strchr(arg, ':');
      +		char *p;
      +
     @@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int
      +		info->stat.st_mode = S_IFREG | 0644;
      +		info->content = xstrdup(colon + 1);
      +		info->stat.st_size = strlen(info->content);
     ++	} else {
     ++		BUG("add_file_cb() called for %s", opt->long_name);
      +	}
      +	item = string_list_append_nodup(&args->extra_files, path);
      +	item->util = info;
     @@ archive.c: static int parse_archive_args(int argc, const char **argv,
       		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
       		  N_("add untracked file to archive"), 0, add_file_cb,
       		  (intptr_t)&base },
     -+		{ OPTION_CALLBACK, 0, "add-file-with-content", args,
     ++		{ OPTION_CALLBACK, 0, "add-virtual-file", args,
      +		  N_("path:content"), N_("add untracked file to archive"), 0,
      +		  add_file_cb, (intptr_t)&base },
       		OPT_STRING('o', "output", &output, N_("file"),
     @@ t/t5003-archive-zip.sh: test_expect_success 'git archive --format=zip --add-file
       check_zip with_untracked
       check_added with_untracked untracked untracked
       
     -+test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
     ++test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
      +	git archive --format=zip >with_file_with_content.zip \
     -+		--add-file-with-content=hello:world $EMPTY_TREE &&
     ++		--add-virtual-file=hello:world $EMPTY_TREE &&
      +	test_when_finished "rm -rf tmp-unpack" &&
      +	mkdir tmp-unpack && (
      +		cd tmp-unpack &&
 2:  fdba4ed6f4d ! 2:  b5ebd61066a archive --add-file-with-contents: allow paths containing colons
     @@ archive.c
      @@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int unset)
       			die(_("Not a regular file: %s"), path);
       		info->content = NULL; /* read the file later */
     - 	} else {
     + 	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
      -		const char *colon = strchr(arg, ':');
      -		char *p;
      +		struct strbuf buf = STRBUF_INIT;
     @@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int
      -		info->content = xstrdup(colon + 1);
      +		info->content = xstrdup(p + 1);
       		info->stat.st_size = strlen(info->content);
     - 	}
     - 	item = string_list_append_nodup(&args->extra_files, path);
     + 	} else {
     + 		BUG("add_file_cb() called for %s", opt->long_name);
      
       ## t/t5003-archive-zip.sh ##
      @@ t/t5003-archive-zip.sh: check_zip with_untracked
       check_added with_untracked untracked untracked
       
     - test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
     + test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
      +	if test_have_prereq FUNNYNAMES
      +	then
      +		QUOTED=quoted:colon
     @@ t/t5003-archive-zip.sh: check_zip with_untracked
      +		QUOTED=quoted
      +	fi &&
       	git archive --format=zip >with_file_with_content.zip \
     -+		--add-file-with-content=\"$QUOTED\": \
     - 		--add-file-with-content=hello:world $EMPTY_TREE &&
     ++		--add-virtual-file=\"$QUOTED\": \
     + 		--add-virtual-file=hello:world $EMPTY_TREE &&
       	test_when_finished "rm -rf tmp-unpack" &&
       	mkdir tmp-unpack && (
       		cd tmp-unpack &&
 3:  da9f52a8240 = 3:  f1ba69c02d7 scalar: validate the optional enlistment argument
 4:  87bdc22322b ! 4:  3fb90194744 Implement `scalar diagnose`
     @@ contrib/scalar/scalar.c: static int unregister_dir(void)
      +	int res = 0;
      +
      +	if (!dir)
     -+		return error(_("could not open directory '%s'"), path);
     ++		return error_errno(_("could not open directory '%s'"), path);
      +
      +	if (!at_root)
      +		strbuf_addf(&buf, "%s/", path);
     @@ contrib/scalar/scalar.c: cleanup:
      +	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
      +	write_or_die(stdout_fd, buf.buf, buf.len);
      +	strvec_pushf(&archiver_args,
     -+		     "--add-file-with-content=diagnostics.log:%.*s",
     ++		     "--add-virtual-file=diagnostics.log:%.*s",
      +		     (int)buf.len, buf.buf);
      +
      +	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 5:  3f63b197d42 ! 5:  2e645b08a9e scalar diagnose: include disk space information
     @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
      +	get_disk_info(&buf);
       	write_or_die(stdout_fd, buf.buf, buf.len);
       	strvec_pushf(&archiver_args,
     - 		     "--add-file-with-content=diagnostics.log:%.*s",
     + 		     "--add-virtual-file=diagnostics.log:%.*s",
      
       ## contrib/scalar/t/t9099-scalar.sh ##
      @@ contrib/scalar/t/t9099-scalar.sh: SQ="'"
 6:  fc1319338fc ! 6:  0fa20d73750 scalar: teach `diagnose` to gather packfile info
     @@ contrib/scalar/scalar.c: cleanup:
       {
       	struct option options[] = {
      @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
     - 		     "--add-file-with-content=diagnostics.log:%.*s",
     + 		     "--add-virtual-file=diagnostics.log:%.*s",
       		     (int)buf.len, buf.buf);
       
      +	strbuf_reset(&buf);
     -+	strbuf_addstr(&buf, "--add-file-with-content=packs-local.txt:");
     ++	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
      +	dir_file_stats(the_repository->objects->odb, &buf);
      +	foreach_alt_odb(dir_file_stats, &buf);
      +	strvec_push(&archiver_args, buf.buf);
 7:  e8f5b42f7b7 ! 7:  62e173b47cf scalar: teach `diagnose` to gather loose objects information
     @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
       	strvec_push(&archiver_args, buf.buf);
       
      +	strbuf_reset(&buf);
     -+	strbuf_addstr(&buf, "--add-file-with-content=objects-local.txt:");
     ++	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
      +	loose_objs_stats(&buf, ".git/objects");
      +	strvec_push(&archiver_args, buf.buf);
      +

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 1/7] archive: optionally add "virtual" files
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
@ 2022-05-19 18:17         ` Johannes Schindelin via GitGitGadget
  2022-05-20 14:41           ` René Scharfe
  2022-05-19 18:17         ` [PATCH v5 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
                           ` (7 subsequent siblings)
  8 siblings, 1 reply; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-19 18:17 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

With the `--add-file-with-content=<path>:<content>` option, `git
archive` now supports use cases where relatively trivial files need to
be added that do not exist on disk.

This will allow us to generate `.zip` files with generated content,
without having to add said content to the object database and without
having to write it out to disk.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 11 ++++++++
 archive.c                     | 53 +++++++++++++++++++++++++++++------
 t/t5003-archive-zip.sh        | 12 ++++++++
 3 files changed, 68 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index bc4e76a7834..893cb1075bf 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -61,6 +61,17 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 
+--add-virtual-file=<path>:<content>::
+	Add the specified contents to the archive.  Can be repeated to add
+	multiple files.  The path of the file in the archive is built
+	by concatenating the value for `--prefix` (if any) and the
+	basename of <file>.
++
+The `<path>` cannot contain any colon, the file mode is limited to
+a regular file, and the option may be subject to platform-dependent
+command-line limits. For non-trivial cases, write an untracked file
+and use `--add-file` instead.
+
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
 	as well (see <<ATTRIBUTES>>).
diff --git a/archive.c b/archive.c
index a3bbb091256..d20e16fa819 100644
--- a/archive.c
+++ b/archive.c
@@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
 struct extra_file_info {
 	char *base;
 	struct stat stat;
+	void *content;
 };
 
 int write_archive_entries(struct archiver_args *args,
@@ -337,7 +338,13 @@ int write_archive_entries(struct archiver_args *args,
 		strbuf_addstr(&path_in_archive, basename(path));
 
 		strbuf_reset(&content);
-		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
+		if (info->content)
+			err = write_entry(args, &fake_oid, path_in_archive.buf,
+					  path_in_archive.len,
+					  info->stat.st_mode,
+					  info->content, info->stat.st_size);
+		else if (strbuf_read_file(&content, path,
+					  info->stat.st_size) < 0)
 			err = error_errno(_("could not read '%s'"), path);
 		else
 			err = write_entry(args, &fake_oid, path_in_archive.buf,
@@ -493,6 +500,7 @@ static void extra_file_info_clear(void *util, const char *str)
 {
 	struct extra_file_info *info = util;
 	free(info->base);
+	free(info->content);
 	free(info);
 }
 
@@ -514,14 +522,40 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 	if (!arg)
 		return -1;
 
-	path = prefix_filename(args->prefix, arg);
-	item = string_list_append_nodup(&args->extra_files, path);
-	item->util = info = xmalloc(sizeof(*info));
+	info = xmalloc(sizeof(*info));
 	info->base = xstrdup_or_null(base);
-	if (stat(path, &info->stat))
-		die(_("File not found: %s"), path);
-	if (!S_ISREG(info->stat.st_mode))
-		die(_("Not a regular file: %s"), path);
+
+	if (!strcmp(opt->long_name, "add-file")) {
+		path = prefix_filename(args->prefix, arg);
+		if (stat(path, &info->stat))
+			die(_("File not found: %s"), path);
+		if (!S_ISREG(info->stat.st_mode))
+			die(_("Not a regular file: %s"), path);
+		info->content = NULL; /* read the file later */
+	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
+		const char *colon = strchr(arg, ':');
+		char *p;
+
+		if (!colon)
+			die(_("missing colon: '%s'"), arg);
+
+		p = xstrndup(arg, colon - arg);
+		if (!args->prefix)
+			path = p;
+		else {
+			path = prefix_filename(args->prefix, p);
+			free(p);
+		}
+		memset(&info->stat, 0, sizeof(info->stat));
+		info->stat.st_mode = S_IFREG | 0644;
+		info->content = xstrdup(colon + 1);
+		info->stat.st_size = strlen(info->content);
+	} else {
+		BUG("add_file_cb() called for %s", opt->long_name);
+	}
+	item = string_list_append_nodup(&args->extra_files, path);
+	item->util = info;
+
 	return 0;
 }
 
@@ -554,6 +588,9 @@ static int parse_archive_args(int argc, const char **argv,
 		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
 		  N_("add untracked file to archive"), 0, add_file_cb,
 		  (intptr_t)&base },
+		{ OPTION_CALLBACK, 0, "add-virtual-file", args,
+		  N_("path:content"), N_("add untracked file to archive"), 0,
+		  add_file_cb, (intptr_t)&base },
 		OPT_STRING('o', "output", &output, N_("file"),
 			N_("write the archive to this file")),
 		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 1e6d18b140e..ebc26e89a9b 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -206,6 +206,18 @@ test_expect_success 'git archive --format=zip --add-file' '
 check_zip with_untracked
 check_added with_untracked untracked untracked
 
+test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
+	git archive --format=zip >with_file_with_content.zip \
+		--add-virtual-file=hello:world $EMPTY_TREE &&
+	test_when_finished "rm -rf tmp-unpack" &&
+	mkdir tmp-unpack && (
+		cd tmp-unpack &&
+		"$GIT_UNZIP" ../with_file_with_content.zip &&
+		test_path_is_file hello &&
+		test world = $(cat hello)
+	)
+'
+
 test_expect_success 'git archive --format=zip --add-file twice' '
 	echo untracked >untracked &&
 	git archive --format=zip --prefix=one/ --add-file=untracked \
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
  2022-05-19 18:17         ` [PATCH v5 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
@ 2022-05-19 18:17         ` Johannes Schindelin via GitGitGadget
  2022-05-19 18:17         ` [PATCH v5 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
                           ` (6 subsequent siblings)
  8 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-19 18:17 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

By allowing the path to be enclosed in double-quotes, we can avoid
the limitation that paths cannot contain colons.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 14 ++++++++++----
 archive.c                     | 30 ++++++++++++++++++++----------
 t/t5003-archive-zip.sh        |  8 ++++++++
 3 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index 893cb1075bf..54de945a84e 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -67,10 +67,16 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 +
-The `<path>` cannot contain any colon, the file mode is limited to
-a regular file, and the option may be subject to platform-dependent
-command-line limits. For non-trivial cases, write an untracked file
-and use `--add-file` instead.
+The `<path>` argument can start and end with a literal double-quote
+character; The contained file name is interpreted as a C-style string,
+i.e. the backslash is interpreted as escape character. The path must
+be quoted if it contains a colon, to avoid the colon from being
+misinterpreted as the separator between the path and the contents, or
+if the path begins or ends with a double-quote character.
++
+The file mode is limited to a regular file, and the option may be
+subject to platform-dependent command-line limits. For non-trivial
+cases, write an untracked file and use `--add-file` instead.
 
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
diff --git a/archive.c b/archive.c
index d20e16fa819..b7756b91200 100644
--- a/archive.c
+++ b/archive.c
@@ -9,6 +9,7 @@
 #include "parse-options.h"
 #include "unpack-trees.h"
 #include "dir.h"
+#include "quote.h"
 
 static char const * const archive_usage[] = {
 	N_("git archive [<options>] <tree-ish> [<path>...]"),
@@ -533,22 +534,31 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 			die(_("Not a regular file: %s"), path);
 		info->content = NULL; /* read the file later */
 	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
-		const char *colon = strchr(arg, ':');
-		char *p;
+		struct strbuf buf = STRBUF_INIT;
+		const char *p = arg;
+
+		if (*p != '"')
+			p = strchr(p, ':');
+		else if (unquote_c_style(&buf, p, &p) < 0)
+			die(_("unclosed quote: '%s'"), arg);
 
-		if (!colon)
+		if (!p || *p != ':')
 			die(_("missing colon: '%s'"), arg);
 
-		p = xstrndup(arg, colon - arg);
-		if (!args->prefix)
-			path = p;
-		else {
-			path = prefix_filename(args->prefix, p);
-			free(p);
+		if (p == arg)
+			die(_("empty file name: '%s'"), arg);
+
+		path = buf.len ?
+			strbuf_detach(&buf, NULL) : xstrndup(arg, p - arg);
+
+		if (args->prefix) {
+			char *save = path;
+			path = prefix_filename(args->prefix, path);
+			free(save);
 		}
 		memset(&info->stat, 0, sizeof(info->stat));
 		info->stat.st_mode = S_IFREG | 0644;
-		info->content = xstrdup(colon + 1);
+		info->content = xstrdup(p + 1);
 		info->stat.st_size = strlen(info->content);
 	} else {
 		BUG("add_file_cb() called for %s", opt->long_name);
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index ebc26e89a9b..50932a866c9 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -207,13 +207,21 @@ check_zip with_untracked
 check_added with_untracked untracked untracked
 
 test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
+	if test_have_prereq FUNNYNAMES
+	then
+		QUOTED=quoted:colon
+	else
+		QUOTED=quoted
+	fi &&
 	git archive --format=zip >with_file_with_content.zip \
+		--add-virtual-file=\"$QUOTED\": \
 		--add-virtual-file=hello:world $EMPTY_TREE &&
 	test_when_finished "rm -rf tmp-unpack" &&
 	mkdir tmp-unpack && (
 		cd tmp-unpack &&
 		"$GIT_UNZIP" ../with_file_with_content.zip &&
 		test_path_is_file hello &&
+		test_path_is_file $QUOTED &&
 		test world = $(cat hello)
 	)
 '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 3/7] scalar: validate the optional enlistment argument
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
  2022-05-19 18:17         ` [PATCH v5 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
  2022-05-19 18:17         ` [PATCH v5 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
@ 2022-05-19 18:17         ` Johannes Schindelin via GitGitGadget
  2022-05-19 18:18         ` [PATCH v5 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
                           ` (5 subsequent siblings)
  8 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-19 18:17 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `scalar` command needs a Scalar enlistment for many subcommands, and
looks in the current directory for such an enlistment (traversing the
parent directories until it finds one).

These is subcommands can also be called with an optional argument
specifying the enlistment. Here, too, we traverse parent directories as
needed, until we find an enlistment.

However, if the specified directory does not even exist, or is not a
directory, we should stop right there, with an error message.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 6 ++++--
 contrib/scalar/t/t9099-scalar.sh | 5 +++++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 1ce9c2b00e8..00dcd4b50ef 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -43,9 +43,11 @@ static void setup_enlistment_directory(int argc, const char **argv,
 		usage_with_options(usagestr, options);
 
 	/* find the worktree, determine its corresponding root */
-	if (argc == 1)
+	if (argc == 1) {
 		strbuf_add_absolute_path(&path, argv[0]);
-	else if (strbuf_getcwd(&path) < 0)
+		if (!is_directory(path.buf))
+			die(_("'%s' does not exist"), path.buf);
+	} else if (strbuf_getcwd(&path) < 0)
 		die(_("need a working directory"));
 
 	strbuf_trim_trailing_dir_sep(&path);
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 2e1502ad45e..9d83fdf25e8 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -85,4 +85,9 @@ test_expect_success 'scalar delete with enlistment' '
 	test_path_is_missing cloned
 '
 
+test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
+	! scalar run config cloned 2>err &&
+	grep "cloned. does not exist" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 4/7] Implement `scalar diagnose`
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
                           ` (2 preceding siblings ...)
  2022-05-19 18:17         ` [PATCH v5 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
@ 2022-05-19 18:18         ` Johannes Schindelin via GitGitGadget
  2022-05-19 18:18         ` [PATCH v5 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
                           ` (4 subsequent siblings)
  8 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-19 18:18 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Over the course of Scalar's development, it became obvious that there is
a need for a command that can gather all kinds of useful information
that can help identify the most typical problems with large
worktrees/repositories.

The `diagnose` command is the culmination of this hard-won knowledge: it
gathers the installed hooks, the config, a couple statistics describing
the data shape, among other pieces of information, and then wraps
everything up in a tidy, neat `.zip` archive.

Note: originally, Scalar was implemented in C# using the .NET API, where
we had the luxury of a comprehensive standard library that includes
basic functionality such as writing a `.zip` file. In the C version, we
lack such a commodity. Rather than introducing a dependency on, say,
libzip, we slightly abuse Git's `archive` machinery: we write out a
`.zip` of the empty try, augmented by a couple files that are added via
the `--add-file*` options. We are careful trying not to modify the
current repository in any way lest the very circumstances that required
`scalar diagnose` to be run are changed by the `diagnose` run itself.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 144 +++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt        |  12 +++
 contrib/scalar/t/t9099-scalar.sh |  14 +++
 3 files changed, 170 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 00dcd4b50ef..53213f9a3b9 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,6 +11,7 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
+#include "archive.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -261,6 +262,47 @@ static int unregister_dir(void)
 	return res;
 }
 
+static int add_directory_to_archiver(struct strvec *archiver_args,
+					  const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir = opendir(at_root ? "." : path);
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	if (!dir)
+		return error_errno(_("could not open directory '%s'"), path);
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if (e->d_type == DT_REG)
+			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
+		else if (e->d_type != DT_DIR)
+			warning(_("skipping '%s', which is neither file nor "
+				  "directory"), buf.buf);
+		else if (recurse &&
+			 add_directory_to_archiver(archiver_args,
+						   buf.buf, recurse) < 0)
+			res = -1;
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -501,6 +543,107 @@ cleanup:
 	return res;
 }
 
+static int cmd_diagnose(int argc, const char **argv)
+{
+	struct option options[] = {
+		OPT_END(),
+	};
+	const char * const usage[] = {
+		N_("scalar diagnose [<enlistment>]"),
+		NULL
+	};
+	struct strbuf zip_path = STRBUF_INIT;
+	struct strvec archiver_args = STRVEC_INIT;
+	char **argv_copy = NULL;
+	int stdout_fd = -1, archiver_fd = -1;
+	time_t now = time(NULL);
+	struct tm tm;
+	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
+	int res = 0;
+
+	argc = parse_options(argc, argv, NULL, options,
+			     usage, 0);
+
+	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
+
+	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
+	strbuf_addftime(&zip_path,
+			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
+	strbuf_addstr(&zip_path, ".zip");
+	switch (safe_create_leading_directories(zip_path.buf)) {
+	case SCLD_EXISTS:
+	case SCLD_OK:
+		break;
+	default:
+		error_errno(_("could not create directory for '%s'"),
+			    zip_path.buf);
+		goto diagnose_cleanup;
+	}
+	stdout_fd = dup(1);
+	if (stdout_fd < 0) {
+		res = error_errno(_("could not duplicate stdout"));
+		goto diagnose_cleanup;
+	}
+
+	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
+		res = error_errno(_("could not redirect output"));
+		goto diagnose_cleanup;
+	}
+
+	init_zip_archiver();
+	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	write_or_die(stdout_fd, buf.buf, buf.len);
+	strvec_pushf(&archiver_args,
+		     "--add-virtual-file=diagnostics.log:%.*s",
+		     (int)buf.len, buf.buf);
+
+	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	strvec_pushl(&archiver_args, "--prefix=",
+		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
+
+	/* `write_archive()` modifies the `argv` passed to it. Let it. */
+	argv_copy = xmemdupz(archiver_args.v,
+			     sizeof(char *) * archiver_args.nr);
+	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
+			    the_repository, NULL, 0);
+	if (res) {
+		error(_("failed to write archive"));
+		goto diagnose_cleanup;
+	}
+
+	if (!res)
+		fprintf(stderr, "\n"
+		       "Diagnostics complete.\n"
+		       "All of the gathered info is captured in '%s'\n",
+		       zip_path.buf);
+
+diagnose_cleanup:
+	if (archiver_fd >= 0) {
+		close(1);
+		dup2(stdout_fd, 1);
+	}
+	free(argv_copy);
+	strvec_clear(&archiver_args);
+	strbuf_release(&zip_path);
+	strbuf_release(&path);
+	strbuf_release(&buf);
+
+	return res;
+}
+
 static int cmd_list(int argc, const char **argv)
 {
 	if (argc != 1)
@@ -802,6 +945,7 @@ static struct {
 	{ "reconfigure", cmd_reconfigure },
 	{ "delete", cmd_delete },
 	{ "version", cmd_version },
+	{ "diagnose", cmd_diagnose },
 	{ NULL, NULL},
 };
 
diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
index f416d637289..22583fe046e 100644
--- a/contrib/scalar/scalar.txt
+++ b/contrib/scalar/scalar.txt
@@ -14,6 +14,7 @@ scalar register [<enlistment>]
 scalar unregister [<enlistment>]
 scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
 scalar reconfigure [ --all | <enlistment> ]
+scalar diagnose [<enlistment>]
 scalar delete <enlistment>
 
 DESCRIPTION
@@ -129,6 +130,17 @@ reconfigure the enlistment.
 With the `--all` option, all enlistments currently registered with Scalar
 will be reconfigured. Use this option after each Scalar upgrade.
 
+Diagnose
+~~~~~~~~
+
+diagnose [<enlistment>]::
+    When reporting issues with Scalar, it is often helpful to provide the
+    information gathered by this command, including logs and certain
+    statistics describing the data shape of the current enlistment.
++
+The output of this command is a `.zip` file that is written into
+a directory adjacent to the worktree in the `src` directory.
+
 Delete
 ~~~~~~
 
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 9d83fdf25e8..6802d317258 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -90,4 +90,18 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 	grep "cloned. does not exist" err
 '
 
+SQ="'"
+test_expect_success UNZIP 'scalar diagnose' '
+	scalar clone "file://$(pwd)" cloned --single-branch &&
+	scalar diagnose cloned >out 2>err &&
+	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
+	zip_path=$(cat zip_path) &&
+	test -n "$zip_path" &&
+	unzip -v "$zip_path" &&
+	folder=${zip_path%.zip} &&
+	test_path_is_missing "$folder" &&
+	unzip -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 5/7] scalar diagnose: include disk space information
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
                           ` (3 preceding siblings ...)
  2022-05-19 18:18         ` [PATCH v5 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-05-19 18:18         ` Johannes Schindelin via GitGitGadget
  2022-05-19 18:18         ` [PATCH v5 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
                           ` (3 subsequent siblings)
  8 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-19 18:18 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When analyzing problems with large worktrees/repositories, it is useful
to know how close to a "full disk" situation Scalar/Git operates. Let's
include this information.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 53 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  1 +
 2 files changed, 54 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 53213f9a3b9..0a9e25a57f8 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -303,6 +303,58 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 	return res;
 }
 
+#ifndef WIN32
+#include <sys/statvfs.h>
+#endif
+
+static int get_disk_info(struct strbuf *out)
+{
+#ifdef WIN32
+	struct strbuf buf = STRBUF_INIT;
+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
+	DWORD serial_number, component_length, flags;
+	ULARGE_INTEGER avail2caller, total, avail;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
+		error(_("could not determine free disk size for '%s'"),
+		      buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
+				   &serial_number, &component_length, &flags,
+				   fs_name, sizeof(fs_name))) {
+		error(_("could not get info for '%s'"), buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
+	strbuf_addch(out, '\n');
+	strbuf_release(&buf);
+#else
+	struct strbuf buf = STRBUF_INIT;
+	struct statvfs stat;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (statvfs(buf.buf, &stat) < 0) {
+		error_errno(_("could not determine free disk size for '%s'"),
+			    buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
+	strbuf_release(&buf);
+#endif
+	return 0;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -599,6 +651,7 @@ static int cmd_diagnose(int argc, const char **argv)
 	get_version_info(&buf, 1);
 
 	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	get_disk_info(&buf);
 	write_or_die(stdout_fd, buf.buf, buf.len);
 	strvec_pushf(&archiver_args,
 		     "--add-virtual-file=diagnostics.log:%.*s",
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 6802d317258..934b2485d91 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -94,6 +94,7 @@ SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	scalar diagnose cloned >out 2>err &&
+	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
 	zip_path=$(cat zip_path) &&
 	test -n "$zip_path" &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 6/7] scalar: teach `diagnose` to gather packfile info
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
                           ` (4 preceding siblings ...)
  2022-05-19 18:18         ` [PATCH v5 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
@ 2022-05-19 18:18         ` Matthew John Cheetham via GitGitGadget
  2022-05-19 18:18         ` [PATCH v5 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
                           ` (2 subsequent siblings)
  8 siblings, 0 replies; 140+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-05-19 18:18 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

It's helpful to see if there are other crud files in the pack
directory. Let's teach the `scalar diagnose` command to gather
file size information about pack files.

While at it, also enumerate the pack files in the alternate
object directories, if any are registered.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 30 ++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  6 +++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 0a9e25a57f8..d302c27e114 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "help.h"
 #include "archive.h"
+#include "object-store.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -595,6 +596,29 @@ cleanup:
 	return res;
 }
 
+static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
+				   const char *file_name, void *data)
+{
+	struct strbuf *buf = data;
+	struct stat st;
+
+	if (!stat(full_path, &st))
+		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
+			    (uintmax_t)st.st_size);
+}
+
+static int dir_file_stats(struct object_directory *object_dir, void *data)
+{
+	struct strbuf *buf = data;
+
+	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
+
+	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
+				  data);
+
+	return 0;
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -657,6 +681,12 @@ static int cmd_diagnose(int argc, const char **argv)
 		     "--add-virtual-file=diagnostics.log:%.*s",
 		     (int)buf.len, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
+	dir_file_stats(the_repository->objects->odb, &buf);
+	foreach_alt_odb(dir_file_stats, &buf);
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 934b2485d91..3dd5650cceb 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -93,6 +93,8 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
+	git repack &&
+	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
 	scalar diagnose cloned >out 2>err &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
@@ -102,7 +104,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	folder=${zip_path%.zip} &&
 	test_path_is_missing "$folder" &&
 	unzip -p "$zip_path" diagnostics.log >out &&
-	test_file_not_empty out
+	test_file_not_empty out &&
+	unzip -p "$zip_path" packs-local.txt >out &&
+	grep "$(pwd)/.git/objects" out
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 7/7] scalar: teach `diagnose` to gather loose objects information
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
                           ` (5 preceding siblings ...)
  2022-05-19 18:18         ` [PATCH v5 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
@ 2022-05-19 18:18         ` Matthew John Cheetham via GitGitGadget
  2022-05-19 19:23         ` [PATCH v5 0/7] scalar: implement the subcommand "diagnose" Junio C Hamano
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
  8 siblings, 0 replies; 140+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-05-19 18:18 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

When operating at the scale that Scalar wants to support, certain data
shapes are more likely to cause undesirable performance issues, such as
large numbers of loose objects.

By including statistics about this, `scalar diagnose` now makes it
easier to identify such scenarios.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 59 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  5 ++-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index d302c27e114..0c278681758 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -619,6 +619,60 @@ static int dir_file_stats(struct object_directory *object_dir, void *data)
 	return 0;
 }
 
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -687,6 +741,11 @@ static int cmd_diagnose(int argc, const char **argv)
 	foreach_alt_odb(dir_file_stats, &buf);
 	strvec_push(&archiver_args, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
+	loose_objs_stats(&buf, ".git/objects");
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 3dd5650cceb..72023a1ca1d 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -95,6 +95,7 @@ test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	git repack &&
 	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
+	test_commit -C cloned/src loose &&
 	scalar diagnose cloned >out 2>err &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
@@ -106,7 +107,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	unzip -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
 	unzip -p "$zip_path" packs-local.txt >out &&
-	grep "$(pwd)/.git/objects" out
+	grep "$(pwd)/.git/objects" out &&
+	unzip -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [1-9]" out
 '
 
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-19 18:09           ` Johannes Schindelin
@ 2022-05-19 18:44             ` Junio C Hamano
  0 siblings, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2022-05-19 18:44 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, René Scharfe,
	Taylor Blau, Derrick Stolee, Elijah Newren

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> >  	git archive --format=zip >with_file_with_content.zip \
>> > +		--add-file-with-content=\"$QUOTED\": \
>> >  		--add-file-with-content=hello:world $EMPTY_TREE &&
>> >  	test_when_finished "rm -rf tmp-unpack" &&
>> >  	mkdir tmp-unpack && (
>> >  		cd tmp-unpack &&
>> >  		"$GIT_UNZIP" ../with_file_with_content.zip &&
>> >  		test_path_is_file hello &&
>> > +		test_path_is_file $QUOTED &&
>>
>> Looks OK, even though it probably is a good idea to have dq around
>> $QUOTED, so that future developers can easily insert SP into its
>> value to use a bit more common but still a bit more problematic
>> pathnames in the test.
>
> I actually decided against this because reading
>
> 	"$QUOTED"
>
> would mislead future me to think that the double quotes that enclose
> $QUOTED are the quotes that the variable's name talks about. But the
> quotes are actually the escaped ones that are passed to `git archive`
> above.

>
> So, to help future Dscho should they read this code six months from now or
> even later, I wanted to specifically only add quotes to the `git archive`
> call to make the intention abundantly clear.

If you find "$QUOTED" misleads any reader to think QUOTED may have
some quote characters in there, you could rename it, of course, to
signal what the value is (e.g. $PATHNAME) better.

But I think you misunderstood my comment completely.

What I meant was to write these lines like:

	--add-file-with-content=\""$QUOTED"\":
	test_path_is_file "$QUOTED"

Because the value in QUOTED can have $IFS whitespaces in it (after
all, allowing random letters like colon, quotes and whitespaces is
why we are adding this unquote_c_style() call), and without the
extra double quotes to protect the parameter expansion of $QUOTED,
the command line is broken.

So, don't decide against it; the reasoning behind that decision is
simply wrong.

Thanks.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/7] scalar: implement the subcommand "diagnose"
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
                           ` (6 preceding siblings ...)
  2022-05-19 18:18         ` [PATCH v5 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
@ 2022-05-19 19:23         ` Junio C Hamano
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
  8 siblings, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2022-05-19 19:23 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, René Scharfe, Taylor Blau, Derrick Stolee,
	Elijah Newren, rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> Changes since v4:
>
>  * Squashed in Junio's suggested fixups
>  * Renamed the option from --add-file-with-content=<name>:<content> to
>    --add-virtual-file=<name>:<content>

;-)  5 letters shorter and is a good name.

>  * Fixed one instance where I had used error() instead of error_errno().

Looks good.

Thanks.  Will replace and queue.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-18 17:35           ` Junio C Hamano
@ 2022-05-20  7:30             ` Ævar Arnfjörð Bjarmason
  2022-05-20 15:55               ` Johannes Schindelin
  0 siblings, 1 reply; 140+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-20  7:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, René Scharfe,
	Taylor Blau, Derrick Stolee, Elijah Newren, Johannes Schindelin


On Wed, May 18 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>>> +test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
>>> +	! scalar run config cloned 2>err &&
>>
>> Needs to use test_must_fail, not !
>
> Good eyes and careful reading are very much appreciated, but in this
> case, doesn't such an improvement depend on an update to teach
> test_must_fail_acceptable about scalar being whitelisted?

Yes, I think so (but haven't tested it just now), but it's a relatively
small change to t/test-lib-functions.sh.

I was just noting the potential hidden segfault etc., the issue remains
in v5.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/7] archive: optionally add "virtual" files
  2022-05-19 18:17         ` [PATCH v5 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
@ 2022-05-20 14:41           ` René Scharfe
  2022-05-20 16:21             ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: René Scharfe @ 2022-05-20 14:41 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git
  Cc: Taylor Blau, Derrick Stolee, Elijah Newren, rsbecker,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin

Am 19.05.22 um 20:17 schrieb Johannes Schindelin via GitGitGadget:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> With the `--add-file-with-content=<path>:<content>` option, `git
            ^^^^^^^^^^^^^^^^^^^^^^^
That's still the old option name.  Same in the subject of patch 2.

> archive` now supports use cases where relatively trivial files need to
> be added that do not exist on disk.
>
> This will allow us to generate `.zip` files with generated content,
> without having to add said content to the object database and without
> having to write it out to disk.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  Documentation/git-archive.txt | 11 ++++++++
>  archive.c                     | 53 +++++++++++++++++++++++++++++------
>  t/t5003-archive-zip.sh        | 12 ++++++++
>  3 files changed, 68 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
> index bc4e76a7834..893cb1075bf 100644
> --- a/Documentation/git-archive.txt
> +++ b/Documentation/git-archive.txt
> @@ -61,6 +61,17 @@ OPTIONS
>  	by concatenating the value for `--prefix` (if any) and the
>  	basename of <file>.
>
> +--add-virtual-file=<path>:<content>::
> +	Add the specified contents to the archive.  Can be repeated to add
> +	multiple files.  The path of the file in the archive is built
> +	by concatenating the value for `--prefix` (if any) and the
> +	basename of <file>.
> ++
> +The `<path>` cannot contain any colon, the file mode is limited to
> +a regular file, and the option may be subject to platform-dependent
> +command-line limits. For non-trivial cases, write an untracked file
> +and use `--add-file` instead.
> +
>  --worktree-attributes::
>  	Look for attributes in .gitattributes files in the working tree
>  	as well (see <<ATTRIBUTES>>).
> diff --git a/archive.c b/archive.c
> index a3bbb091256..d20e16fa819 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
>  struct extra_file_info {
>  	char *base;
>  	struct stat stat;
> +	void *content;
>  };
>
>  int write_archive_entries(struct archiver_args *args,
> @@ -337,7 +338,13 @@ int write_archive_entries(struct archiver_args *args,
>  		strbuf_addstr(&path_in_archive, basename(path));
>
>  		strbuf_reset(&content);
> -		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
> +		if (info->content)
> +			err = write_entry(args, &fake_oid, path_in_archive.buf,
> +					  path_in_archive.len,
> +					  info->stat.st_mode,
> +					  info->content, info->stat.st_size);
> +		else if (strbuf_read_file(&content, path,
> +					  info->stat.st_size) < 0)
>  			err = error_errno(_("could not read '%s'"), path);
>  		else
>  			err = write_entry(args, &fake_oid, path_in_archive.buf,
> @@ -493,6 +500,7 @@ static void extra_file_info_clear(void *util, const char *str)
>  {
>  	struct extra_file_info *info = util;
>  	free(info->base);
> +	free(info->content);
>  	free(info);
>  }
>
> @@ -514,14 +522,40 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
>  	if (!arg)
>  		return -1;
>
> -	path = prefix_filename(args->prefix, arg);
> -	item = string_list_append_nodup(&args->extra_files, path);
> -	item->util = info = xmalloc(sizeof(*info));
> +	info = xmalloc(sizeof(*info));
>  	info->base = xstrdup_or_null(base);
> -	if (stat(path, &info->stat))
> -		die(_("File not found: %s"), path);
> -	if (!S_ISREG(info->stat.st_mode))
> -		die(_("Not a regular file: %s"), path);
> +
> +	if (!strcmp(opt->long_name, "add-file")) {
> +		path = prefix_filename(args->prefix, arg);
> +		if (stat(path, &info->stat))
> +			die(_("File not found: %s"), path);
> +		if (!S_ISREG(info->stat.st_mode))
> +			die(_("Not a regular file: %s"), path);
> +		info->content = NULL; /* read the file later */
> +	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
> +		const char *colon = strchr(arg, ':');
> +		char *p;
> +
> +		if (!colon)
> +			die(_("missing colon: '%s'"), arg);
> +
> +		p = xstrndup(arg, colon - arg);
> +		if (!args->prefix)
> +			path = p;
> +		else {
> +			path = prefix_filename(args->prefix, p);
> +			free(p);
> +		}
> +		memset(&info->stat, 0, sizeof(info->stat));
> +		info->stat.st_mode = S_IFREG | 0644;
> +		info->content = xstrdup(colon + 1);
> +		info->stat.st_size = strlen(info->content);
> +	} else {
> +		BUG("add_file_cb() called for %s", opt->long_name);
> +	}
> +	item = string_list_append_nodup(&args->extra_files, path);
> +	item->util = info;
> +
>  	return 0;
>  }
>
> @@ -554,6 +588,9 @@ static int parse_archive_args(int argc, const char **argv,
>  		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
>  		  N_("add untracked file to archive"), 0, add_file_cb,
>  		  (intptr_t)&base },
> +		{ OPTION_CALLBACK, 0, "add-virtual-file", args,
> +		  N_("path:content"), N_("add untracked file to archive"), 0,
> +		  add_file_cb, (intptr_t)&base },
>  		OPT_STRING('o', "output", &output, N_("file"),
>  			N_("write the archive to this file")),
>  		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> index 1e6d18b140e..ebc26e89a9b 100755
> --- a/t/t5003-archive-zip.sh
> +++ b/t/t5003-archive-zip.sh
> @@ -206,6 +206,18 @@ test_expect_success 'git archive --format=zip --add-file' '
>  check_zip with_untracked
>  check_added with_untracked untracked untracked
>
> +test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
> +	git archive --format=zip >with_file_with_content.zip \
> +		--add-virtual-file=hello:world $EMPTY_TREE &&
> +	test_when_finished "rm -rf tmp-unpack" &&
> +	mkdir tmp-unpack && (
> +		cd tmp-unpack &&
> +		"$GIT_UNZIP" ../with_file_with_content.zip &&
> +		test_path_is_file hello &&
> +		test world = $(cat hello)
> +	)
> +'
> +
>  test_expect_success 'git archive --format=zip --add-file twice' '
>  	echo untracked >untracked &&
>  	git archive --format=zip --prefix=one/ --add-file=untracked \

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-20  7:30             ` Ævar Arnfjörð Bjarmason
@ 2022-05-20 15:55               ` Johannes Schindelin
  2022-05-21  9:54                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 140+ messages in thread
From: Johannes Schindelin @ 2022-05-20 15:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Johannes Schindelin via GitGitGadget, git,
	René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren

[-- Attachment #1: Type: text/plain, Size: 974 bytes --]

Hi Ævar,

On Fri, 20 May 2022, Ævar Arnfjörð Bjarmason wrote:

>
> On Wed, May 18 2022, Junio C Hamano wrote:
>
> > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> >
> >>> +test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
> >>> +	! scalar run config cloned 2>err &&
> >>
> >> Needs to use test_must_fail, not !
> >
> > Good eyes and careful reading are very much appreciated, but in this
> > case, doesn't such an improvement depend on an update to teach
> > test_must_fail_acceptable about scalar being whitelisted?
>
> Yes, I think so (but haven't tested it just now), but it's a relatively
> small change to t/test-lib-functions.sh.

Let it be noted that I fully agree with Junio that good eyes and careful
reading are very much appreciated. And that in this case, that would have
implied noticing that `test_must_fail` is reserved for Git commands.

Scalar is not (yet?) a Git command.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/7] archive: optionally add "virtual" files
  2022-05-20 14:41           ` René Scharfe
@ 2022-05-20 16:21             ` Junio C Hamano
  0 siblings, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2022-05-20 16:21 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin via GitGitGadget, git, Taylor Blau,
	Derrick Stolee, Elijah Newren, rsbecker,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin

René Scharfe <l.s.r@web.de> writes:

> Am 19.05.22 um 20:17 schrieb Johannes Schindelin via GitGitGadget:
>> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>>
>> With the `--add-file-with-content=<path>:<content>` option, `git
>             ^^^^^^^^^^^^^^^^^^^^^^^
> That's still the old option name.  Same in the subject of patch 2.

Good eyes, and thanks for catching what I missed---the risk of
relying too much on the range-diff X-<.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-20 15:55               ` Johannes Schindelin
@ 2022-05-21  9:54                 ` Ævar Arnfjörð Bjarmason
  2022-05-22  5:50                   ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-21  9:54 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Johannes Schindelin via GitGitGadget, git,
	René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren


On Fri, May 20 2022, Johannes Schindelin wrote:

> Hi Ævar,
>
> On Fri, 20 May 2022, Ævar Arnfjörð Bjarmason wrote:
>
>>
>> On Wed, May 18 2022, Junio C Hamano wrote:
>>
>> > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>> >
>> >>> +test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
>> >>> +	! scalar run config cloned 2>err &&
>> >>
>> >> Needs to use test_must_fail, not !
>> >
>> > Good eyes and careful reading are very much appreciated, but in this
>> > case, doesn't such an improvement depend on an update to teach
>> > test_must_fail_acceptable about scalar being whitelisted?
>>
>> Yes, I think so (but haven't tested it just now), but it's a relatively
>> small change to t/test-lib-functions.sh.
>
> Let it be noted that I fully agree with Junio that good eyes and careful
> reading are very much appreciated. And that in this case, that would have
> implied noticing that `test_must_fail` is reserved for Git commands.
>
> Scalar is not (yet?) a Git command.

"test-tool" isn't "git" either, so I think this argument is a
non-starter.

As the documentation for "test_must_fail" notes the distinction is
whether something is "system-supplied". I.e. we're not going to test
whether "grep" segfaults, but we should test our own code to see if it
segfaults.

The scalar code is code we ship and test, so we should use the helper
that doesn't hide a segfault.

I don't understand why you wouldn't think that's the obvious fix here,
adding "scalar" to that whitelist is a one-line fix, and clearly yields
a more useful end result than a test silently hiding segfaults.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 0/7] scalar: implement the subcommand "diagnose"
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
                           ` (7 preceding siblings ...)
  2022-05-19 19:23         ` [PATCH v5 0/7] scalar: implement the subcommand "diagnose" Junio C Hamano
@ 2022-05-21 15:08         ` Johannes Schindelin via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
                             ` (7 more replies)
  8 siblings, 8 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-21 15:08 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin

Over the course of the years, we developed a sub-command that gathers
diagnostic data into a .zip file that can then be attached to bug reports.
This sub-command turned out to be very useful in helping Scalar developers
identify and fix issues.

Changes since v5:

 * Reworded the missed mentions of the old name of the --add-virtual-file
   option (thanks René!).
 * Renamed misleading variable name from $QUOTED to $PATHNAME (thanks
   Junio!).

Changes since v4:

 * Squashed in Junio's suggested fixups
 * Renamed the option from --add-file-with-content=<name>:<content> to
   --add-virtual-file=<name>:<content>
 * Fixed one instance where I had used error() instead of error_errno().

Changes since v3:

 * We're now using unquote_c_style() instead of rolling our own unquoter.
 * Fixed the added regression test.
 * As pointed out by Scalar's Functional Tests, the
   add_directory_to_archiver() function should not fail when scalar diagnose
   encounters FSMonitor's Unix socket, but only warn instead.
 * Related: add_directory_to_archiver() needs to propagate errors from
   processing subdirectories so that the top-level call returns an error,
   too.

Changes since v2:

 * Clarified in the commit message what the biggest benefit of
   --add-file-with-content is.
 * The <path> part of the -add-file-with-content argument can now contain
   colons. To do this, the path needs to start and end in double-quote
   characters (which are stripped), and the backslash serves as escape
   character in that case (to allow the path to contain both colons and
   double-quotes).
 * Fixed incorrect grammar.
 * Instead of strcmp(<what-we-don't-want>), we now say
   !strcmp(<what-we-want>).
 * The help text for --add-file-with-content was improved a tiny bit.
 * Adjusted the commit message that still talked about spawning plenty of
   processes and about a throw-away repository for the sake of generating a
   .zip file.
 * Simplified the code that shows the diagnostics and adds them to the .zip
   file.
 * The final message that reports that the archive is complete is now
   printed to stderr instead of stdout.

Changes since v1:

 * Instead of creating a throw-away repository, staging the contents of the
   .zip file and then using git write-tree and git archive to write the .zip
   file, the patch series now introduces a new option to git archive and
   uses write_archive() directly (avoiding any separate process).
 * Since the command avoids separate processes, it is now blazing fast on
   Windows, and I dropped the spinner() function because it's no longer
   needed.
 * While reworking the test case, I noticed that scalar [...] <enlistment>
   failed to verify that the specified directory exists, and would happily
   "traverse to its parent directory" on its quest to find a Scalar
   enlistment. That is of course incorrect, and has been fixed as a "while
   at it" sort of preparatory commit.
 * I had forgotten to sign off on all the commits, which has been fixed.
 * Instead of some "home-grown" readdir()-based function, the code now uses
   for_each_file_in_pack_dir() to look through the pack directories.
 * If any alternates are configured, their pack directories are now included
   in the output.
 * The commit message that might be interpreted to promise information about
   large loose files has been corrected to no longer promise that.
 * The test cases have been adjusted to test a little bit more (e.g.
   verifying that specific paths are mentioned in the output, instead of
   merely verifying that the output is non-empty).

Johannes Schindelin (5):
  archive: optionally add "virtual" files
  archive --add-virtual-file: allow paths containing colons
  scalar: validate the optional enlistment argument
  Implement `scalar diagnose`
  scalar diagnose: include disk space information

Matthew John Cheetham (2):
  scalar: teach `diagnose` to gather packfile info
  scalar: teach `diagnose` to gather loose objects information

 Documentation/git-archive.txt    |  17 ++
 archive.c                        |  63 ++++++-
 contrib/scalar/scalar.c          | 292 ++++++++++++++++++++++++++++++-
 contrib/scalar/scalar.txt        |  12 ++
 contrib/scalar/t/t9099-scalar.sh |  27 +++
 t/t5003-archive-zip.sh           |  20 +++
 6 files changed, 421 insertions(+), 10 deletions(-)


base-commit: ddc35d833dd6f9e8946b09cecd3311b8aa18d295
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1128%2Fdscho%2Fscalar-diagnose-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1128/dscho/scalar-diagnose-v6
Pull-Request: https://github.com/gitgitgadget/git/pull/1128

Range-diff vs v5:

 1:  42e73fb0aac ! 1:  0005cfae31d archive: optionally add "virtual" files
     @@ Metadata
       ## Commit message ##
          archive: optionally add "virtual" files
      
     -    With the `--add-file-with-content=<path>:<content>` option, `git
     -    archive` now supports use cases where relatively trivial files need to
     -    be added that do not exist on disk.
     +    With the `--add-virtual-file=<path>:<content>` option, `git archive` now
     +    supports use cases where relatively trivial files need to be added that
     +    do not exist on disk.
      
          This will allow us to generate `.zip` files with generated content,
          without having to add said content to the object database and without
 2:  b5ebd61066a ! 2:  7eebcf27b45 archive --add-file-with-contents: allow paths containing colons
     @@ Metadata
      Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
      
       ## Commit message ##
     -    archive --add-file-with-contents: allow paths containing colons
     +    archive --add-virtual-file: allow paths containing colons
      
          By allowing the path to be enclosed in double-quotes, we can avoid
          the limitation that paths cannot contain colons.
     @@ t/t5003-archive-zip.sh: check_zip with_untracked
       test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
      +	if test_have_prereq FUNNYNAMES
      +	then
     -+		QUOTED=quoted:colon
     ++		PATHNAME=quoted:colon
      +	else
     -+		QUOTED=quoted
     ++		PATHNAME=quoted
      +	fi &&
       	git archive --format=zip >with_file_with_content.zip \
     -+		--add-virtual-file=\"$QUOTED\": \
     ++		--add-virtual-file=\"$PATHNAME\": \
       		--add-virtual-file=hello:world $EMPTY_TREE &&
       	test_when_finished "rm -rf tmp-unpack" &&
       	mkdir tmp-unpack && (
       		cd tmp-unpack &&
       		"$GIT_UNZIP" ../with_file_with_content.zip &&
       		test_path_is_file hello &&
     -+		test_path_is_file $QUOTED &&
     ++		test_path_is_file $PATHNAME &&
       		test world = $(cat hello)
       	)
       '
 3:  f1ba69c02d7 = 3:  ca83ddd5eed scalar: validate the optional enlistment argument
 4:  3fb90194744 = 4:  89c13a45e00 Implement `scalar diagnose`
 5:  2e645b08a9e = 5:  8ffbaad3086 scalar diagnose: include disk space information
 6:  0fa20d73750 = 6:  15cd7f17896 scalar: teach `diagnose` to gather packfile info
 7:  62e173b47cf = 7:  a4a74d5ef58 scalar: teach `diagnose` to gather loose objects information

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 1/7] archive: optionally add "virtual" files
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
@ 2022-05-21 15:08           ` Johannes Schindelin via GitGitGadget
  2022-05-25 21:11             ` Junio C Hamano
  2022-05-21 15:08           ` [PATCH v6 2/7] archive --add-virtual-file: allow paths containing colons Johannes Schindelin via GitGitGadget
                             ` (6 subsequent siblings)
  7 siblings, 1 reply; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-21 15:08 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

With the `--add-virtual-file=<path>:<content>` option, `git archive` now
supports use cases where relatively trivial files need to be added that
do not exist on disk.

This will allow us to generate `.zip` files with generated content,
without having to add said content to the object database and without
having to write it out to disk.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 11 ++++++++
 archive.c                     | 53 +++++++++++++++++++++++++++++------
 t/t5003-archive-zip.sh        | 12 ++++++++
 3 files changed, 68 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index bc4e76a7834..893cb1075bf 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -61,6 +61,17 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 
+--add-virtual-file=<path>:<content>::
+	Add the specified contents to the archive.  Can be repeated to add
+	multiple files.  The path of the file in the archive is built
+	by concatenating the value for `--prefix` (if any) and the
+	basename of <file>.
++
+The `<path>` cannot contain any colon, the file mode is limited to
+a regular file, and the option may be subject to platform-dependent
+command-line limits. For non-trivial cases, write an untracked file
+and use `--add-file` instead.
+
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
 	as well (see <<ATTRIBUTES>>).
diff --git a/archive.c b/archive.c
index a3bbb091256..d20e16fa819 100644
--- a/archive.c
+++ b/archive.c
@@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
 struct extra_file_info {
 	char *base;
 	struct stat stat;
+	void *content;
 };
 
 int write_archive_entries(struct archiver_args *args,
@@ -337,7 +338,13 @@ int write_archive_entries(struct archiver_args *args,
 		strbuf_addstr(&path_in_archive, basename(path));
 
 		strbuf_reset(&content);
-		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
+		if (info->content)
+			err = write_entry(args, &fake_oid, path_in_archive.buf,
+					  path_in_archive.len,
+					  info->stat.st_mode,
+					  info->content, info->stat.st_size);
+		else if (strbuf_read_file(&content, path,
+					  info->stat.st_size) < 0)
 			err = error_errno(_("could not read '%s'"), path);
 		else
 			err = write_entry(args, &fake_oid, path_in_archive.buf,
@@ -493,6 +500,7 @@ static void extra_file_info_clear(void *util, const char *str)
 {
 	struct extra_file_info *info = util;
 	free(info->base);
+	free(info->content);
 	free(info);
 }
 
@@ -514,14 +522,40 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 	if (!arg)
 		return -1;
 
-	path = prefix_filename(args->prefix, arg);
-	item = string_list_append_nodup(&args->extra_files, path);
-	item->util = info = xmalloc(sizeof(*info));
+	info = xmalloc(sizeof(*info));
 	info->base = xstrdup_or_null(base);
-	if (stat(path, &info->stat))
-		die(_("File not found: %s"), path);
-	if (!S_ISREG(info->stat.st_mode))
-		die(_("Not a regular file: %s"), path);
+
+	if (!strcmp(opt->long_name, "add-file")) {
+		path = prefix_filename(args->prefix, arg);
+		if (stat(path, &info->stat))
+			die(_("File not found: %s"), path);
+		if (!S_ISREG(info->stat.st_mode))
+			die(_("Not a regular file: %s"), path);
+		info->content = NULL; /* read the file later */
+	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
+		const char *colon = strchr(arg, ':');
+		char *p;
+
+		if (!colon)
+			die(_("missing colon: '%s'"), arg);
+
+		p = xstrndup(arg, colon - arg);
+		if (!args->prefix)
+			path = p;
+		else {
+			path = prefix_filename(args->prefix, p);
+			free(p);
+		}
+		memset(&info->stat, 0, sizeof(info->stat));
+		info->stat.st_mode = S_IFREG | 0644;
+		info->content = xstrdup(colon + 1);
+		info->stat.st_size = strlen(info->content);
+	} else {
+		BUG("add_file_cb() called for %s", opt->long_name);
+	}
+	item = string_list_append_nodup(&args->extra_files, path);
+	item->util = info;
+
 	return 0;
 }
 
@@ -554,6 +588,9 @@ static int parse_archive_args(int argc, const char **argv,
 		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
 		  N_("add untracked file to archive"), 0, add_file_cb,
 		  (intptr_t)&base },
+		{ OPTION_CALLBACK, 0, "add-virtual-file", args,
+		  N_("path:content"), N_("add untracked file to archive"), 0,
+		  add_file_cb, (intptr_t)&base },
 		OPT_STRING('o', "output", &output, N_("file"),
 			N_("write the archive to this file")),
 		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 1e6d18b140e..ebc26e89a9b 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -206,6 +206,18 @@ test_expect_success 'git archive --format=zip --add-file' '
 check_zip with_untracked
 check_added with_untracked untracked untracked
 
+test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
+	git archive --format=zip >with_file_with_content.zip \
+		--add-virtual-file=hello:world $EMPTY_TREE &&
+	test_when_finished "rm -rf tmp-unpack" &&
+	mkdir tmp-unpack && (
+		cd tmp-unpack &&
+		"$GIT_UNZIP" ../with_file_with_content.zip &&
+		test_path_is_file hello &&
+		test world = $(cat hello)
+	)
+'
+
 test_expect_success 'git archive --format=zip --add-file twice' '
 	echo untracked >untracked &&
 	git archive --format=zip --prefix=one/ --add-file=untracked \
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 2/7] archive --add-virtual-file: allow paths containing colons
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
@ 2022-05-21 15:08           ` Johannes Schindelin via GitGitGadget
  2022-05-25 20:22             ` Junio C Hamano
  2022-05-21 15:08           ` [PATCH v6 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
                             ` (5 subsequent siblings)
  7 siblings, 1 reply; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-21 15:08 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

By allowing the path to be enclosed in double-quotes, we can avoid
the limitation that paths cannot contain colons.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 14 ++++++++++----
 archive.c                     | 30 ++++++++++++++++++++----------
 t/t5003-archive-zip.sh        |  8 ++++++++
 3 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index 893cb1075bf..54de945a84e 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -67,10 +67,16 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 +
-The `<path>` cannot contain any colon, the file mode is limited to
-a regular file, and the option may be subject to platform-dependent
-command-line limits. For non-trivial cases, write an untracked file
-and use `--add-file` instead.
+The `<path>` argument can start and end with a literal double-quote
+character; The contained file name is interpreted as a C-style string,
+i.e. the backslash is interpreted as escape character. The path must
+be quoted if it contains a colon, to avoid the colon from being
+misinterpreted as the separator between the path and the contents, or
+if the path begins or ends with a double-quote character.
++
+The file mode is limited to a regular file, and the option may be
+subject to platform-dependent command-line limits. For non-trivial
+cases, write an untracked file and use `--add-file` instead.
 
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
diff --git a/archive.c b/archive.c
index d20e16fa819..b7756b91200 100644
--- a/archive.c
+++ b/archive.c
@@ -9,6 +9,7 @@
 #include "parse-options.h"
 #include "unpack-trees.h"
 #include "dir.h"
+#include "quote.h"
 
 static char const * const archive_usage[] = {
 	N_("git archive [<options>] <tree-ish> [<path>...]"),
@@ -533,22 +534,31 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 			die(_("Not a regular file: %s"), path);
 		info->content = NULL; /* read the file later */
 	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
-		const char *colon = strchr(arg, ':');
-		char *p;
+		struct strbuf buf = STRBUF_INIT;
+		const char *p = arg;
+
+		if (*p != '"')
+			p = strchr(p, ':');
+		else if (unquote_c_style(&buf, p, &p) < 0)
+			die(_("unclosed quote: '%s'"), arg);
 
-		if (!colon)
+		if (!p || *p != ':')
 			die(_("missing colon: '%s'"), arg);
 
-		p = xstrndup(arg, colon - arg);
-		if (!args->prefix)
-			path = p;
-		else {
-			path = prefix_filename(args->prefix, p);
-			free(p);
+		if (p == arg)
+			die(_("empty file name: '%s'"), arg);
+
+		path = buf.len ?
+			strbuf_detach(&buf, NULL) : xstrndup(arg, p - arg);
+
+		if (args->prefix) {
+			char *save = path;
+			path = prefix_filename(args->prefix, path);
+			free(save);
 		}
 		memset(&info->stat, 0, sizeof(info->stat));
 		info->stat.st_mode = S_IFREG | 0644;
-		info->content = xstrdup(colon + 1);
+		info->content = xstrdup(p + 1);
 		info->stat.st_size = strlen(info->content);
 	} else {
 		BUG("add_file_cb() called for %s", opt->long_name);
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index ebc26e89a9b..3a5a052e8ce 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -207,13 +207,21 @@ check_zip with_untracked
 check_added with_untracked untracked untracked
 
 test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
+	if test_have_prereq FUNNYNAMES
+	then
+		PATHNAME=quoted:colon
+	else
+		PATHNAME=quoted
+	fi &&
 	git archive --format=zip >with_file_with_content.zip \
+		--add-virtual-file=\"$PATHNAME\": \
 		--add-virtual-file=hello:world $EMPTY_TREE &&
 	test_when_finished "rm -rf tmp-unpack" &&
 	mkdir tmp-unpack && (
 		cd tmp-unpack &&
 		"$GIT_UNZIP" ../with_file_with_content.zip &&
 		test_path_is_file hello &&
+		test_path_is_file $PATHNAME &&
 		test world = $(cat hello)
 	)
 '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 3/7] scalar: validate the optional enlistment argument
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 2/7] archive --add-virtual-file: allow paths containing colons Johannes Schindelin via GitGitGadget
@ 2022-05-21 15:08           ` Johannes Schindelin via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
                             ` (4 subsequent siblings)
  7 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-21 15:08 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `scalar` command needs a Scalar enlistment for many subcommands, and
looks in the current directory for such an enlistment (traversing the
parent directories until it finds one).

These is subcommands can also be called with an optional argument
specifying the enlistment. Here, too, we traverse parent directories as
needed, until we find an enlistment.

However, if the specified directory does not even exist, or is not a
directory, we should stop right there, with an error message.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 6 ++++--
 contrib/scalar/t/t9099-scalar.sh | 5 +++++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 1ce9c2b00e8..00dcd4b50ef 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -43,9 +43,11 @@ static void setup_enlistment_directory(int argc, const char **argv,
 		usage_with_options(usagestr, options);
 
 	/* find the worktree, determine its corresponding root */
-	if (argc == 1)
+	if (argc == 1) {
 		strbuf_add_absolute_path(&path, argv[0]);
-	else if (strbuf_getcwd(&path) < 0)
+		if (!is_directory(path.buf))
+			die(_("'%s' does not exist"), path.buf);
+	} else if (strbuf_getcwd(&path) < 0)
 		die(_("need a working directory"));
 
 	strbuf_trim_trailing_dir_sep(&path);
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 2e1502ad45e..9d83fdf25e8 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -85,4 +85,9 @@ test_expect_success 'scalar delete with enlistment' '
 	test_path_is_missing cloned
 '
 
+test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
+	! scalar run config cloned 2>err &&
+	grep "cloned. does not exist" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 4/7] Implement `scalar diagnose`
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
                             ` (2 preceding siblings ...)
  2022-05-21 15:08           ` [PATCH v6 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
@ 2022-05-21 15:08           ` Johannes Schindelin via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
                             ` (3 subsequent siblings)
  7 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-21 15:08 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Over the course of Scalar's development, it became obvious that there is
a need for a command that can gather all kinds of useful information
that can help identify the most typical problems with large
worktrees/repositories.

The `diagnose` command is the culmination of this hard-won knowledge: it
gathers the installed hooks, the config, a couple statistics describing
the data shape, among other pieces of information, and then wraps
everything up in a tidy, neat `.zip` archive.

Note: originally, Scalar was implemented in C# using the .NET API, where
we had the luxury of a comprehensive standard library that includes
basic functionality such as writing a `.zip` file. In the C version, we
lack such a commodity. Rather than introducing a dependency on, say,
libzip, we slightly abuse Git's `archive` machinery: we write out a
`.zip` of the empty try, augmented by a couple files that are added via
the `--add-file*` options. We are careful trying not to modify the
current repository in any way lest the very circumstances that required
`scalar diagnose` to be run are changed by the `diagnose` run itself.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 144 +++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt        |  12 +++
 contrib/scalar/t/t9099-scalar.sh |  14 +++
 3 files changed, 170 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 00dcd4b50ef..53213f9a3b9 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,6 +11,7 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
+#include "archive.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -261,6 +262,47 @@ static int unregister_dir(void)
 	return res;
 }
 
+static int add_directory_to_archiver(struct strvec *archiver_args,
+					  const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir = opendir(at_root ? "." : path);
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	if (!dir)
+		return error_errno(_("could not open directory '%s'"), path);
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if (e->d_type == DT_REG)
+			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
+		else if (e->d_type != DT_DIR)
+			warning(_("skipping '%s', which is neither file nor "
+				  "directory"), buf.buf);
+		else if (recurse &&
+			 add_directory_to_archiver(archiver_args,
+						   buf.buf, recurse) < 0)
+			res = -1;
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -501,6 +543,107 @@ cleanup:
 	return res;
 }
 
+static int cmd_diagnose(int argc, const char **argv)
+{
+	struct option options[] = {
+		OPT_END(),
+	};
+	const char * const usage[] = {
+		N_("scalar diagnose [<enlistment>]"),
+		NULL
+	};
+	struct strbuf zip_path = STRBUF_INIT;
+	struct strvec archiver_args = STRVEC_INIT;
+	char **argv_copy = NULL;
+	int stdout_fd = -1, archiver_fd = -1;
+	time_t now = time(NULL);
+	struct tm tm;
+	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
+	int res = 0;
+
+	argc = parse_options(argc, argv, NULL, options,
+			     usage, 0);
+
+	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
+
+	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
+	strbuf_addftime(&zip_path,
+			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
+	strbuf_addstr(&zip_path, ".zip");
+	switch (safe_create_leading_directories(zip_path.buf)) {
+	case SCLD_EXISTS:
+	case SCLD_OK:
+		break;
+	default:
+		error_errno(_("could not create directory for '%s'"),
+			    zip_path.buf);
+		goto diagnose_cleanup;
+	}
+	stdout_fd = dup(1);
+	if (stdout_fd < 0) {
+		res = error_errno(_("could not duplicate stdout"));
+		goto diagnose_cleanup;
+	}
+
+	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
+		res = error_errno(_("could not redirect output"));
+		goto diagnose_cleanup;
+	}
+
+	init_zip_archiver();
+	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	write_or_die(stdout_fd, buf.buf, buf.len);
+	strvec_pushf(&archiver_args,
+		     "--add-virtual-file=diagnostics.log:%.*s",
+		     (int)buf.len, buf.buf);
+
+	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	strvec_pushl(&archiver_args, "--prefix=",
+		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
+
+	/* `write_archive()` modifies the `argv` passed to it. Let it. */
+	argv_copy = xmemdupz(archiver_args.v,
+			     sizeof(char *) * archiver_args.nr);
+	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
+			    the_repository, NULL, 0);
+	if (res) {
+		error(_("failed to write archive"));
+		goto diagnose_cleanup;
+	}
+
+	if (!res)
+		fprintf(stderr, "\n"
+		       "Diagnostics complete.\n"
+		       "All of the gathered info is captured in '%s'\n",
+		       zip_path.buf);
+
+diagnose_cleanup:
+	if (archiver_fd >= 0) {
+		close(1);
+		dup2(stdout_fd, 1);
+	}
+	free(argv_copy);
+	strvec_clear(&archiver_args);
+	strbuf_release(&zip_path);
+	strbuf_release(&path);
+	strbuf_release(&buf);
+
+	return res;
+}
+
 static int cmd_list(int argc, const char **argv)
 {
 	if (argc != 1)
@@ -802,6 +945,7 @@ static struct {
 	{ "reconfigure", cmd_reconfigure },
 	{ "delete", cmd_delete },
 	{ "version", cmd_version },
+	{ "diagnose", cmd_diagnose },
 	{ NULL, NULL},
 };
 
diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
index f416d637289..22583fe046e 100644
--- a/contrib/scalar/scalar.txt
+++ b/contrib/scalar/scalar.txt
@@ -14,6 +14,7 @@ scalar register [<enlistment>]
 scalar unregister [<enlistment>]
 scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
 scalar reconfigure [ --all | <enlistment> ]
+scalar diagnose [<enlistment>]
 scalar delete <enlistment>
 
 DESCRIPTION
@@ -129,6 +130,17 @@ reconfigure the enlistment.
 With the `--all` option, all enlistments currently registered with Scalar
 will be reconfigured. Use this option after each Scalar upgrade.
 
+Diagnose
+~~~~~~~~
+
+diagnose [<enlistment>]::
+    When reporting issues with Scalar, it is often helpful to provide the
+    information gathered by this command, including logs and certain
+    statistics describing the data shape of the current enlistment.
++
+The output of this command is a `.zip` file that is written into
+a directory adjacent to the worktree in the `src` directory.
+
 Delete
 ~~~~~~
 
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 9d83fdf25e8..6802d317258 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -90,4 +90,18 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 	grep "cloned. does not exist" err
 '
 
+SQ="'"
+test_expect_success UNZIP 'scalar diagnose' '
+	scalar clone "file://$(pwd)" cloned --single-branch &&
+	scalar diagnose cloned >out 2>err &&
+	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
+	zip_path=$(cat zip_path) &&
+	test -n "$zip_path" &&
+	unzip -v "$zip_path" &&
+	folder=${zip_path%.zip} &&
+	test_path_is_missing "$folder" &&
+	unzip -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 5/7] scalar diagnose: include disk space information
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
                             ` (3 preceding siblings ...)
  2022-05-21 15:08           ` [PATCH v6 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-05-21 15:08           ` Johannes Schindelin via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
                             ` (2 subsequent siblings)
  7 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-21 15:08 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When analyzing problems with large worktrees/repositories, it is useful
to know how close to a "full disk" situation Scalar/Git operates. Let's
include this information.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 53 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  1 +
 2 files changed, 54 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 53213f9a3b9..0a9e25a57f8 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -303,6 +303,58 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 	return res;
 }
 
+#ifndef WIN32
+#include <sys/statvfs.h>
+#endif
+
+static int get_disk_info(struct strbuf *out)
+{
+#ifdef WIN32
+	struct strbuf buf = STRBUF_INIT;
+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
+	DWORD serial_number, component_length, flags;
+	ULARGE_INTEGER avail2caller, total, avail;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
+		error(_("could not determine free disk size for '%s'"),
+		      buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
+				   &serial_number, &component_length, &flags,
+				   fs_name, sizeof(fs_name))) {
+		error(_("could not get info for '%s'"), buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
+	strbuf_addch(out, '\n');
+	strbuf_release(&buf);
+#else
+	struct strbuf buf = STRBUF_INIT;
+	struct statvfs stat;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (statvfs(buf.buf, &stat) < 0) {
+		error_errno(_("could not determine free disk size for '%s'"),
+			    buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
+	strbuf_release(&buf);
+#endif
+	return 0;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -599,6 +651,7 @@ static int cmd_diagnose(int argc, const char **argv)
 	get_version_info(&buf, 1);
 
 	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	get_disk_info(&buf);
 	write_or_die(stdout_fd, buf.buf, buf.len);
 	strvec_pushf(&archiver_args,
 		     "--add-virtual-file=diagnostics.log:%.*s",
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 6802d317258..934b2485d91 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -94,6 +94,7 @@ SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	scalar diagnose cloned >out 2>err &&
+	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
 	zip_path=$(cat zip_path) &&
 	test -n "$zip_path" &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 6/7] scalar: teach `diagnose` to gather packfile info
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
                             ` (4 preceding siblings ...)
  2022-05-21 15:08           ` [PATCH v6 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
@ 2022-05-21 15:08           ` Matthew John Cheetham via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
  2022-05-28 23:11           ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Junio C Hamano
  7 siblings, 0 replies; 140+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-05-21 15:08 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

It's helpful to see if there are other crud files in the pack
directory. Let's teach the `scalar diagnose` command to gather
file size information about pack files.

While at it, also enumerate the pack files in the alternate
object directories, if any are registered.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 30 ++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  6 +++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 0a9e25a57f8..d302c27e114 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "help.h"
 #include "archive.h"
+#include "object-store.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -595,6 +596,29 @@ cleanup:
 	return res;
 }
 
+static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
+				   const char *file_name, void *data)
+{
+	struct strbuf *buf = data;
+	struct stat st;
+
+	if (!stat(full_path, &st))
+		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
+			    (uintmax_t)st.st_size);
+}
+
+static int dir_file_stats(struct object_directory *object_dir, void *data)
+{
+	struct strbuf *buf = data;
+
+	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
+
+	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
+				  data);
+
+	return 0;
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -657,6 +681,12 @@ static int cmd_diagnose(int argc, const char **argv)
 		     "--add-virtual-file=diagnostics.log:%.*s",
 		     (int)buf.len, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
+	dir_file_stats(the_repository->objects->odb, &buf);
+	foreach_alt_odb(dir_file_stats, &buf);
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 934b2485d91..3dd5650cceb 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -93,6 +93,8 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
+	git repack &&
+	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
 	scalar diagnose cloned >out 2>err &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
@@ -102,7 +104,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	folder=${zip_path%.zip} &&
 	test_path_is_missing "$folder" &&
 	unzip -p "$zip_path" diagnostics.log >out &&
-	test_file_not_empty out
+	test_file_not_empty out &&
+	unzip -p "$zip_path" packs-local.txt >out &&
+	grep "$(pwd)/.git/objects" out
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 7/7] scalar: teach `diagnose` to gather loose objects information
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
                             ` (5 preceding siblings ...)
  2022-05-21 15:08           ` [PATCH v6 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
@ 2022-05-21 15:08           ` Matthew John Cheetham via GitGitGadget
  2022-05-28 23:11           ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Junio C Hamano
  7 siblings, 0 replies; 140+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-05-21 15:08 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

When operating at the scale that Scalar wants to support, certain data
shapes are more likely to cause undesirable performance issues, such as
large numbers of loose objects.

By including statistics about this, `scalar diagnose` now makes it
easier to identify such scenarios.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 59 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  5 ++-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index d302c27e114..0c278681758 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -619,6 +619,60 @@ static int dir_file_stats(struct object_directory *object_dir, void *data)
 	return 0;
 }
 
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -687,6 +741,11 @@ static int cmd_diagnose(int argc, const char **argv)
 	foreach_alt_odb(dir_file_stats, &buf);
 	strvec_push(&archiver_args, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
+	loose_objs_stats(&buf, ".git/objects");
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 3dd5650cceb..72023a1ca1d 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -95,6 +95,7 @@ test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	git repack &&
 	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
+	test_commit -C cloned/src loose &&
 	scalar diagnose cloned >out 2>err &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
@@ -106,7 +107,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	unzip -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
 	unzip -p "$zip_path" packs-local.txt >out &&
-	grep "$(pwd)/.git/objects" out
+	grep "$(pwd)/.git/objects" out &&
+	unzip -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [1-9]" out
 '
 
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-21  9:54                 ` Ævar Arnfjörð Bjarmason
@ 2022-05-22  5:50                   ` Junio C Hamano
  2022-05-24 12:25                     ` Johannes Schindelin
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-05-22  5:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> Scalar is not (yet?) a Git command.
>
> "test-tool" isn't "git" either, so I think this argument is a
> non-starter.
>
> As the documentation for "test_must_fail" notes the distinction is
> whether something is "system-supplied". I.e. we're not going to test
> whether "grep" segfaults, but we should test our own code to see if it
> segfaults.
>
> The scalar code is code we ship and test, so we should use the helper
> that doesn't hide a segfault.
>
> I don't understand why you wouldn't think that's the obvious fix here,
> adding "scalar" to that whitelist is a one-line fix, and clearly yields
> a more useful end result than a test silently hiding segfaults.

FWIW, I don't, either.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-22  5:50                   ` Junio C Hamano
@ 2022-05-24 12:25                     ` Johannes Schindelin
  2022-05-24 18:11                       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 140+ messages in thread
From: Johannes Schindelin @ 2022-05-24 12:25 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason,
	Johannes Schindelin via GitGitGadget, git, René Scharfe,
	Taylor Blau, Derrick Stolee, Elijah Newren

[-- Attachment #1: Type: text/plain, Size: 1252 bytes --]

Hi Junio,

On Sat, 21 May 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
> >> Scalar is not (yet?) a Git command.
> >
> > "test-tool" isn't "git" either, so I think this argument is a
> > non-starter.
> >
> > As the documentation for "test_must_fail" notes the distinction is
> > whether something is "system-supplied". I.e. we're not going to test
> > whether "grep" segfaults, but we should test our own code to see if it
> > segfaults.
> >
> > The scalar code is code we ship and test, so we should use the helper
> > that doesn't hide a segfault.
> >
> > I don't understand why you wouldn't think that's the obvious fix here,
> > adding "scalar" to that whitelist is a one-line fix, and clearly yields
> > a more useful end result than a test silently hiding segfaults.
>
> FWIW, I don't, either.

Because we are still talking about code that lives as much encapsulated
inside `contrib/scalar/` as possible.

The `! scalar` call is in `contrib/scalar/t/t9099-scalar.sh`.

To make it work with Git's test suite, you would have to bleed an
implementation detail of something inside `contrib/` into
`t/test-lib-functions.sh`.

Not what we want, at this stage.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-24 12:25                     ` Johannes Schindelin
@ 2022-05-24 18:11                       ` Ævar Arnfjörð Bjarmason
  2022-05-24 19:29                         ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-24 18:11 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Johannes Schindelin via GitGitGadget, git,
	René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren


On Tue, May 24 2022, Johannes Schindelin wrote:

> Hi Junio,
>
> On Sat, 21 May 2022, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>
>> >> Scalar is not (yet?) a Git command.
>> >
>> > "test-tool" isn't "git" either, so I think this argument is a
>> > non-starter.
>> >
>> > As the documentation for "test_must_fail" notes the distinction is
>> > whether something is "system-supplied". I.e. we're not going to test
>> > whether "grep" segfaults, but we should test our own code to see if it
>> > segfaults.
>> >
>> > The scalar code is code we ship and test, so we should use the helper
>> > that doesn't hide a segfault.
>> >
>> > I don't understand why you wouldn't think that's the obvious fix here,
>> > adding "scalar" to that whitelist is a one-line fix, and clearly yields
>> > a more useful end result than a test silently hiding segfaults.
>>
>> FWIW, I don't, either.
>
> Because we are still talking about code that lives as much encapsulated
> inside `contrib/scalar/` as possible.
>
> The `! scalar` call is in `contrib/scalar/t/t9099-scalar.sh`.
>
> To make it work with Git's test suite, you would have to bleed an
> implementation detail of something inside `contrib/` into
> `t/test-lib-functions.sh`.

The "scalar" command is already built by the top-level Makefile, so I
don't think the distinction you're trying to maintain here even exists
in practice.

I.e. if we ran with this strict reasoning then surely "scalar" belongs
on there just as much as "test-tool" does.

Both are built by our main build process, and thus should have
corresponding adjustments in our main test code, just as is already the
case for both "git" and "test-tool".

But even if that wasn't the case I'd still be of the view that we should
add "scalar" to that list.

It's just a matter of potential time sinks in the future. If we
introduce a hidden segfault in the scalar code and don't notice for some
time because we're using that test pattern that's going to suck, and
likely to waste a lot of time. We might even ship a broken command to
users.

Whereas having "scalar" on that list is going to be a relatively easy
matter of grepping and doing some boilerplate changes if and when we
ever "git rm" it entirely, or "promote it" from contrib or whatever.

I also think that just getting rid of that whitelist entirely is an
acceptable solution. Perhaps it's just being overzealous in forbidding
everything except "git", we should still not use it for the likes of
"grep", but we could just leave that to the documentation.

But I suspect Junio would disagree with that, so in lieu of that ...

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-24 18:11                       ` Ævar Arnfjörð Bjarmason
@ 2022-05-24 19:29                         ` Junio C Hamano
  2022-05-25 10:31                           ` Johannes Schindelin
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-05-24 19:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Both are built by our main build process, and thus should have
> corresponding adjustments in our main test code, just as is already the
> case for both "git" and "test-tool".
>
> But even if that wasn't the case I'd still be of the view that we should
> add "scalar" to that list.
>
> It's just a matter of potential time sinks in the future. If we
> introduce a hidden segfault in the scalar code and don't notice for some
> time because we're using that test pattern that's going to suck, and
> likely to waste a lot of time. We might even ship a broken command to
> users.
>
> Whereas having "scalar" on that list is going to be a relatively easy
> matter of grepping and doing some boilerplate changes if and when we
> ever "git rm" it entirely, or "promote it" from contrib or whatever.

In addition, it already is an actual time sink that causes us send a
lot more bytes back and forth than the number of bytes necessary to
send a reroll that adds one liner to the same step.

> I also think that just getting rid of that whitelist entirely is an
> acceptable solution. Perhaps it's just being overzealous in forbidding
> everything except "git", we should still not use it for the likes of
> "grep", but we could just leave that to the documentation.

It indeed is tempting entry into a slippery slope, and I'd see it as
a change bigger than we could comfortably make as a "while at it"
change.

We can stop arguing and instead send in a reroll that squashes in
something like this, which shouldn't be controversial, I would say.

 t/test-lib-functions.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git i/t/test-lib-functions.sh w/t/test-lib-functions.sh
index 93c03380d4..8899eaabed 100644
--- i/t/test-lib-functions.sh
+++ w/t/test-lib-functions.sh
@@ -1106,7 +1106,7 @@ test_must_fail_acceptable () {
 	fi
 
 	case "$1" in
-	git|__git*|test-tool|test_terminal)
+	git|__git*|scalar|test-tool|test_terminal)
 		return 0
 		;;
 	*)




^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-24 19:29                         ` Junio C Hamano
@ 2022-05-25 10:31                           ` Johannes Schindelin
  0 siblings, 0 replies; 140+ messages in thread
From: Johannes Schindelin @ 2022-05-25 10:31 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason,
	Johannes Schindelin via GitGitGadget, git, René Scharfe,
	Taylor Blau, Derrick Stolee, Elijah Newren

Hi Junio,

On Tue, 24 May 2022, Junio C Hamano wrote:

> We can stop arguing and instead send in a reroll that squashes in
> something like this, which shouldn't be controversial, I would say.
>
>  t/test-lib-functions.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git i/t/test-lib-functions.sh w/t/test-lib-functions.sh
> index 93c03380d4..8899eaabed 100644
> --- i/t/test-lib-functions.sh
> +++ w/t/test-lib-functions.sh
> @@ -1106,7 +1106,7 @@ test_must_fail_acceptable () {
>  	fi
>
>  	case "$1" in
> -	git|__git*|test-tool|test_terminal)
> +	git|__git*|scalar|test-tool|test_terminal)
>  		return 0
>  		;;
>  	*)
>
>
>
>

It is still wrong to adjust Git's test suite for a user that is not part
of Git proper. But if your pragmatism says that this is the only way we
can venture on to more productive venues, I won't argue against that :-)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6 2/7] archive --add-virtual-file: allow paths containing colons
  2022-05-21 15:08           ` [PATCH v6 2/7] archive --add-virtual-file: allow paths containing colons Johannes Schindelin via GitGitGadget
@ 2022-05-25 20:22             ` Junio C Hamano
  2022-05-25 21:42               ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-05-25 20:22 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, René Scharfe, Taylor Blau, Derrick Stolee,
	Elijah Newren, rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

>  test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
> +	if test_have_prereq FUNNYNAMES
> +	then
> +		PATHNAME=quoted:colon
> +	else
> +		PATHNAME=quoted
> +	fi &&
>  	git archive --format=zip >with_file_with_content.zip \
> +		--add-virtual-file=\"$PATHNAME\": \

The name is better, but this still limits what can be in PATHNAME.

Write either one of these:

		--add-virtual-file="\"$PATHNAME\":" \
		--add-virtual-file=\""$PATHNAME"\": \

to signal the intention better to future readers.  We are showing an
explicit dq-pair we want to pass to the c-unquote machinery, and we
are showing that we are not being unnecessarily loose by protecting
the string from getting word split.

Either is fine, but leaving it unquoted is not.

> +		test_path_is_file $PATHNAME &&

Ditto.  There is no reason to forbid future developers from futzing
the test to include space in the PATHNAME variable.  

IOW, I want us to be better than saying

    I know there is no $IFS whitespace now because I just wrote it.
    Because I do not think there is any need to test with a string
    with whitespace in it, I will leave the variable unquoted.
    Anybody who changes the variable and breaks this assumption have
    only themselves to blame for breaking the tests.  It is not my
    fault and it is not my problem.

which is the signal our readers would get from this patch (I would,
if I were reading this commit as a third-party), especially once
they become aware of the fact that this exact issue was already
pointed out during the review discussion.

Using double-quote appropriately sends a strong signal to reviewers
and future developers that we care about details.

A valid alternative is to write the assumption out where we
currently assign to PATHNAME.

	# The PATHNAME variable is used without quote in the code
	# below for such and such reasons, so you cannot use a $IFS
	# whitespace in it.
	if test_have_prereq FUNNYNAMES
	then
		...

If the "defensive" measure that is necessary to avoid a limitation
is too onerous, such an approach may be very much more preferrable
than preparing for future changes.  "for such and such reasons" is
a good place to justify why we avoid unnecessarily complex defensive
measure and restrict future changes in the documented way.

But in _this_ particular case, the "defensive" measure necessary is
merely just to quote the shell variables properly, which nobody
sensible would say too onerous.  I couldn't come up with anything
remotely plausible to fill "for such and such reasons" myself when I
tried to justify leaving the variables unquoted.

Regardless of the quoting issue, we probably want to comment on what
value exactly is in PATHNAME before the assignment, by the way.

E.g.

	# The PATHNAME variable holds a filename encoded like a
	# string constant in C language (e.g. "\060" is digit "0")
	if test_have_prereq FUNNYNAMES
	then
		PATHNAME=quoted:colon:\\060zero
	else
		PATHNAME=quoted\\060zero
	fi

That would not just protect only one aspect (i.e. we can pass a
colon into the resulting filename) this change but the path goes
through the c-unquoting rules.

Thanks.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6 1/7] archive: optionally add "virtual" files
  2022-05-21 15:08           ` [PATCH v6 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
@ 2022-05-25 21:11             ` Junio C Hamano
  2022-05-26  9:09               ` René Scharfe
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-05-25 21:11 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, René Scharfe, Taylor Blau, Derrick Stolee,
	Elijah Newren, rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> @@ -61,6 +61,17 @@ OPTIONS
>  	by concatenating the value for `--prefix` (if any) and the
>  	basename of <file>.
>  
> +--add-virtual-file=<path>:<content>::
> +	Add the specified contents to the archive.  Can be repeated to add
> +	multiple files.  The path of the file in the archive is built
> +	by concatenating the value for `--prefix` (if any) and the
> +	basename of <file>.

This sentence was copy-pasted from --add-file without adjusting.
There is no <file>; this new feature gives <path>.

Also, I suspect that the feature is losing end-user supplied
information without a good reason.  --add-file=<file> may have
prepared an input in a randomly named temporary directory and it
would make quite a lot of sense to strip the leading directory
components from <file> and use only the basename part.  But the
<path> given to "--add-virtual-file" does not refer to anything on
the filesystem.  Its ONLY use is to be used as the path in the
archive to store the content.  There is no justification why we
would discard the leading path components from it.  I am not
decided, but I am inclined to say that we should not honor
"--prefix".

   $ git archive --prefix=2.36.0 v2.36.0

would be a way to create a single directory and put everything in
the tree-ish in there, but there probably are cases where the user
of an "extra file" feature wants to add untracked cruft _in_ that
directory, and there are other cases where an extra file wants to go
to the top-level next to the 2.36.0 directory.  A user can use the
same string as --prefix=<base> in front of <path> if the extra file
should go next to the top-level of the tree-ish, or without such
prefixing to place the extra file at the top-level.

Hence

	Add the specified contents to the archive.  Can be repeated
	to add multiple files.  `<path>` is used as the path of the
	file in the archive.
	
would be what I would expect in a version of this feature that is
reasonably designed.

> ++
> +The `<path>` cannot contain any colon, the file mode is limited to
> +a regular file, and the option may be subject to platform-dependent
> +command-line limits. For non-trivial cases, write an untracked file
> +and use `--add-file` instead.

OK.

> diff --git a/archive.c b/archive.c
> index a3bbb091256..d20e16fa819 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
>  struct extra_file_info {
>  	char *base;
>  	struct stat stat;
> +	void *content;
>  };
>  
>  int write_archive_entries(struct archiver_args *args,
> @@ -337,7 +338,13 @@ int write_archive_entries(struct archiver_args *args,
>  		strbuf_addstr(&path_in_archive, basename(path));
>  
>  		strbuf_reset(&content);
> -		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
> +		if (info->content)

We ended up with the problematic "leading <path> components are
discarded" design only because the implementation reuses the logic
path_in_archive computation (the last line is seen in precontext),
which is a bit unfortunate.  I think we could rewrite the inside of
that "for each extra file" loop like so, instead:

	for (i = 0; i < args->extra_files.nr; i++) {
		struct string_list_item *item = args->extra_files.items + i;
		char *path = item->string;
		struct extra_file_info *info = item->util;

		put_be64(fake_oid.hash, i + 1);

		if (!info->content) {
			strbuf_reset(&path_in_archive);
			if (info->base)
				strbuf_addstr(&path_in_archive, info->base);
			strbuf_addstr(&path_in_archive, basename(path));

			strbuf_reset(&content);
			if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
				err = error_errno(_("could not read '%s'"), path);
			else
				err = write_entry(args, &fake_oid, path_in_archive.buf,
						  path_in_archive.len,
						  info->stat.st_mode,
						  content.buf, content.len);
		} else {
			err = write_entry(args, &fake_oid,
					  path, strlen(path),
					  info->stat.st_mode,
					  info->content, info->stat.st_size);
		}

		if (err)
			break;
	}

The first half is the original code for "--add-file", which clears
info->content to NULL.  We mangle the filename to come up with the
name in the archive (i.e. take basename and prefix with info->base).

The "else" side is the new code.  "--add-virtual-file" has the
"<path>" thing in item->string, and info has the contents, so we
just write it out.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6 2/7] archive --add-virtual-file: allow paths containing colons
  2022-05-25 20:22             ` Junio C Hamano
@ 2022-05-25 21:42               ` Junio C Hamano
  2022-05-25 22:34                 ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-05-25 21:42 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, René Scharfe, Taylor Blau, Derrick Stolee,
	Elijah Newren, rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin

Junio C Hamano <gitster@pobox.com> writes:

> But in _this_ particular case, the "defensive" measure necessary is
> merely just to quote the shell variables properly, which nobody
> sensible would say too onerous.  I couldn't come up with anything
> remotely plausible to fill "for such and such reasons" myself when I
> tried to justify leaving the variables unquoted.
>
> Regardless of the quoting issue, we probably want to comment on what
> value exactly is in PATHNAME before the assignment, by the way.
>
> E.g.
>
> 	# The PATHNAME variable holds a filename encoded like a
> 	# string constant in C language (e.g. "\060" is digit "0")
> 	if test_have_prereq FUNNYNAMES
> 	then
> 		PATHNAME=quoted:colon:\\060zero
> 	else
> 		PATHNAME=quoted\\060zero
> 	fi
>
> That would not just protect only one aspect (i.e. we can pass a
> colon into the resulting filename) this change but the path goes
> through the c-unquoting rules.

Actually, I _think_ that pushes us beyond the "reasonably defensive
for the current need".  We'd need to prepare how the pathname is
expected to be unquoted for the later test

	test_path_is_file "$PATHNAME"

to work.  So here is what I queued as a fixup for this step on top
of the series.

 t/t5003-archive-zip.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git c/t/t5003-archive-zip.sh w/t/t5003-archive-zip.sh
index 3a5a052e8c..6addb6c684 100755
--- c/t/t5003-archive-zip.sh
+++ w/t/t5003-archive-zip.sh
@@ -209,19 +209,19 @@ check_added with_untracked untracked untracked
 test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
 	if test_have_prereq FUNNYNAMES
 	then
-		PATHNAME=quoted:colon
+		PATHNAME="pathname with : colon"
 	else
-		PATHNAME=quoted
+		PATHNAME="pathname without colon"
 	fi &&
 	git archive --format=zip >with_file_with_content.zip \
-		--add-virtual-file=\"$PATHNAME\": \
+		--add-virtual-file=\""$PATHNAME"\": \
 		--add-virtual-file=hello:world $EMPTY_TREE &&
 	test_when_finished "rm -rf tmp-unpack" &&
 	mkdir tmp-unpack && (
 		cd tmp-unpack &&
 		"$GIT_UNZIP" ../with_file_with_content.zip &&
 		test_path_is_file hello &&
-		test_path_is_file $PATHNAME &&
+		test_path_is_file "$PATHNAME" &&
 		test world = $(cat hello)
 	)
 '

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6 2/7] archive --add-virtual-file: allow paths containing colons
  2022-05-25 21:42               ` Junio C Hamano
@ 2022-05-25 22:34                 ` Junio C Hamano
  0 siblings, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2022-05-25 22:34 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, René Scharfe, Taylor Blau, Derrick Stolee,
	Elijah Newren, rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin

Junio C Hamano <gitster@pobox.com> writes:

>> 	# The PATHNAME variable holds a filename encoded like a
>> 	# string constant in C language (e.g. "\060" is digit "0")
>> 	if test_have_prereq FUNNYNAMES
>> 	then
>> 		PATHNAME=quoted:colon:\\060zero
>> 	...
> Actually, I _think_ that pushes us beyond the "reasonably defensive
> for the current need".  We'd need to prepare how the pathname is
> expected to be unquoted for the later test
>
> 	test_path_is_file "$PATHNAME"
>
> to work.

IOW, I would need to add a new test-tool (attached) and then start
this test like so:

	if ...
	then
		PATHNAME=quoted:colon:\\060zero
	else
		PATHNAME=quoted\\060zero
	fi
	UQPATHNAME=$(test-tool unquote-c-style \""$PATHNAME"\")

and change the last test to

	test_path_is_file "$UQPATHNAME"

if we really wanted to test that the the PATHNAME is treated as a
c-style quoted string.

I am on the fence.  We do not have an immediate need, in the sense
that nobody needs to encode "0" as "\060" and trigger the unquote
codepath in real life.  But it does feel prudent to make sure we can
grok C-quoted pathname as we claim in the documentation.

And the resulting change to the test does not look _too_ bad (and
the new test-tool certainly does not hurt, either).

So...


 Makefile               |  1 +
 t/helper/test-quoted.c | 34 ++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c   |  2 ++
 t/helper/test-tool.h   |  2 ++
 4 files changed, 39 insertions(+)

diff --git c/Makefile w/Makefile
index 298becd5a5..1d544ad46a 100644
--- c/Makefile
+++ w/Makefile
@@ -749,6 +749,7 @@ TEST_BUILTINS_OBJS += test-pkt-line.o
 TEST_BUILTINS_OBJS += test-prio-queue.o
 TEST_BUILTINS_OBJS += test-proc-receive.o
 TEST_BUILTINS_OBJS += test-progress.o
+TEST_BUILTINS_OBJS += test-quoted.o
 TEST_BUILTINS_OBJS += test-reach.o
 TEST_BUILTINS_OBJS += test-read-cache.o
 TEST_BUILTINS_OBJS += test-read-graph.o
diff --git c/t/helper/test-quoted.c w/t/helper/test-quoted.c
new file mode 100644
index 0000000000..15baa55e43
--- /dev/null
+++ w/t/helper/test-quoted.c
@@ -0,0 +1,34 @@
+#include "test-tool.h"
+#include "cache.h"
+#include "quote.h"
+
+int cmd__unquote_c_style(int argc, const char **argv)
+{
+	struct strbuf buf = STRBUF_INIT;
+
+	while (*++argv) {
+		const char *p = *argv;
+
+		if (unquote_c_style(&buf, p, &p) < 0)
+			error("cannot unquote '%s'", *argv);
+		else
+			printf("%s\n", buf.buf);
+		strbuf_reset(&buf);
+	}
+	return 0;
+}
+
+int cmd__quote_c_style(int argc, const char **argv)
+{
+	struct strbuf buf = STRBUF_INIT;
+
+	while (*++argv) {
+		const char *p = *argv;
+
+		quote_c_style(p, &buf, NULL, 0);
+		printf("%s\n", buf.buf);
+		strbuf_reset(&buf);
+	}
+	return 0;
+}
+
diff --git c/t/helper/test-tool.c w/t/helper/test-tool.c
index d2eacd302d..5633c98569 100644
--- c/t/helper/test-tool.c
+++ w/t/helper/test-tool.c
@@ -58,6 +58,7 @@ static struct test_cmd cmds[] = {
 	{ "prio-queue", cmd__prio_queue },
 	{ "proc-receive", cmd__proc_receive },
 	{ "progress", cmd__progress },
+	{ "quote-c-style", cmd__quote_c_style },
 	{ "reach", cmd__reach },
 	{ "read-cache", cmd__read_cache },
 	{ "read-graph", cmd__read_graph },
@@ -81,6 +82,7 @@ static struct test_cmd cmds[] = {
 	{ "submodule-nested-repo-config", cmd__submodule_nested_repo_config },
 	{ "subprocess", cmd__subprocess },
 	{ "trace2", cmd__trace2 },
+	{ "unquote-c-style", cmd__unquote_c_style },
 	{ "userdiff", cmd__userdiff },
 	{ "urlmatch-normalization", cmd__urlmatch_normalization },
 	{ "xml-encode", cmd__xml_encode },
diff --git c/t/helper/test-tool.h w/t/helper/test-tool.h
index 960cc27ef7..f5e8929009 100644
--- c/t/helper/test-tool.h
+++ w/t/helper/test-tool.h
@@ -48,6 +48,7 @@ int cmd__pkt_line(int argc, const char **argv);
 int cmd__prio_queue(int argc, const char **argv);
 int cmd__proc_receive(int argc, const char **argv);
 int cmd__progress(int argc, const char **argv);
+int cmd__quote_c_style(int argc, const char **argv);
 int cmd__reach(int argc, const char **argv);
 int cmd__read_cache(int argc, const char **argv);
 int cmd__read_graph(int argc, const char **argv);
@@ -71,6 +72,7 @@ int cmd__submodule_config(int argc, const char **argv);
 int cmd__submodule_nested_repo_config(int argc, const char **argv);
 int cmd__subprocess(int argc, const char **argv);
 int cmd__trace2(int argc, const char **argv);
+int cmd__unquote_c_style(int argc, const char **argv);
 int cmd__userdiff(int argc, const char **argv);
 int cmd__urlmatch_normalization(int argc, const char **argv);
 int cmd__xml_encode(int argc, const char **argv);

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6 1/7] archive: optionally add "virtual" files
  2022-05-25 21:11             ` Junio C Hamano
@ 2022-05-26  9:09               ` René Scharfe
  2022-05-26 17:10                 ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: René Scharfe @ 2022-05-26  9:09 UTC (permalink / raw)
  To: Junio C Hamano, Johannes Schindelin via GitGitGadget
  Cc: git, Taylor Blau, Derrick Stolee, Elijah Newren, rsbecker,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin

Am 25.05.22 um 23:11 schrieb Junio C Hamano:
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
>> @@ -61,6 +61,17 @@ OPTIONS
>>  	by concatenating the value for `--prefix` (if any) and the
>>  	basename of <file>.
>>
>> +--add-virtual-file=<path>:<content>::
>> +	Add the specified contents to the archive.  Can be repeated to add
>> +	multiple files.  The path of the file in the archive is built
>> +	by concatenating the value for `--prefix` (if any) and the
>> +	basename of <file>.
>
> This sentence was copy-pasted from --add-file without adjusting.
> There is no <file>; this new feature gives <path>.
>
> Also, I suspect that the feature is losing end-user supplied
> information without a good reason.  --add-file=<file> may have
> prepared an input in a randomly named temporary directory and it
> would make quite a lot of sense to strip the leading directory
> components from <file> and use only the basename part.  But the
> <path> given to "--add-virtual-file" does not refer to anything on
> the filesystem.  Its ONLY use is to be used as the path in the
> archive to store the content.  There is no justification why we
> would discard the leading path components from it.

Good point.

>  I am not
> decided, but I am inclined to say that we should not honor
> "--prefix".
>
>    $ git archive --prefix=2.36.0 v2.36.0
>
> would be a way to create a single directory and put everything in
> the tree-ish in there, but there probably are cases where the user
> of an "extra file" feature wants to add untracked cruft _in_ that
> directory, and there are other cases where an extra file wants to go
> to the top-level next to the 2.36.0 directory.  A user can use the
> same string as --prefix=<base> in front of <path> if the extra file
> should go next to the top-level of the tree-ish, or without such
> prefixing to place the extra file at the top-level.

If the prefix is applied then a prefix-less extra file can by had by
using --prefix= or --no-prefix for it and --prefix=... for the tree,
e.g.:

   $ git archive --add-file=extra --prefix=dir/ v2.36.0

puts "extra" at the root and the rest under "dir".  The order of
arguments matters here, and the default prefix is the empty string.

So extra files can be put anywhere even if --prefix is honored.

Keeping the whole path from --add-virtual-file makes sense to me; I
slightly prefer applying --prefix on top of that for consistency.

René

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6 1/7] archive: optionally add "virtual" files
  2022-05-26  9:09               ` René Scharfe
@ 2022-05-26 17:10                 ` Junio C Hamano
  2022-05-26 18:57                   ` René Scharfe
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-05-26 17:10 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin via GitGitGadget, git, Taylor Blau,
	Derrick Stolee, Elijah Newren, rsbecker,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin

René Scharfe <l.s.r@web.de> writes:

> If the prefix is applied then a prefix-less extra file can by had by
> using --prefix= or --no-prefix for it and --prefix=... for the tree,
> e.g.:
>
>    $ git archive --add-file=extra --prefix=dir/ v2.36.0
>
> puts "extra" at the root and the rest under "dir".  The order of
> arguments matters here, and the default prefix is the empty string.

This was the part of the design for the original "--add-file" that I
was moderately unhappy with.  If "--add-file" were the only feature
that used "--prefix", I wouldn't have been unhappy, but this rule:

        The value of "--prefix" most recently seen at the point of
        "--add-file" is prepended.  (By the way, it is not clearly
        documented what happens when you give multiple prefix and
        when you give prefix before or after add-file)

makes the original use of "--prefix":

	The value given to "--prefix" is prepended to each filename
	in the archive.  (IOW "git archive --prefix=git-2.36.0/
	v2.36.0" is a way to prefix each and every path in the
	tree-ish with the given prefix)

confusing.  Does

	git archive --prefix=bonus-files/ --add-file=extra v2.36.0

place the main part of the archive also in bonus-files/ or at the
top level?  One reasonable interpretation is "yes", if we imagine
that each invocation of --add-file will consume and reset the prefix.
Another reasonable interpretation is "no", if we imagine that the
prefix last specified will stay around and equally affect both extra
ones and main part of the archive.

Unfortunately what the implmentation does is the latter, and those
who want to put the main part of the archive at the top-level must
add "--prefix=''" at the end (before the tree-ish).

Because of this potential for confusion ...

> So extra files can be put anywhere even if --prefix is honored.
>
> Keeping the whole path from --add-virtual-file makes sense to me; I
> slightly prefer applying --prefix on top of that for consistency.

... I was hoping that we can releave users from having to worry
about the interaction between "prefix" and contents coming from
outside the tree-ish by ignoring the "prefix".

But either is fine by me.

Thanks.



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6 1/7] archive: optionally add "virtual" files
  2022-05-26 17:10                 ` Junio C Hamano
@ 2022-05-26 18:57                   ` René Scharfe
  2022-05-26 20:16                     ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: René Scharfe @ 2022-05-26 18:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Taylor Blau,
	Derrick Stolee, Elijah Newren, rsbecker,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin

Am 26.05.22 um 19:10 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> If the prefix is applied then a prefix-less extra file can by had by
>> using --prefix= or --no-prefix for it and --prefix=... for the tree,
>> e.g.:
>>
>>    $ git archive --add-file=extra --prefix=dir/ v2.36.0
>>
>> puts "extra" at the root and the rest under "dir".  The order of
>> arguments matters here, and the default prefix is the empty string.
>
> This was the part of the design for the original "--add-file" that I
> was moderately unhappy with.  If "--add-file" were the only feature
> that used "--prefix", I wouldn't have been unhappy, but this rule:
>
>         The value of "--prefix" most recently seen at the point of
>         "--add-file" is prepended.  (By the way, it is not clearly
>         documented what happens when you give multiple prefix and
>         when you give prefix before or after add-file)

Regarding documentation: I wonder what's missing; a guess is below.

>
> makes the original use of "--prefix":
>
> 	The value given to "--prefix" is prepended to each filename
> 	in the archive.  (IOW "git archive --prefix=git-2.36.0/
> 	v2.36.0" is a way to prefix each and every path in the
> 	tree-ish with the given prefix)
>
> confusing.  Does
>
> 	git archive --prefix=bonus-files/ --add-file=extra v2.36.0
>
> place the main part of the archive also in bonus-files/ or at the
> top level?  One reasonable interpretation is "yes", if we imagine
> that each invocation of --add-file will consume and reset the prefix.
> Another reasonable interpretation is "no", if we imagine that the
> prefix last specified will stay around and equally affect both extra
> ones and main part of the archive.
>
> Unfortunately what the implmentation does is the latter, and those
> who want to put the main part of the archive at the top-level must
> add "--prefix=''" at the end (before the tree-ish).

A one-shot --prefix would be surprising -- usually options keep their
value until they are specified again with a different value or negated
(--no-...).  That surprise could be documented away by using a
different name like --next-prefix or --single-use-prefix.  But a
sub-option to a single option like that would probably be better baked
into that option, e.g. allow --add-file=<path_in_archive>:<path_in_fs>.

>
> Because of this potential for confusion ...
>
>> So extra files can be put anywhere even if --prefix is honored.
>>
>> Keeping the whole path from --add-virtual-file makes sense to me; I
>> slightly prefer applying --prefix on top of that for consistency.
>
> ... I was hoping that we can releave users from having to worry
> about the interaction between "prefix" and contents coming from
> outside the tree-ish by ignoring the "prefix".
>
> But either is fine by me.

The unusual thing about the current --prefix implementation is that its
current value is captured along the way instead of just using its
right-most value.  Not sure ignoring it for one of the three archive
content sources helps.  (Really, it's hard for me to put me in the shoes
of someone who doesn't know how these options are supposed to be used.)


--- >8 ---
Subject: [PATCH] archive: improve documentation of --prefix

Document the interaction between --add-file and --prefix by giving an
example.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 Documentation/git-archive.txt | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index bc4e76a783..10a48ab5f8 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -49,7 +49,9 @@ OPTIONS
 	Report progress to stderr.

 --prefix=<prefix>/::
-	Prepend <prefix>/ to each filename in the archive.
+	Prepend <prefix>/ to each filename in the archive.  Can be
+	specified multiple times; the last one seen when reading from
+	left to right is applied.

 -o <file>::
 --output=<file>::
@@ -58,8 +60,8 @@ OPTIONS
 --add-file=<file>::
 	Add a non-tracked file to the archive.  Can be repeated to add
 	multiple files.  The path of the file in the archive is built
-	by concatenating the value for `--prefix` (if any) and the
-	basename of <file>.
+	by concatenating the current value for `--prefix` (if any) and
+	the basename of <file>.

 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
@@ -194,6 +196,12 @@ EXAMPLES
 	commit on the current branch. Note that the output format is
 	inferred by the extension of the output file.

+`git archive -o latest.tar --prefix=build/ --add-file=configure --prefix= HEAD`::
+
+	Creates a tar archive that contains the contents of the latest
+	commit on the current branch with no prefix and the untracked
+	file 'configure' with the prefix 'build/'.
+
 `git config tar.tar.xz.command "xz -c"`::

 	Configure a "tar.xz" format for making LZMA-compressed tarfiles.
--
2.35.3

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6 1/7] archive: optionally add "virtual" files
  2022-05-26 18:57                   ` René Scharfe
@ 2022-05-26 20:16                     ` Junio C Hamano
  2022-05-27 17:02                       ` René Scharfe
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-05-26 20:16 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin via GitGitGadget, git, Taylor Blau,
	Derrick Stolee, Elijah Newren, rsbecker,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin

René Scharfe <l.s.r@web.de> writes:

> diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
> index bc4e76a783..10a48ab5f8 100644
> --- a/Documentation/git-archive.txt
> +++ b/Documentation/git-archive.txt
> @@ -49,7 +49,9 @@ OPTIONS
>  	Report progress to stderr.
>
>  --prefix=<prefix>/::
> -	Prepend <prefix>/ to each filename in the archive.
> +	Prepend <prefix>/ to each filename in the archive.  Can be
> +	specified multiple times; the last one seen when reading from
> +	left to right is applied.

That can be read to mean that we will use C consistently,

$ cmd --prefix=A other-args --prefix=B other-args --prefix=C other-args

which was what I am worried to be a source of confusion.

>  -o <file>::
>  --output=<file>::
> @@ -58,8 +60,8 @@ OPTIONS
>  --add-file=<file>::
>  	Add a non-tracked file to the archive.  Can be repeated to add
>  	multiple files.  The path of the file in the archive is built
> -	by concatenating the value for `--prefix` (if any) and the
> -	basename of <file>.
> +	by concatenating the current value for `--prefix` (if any) and
> +	the basename of <file>.

"the current value for `--prefix` (if any)" would work well once we
somehow make the reader form a mental model that there is "the
current" for the "prefix", which starts with an empty string, and
gets updated every time the "--prefix=<prefix>/" option is given.

So, perhaps with

	--prefix=<prefix>/::
		The paths of the files in the tree being archived,
		and untracked contents added via the `--add-file`
		and `--add-virtual-file` options, can be modified by
		prepending the "prefix" value that is in effect when
		these options or the tree object is seen on the
		command line.  The "prefix" value initially starts
		as an empty string, and it gets updated every time
		this option is given on the command line.

or something like that, with something like

> +	by concatenating the current value for "prefix" (see `--prefix`
> +	above) and the basename of <file>.

here, it might make it less misunderstanding-prone, hopefully?

> +`git archive -o latest.tar --prefix=build/ --add-file=configure --prefix= HEAD`::
> +
> +	Creates a tar archive that contains the contents of the latest
> +	commit on the current branch with no prefix and the untracked
> +	file 'configure' with the prefix 'build/'.

Great to have this example.

Thanks.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6 1/7] archive: optionally add "virtual" files
  2022-05-26 20:16                     ` Junio C Hamano
@ 2022-05-27 17:02                       ` René Scharfe
  2022-05-27 19:01                         ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: René Scharfe @ 2022-05-27 17:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Taylor Blau,
	Derrick Stolee, Elijah Newren, rsbecker,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin

Am 26.05.22 um 22:16 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
>> index bc4e76a783..10a48ab5f8 100644
>> --- a/Documentation/git-archive.txt
>> +++ b/Documentation/git-archive.txt
>> @@ -49,7 +49,9 @@ OPTIONS
>>  	Report progress to stderr.
>>
>>  --prefix=<prefix>/::
>> -	Prepend <prefix>/ to each filename in the archive.
>> +	Prepend <prefix>/ to each filename in the archive.  Can be
>> +	specified multiple times; the last one seen when reading from
>> +	left to right is applied.
>
> That can be read to mean that we will use C consistently,
>
> $ cmd --prefix=A other-args --prefix=B other-args --prefix=C other-args
>
> which was what I am worried to be a source of confusion.
>
>>  -o <file>::
>>  --output=<file>::
>> @@ -58,8 +60,8 @@ OPTIONS
>>  --add-file=<file>::
>>  	Add a non-tracked file to the archive.  Can be repeated to add
>>  	multiple files.  The path of the file in the archive is built
>> -	by concatenating the value for `--prefix` (if any) and the
>> -	basename of <file>.
>> +	by concatenating the current value for `--prefix` (if any) and
>> +	the basename of <file>.
>
> "the current value for `--prefix` (if any)" would work well once we
> somehow make the reader form a mental model that there is "the
> current" for the "prefix", which starts with an empty string, and
> gets updated every time the "--prefix=<prefix>/" option is given.

Right, "current" has a well-known meaning, but its not enough to convey
that the non-standard concept of capturing option values in the middle of
the argument list is used here.

>
> So, perhaps with
>
> 	--prefix=<prefix>/::
> 		The paths of the files in the tree being archived,
> 		and untracked contents added via the `--add-file`
> 		and `--add-virtual-file` options, can be modified by
> 		prepending the "prefix" value that is in effect when
> 		these options or the tree object is seen on the
> 		command line.  The "prefix" value initially starts
> 		as an empty string, and it gets updated every time
> 		this option is given on the command line.
>
> or something like that, with something like
>
>> +	by concatenating the current value for "prefix" (see `--prefix`
>> +	above) and the basename of <file>.
>
> here, it might make it less misunderstanding-prone, hopefully?

So how about this, which avoids mentioning the idea of a "current"
option, or of updating its value (which implies an order that might not
be obvious)?

--- >8 ---
Subject: [PATCH v2] archive: improve documentation of --prefix

Document the interaction between --add-file and --prefix by giving an
example.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 Documentation/git-archive.txt | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index bc4e76a783..9c0e306c03 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -49,7 +49,9 @@ OPTIONS
 	Report progress to stderr.

 --prefix=<prefix>/::
-	Prepend <prefix>/ to each filename in the archive.
+	Prepend <prefix>/ to paths in the archive.  Can be repeated; its
+	leftmost value is used for all tracked files.  See below which
+	value gets used by `--add-file`.

 -o <file>::
 --output=<file>::
@@ -58,8 +60,9 @@ OPTIONS
 --add-file=<file>::
 	Add a non-tracked file to the archive.  Can be repeated to add
 	multiple files.  The path of the file in the archive is built
-	by concatenating the value for `--prefix` (if any) and the
-	basename of <file>.
+	by concatenating the value of the leftmost `--prefix` option to
+	the right of this `--add-file` (if any) and the basename of
+	<file>.

 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
@@ -194,6 +197,12 @@ EXAMPLES
 	commit on the current branch. Note that the output format is
 	inferred by the extension of the output file.

+`git archive -o latest.tar --prefix=build/ --add-file=configure --prefix= HEAD`::
+
+	Creates a tar archive that contains the contents of the latest
+	commit on the current branch with no prefix and the untracked
+	file 'configure' with the prefix 'build/'.
+
 `git config tar.tar.xz.command "xz -c"`::

 	Configure a "tar.xz" format for making LZMA-compressed tarfiles.
--
2.35.3


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6 1/7] archive: optionally add "virtual" files
  2022-05-27 17:02                       ` René Scharfe
@ 2022-05-27 19:01                         ` Junio C Hamano
  2022-05-28  6:57                           ` René Scharfe
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-05-27 19:01 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin via GitGitGadget, git, Taylor Blau,
	Derrick Stolee, Elijah Newren, rsbecker,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin

René Scharfe <l.s.r@web.de> writes:

>  --prefix=<prefix>/::
> -	Prepend <prefix>/ to each filename in the archive.
> +	Prepend <prefix>/ to paths in the archive.  Can be repeated; its
> +	leftmost value is used for all tracked files.  See below which
> +	value gets used by `--add-file`.

Doesn't "the last one wins" take the rightmost one?

> @@ -58,8 +60,9 @@ OPTIONS
>  --add-file=<file>::
>  	Add a non-tracked file to the archive.  Can be repeated to add
>  	multiple files.  The path of the file in the archive is built
> -	by concatenating the value for `--prefix` (if any) and the
> -	basename of <file>.
> +	by concatenating the value of the leftmost `--prefix` option to
> +	the right of this `--add-file` (if any) and the basename of
> +	<file>.

It is not what archive.c::add_file_cb() seems to be doing, though

It is passed the pointer to "base" that is on-stack of
parse_archive_args(), which is the same variable that is used to
remember the latest value that was given to "--prefix".  Then it
concatenates the argument it received after that base value, so

    by concatenating the value of the last "--prefix" seen on the
    command line (if any) before this `--add-file` and the basename
    of <file>.

probably.  I always get my left and right mixed up X-<.

> @@ -194,6 +197,12 @@ EXAMPLES
>  	commit on the current branch. Note that the output format is
>  	inferred by the extension of the output file.
>
> +`git archive -o latest.tar --prefix=build/ --add-file=configure --prefix= HEAD`::
> +
> +	Creates a tar archive that contains the contents of the latest
> +	commit on the current branch with no prefix and the untracked
> +	file 'configure' with the prefix 'build/'.
> +
>  `git config tar.tar.xz.command "xz -c"`::
>
>  	Configure a "tar.xz" format for making LZMA-compressed tarfiles.

Thanks.

This patch probably needs to come before the "scalar diagnose"
series, which we haven't heard much about recently (no, I am not
complaining---we all heard that Dscho is busy).



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6 1/7] archive: optionally add "virtual" files
  2022-05-27 19:01                         ` Junio C Hamano
@ 2022-05-28  6:57                           ` René Scharfe
  0 siblings, 0 replies; 140+ messages in thread
From: René Scharfe @ 2022-05-28  6:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Taylor Blau,
	Derrick Stolee, Elijah Newren, rsbecker,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin

Am 27.05.22 um 21:01 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>  --prefix=<prefix>/::
>> -	Prepend <prefix>/ to each filename in the archive.
>> +	Prepend <prefix>/ to paths in the archive.  Can be repeated; its
>> +	leftmost value is used for all tracked files.  See below which
>> +	value gets used by `--add-file`.
>
> Doesn't "the last one wins" take the rightmost one?

Ha ha!  Classic mistake, I do that all the time, especially when in a
hurry. >_<

>
>> @@ -58,8 +60,9 @@ OPTIONS
>>  --add-file=<file>::
>>  	Add a non-tracked file to the archive.  Can be repeated to add
>>  	multiple files.  The path of the file in the archive is built
>> -	by concatenating the value for `--prefix` (if any) and the
>> -	basename of <file>.
>> +	by concatenating the value of the leftmost `--prefix` option to
>> +	the right of this `--add-file` (if any) and the basename of
>> +	<file>.
>
> It is not what archive.c::add_file_cb() seems to be doing, though
>
> It is passed the pointer to "base" that is on-stack of
> parse_archive_args(), which is the same variable that is used to
> remember the latest value that was given to "--prefix".  Then it
> concatenates the argument it received after that base value, so
>
>     by concatenating the value of the last "--prefix" seen on the
>     command line (if any) before this `--add-file` and the basename
>     of <file>.
>
> probably.  I always get my left and right mixed up X-<.

You too?  So yeah, avoiding the terms is appealing.

>
>> @@ -194,6 +197,12 @@ EXAMPLES
>>  	commit on the current branch. Note that the output format is
>>  	inferred by the extension of the output file.
>>
>> +`git archive -o latest.tar --prefix=build/ --add-file=configure --prefix= HEAD`::
>> +
>> +	Creates a tar archive that contains the contents of the latest
>> +	commit on the current branch with no prefix and the untracked
>> +	file 'configure' with the prefix 'build/'.
>> +
>>  `git config tar.tar.xz.command "xz -c"`::
>>
>>  	Configure a "tar.xz" format for making LZMA-compressed tarfiles.
>
> Thanks.
>
> This patch probably needs to come before the "scalar diagnose"
> series, which we haven't heard much about recently (no, I am not
> complaining---we all heard that Dscho is busy).
>
>

--- >8 ---
Subject: [PATCH v3] archive: improve documentation of --prefix

Document the interaction between --add-file and --prefix by giving an
example.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 Documentation/git-archive.txt | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index bc4e76a783..94519aae23 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -49,7 +49,9 @@ OPTIONS
 	Report progress to stderr.

 --prefix=<prefix>/::
-	Prepend <prefix>/ to each filename in the archive.
+	Prepend <prefix>/ to paths in the archive.  Can be repeated; its
+	rightmost value is used for all tracked files.  See below which
+	value gets used by `--add-file`.

 -o <file>::
 --output=<file>::
@@ -57,9 +59,9 @@ OPTIONS

 --add-file=<file>::
 	Add a non-tracked file to the archive.  Can be repeated to add
-	multiple files.  The path of the file in the archive is built
-	by concatenating the value for `--prefix` (if any) and the
-	basename of <file>.
+	multiple files.  The path of the file in the archive is built by
+	concatenating the value of the last `--prefix` option (if any)
+	before this `--add-file` and the basename of <file>.

 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
@@ -194,6 +196,12 @@ EXAMPLES
 	commit on the current branch. Note that the output format is
 	inferred by the extension of the output file.

+`git archive -o latest.tar --prefix=build/ --add-file=configure --prefix= HEAD`::
+
+	Creates a tar archive that contains the contents of the latest
+	commit on the current branch with no prefix and the untracked
+	file 'configure' with the prefix 'build/'.
+
 `git config tar.tar.xz.command "xz -c"`::

 	Configure a "tar.xz" format for making LZMA-compressed tarfiles.
--
2.35.3

^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6+ 0/7] js/scalar-diagnose rebased
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
                             ` (6 preceding siblings ...)
  2022-05-21 15:08           ` [PATCH v6 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
@ 2022-05-28 23:11           ` Junio C Hamano
  2022-05-28 23:11             ` [PATCH v6+ 1/7] archive: optionally add "virtual" files Junio C Hamano
                               ` (7 more replies)
  7 siblings, 8 replies; 140+ messages in thread
From: Junio C Hamano @ 2022-05-28 23:11 UTC (permalink / raw)
  To: git

Recent document clarification on the "--prefix" option of the "git
archive" command from René serves as a good basis for the
documentation of the "--add-virtual-file" option added by this
series, so here is my attempt to rebase js/scalar-diagnose topic
on it to hopefully help reduce Dscho's workload ;-)

Aside from obvious adjustments needed while rebasing onto the
updated documentation, there are only a couple of changes:

 - The way the <path> in --add-virtual-file=<path>:<contents> is
   used has been corrected.  Earlier, leading directory components
   of the <path> were all discarded and used nowhere, which made no
   sense.  The <path> is used as a whole, but for consistency with
   --add-file=<path>, <prefix> is still applied.

 - Overly loose quoting of variables in test scripts has been
   corrected.

Both changes have been in 'seen' from before the rebase.

1:  510f6b226b ! 1:  61522a0866 archive: optionally add "virtual" files
    @@ Commit message
     
      ## Documentation/git-archive.txt ##
     @@ Documentation/git-archive.txt: OPTIONS
    - 	by concatenating the value for `--prefix` (if any) and the
    - 	basename of <file>.
    + --prefix=<prefix>/::
    + 	Prepend <prefix>/ to paths in the archive.  Can be repeated; its
    + 	rightmost value is used for all tracked files.  See below which
    +-	value gets used by `--add-file`.
    ++	value gets used by `--add-file` and `--add-virtual-file`.
    + 
    + -o <file>::
    + --output=<file>::
    +@@ Documentation/git-archive.txt: OPTIONS
    + 	concatenating the value of the last `--prefix` option (if any)
    + 	before this `--add-file` and the basename of <file>.
      
     +--add-virtual-file=<path>:<content>::
     +	Add the specified contents to the archive.  Can be repeated to add
     +	multiple files.  The path of the file in the archive is built
    -+	by concatenating the value for `--prefix` (if any) and the
    -+	basename of <file>.
    ++	by concatenating the value of the last `--prefix` option (if any)
    ++	before this `--add-virtual-file` and `<path>`.
     ++
     +The `<path>` cannot contain any colon, the file mode is limited to
     +a regular file, and the option may be subject to platform-dependent
    @@ archive.c: static int queue_or_write_archive_entry(const struct object_id *oid,
      
      int write_archive_entries(struct archiver_args *args,
     @@ archive.c: int write_archive_entries(struct archiver_args *args,
    - 		strbuf_addstr(&path_in_archive, basename(path));
      
    - 		strbuf_reset(&content);
    + 		put_be64(fake_oid.hash, i + 1);
    + 
    +-		strbuf_reset(&path_in_archive);
    +-		if (info->base)
    +-			strbuf_addstr(&path_in_archive, info->base);
    +-		strbuf_addstr(&path_in_archive, basename(path));
    +-
    +-		strbuf_reset(&content);
     -		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
    -+		if (info->content)
    -+			err = write_entry(args, &fake_oid, path_in_archive.buf,
    -+					  path_in_archive.len,
    -+					  canon_mode(info->stat.st_mode),
    +-			err = error_errno(_("cannot read '%s'"), path);
    +-		else
    +-			err = write_entry(args, &fake_oid, path_in_archive.buf,
    +-					  path_in_archive.len,
    ++		if (!info->content) {
    ++			strbuf_reset(&path_in_archive);
    ++			if (info->base)
    ++				strbuf_addstr(&path_in_archive, info->base);
    ++			strbuf_addstr(&path_in_archive, basename(path));
    ++
    ++			strbuf_reset(&content);
    ++			if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
    ++				err = error_errno(_("could not read '%s'"), path);
    ++			else
    ++				err = write_entry(args, &fake_oid, path_in_archive.buf,
    ++						  path_in_archive.len,
    ++						  canon_mode(info->stat.st_mode),
    ++						  content.buf, content.len);
    ++		} else {
    ++			err = write_entry(args, &fake_oid,
    ++					  path, strlen(path),
    + 					  canon_mode(info->stat.st_mode),
    +-					  content.buf, content.len);
     +					  info->content, info->stat.st_size);
    -+		else if (strbuf_read_file(&content, path,
    -+					  info->stat.st_size) < 0)
    - 			err = error_errno(_("cannot read '%s'"), path);
    - 		else
    - 			err = write_entry(args, &fake_oid, path_in_archive.buf,
    ++		}
    ++
    + 		if (err)
    + 			break;
    + 	}
     @@ archive.c: static void extra_file_info_clear(void *util, const char *str)
      {
      	struct extra_file_info *info = util;
2:  208f4aad5f ! 2:  5e9d19a70f archive --add-virtual-file: allow paths containing colons
    @@ Commit message
     
      ## Documentation/git-archive.txt ##
     @@ Documentation/git-archive.txt: OPTIONS
    - 	by concatenating the value for `--prefix` (if any) and the
    - 	basename of <file>.
    + 	by concatenating the value of the last `--prefix` option (if any)
    + 	before this `--add-virtual-file` and `<path>`.
      +
     -The `<path>` cannot contain any colon, the file mode is limited to
     -a regular file, and the option may be subject to platform-dependent
     -command-line limits. For non-trivial cases, write an untracked file
     -and use `--add-file` instead.
     +The `<path>` argument can start and end with a literal double-quote
    -+character; The contained file name is interpreted as a C-style string,
    ++character; the contained file name is interpreted as a C-style string,
     +i.e. the backslash is interpreted as escape character. The path must
     +be quoted if it contains a colon, to avoid the colon from being
     +misinterpreted as the separator between the path and the contents, or
    @@ t/t5003-archive-zip.sh: check_zip with_untracked
      test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
     +	if test_have_prereq FUNNYNAMES
     +	then
    -+		PATHNAME=quoted:colon
    ++		PATHNAME="pathname with : colon"
     +	else
    -+		PATHNAME=quoted
    ++		PATHNAME="pathname without colon"
     +	fi &&
      	git archive --format=zip >with_file_with_content.zip \
    -+		--add-virtual-file=\"$PATHNAME\": \
    ++		--add-virtual-file=\""$PATHNAME"\": \
      		--add-virtual-file=hello:world $EMPTY_TREE &&
      	test_when_finished "rm -rf tmp-unpack" &&
      	mkdir tmp-unpack && (
      		cd tmp-unpack &&
      		"$GIT_UNZIP" ../with_file_with_content.zip &&
      		test_path_is_file hello &&
    -+		test_path_is_file $PATHNAME &&
    ++		test_path_is_file "$PATHNAME" &&
      		test world = $(cat hello)
      	)
      '
3:  bc1164404f = 3:  4f5b3aa775 scalar: validate the optional enlistment argument
4:  69daeb7d9d ! 4:  f4f070df8e Implement `scalar diagnose`
    @@ Metadata
     Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
     
      ## Commit message ##
    -    Implement `scalar diagnose`
    +    scalar: implement `scalar diagnose`
     
         Over the course of Scalar's development, it became obvious that there is
         a need for a command that can gather all kinds of useful information
5:  5c1ef19524 = 5:  0417d8abe4 scalar diagnose: include disk space information
6:  0325b9c3ab = 6:  5531b65ddb scalar: teach `diagnose` to gather packfile info
7:  8fee365b07 = 7:  ce9eba5e32 scalar: teach `diagnose` to gather loose objects information



^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6+ 1/7] archive: optionally add "virtual" files
  2022-05-28 23:11           ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Junio C Hamano
@ 2022-05-28 23:11             ` Junio C Hamano
  2022-05-28 23:11             ` [PATCH v6+ 2/7] archive --add-virtual-file: allow paths containing colons Junio C Hamano
                               ` (6 subsequent siblings)
  7 siblings, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2022-05-28 23:11 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

With the `--add-virtual-file=<path>:<content>` option, `git archive` now
supports use cases where relatively trivial files need to be added that
do not exist on disk.

This will allow us to generate `.zip` files with generated content,
without having to add said content to the object database and without
having to write it out to disk.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
[jc: tweaked <path> handling]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * The changes to the way how leading components of the <path> are
   not discarded and used made the "extra entries" handling into two
   separate code to independently come up with the path stored in
   the archive, as well as the contents stored in the archive.

   The explanation of how --prefix and --add-file interacts also
   applies to the new option.

 Documentation/git-archive.txt | 13 +++++-
 archive.c                     | 77 ++++++++++++++++++++++++++---------
 t/t5003-archive-zip.sh        | 12 ++++++
 3 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index 94519aae23..b41cc5bc2e 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -51,7 +51,7 @@ OPTIONS
 --prefix=<prefix>/::
 	Prepend <prefix>/ to paths in the archive.  Can be repeated; its
 	rightmost value is used for all tracked files.  See below which
-	value gets used by `--add-file`.
+	value gets used by `--add-file` and `--add-virtual-file`.
 
 -o <file>::
 --output=<file>::
@@ -63,6 +63,17 @@ OPTIONS
 	concatenating the value of the last `--prefix` option (if any)
 	before this `--add-file` and the basename of <file>.
 
+--add-virtual-file=<path>:<content>::
+	Add the specified contents to the archive.  Can be repeated to add
+	multiple files.  The path of the file in the archive is built
+	by concatenating the value of the last `--prefix` option (if any)
+	before this `--add-virtual-file` and `<path>`.
++
+The `<path>` cannot contain any colon, the file mode is limited to
+a regular file, and the option may be subject to platform-dependent
+command-line limits. For non-trivial cases, write an untracked file
+and use `--add-file` instead.
+
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
 	as well (see <<ATTRIBUTES>>).
diff --git a/archive.c b/archive.c
index e2121ebefb..d26f4ef945 100644
--- a/archive.c
+++ b/archive.c
@@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
 struct extra_file_info {
 	char *base;
 	struct stat stat;
+	void *content;
 };
 
 int write_archive_entries(struct archiver_args *args,
@@ -331,19 +332,27 @@ int write_archive_entries(struct archiver_args *args,
 
 		put_be64(fake_oid.hash, i + 1);
 
-		strbuf_reset(&path_in_archive);
-		if (info->base)
-			strbuf_addstr(&path_in_archive, info->base);
-		strbuf_addstr(&path_in_archive, basename(path));
-
-		strbuf_reset(&content);
-		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
-			err = error_errno(_("cannot read '%s'"), path);
-		else
-			err = write_entry(args, &fake_oid, path_in_archive.buf,
-					  path_in_archive.len,
+		if (!info->content) {
+			strbuf_reset(&path_in_archive);
+			if (info->base)
+				strbuf_addstr(&path_in_archive, info->base);
+			strbuf_addstr(&path_in_archive, basename(path));
+
+			strbuf_reset(&content);
+			if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
+				err = error_errno(_("could not read '%s'"), path);
+			else
+				err = write_entry(args, &fake_oid, path_in_archive.buf,
+						  path_in_archive.len,
+						  canon_mode(info->stat.st_mode),
+						  content.buf, content.len);
+		} else {
+			err = write_entry(args, &fake_oid,
+					  path, strlen(path),
 					  canon_mode(info->stat.st_mode),
-					  content.buf, content.len);
+					  info->content, info->stat.st_size);
+		}
+
 		if (err)
 			break;
 	}
@@ -493,6 +502,7 @@ static void extra_file_info_clear(void *util, const char *str)
 {
 	struct extra_file_info *info = util;
 	free(info->base);
+	free(info->content);
 	free(info);
 }
 
@@ -514,14 +524,40 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 	if (!arg)
 		return -1;
 
-	path = prefix_filename(args->prefix, arg);
-	item = string_list_append_nodup(&args->extra_files, path);
-	item->util = info = xmalloc(sizeof(*info));
+	info = xmalloc(sizeof(*info));
 	info->base = xstrdup_or_null(base);
-	if (stat(path, &info->stat))
-		die(_("File not found: %s"), path);
-	if (!S_ISREG(info->stat.st_mode))
-		die(_("Not a regular file: %s"), path);
+
+	if (!strcmp(opt->long_name, "add-file")) {
+		path = prefix_filename(args->prefix, arg);
+		if (stat(path, &info->stat))
+			die(_("File not found: %s"), path);
+		if (!S_ISREG(info->stat.st_mode))
+			die(_("Not a regular file: %s"), path);
+		info->content = NULL; /* read the file later */
+	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
+		const char *colon = strchr(arg, ':');
+		char *p;
+
+		if (!colon)
+			die(_("missing colon: '%s'"), arg);
+
+		p = xstrndup(arg, colon - arg);
+		if (!args->prefix)
+			path = p;
+		else {
+			path = prefix_filename(args->prefix, p);
+			free(p);
+		}
+		memset(&info->stat, 0, sizeof(info->stat));
+		info->stat.st_mode = S_IFREG | 0644;
+		info->content = xstrdup(colon + 1);
+		info->stat.st_size = strlen(info->content);
+	} else {
+		BUG("add_file_cb() called for %s", opt->long_name);
+	}
+	item = string_list_append_nodup(&args->extra_files, path);
+	item->util = info;
+
 	return 0;
 }
 
@@ -554,6 +590,9 @@ static int parse_archive_args(int argc, const char **argv,
 		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
 		  N_("add untracked file to archive"), 0, add_file_cb,
 		  (intptr_t)&base },
+		{ OPTION_CALLBACK, 0, "add-virtual-file", args,
+		  N_("path:content"), N_("add untracked file to archive"), 0,
+		  add_file_cb, (intptr_t)&base },
 		OPT_STRING('o', "output", &output, N_("file"),
 			N_("write the archive to this file")),
 		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index d726964307..d6027189e2 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -206,6 +206,18 @@ test_expect_success 'git archive --format=zip --add-file' '
 check_zip with_untracked
 check_added with_untracked untracked untracked
 
+test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
+	git archive --format=zip >with_file_with_content.zip \
+		--add-virtual-file=hello:world $EMPTY_TREE &&
+	test_when_finished "rm -rf tmp-unpack" &&
+	mkdir tmp-unpack && (
+		cd tmp-unpack &&
+		"$GIT_UNZIP" ../with_file_with_content.zip &&
+		test_path_is_file hello &&
+		test world = $(cat hello)
+	)
+'
+
 test_expect_success 'git archive --format=zip --add-file twice' '
 	echo untracked >untracked &&
 	git archive --format=zip --prefix=one/ --add-file=untracked \
-- 
2.36.1-385-g60203f3fdb


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6+ 2/7] archive --add-virtual-file: allow paths containing colons
  2022-05-28 23:11           ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Junio C Hamano
  2022-05-28 23:11             ` [PATCH v6+ 1/7] archive: optionally add "virtual" files Junio C Hamano
@ 2022-05-28 23:11             ` Junio C Hamano
  2022-06-15 18:16               ` Adam Dinwoodie
  2022-05-28 23:11             ` [PATCH v6+ 3/7] scalar: validate the optional enlistment argument Junio C Hamano
                               ` (5 subsequent siblings)
  7 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-05-28 23:11 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

By allowing the path to be enclosed in double-quotes, we can avoid
the limitation that paths cannot contain colons.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 * Tightened shell variable quoting

 Documentation/git-archive.txt | 14 ++++++++++----
 archive.c                     | 30 ++++++++++++++++++++----------
 t/t5003-archive-zip.sh        |  8 ++++++++
 3 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index b41cc5bc2e..56989a2f34 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -69,10 +69,16 @@ OPTIONS
 	by concatenating the value of the last `--prefix` option (if any)
 	before this `--add-virtual-file` and `<path>`.
 +
-The `<path>` cannot contain any colon, the file mode is limited to
-a regular file, and the option may be subject to platform-dependent
-command-line limits. For non-trivial cases, write an untracked file
-and use `--add-file` instead.
+The `<path>` argument can start and end with a literal double-quote
+character; the contained file name is interpreted as a C-style string,
+i.e. the backslash is interpreted as escape character. The path must
+be quoted if it contains a colon, to avoid the colon from being
+misinterpreted as the separator between the path and the contents, or
+if the path begins or ends with a double-quote character.
++
+The file mode is limited to a regular file, and the option may be
+subject to platform-dependent command-line limits. For non-trivial
+cases, write an untracked file and use `--add-file` instead.
 
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
diff --git a/archive.c b/archive.c
index d26f4ef945..48aba4ac46 100644
--- a/archive.c
+++ b/archive.c
@@ -9,6 +9,7 @@
 #include "parse-options.h"
 #include "unpack-trees.h"
 #include "dir.h"
+#include "quote.h"
 
 static char const * const archive_usage[] = {
 	N_("git archive [<options>] <tree-ish> [<path>...]"),
@@ -535,22 +536,31 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 			die(_("Not a regular file: %s"), path);
 		info->content = NULL; /* read the file later */
 	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
-		const char *colon = strchr(arg, ':');
-		char *p;
+		struct strbuf buf = STRBUF_INIT;
+		const char *p = arg;
+
+		if (*p != '"')
+			p = strchr(p, ':');
+		else if (unquote_c_style(&buf, p, &p) < 0)
+			die(_("unclosed quote: '%s'"), arg);
 
-		if (!colon)
+		if (!p || *p != ':')
 			die(_("missing colon: '%s'"), arg);
 
-		p = xstrndup(arg, colon - arg);
-		if (!args->prefix)
-			path = p;
-		else {
-			path = prefix_filename(args->prefix, p);
-			free(p);
+		if (p == arg)
+			die(_("empty file name: '%s'"), arg);
+
+		path = buf.len ?
+			strbuf_detach(&buf, NULL) : xstrndup(arg, p - arg);
+
+		if (args->prefix) {
+			char *save = path;
+			path = prefix_filename(args->prefix, path);
+			free(save);
 		}
 		memset(&info->stat, 0, sizeof(info->stat));
 		info->stat.st_mode = S_IFREG | 0644;
-		info->content = xstrdup(colon + 1);
+		info->content = xstrdup(p + 1);
 		info->stat.st_size = strlen(info->content);
 	} else {
 		BUG("add_file_cb() called for %s", opt->long_name);
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index d6027189e2..3992d08158 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -207,13 +207,21 @@ check_zip with_untracked
 check_added with_untracked untracked untracked
 
 test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
+	if test_have_prereq FUNNYNAMES
+	then
+		PATHNAME="pathname with : colon"
+	else
+		PATHNAME="pathname without colon"
+	fi &&
 	git archive --format=zip >with_file_with_content.zip \
+		--add-virtual-file=\""$PATHNAME"\": \
 		--add-virtual-file=hello:world $EMPTY_TREE &&
 	test_when_finished "rm -rf tmp-unpack" &&
 	mkdir tmp-unpack && (
 		cd tmp-unpack &&
 		"$GIT_UNZIP" ../with_file_with_content.zip &&
 		test_path_is_file hello &&
+		test_path_is_file "$PATHNAME" &&
 		test world = $(cat hello)
 	)
 '
-- 
2.36.1-385-g60203f3fdb


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6+ 3/7] scalar: validate the optional enlistment argument
  2022-05-28 23:11           ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Junio C Hamano
  2022-05-28 23:11             ` [PATCH v6+ 1/7] archive: optionally add "virtual" files Junio C Hamano
  2022-05-28 23:11             ` [PATCH v6+ 2/7] archive --add-virtual-file: allow paths containing colons Junio C Hamano
@ 2022-05-28 23:11             ` Junio C Hamano
  2022-05-28 23:11             ` [PATCH v6+ 4/7] scalar: implement `scalar diagnose` Junio C Hamano
                               ` (4 subsequent siblings)
  7 siblings, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2022-05-28 23:11 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `scalar` command needs a Scalar enlistment for many subcommands, and
looks in the current directory for such an enlistment (traversing the
parent directories until it finds one).

These is subcommands can also be called with an optional argument
specifying the enlistment. Here, too, we traverse parent directories as
needed, until we find an enlistment.

However, if the specified directory does not even exist, or is not a
directory, we should stop right there, with an error message.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 contrib/scalar/scalar.c          | 6 ++++--
 contrib/scalar/t/t9099-scalar.sh | 5 +++++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 58ca0e56f1..6d58c7a698 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -43,9 +43,11 @@ static void setup_enlistment_directory(int argc, const char **argv,
 		usage_with_options(usagestr, options);
 
 	/* find the worktree, determine its corresponding root */
-	if (argc == 1)
+	if (argc == 1) {
 		strbuf_add_absolute_path(&path, argv[0]);
-	else if (strbuf_getcwd(&path) < 0)
+		if (!is_directory(path.buf))
+			die(_("'%s' does not exist"), path.buf);
+	} else if (strbuf_getcwd(&path) < 0)
 		die(_("need a working directory"));
 
 	strbuf_trim_trailing_dir_sep(&path);
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 89781568f4..bb42354a8b 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -93,4 +93,9 @@ test_expect_success 'scalar supports -c/-C' '
 	test true = "$(git -C sub config core.preloadIndex)"
 '
 
+test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
+	! scalar run config cloned 2>err &&
+	grep "cloned. does not exist" err
+'
+
 test_done
-- 
2.36.1-385-g60203f3fdb


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6+ 4/7] scalar: implement `scalar diagnose`
  2022-05-28 23:11           ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Junio C Hamano
                               ` (2 preceding siblings ...)
  2022-05-28 23:11             ` [PATCH v6+ 3/7] scalar: validate the optional enlistment argument Junio C Hamano
@ 2022-05-28 23:11             ` Junio C Hamano
  2022-06-10  2:08               ` Ævar Arnfjörð Bjarmason
  2022-05-28 23:11             ` [PATCH v6+ 5/7] scalar diagnose: include disk space information Junio C Hamano
                               ` (3 subsequent siblings)
  7 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-05-28 23:11 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Over the course of Scalar's development, it became obvious that there is
a need for a command that can gather all kinds of useful information
that can help identify the most typical problems with large
worktrees/repositories.

The `diagnose` command is the culmination of this hard-won knowledge: it
gathers the installed hooks, the config, a couple statistics describing
the data shape, among other pieces of information, and then wraps
everything up in a tidy, neat `.zip` archive.

Note: originally, Scalar was implemented in C# using the .NET API, where
we had the luxury of a comprehensive standard library that includes
basic functionality such as writing a `.zip` file. In the C version, we
lack such a commodity. Rather than introducing a dependency on, say,
libzip, we slightly abuse Git's `archive` machinery: we write out a
`.zip` of the empty try, augmented by a couple files that are added via
the `--add-file*` options. We are careful trying not to modify the
current repository in any way lest the very circumstances that required
`scalar diagnose` to be run are changed by the `diagnose` run itself.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 contrib/scalar/scalar.c          | 144 +++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt        |  12 +++
 contrib/scalar/t/t9099-scalar.sh |  14 +++
 3 files changed, 170 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 6d58c7a698..a1e05a2146 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,6 +11,7 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
+#include "archive.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -260,6 +261,47 @@ static int unregister_dir(void)
 	return res;
 }
 
+static int add_directory_to_archiver(struct strvec *archiver_args,
+					  const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir = opendir(at_root ? "." : path);
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	if (!dir)
+		return error_errno(_("could not open directory '%s'"), path);
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if (e->d_type == DT_REG)
+			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
+		else if (e->d_type != DT_DIR)
+			warning(_("skipping '%s', which is neither file nor "
+				  "directory"), buf.buf);
+		else if (recurse &&
+			 add_directory_to_archiver(archiver_args,
+						   buf.buf, recurse) < 0)
+			res = -1;
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -500,6 +542,107 @@ static int cmd_clone(int argc, const char **argv)
 	return res;
 }
 
+static int cmd_diagnose(int argc, const char **argv)
+{
+	struct option options[] = {
+		OPT_END(),
+	};
+	const char * const usage[] = {
+		N_("scalar diagnose [<enlistment>]"),
+		NULL
+	};
+	struct strbuf zip_path = STRBUF_INIT;
+	struct strvec archiver_args = STRVEC_INIT;
+	char **argv_copy = NULL;
+	int stdout_fd = -1, archiver_fd = -1;
+	time_t now = time(NULL);
+	struct tm tm;
+	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
+	int res = 0;
+
+	argc = parse_options(argc, argv, NULL, options,
+			     usage, 0);
+
+	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
+
+	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
+	strbuf_addftime(&zip_path,
+			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
+	strbuf_addstr(&zip_path, ".zip");
+	switch (safe_create_leading_directories(zip_path.buf)) {
+	case SCLD_EXISTS:
+	case SCLD_OK:
+		break;
+	default:
+		error_errno(_("could not create directory for '%s'"),
+			    zip_path.buf);
+		goto diagnose_cleanup;
+	}
+	stdout_fd = dup(1);
+	if (stdout_fd < 0) {
+		res = error_errno(_("could not duplicate stdout"));
+		goto diagnose_cleanup;
+	}
+
+	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
+		res = error_errno(_("could not redirect output"));
+		goto diagnose_cleanup;
+	}
+
+	init_zip_archiver();
+	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	write_or_die(stdout_fd, buf.buf, buf.len);
+	strvec_pushf(&archiver_args,
+		     "--add-virtual-file=diagnostics.log:%.*s",
+		     (int)buf.len, buf.buf);
+
+	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	strvec_pushl(&archiver_args, "--prefix=",
+		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
+
+	/* `write_archive()` modifies the `argv` passed to it. Let it. */
+	argv_copy = xmemdupz(archiver_args.v,
+			     sizeof(char *) * archiver_args.nr);
+	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
+			    the_repository, NULL, 0);
+	if (res) {
+		error(_("failed to write archive"));
+		goto diagnose_cleanup;
+	}
+
+	if (!res)
+		fprintf(stderr, "\n"
+		       "Diagnostics complete.\n"
+		       "All of the gathered info is captured in '%s'\n",
+		       zip_path.buf);
+
+diagnose_cleanup:
+	if (archiver_fd >= 0) {
+		close(1);
+		dup2(stdout_fd, 1);
+	}
+	free(argv_copy);
+	strvec_clear(&archiver_args);
+	strbuf_release(&zip_path);
+	strbuf_release(&path);
+	strbuf_release(&buf);
+
+	return res;
+}
+
 static int cmd_list(int argc, const char **argv)
 {
 	if (argc != 1)
@@ -801,6 +944,7 @@ static struct {
 	{ "reconfigure", cmd_reconfigure },
 	{ "delete", cmd_delete },
 	{ "version", cmd_version },
+	{ "diagnose", cmd_diagnose },
 	{ NULL, NULL},
 };
 
diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
index cf4e5b889c..c0425e0653 100644
--- a/contrib/scalar/scalar.txt
+++ b/contrib/scalar/scalar.txt
@@ -14,6 +14,7 @@ scalar register [<enlistment>]
 scalar unregister [<enlistment>]
 scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
 scalar reconfigure [ --all | <enlistment> ]
+scalar diagnose [<enlistment>]
 scalar delete <enlistment>
 
 DESCRIPTION
@@ -139,6 +140,17 @@ reconfigure the enlistment.
 With the `--all` option, all enlistments currently registered with Scalar
 will be reconfigured. Use this option after each Scalar upgrade.
 
+Diagnose
+~~~~~~~~
+
+diagnose [<enlistment>]::
+    When reporting issues with Scalar, it is often helpful to provide the
+    information gathered by this command, including logs and certain
+    statistics describing the data shape of the current enlistment.
++
+The output of this command is a `.zip` file that is written into
+a directory adjacent to the worktree in the `src` directory.
+
 Delete
 ~~~~~~
 
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index bb42354a8b..fbb1df2049 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -98,4 +98,18 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 	grep "cloned. does not exist" err
 '
 
+SQ="'"
+test_expect_success UNZIP 'scalar diagnose' '
+	scalar clone "file://$(pwd)" cloned --single-branch &&
+	scalar diagnose cloned >out 2>err &&
+	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
+	zip_path=$(cat zip_path) &&
+	test -n "$zip_path" &&
+	unzip -v "$zip_path" &&
+	folder=${zip_path%.zip} &&
+	test_path_is_missing "$folder" &&
+	unzip -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out
+'
+
 test_done
-- 
2.36.1-385-g60203f3fdb


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6+ 5/7] scalar diagnose: include disk space information
  2022-05-28 23:11           ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Junio C Hamano
                               ` (3 preceding siblings ...)
  2022-05-28 23:11             ` [PATCH v6+ 4/7] scalar: implement `scalar diagnose` Junio C Hamano
@ 2022-05-28 23:11             ` Junio C Hamano
  2022-05-28 23:11             ` [PATCH v6+ 6/7] scalar: teach `diagnose` to gather packfile info Junio C Hamano
                               ` (2 subsequent siblings)
  7 siblings, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2022-05-28 23:11 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When analyzing problems with large worktrees/repositories, it is useful
to know how close to a "full disk" situation Scalar/Git operates. Let's
include this information.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 contrib/scalar/scalar.c          | 53 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  1 +
 2 files changed, 54 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index a1e05a2146..f06a2f3576 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -302,6 +302,58 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 	return res;
 }
 
+#ifndef WIN32
+#include <sys/statvfs.h>
+#endif
+
+static int get_disk_info(struct strbuf *out)
+{
+#ifdef WIN32
+	struct strbuf buf = STRBUF_INIT;
+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
+	DWORD serial_number, component_length, flags;
+	ULARGE_INTEGER avail2caller, total, avail;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
+		error(_("could not determine free disk size for '%s'"),
+		      buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
+				   &serial_number, &component_length, &flags,
+				   fs_name, sizeof(fs_name))) {
+		error(_("could not get info for '%s'"), buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
+	strbuf_addch(out, '\n');
+	strbuf_release(&buf);
+#else
+	struct strbuf buf = STRBUF_INIT;
+	struct statvfs stat;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (statvfs(buf.buf, &stat) < 0) {
+		error_errno(_("could not determine free disk size for '%s'"),
+			    buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
+	strbuf_release(&buf);
+#endif
+	return 0;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -598,6 +650,7 @@ static int cmd_diagnose(int argc, const char **argv)
 	get_version_info(&buf, 1);
 
 	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	get_disk_info(&buf);
 	write_or_die(stdout_fd, buf.buf, buf.len);
 	strvec_pushf(&archiver_args,
 		     "--add-virtual-file=diagnostics.log:%.*s",
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index fbb1df2049..6e52088919 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -102,6 +102,7 @@ SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	scalar diagnose cloned >out 2>err &&
+	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
 	zip_path=$(cat zip_path) &&
 	test -n "$zip_path" &&
-- 
2.36.1-385-g60203f3fdb


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6+ 6/7] scalar: teach `diagnose` to gather packfile info
  2022-05-28 23:11           ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Junio C Hamano
                               ` (4 preceding siblings ...)
  2022-05-28 23:11             ` [PATCH v6+ 5/7] scalar diagnose: include disk space information Junio C Hamano
@ 2022-05-28 23:11             ` Junio C Hamano
  2022-05-28 23:11             ` [PATCH v6+ 7/7] scalar: teach `diagnose` to gather loose objects information Junio C Hamano
  2022-05-30 10:12             ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Johannes Schindelin
  7 siblings, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2022-05-28 23:11 UTC (permalink / raw)
  To: git; +Cc: Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

It's helpful to see if there are other crud files in the pack
directory. Let's teach the `scalar diagnose` command to gather
file size information about pack files.

While at it, also enumerate the pack files in the alternate
object directories, if any are registered.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 contrib/scalar/scalar.c          | 30 ++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  6 +++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index f06a2f3576..f745519038 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "help.h"
 #include "archive.h"
+#include "object-store.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -594,6 +595,29 @@ static int cmd_clone(int argc, const char **argv)
 	return res;
 }
 
+static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
+				   const char *file_name, void *data)
+{
+	struct strbuf *buf = data;
+	struct stat st;
+
+	if (!stat(full_path, &st))
+		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
+			    (uintmax_t)st.st_size);
+}
+
+static int dir_file_stats(struct object_directory *object_dir, void *data)
+{
+	struct strbuf *buf = data;
+
+	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
+
+	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
+				  data);
+
+	return 0;
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -656,6 +680,12 @@ static int cmd_diagnose(int argc, const char **argv)
 		     "--add-virtual-file=diagnostics.log:%.*s",
 		     (int)buf.len, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
+	dir_file_stats(the_repository->objects->odb, &buf);
+	foreach_alt_odb(dir_file_stats, &buf);
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 6e52088919..2603e2278f 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -101,6 +101,8 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
+	git repack &&
+	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
 	scalar diagnose cloned >out 2>err &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
@@ -110,7 +112,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	folder=${zip_path%.zip} &&
 	test_path_is_missing "$folder" &&
 	unzip -p "$zip_path" diagnostics.log >out &&
-	test_file_not_empty out
+	test_file_not_empty out &&
+	unzip -p "$zip_path" packs-local.txt >out &&
+	grep "$(pwd)/.git/objects" out
 '
 
 test_done
-- 
2.36.1-385-g60203f3fdb


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6+ 7/7] scalar: teach `diagnose` to gather loose objects information
  2022-05-28 23:11           ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Junio C Hamano
                               ` (5 preceding siblings ...)
  2022-05-28 23:11             ` [PATCH v6+ 6/7] scalar: teach `diagnose` to gather packfile info Junio C Hamano
@ 2022-05-28 23:11             ` Junio C Hamano
  2022-05-30 10:12             ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Johannes Schindelin
  7 siblings, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2022-05-28 23:11 UTC (permalink / raw)
  To: git; +Cc: Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

When operating at the scale that Scalar wants to support, certain data
shapes are more likely to cause undesirable performance issues, such as
large numbers of loose objects.

By including statistics about this, `scalar diagnose` now makes it
easier to identify such scenarios.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 contrib/scalar/scalar.c          | 59 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  5 ++-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index f745519038..28176914e5 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -618,6 +618,60 @@ static int dir_file_stats(struct object_directory *object_dir, void *data)
 	return 0;
 }
 
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -686,6 +740,11 @@ static int cmd_diagnose(int argc, const char **argv)
 	foreach_alt_odb(dir_file_stats, &buf);
 	strvec_push(&archiver_args, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
+	loose_objs_stats(&buf, ".git/objects");
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 2603e2278f..10b1172a8a 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -103,6 +103,7 @@ test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	git repack &&
 	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
+	test_commit -C cloned/src loose &&
 	scalar diagnose cloned >out 2>err &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
@@ -114,7 +115,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	unzip -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
 	unzip -p "$zip_path" packs-local.txt >out &&
-	grep "$(pwd)/.git/objects" out
+	grep "$(pwd)/.git/objects" out &&
+	unzip -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [1-9]" out
 '
 
 test_done
-- 
2.36.1-385-g60203f3fdb


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6+ 0/7] js/scalar-diagnose rebased
  2022-05-28 23:11           ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Junio C Hamano
                               ` (6 preceding siblings ...)
  2022-05-28 23:11             ` [PATCH v6+ 7/7] scalar: teach `diagnose` to gather loose objects information Junio C Hamano
@ 2022-05-30 10:12             ` Johannes Schindelin
  2022-05-30 17:37               ` Junio C Hamano
  7 siblings, 1 reply; 140+ messages in thread
From: Johannes Schindelin @ 2022-05-30 10:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 885 bytes --]

Hi Junio,

On Sat, 28 May 2022, Junio C Hamano wrote:

> Recent document clarification on the "--prefix" option of the "git
> archive" command from René serves as a good basis for the
> documentation of the "--add-virtual-file" option added by this
> series, so here is my attempt to rebase js/scalar-diagnose topic
> on it to hopefully help reduce Dscho's workload ;-)

I usually frown upon sending patches on other people's behalf without
obtaining their consent first [*1*], but in this case I have to admit that
I appreciate your help very much.

The range-diff looks good.

Thank you,
Dscho

Footnote *1*: In case it was unclear, I consider submitting PRs at
https://github.com/git-for-windows/git as an implicit request to shepherd
the patches onto the Git mailing list, i.e. as consent to have me send
those patches on the original contributors' behalf.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6+ 0/7] js/scalar-diagnose rebased
  2022-05-30 10:12             ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Johannes Schindelin
@ 2022-05-30 17:37               ` Junio C Hamano
  0 siblings, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2022-05-30 17:37 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> Recent document clarification on the "--prefix" option of the "git
>> archive" command from René serves as a good basis for the
>> documentation of the "--add-virtual-file" option added by this
>> series, so here is my attempt to rebase js/scalar-diagnose topic
>> on it to hopefully help reduce Dscho's workload ;-)
>
> I usually frown upon sending patches on other people's behalf without
> obtaining their consent first [*1*], but in this case I have to admit that
> I appreciate your help very much.

I understand what you mean.

Consider this as an extended form of the usual notes I send to a
thread to say "ok, based on the discussion I saw on the list, I'll
tweak OP's patch <this way> while queuing; thank you all for
contributing."  The way I try to convey <this way> can range from
words (e.g. when a reviewer points out a typo) to a fixup patch
(e.g. when the necessary update is a bit more involved), and this
time it took a full series with interdiff form.  Of course I do not
have to do any of the above and just leave it up to the OP to pick
up ideas from the discussion while sending updates, but sometimes
it is quicker to skip round-trips.

I do not say "Please holler if I misunderstood the discussion and
correct me, and the OP can always update/override with a rerolled
series." when I send out such a "here is how the version queued
would be different from the original" notice, but I always mean
that, this time included ;-).

Your "frowning upon" is understandable in that it can become a
hostile behaviour towards others, including the maintainer who is
forced to ignore or pick.  It is never fun to be in the position to
always exclude half of the patches posted to the list by
contributors who are competing instead of cooperating, and resending
a tweaked patch to show "here is how I would imagine is a better
version of your series" needs to be done with care.

Thanks.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6+ 4/7] scalar: implement `scalar diagnose`
  2022-05-28 23:11             ` [PATCH v6+ 4/7] scalar: implement `scalar diagnose` Junio C Hamano
@ 2022-06-10  2:08               ` Ævar Arnfjörð Bjarmason
  2022-06-10 16:44                 ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-06-10  2:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin


On Sat, May 28 2022, Junio C Hamano wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> [...]
> The `diagnose` command is the culmination of this hard-won knowledge: it
> gathers the installed hooks, the config, a couple statistics describing
> the data shape, among other pieces of information, and then wraps
> everything up in a tidy, neat `.zip` archive.
> [...]
> +	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
> +	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
> +	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
> +	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
> +	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
> +		goto diagnose_cleanup;

Noticed on top of some local changes I have to not add a .git/hooks (the
--no-template topic), but this fails to diagnose any repo that doesn't
have these paths, which are optional, either because a user could have manually removed them, or used --template=.

although I don't think there's a way to create that sort of repo with
the scalar tooling, it doesn't seem to forward that option, but I didn't
look deeply.

So, no big deal, but it would be nice to have that fixed. Is there a
reason for why this mere addition of various stuff for diagnosis goes
straight to an opendir() and error on failure, as opposed to doing an
lstat() etc. first?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6+ 4/7] scalar: implement `scalar diagnose`
  2022-06-10  2:08               ` Ævar Arnfjörð Bjarmason
@ 2022-06-10 16:44                 ` Junio C Hamano
  2022-06-10 17:35                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-06-10 16:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Johannes Schindelin

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Sat, May 28 2022, Junio C Hamano wrote:
>
>> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>> [...]
>> The `diagnose` command is the culmination of this hard-won knowledge: it
>> gathers the installed hooks, the config, a couple statistics describing
>> the data shape, among other pieces of information, and then wraps
>> everything up in a tidy, neat `.zip` archive.
>> [...]
>> +	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
>> +	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
>> +	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
>> +	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
>> +	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
>> +		goto diagnose_cleanup;
>
> Noticed on top of some local changes I have to not add a
> .git/hooks (the --no-template topic), but this fails to diagnose
> any repo that doesn't have these paths, which are optional, either
> because a user could have manually removed them, or used
> --template=.

Quite honestly, if it lacks any directory that we traditionally
created upon "git init", with our standard templates, we can and
should call such a repository "broken" and move on.



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6+ 4/7] scalar: implement `scalar diagnose`
  2022-06-10 16:44                 ` Junio C Hamano
@ 2022-06-10 17:35                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 140+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-06-10 17:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin


On Fri, Jun 10 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Sat, May 28 2022, Junio C Hamano wrote:
>>
>>> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>>> [...]
>>> The `diagnose` command is the culmination of this hard-won knowledge: it
>>> gathers the installed hooks, the config, a couple statistics describing
>>> the data shape, among other pieces of information, and then wraps
>>> everything up in a tidy, neat `.zip` archive.
>>> [...]
>>> +	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
>>> +	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
>>> +	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
>>> +	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
>>> +	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
>>> +		goto diagnose_cleanup;
>>
>> Noticed on top of some local changes I have to not add a
>> .git/hooks (the --no-template topic), but this fails to diagnose
>> any repo that doesn't have these paths, which are optional, either
>> because a user could have manually removed them, or used
>> --template=.
>
> Quite honestly, if it lacks any directory that we traditionally
> created upon "git init", with our standard templates, we can and
> should call such a repository "broken" and move on.

In our own test suite we do e.g. (and did more of that until some recent
changes of mine):

    git mv .git/hooks .git/hooks.disabled

We've never documented in "git init" or the like that these very
optional directories in .git/ were some sort of hard requirenment, and
e.g. core.hooksPath and gitrepository-layout(5) explicitly seem to
suggest otherwise.

In any case, there's the golden rule about being strict in what you emit
and loose in what you accept, which we've taken with repository
compatibility. Having a tool that's designed to aid bugreporting be
picky about what sort of repository it supports seems to go against the
point of such a tool.

Particularly in this case, where it seems easy to just guard it with a
stat() check, or not error out if we fail to add this to the *.zip file,
no?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6+ 2/7] archive --add-virtual-file: allow paths containing colons
  2022-05-28 23:11             ` [PATCH v6+ 2/7] archive --add-virtual-file: allow paths containing colons Junio C Hamano
@ 2022-06-15 18:16               ` Adam Dinwoodie
  2022-06-15 20:00                 ` Junio C Hamano
  0 siblings, 1 reply; 140+ messages in thread
From: Adam Dinwoodie @ 2022-06-15 18:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin

On Sat, May 28, 2022 at 04:11:13PM -0700, Junio C Hamano wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> By allowing the path to be enclosed in double-quotes, we can avoid
> the limitation that paths cannot contain colons.
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  * Tightened shell variable quoting
> 
> <snip>
>
> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> index d6027189e2..3992d08158 100755
> --- a/t/t5003-archive-zip.sh
> +++ b/t/t5003-archive-zip.sh
> @@ -207,13 +207,21 @@ check_zip with_untracked
>  check_added with_untracked untracked untracked
>  
>  test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
> +	if test_have_prereq FUNNYNAMES
> +	then
> +		PATHNAME="pathname with : colon"
> +	else
> +		PATHNAME="pathname without colon"
> +	fi &&
>  	git archive --format=zip >with_file_with_content.zip \
> +		--add-virtual-file=\""$PATHNAME"\": \
>  		--add-virtual-file=hello:world $EMPTY_TREE &&
>  	test_when_finished "rm -rf tmp-unpack" &&
>  	mkdir tmp-unpack && (
>  		cd tmp-unpack &&
>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
>  		test_path_is_file hello &&
> +		test_path_is_file "$PATHNAME" &&
>  		test world = $(cat hello)
>  	)
>  '

This test is currently failing on Cygwin: it looks like it's exposing a
bug in Cygwin that means files with colons in their name aren't
correctly extracted from zip archives.  I'm going to report that to the
Cygwin mailing list, but I wanted to note it for the record here, too.

Adam

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6+ 2/7] archive --add-virtual-file: allow paths containing colons
  2022-06-15 18:16               ` Adam Dinwoodie
@ 2022-06-15 20:00                 ` Junio C Hamano
  2022-06-15 21:36                   ` Adam Dinwoodie
  0 siblings, 1 reply; 140+ messages in thread
From: Junio C Hamano @ 2022-06-15 20:00 UTC (permalink / raw)
  To: Adam Dinwoodie; +Cc: git, Johannes Schindelin

Adam Dinwoodie <adam@dinwoodie.org> writes:

>> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
>> index d6027189e2..3992d08158 100755
>> --- a/t/t5003-archive-zip.sh
>> +++ b/t/t5003-archive-zip.sh
>> @@ -207,13 +207,21 @@ check_zip with_untracked
>>  check_added with_untracked untracked untracked
>>  
>>  test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
>> +	if test_have_prereq FUNNYNAMES
>> +	then
>> +		PATHNAME="pathname with : colon"
>> +	else
>> +		PATHNAME="pathname without colon"
>> +	fi &&
>>  	git archive --format=zip >with_file_with_content.zip \
>> +		--add-virtual-file=\""$PATHNAME"\": \
>>  		--add-virtual-file=hello:world $EMPTY_TREE &&
>>  	test_when_finished "rm -rf tmp-unpack" &&
>>  	mkdir tmp-unpack && (
>>  		cd tmp-unpack &&
>>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
>>  		test_path_is_file hello &&
>> +		test_path_is_file "$PATHNAME" &&
>>  		test world = $(cat hello)
>>  	)
>>  '
>
> This test is currently failing on Cygwin: it looks like it's exposing a
> bug in Cygwin that means files with colons in their name aren't
> correctly extracted from zip archives.  I'm going to report that to the
> Cygwin mailing list, but I wanted to note it for the record here, too.

Does this mean that our code to set FUNNYNAMES prerequiste is
slightly broken?  IOW, should we check with a path with a colon in
it, as well as whatever we use currently for FUNNYNAMES?

Something like the attached patch?  

Or does Cygwin otherwise work perfectly well with a path with a
colon in it, but only $GIT_UNZIP command has problem with it?  If
that is the case, then please disregard the attached.

Thanks.

 t/test-lib.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git i/t/test-lib.sh w/t/test-lib.sh
index 55857af601..5dce7d95c9 100644
--- i/t/test-lib.sh
+++ w/t/test-lib.sh
@@ -1620,6 +1620,7 @@ test_lazy_prereq FUNNYNAMES '
 	touch -- \
 		"FUNNYNAMES tab	embedded" \
 		"FUNNYNAMES \"quote embedded\"" \
+		"FUNNYNAMES colon : embedded" \
 		"FUNNYNAMES newline
 embedded" 2>/dev/null &&
 	rm -- \

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6+ 2/7] archive --add-virtual-file: allow paths containing colons
  2022-06-15 20:00                 ` Junio C Hamano
@ 2022-06-15 21:36                   ` Adam Dinwoodie
  2022-06-18 20:19                     ` Johannes Schindelin
  0 siblings, 1 reply; 140+ messages in thread
From: Adam Dinwoodie @ 2022-06-15 21:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin

On Wed, Jun 15, 2022 at 01:00:07PM -0700, Junio C Hamano wrote:
> Adam Dinwoodie <adam@dinwoodie.org> writes:
> 
> >> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> >> index d6027189e2..3992d08158 100755
> >> --- a/t/t5003-archive-zip.sh
> >> +++ b/t/t5003-archive-zip.sh
> >> @@ -207,13 +207,21 @@ check_zip with_untracked
> >>  check_added with_untracked untracked untracked
> >>  
> >>  test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
> >> +	if test_have_prereq FUNNYNAMES
> >> +	then
> >> +		PATHNAME="pathname with : colon"
> >> +	else
> >> +		PATHNAME="pathname without colon"
> >> +	fi &&
> >>  	git archive --format=zip >with_file_with_content.zip \
> >> +		--add-virtual-file=\""$PATHNAME"\": \
> >>  		--add-virtual-file=hello:world $EMPTY_TREE &&
> >>  	test_when_finished "rm -rf tmp-unpack" &&
> >>  	mkdir tmp-unpack && (
> >>  		cd tmp-unpack &&
> >>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
> >>  		test_path_is_file hello &&
> >> +		test_path_is_file "$PATHNAME" &&
> >>  		test world = $(cat hello)
> >>  	)
> >>  '
> >
> > This test is currently failing on Cygwin: it looks like it's exposing a
> > bug in Cygwin that means files with colons in their name aren't
> > correctly extracted from zip archives.  I'm going to report that to the
> > Cygwin mailing list, but I wanted to note it for the record here, too.
> 
> Does this mean that our code to set FUNNYNAMES prerequiste is
> slightly broken?  IOW, should we check with a path with a colon in
> it, as well as whatever we use currently for FUNNYNAMES?
> 
> Something like the attached patch?  
> 
> Or does Cygwin otherwise work perfectly well with a path with a
> colon in it, but only $GIT_UNZIP command has problem with it?  If
> that is the case, then please disregard the attached.

The latter: Cygwin works perfectly with paths containing colons, except
that Cygwin's `unzip` is seemingly buggy and doesn't work.  The file
systems Cygwin runs on don't support colons in paths, but Cygwin hides
that problem by rewriting ASCII colons to some high Unicode code point
on the filesystem, meaning Cygwin-native applications see a regular
colon, while Windows-native applications see an unusual but perfectly
valid Unicode character.

I tested the same patch to FUNNYNAMES myself before reporting, and the
test fails exactly the same way.  If we wanted to catch this, I think
we'd need a test that explicitly attempted to unzip an archive
containing a path with a colon.

(The code to set FUNNYNAMES *is* slightly broken, per the discussions
around 6d340dfaef ("t9902: split test to run on appropriate systems",
2022-04-08), and my to-do list still features tidying up and
resubmitting the patch Ævar wrote in that discussion thread.  But it
wouldn't help here because this issue is specific to Cygwin's `unzip`,
rather than a general limitation of running on Cygwin.)

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6+ 2/7] archive --add-virtual-file: allow paths containing colons
  2022-06-15 21:36                   ` Adam Dinwoodie
@ 2022-06-18 20:19                     ` Johannes Schindelin
  2022-06-18 22:05                       ` Junio C Hamano
  2022-06-20  9:41                       ` Adam Dinwoodie
  0 siblings, 2 replies; 140+ messages in thread
From: Johannes Schindelin @ 2022-06-18 20:19 UTC (permalink / raw)
  To: Adam Dinwoodie; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 3796 bytes --]

Hi Adam,

On Wed, 15 Jun 2022, Adam Dinwoodie wrote:

> On Wed, Jun 15, 2022 at 01:00:07PM -0700, Junio C Hamano wrote:
> > Adam Dinwoodie <adam@dinwoodie.org> writes:
> >
> > >> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> > >> index d6027189e2..3992d08158 100755
> > >> --- a/t/t5003-archive-zip.sh
> > >> +++ b/t/t5003-archive-zip.sh
> > >> @@ -207,13 +207,21 @@ check_zip with_untracked
> > >>  check_added with_untracked untracked untracked
> > >>
> > >>  test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
> > >> +	if test_have_prereq FUNNYNAMES
> > >> +	then
> > >> +		PATHNAME="pathname with : colon"
> > >> +	else
> > >> +		PATHNAME="pathname without colon"
> > >> +	fi &&
> > >>  	git archive --format=zip >with_file_with_content.zip \
> > >> +		--add-virtual-file=\""$PATHNAME"\": \
> > >>  		--add-virtual-file=hello:world $EMPTY_TREE &&
> > >>  	test_when_finished "rm -rf tmp-unpack" &&
> > >>  	mkdir tmp-unpack && (
> > >>  		cd tmp-unpack &&
> > >>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
> > >>  		test_path_is_file hello &&
> > >> +		test_path_is_file "$PATHNAME" &&
> > >>  		test world = $(cat hello)
> > >>  	)
> > >>  '
> > >
> > > This test is currently failing on Cygwin: it looks like it's exposing a
> > > bug in Cygwin that means files with colons in their name aren't
> > > correctly extracted from zip archives.  I'm going to report that to the
> > > Cygwin mailing list, but I wanted to note it for the record here, too.
> >
> > Does this mean that our code to set FUNNYNAMES prerequiste is
> > slightly broken?  IOW, should we check with a path with a colon in
> > it, as well as whatever we use currently for FUNNYNAMES?
> >
> > Something like the attached patch?
> >
> > Or does Cygwin otherwise work perfectly well with a path with a
> > colon in it, but only $GIT_UNZIP command has problem with it?  If
> > that is the case, then please disregard the attached.
>
> The latter: Cygwin works perfectly with paths containing colons, except
> that Cygwin's `unzip` is seemingly buggy and doesn't work.  The file
> systems Cygwin runs on don't support colons in paths, but Cygwin hides
> that problem by rewriting ASCII colons to some high Unicode code point
> on the filesystem,

Let me throw in a bit more detail: The forbidden characters are mapped
into the Unicode page U+f0XX, which is supposed to be used "for private
purposes". Even more detail can be found here:
https://github.com/cygwin/cygwin/blob/cygwin-3_3_5-release/winsup/cygwin/strfuncs.cc#L19-L23

> meaning Cygwin-native applications see a regular colon, while
> Windows-native applications see an unusual but perfectly valid Unicode
> character.

Now, I have two questions:

- Why does `unzip` not use Cygwin's regular functions (which should all be
  aware of that U+f0XX <-> U+00XX mapping)?

- Even more importantly: would the test case pass if we simply used
  another forbidden character, such as `?` or `*`?

> I tested the same patch to FUNNYNAMES myself before reporting, and the
> test fails exactly the same way.  If we wanted to catch this, I think
> we'd need a test that explicitly attempted to unzip an archive
> containing a path with a colon.
>
> (The code to set FUNNYNAMES *is* slightly broken, per the discussions
> around 6d340dfaef ("t9902: split test to run on appropriate systems",
> 2022-04-08), and my to-do list still features tidying up and
> resubmitting the patch Ævar wrote in that discussion thread.  But it
> wouldn't help here because this issue is specific to Cygwin's `unzip`,
> rather than a general limitation of running on Cygwin.)

I'd rather avoid changing FUNNYNAMES at this stage, if we can help it.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6+ 2/7] archive --add-virtual-file: allow paths containing colons
  2022-06-18 20:19                     ` Johannes Schindelin
@ 2022-06-18 22:05                       ` Junio C Hamano
  2022-06-20  9:41                       ` Adam Dinwoodie
  1 sibling, 0 replies; 140+ messages in thread
From: Junio C Hamano @ 2022-06-18 22:05 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Adam Dinwoodie, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> I'd rather avoid changing FUNNYNAMES at this stage, if we can help it.

I wonder if it is sufficient to ask "unzip -l" the names of the
files in the archive, without having to materialize these files on
the filesystem.  Would that bypass the whole FUNNYNAMES business, or
is "unzip" paranoid enough to reject an archive, even when it is not
extracting into the local filesystem, with a path that it would not
be able to extract if it were asked to?

I do not know how standardized different implementations of "unzip"
is, and how similar output "unzip -l" implementations produce are,
but the following seems to pass for me locally.

 t/t5003-archive-zip.sh | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git c/t/t5003-archive-zip.sh w/t/t5003-archive-zip.sh
index 3992d08158..f2fdf2c235 100755
--- c/t/t5003-archive-zip.sh
+++ w/t/t5003-archive-zip.sh
@@ -207,23 +207,13 @@ check_zip with_untracked
 check_added with_untracked untracked untracked
 
 test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
-	if test_have_prereq FUNNYNAMES
-	then
-		PATHNAME="pathname with : colon"
-	else
-		PATHNAME="pathname without colon"
-	fi &&
+	PATHNAME="pathname with : colon" &&
 	git archive --format=zip >with_file_with_content.zip \
 		--add-virtual-file=\""$PATHNAME"\": \
 		--add-virtual-file=hello:world $EMPTY_TREE &&
-	test_when_finished "rm -rf tmp-unpack" &&
-	mkdir tmp-unpack && (
-		cd tmp-unpack &&
-		"$GIT_UNZIP" ../with_file_with_content.zip &&
-		test_path_is_file hello &&
-		test_path_is_file "$PATHNAME" &&
-		test world = $(cat hello)
-	)
+	"$GIT_UNZIP" -l with_file_with_content.zip >toc &&
+	grep -e " $PATHNAME\$" toc &&
+	grep -e " hello\$" toc
 '
 
 test_expect_success 'git archive --format=zip --add-file twice' '

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v6+ 2/7] archive --add-virtual-file: allow paths containing colons
  2022-06-18 20:19                     ` Johannes Schindelin
  2022-06-18 22:05                       ` Junio C Hamano
@ 2022-06-20  9:41                       ` Adam Dinwoodie
  1 sibling, 0 replies; 140+ messages in thread
From: Adam Dinwoodie @ 2022-06-20  9:41 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git

On Sat, Jun 18, 2022 at 10:19:28PM +0200, Johannes Schindelin wrote:
> Hi Adam,
> 
> On Wed, 15 Jun 2022, Adam Dinwoodie wrote:
> 
> > On Wed, Jun 15, 2022 at 01:00:07PM -0700, Junio C Hamano wrote:
> > > Adam Dinwoodie <adam@dinwoodie.org> writes:
> > >
> > > >> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> > > >> index d6027189e2..3992d08158 100755
> > > >> --- a/t/t5003-archive-zip.sh
> > > >> +++ b/t/t5003-archive-zip.sh
> > > >> @@ -207,13 +207,21 @@ check_zip with_untracked
> > > >>  check_added with_untracked untracked untracked
> > > >>
> > > >>  test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
> > > >> +	if test_have_prereq FUNNYNAMES
> > > >> +	then
> > > >> +		PATHNAME="pathname with : colon"
> > > >> +	else
> > > >> +		PATHNAME="pathname without colon"
> > > >> +	fi &&
> > > >>  	git archive --format=zip >with_file_with_content.zip \
> > > >> +		--add-virtual-file=\""$PATHNAME"\": \
> > > >>  		--add-virtual-file=hello:world $EMPTY_TREE &&
> > > >>  	test_when_finished "rm -rf tmp-unpack" &&
> > > >>  	mkdir tmp-unpack && (
> > > >>  		cd tmp-unpack &&
> > > >>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
> > > >>  		test_path_is_file hello &&
> > > >> +		test_path_is_file "$PATHNAME" &&
> > > >>  		test world = $(cat hello)
> > > >>  	)
> > > >>  '
> > > >
> > > > This test is currently failing on Cygwin: it looks like it's exposing a
> > > > bug in Cygwin that means files with colons in their name aren't
> > > > correctly extracted from zip archives.  I'm going to report that to the
> > > > Cygwin mailing list, but I wanted to note it for the record here, too.
> > >
> > > Does this mean that our code to set FUNNYNAMES prerequiste is
> > > slightly broken?  IOW, should we check with a path with a colon in
> > > it, as well as whatever we use currently for FUNNYNAMES?
> > >
> > > Something like the attached patch?
> > >
> > > Or does Cygwin otherwise work perfectly well with a path with a
> > > colon in it, but only $GIT_UNZIP command has problem with it?  If
> > > that is the case, then please disregard the attached.
> >
> > The latter: Cygwin works perfectly with paths containing colons, except
> > that Cygwin's `unzip` is seemingly buggy and doesn't work.  The file
> > systems Cygwin runs on don't support colons in paths, but Cygwin hides
> > that problem by rewriting ASCII colons to some high Unicode code point
> > on the filesystem,
> 
> Let me throw in a bit more detail: The forbidden characters are mapped
> into the Unicode page U+f0XX, which is supposed to be used "for private
> purposes". Even more detail can be found here:
> https://github.com/cygwin/cygwin/blob/cygwin-3_3_5-release/winsup/cygwin/strfuncs.cc#L19-L23
> 
> > meaning Cygwin-native applications see a regular colon, while
> > Windows-native applications see an unusual but perfectly valid Unicode
> > character.
> 
> Now, I have two questions:
> 
> - Why does `unzip` not use Cygwin's regular functions (which should all be
>   aware of that U+f0XX <-> U+00XX mapping)?

That is an excellent question!  This behaviour came from an `#ifdef
__CYGWIN__` in the upstream unzip package; with that #ifdef removed,
everything works as expected.  The folk on the Cygwin mailing list had
no idea *why* that #ifdef was there, given it's evidently unnecessary;
my best guess is that it was added a long time ago before Cygwin could
handle those characters in the general case.

Since my report, the Cygwin package has picked up a new maintainer who
has released a version of the unzip package with that #ifdef removed, so
this test is now passing.

> - Even more importantly: would the test case pass if we simply used
>   another forbidden character, such as `?` or `*`?

The set of characters that had special handling in unzip was "*:?|<> all
of which are handled appropriately by Cygwin applications in general,
and all of which had this unnecessary handling in `unzip`

> > I tested the same patch to FUNNYNAMES myself before reporting, and the
> > test fails exactly the same way.  If we wanted to catch this, I think
> > we'd need a test that explicitly attempted to unzip an archive
> > containing a path with a colon.
> >
> > (The code to set FUNNYNAMES *is* slightly broken, per the discussions
> > around 6d340dfaef ("t9902: split test to run on appropriate systems",
> > 2022-04-08), and my to-do list still features tidying up and
> > resubmitting the patch Ævar wrote in that discussion thread.  But it
> > wouldn't help here because this issue is specific to Cygwin's `unzip`,
> > rather than a general limitation of running on Cygwin.)
> 
> I'd rather avoid changing FUNNYNAMES at this stage, if we can help it.

Oh yes, I definitely wasn't proposing changing things for 2.37.0!  I
just wanted to acknowledge that there is a known issue here that has
been discussed on this list previously, that we (I) would hopefully get
around to fixing at some point.

Adam

^ permalink raw reply	[flat|nested] 140+ messages in thread

end of thread, other threads:[~2022-06-20  9:46 UTC | newest]

Thread overview: 140+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
2022-01-26  8:41 ` [PATCH 1/5] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-01-26  9:34   ` René Scharfe
2022-01-26 22:20     ` Taylor Blau
2022-02-06 21:34       ` Johannes Schindelin
2022-01-27 19:38   ` Elijah Newren
2022-01-26  8:41 ` [PATCH 2/5] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-01-26  8:41 ` [PATCH 3/5] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-01-26 22:43   ` Taylor Blau
2022-01-27 15:14     ` Derrick Stolee
2022-02-06 21:38       ` Johannes Schindelin
2022-01-26  8:41 ` [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-01-26 22:50   ` Taylor Blau
2022-01-27 15:17     ` Derrick Stolee
2022-01-27 18:59   ` Elijah Newren
2022-02-06 21:25     ` Johannes Schindelin
2022-01-26  8:41 ` [PATCH 5/5] scalar diagnose: show a spinner while staging content Johannes Schindelin via GitGitGadget
2022-01-27 15:19 ` [PATCH 0/5] scalar: implement the subcommand "diagnose" Derrick Stolee
2022-02-06 21:13   ` Johannes Schindelin
2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 1/6] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-02-07 19:55     ` René Scharfe
2022-02-07 23:30       ` Junio C Hamano
2022-02-08 13:12         ` Johannes Schindelin
2022-02-08 17:44           ` Junio C Hamano
2022-02-08 20:58             ` René Scharfe
2022-02-09 22:48               ` Junio C Hamano
2022-02-10 19:10                 ` René Scharfe
2022-02-10 19:23                   ` Junio C Hamano
2022-02-11 19:16                     ` René Scharfe
2022-02-11 21:27                       ` Junio C Hamano
2022-02-12  9:12                         ` René Scharfe
2022-02-13  6:25                           ` Junio C Hamano
2022-02-13  9:02                             ` René Scharfe
2022-02-14 17:22                               ` Junio C Hamano
2022-02-08 12:54       ` Johannes Schindelin
2022-02-06 22:39   ` [PATCH v2 2/6] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 3/6] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-02-07 19:55     ` René Scharfe
2022-02-08 12:08       ` Johannes Schindelin
2022-02-06 22:39   ` [PATCH v2 4/6] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 5/6] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 6/6] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-07  2:06       ` Elijah Newren
2022-05-09 21:04         ` Johannes Schindelin
2022-05-04 15:25     ` [PATCH v3 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-07  2:23     ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Elijah Newren
2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
2022-05-10 19:26       ` [PATCH v4 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-10 21:48         ` Junio C Hamano
2022-05-10 22:06           ` rsbecker
2022-05-10 23:21             ` Junio C Hamano
2022-05-11 16:14               ` René Scharfe
2022-05-11 19:27                 ` Junio C Hamano
2022-05-12 16:16                   ` René Scharfe
2022-05-12 18:15                     ` Junio C Hamano
2022-05-12 21:31                       ` Junio C Hamano
2022-05-14  7:06                         ` René Scharfe
2022-05-12 22:31           ` [PATCH] fixup! " Junio C Hamano
2022-05-10 19:26       ` [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-10 21:56         ` Junio C Hamano
2022-05-10 22:23           ` rsbecker
2022-05-19 18:12             ` Johannes Schindelin
2022-05-19 18:09           ` Johannes Schindelin
2022-05-19 18:44             ` Junio C Hamano
2022-05-10 19:27       ` [PATCH v4 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-17 14:51         ` Ævar Arnfjörð Bjarmason
2022-05-18 17:35           ` Junio C Hamano
2022-05-20  7:30             ` Ævar Arnfjörð Bjarmason
2022-05-20 15:55               ` Johannes Schindelin
2022-05-21  9:54                 ` Ævar Arnfjörð Bjarmason
2022-05-22  5:50                   ` Junio C Hamano
2022-05-24 12:25                     ` Johannes Schindelin
2022-05-24 18:11                       ` Ævar Arnfjörð Bjarmason
2022-05-24 19:29                         ` Junio C Hamano
2022-05-25 10:31                           ` Johannes Schindelin
2022-05-10 19:27       ` [PATCH v4 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-17 14:53         ` Ævar Arnfjörð Bjarmason
2022-05-10 19:27       ` [PATCH v4 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-10 19:27       ` [PATCH v4 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-10 19:27       ` [PATCH v4 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-17 15:03       ` [PATCH v4 0/7] scalar: implement the subcommand "diagnose" Ævar Arnfjörð Bjarmason
2022-05-17 15:28         ` rsbecker
2022-05-19 18:17           ` Johannes Schindelin
2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
2022-05-19 18:17         ` [PATCH v5 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-20 14:41           ` René Scharfe
2022-05-20 16:21             ` Junio C Hamano
2022-05-19 18:17         ` [PATCH v5 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-19 18:17         ` [PATCH v5 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-19 19:23         ` [PATCH v5 0/7] scalar: implement the subcommand "diagnose" Junio C Hamano
2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-25 21:11             ` Junio C Hamano
2022-05-26  9:09               ` René Scharfe
2022-05-26 17:10                 ` Junio C Hamano
2022-05-26 18:57                   ` René Scharfe
2022-05-26 20:16                     ` Junio C Hamano
2022-05-27 17:02                       ` René Scharfe
2022-05-27 19:01                         ` Junio C Hamano
2022-05-28  6:57                           ` René Scharfe
2022-05-21 15:08           ` [PATCH v6 2/7] archive --add-virtual-file: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-25 20:22             ` Junio C Hamano
2022-05-25 21:42               ` Junio C Hamano
2022-05-25 22:34                 ` Junio C Hamano
2022-05-21 15:08           ` [PATCH v6 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-28 23:11           ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 1/7] archive: optionally add "virtual" files Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 2/7] archive --add-virtual-file: allow paths containing colons Junio C Hamano
2022-06-15 18:16               ` Adam Dinwoodie
2022-06-15 20:00                 ` Junio C Hamano
2022-06-15 21:36                   ` Adam Dinwoodie
2022-06-18 20:19                     ` Johannes Schindelin
2022-06-18 22:05                       ` Junio C Hamano
2022-06-20  9:41                       ` Adam Dinwoodie
2022-05-28 23:11             ` [PATCH v6+ 3/7] scalar: validate the optional enlistment argument Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 4/7] scalar: implement `scalar diagnose` Junio C Hamano
2022-06-10  2:08               ` Ævar Arnfjörð Bjarmason
2022-06-10 16:44                 ` Junio C Hamano
2022-06-10 17:35                   ` Ævar Arnfjörð Bjarmason
2022-05-28 23:11             ` [PATCH v6+ 5/7] scalar diagnose: include disk space information Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 6/7] scalar: teach `diagnose` to gather packfile info Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 7/7] scalar: teach `diagnose` to gather loose objects information Junio C Hamano
2022-05-30 10:12             ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Johannes Schindelin
2022-05-30 17:37               ` Junio C Hamano

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).