git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/5] scalar: implement the subcommand "diagnose"
@ 2022-01-26  8:41 Johannes Schindelin via GitGitGadget
  2022-01-26  8:41 ` [PATCH 1/5] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
                   ` (6 more replies)
  0 siblings, 7 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-01-26  8:41 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin

Over the course of the years, we developed a sub-command that gathers
diagnostic data into a .zip file that can then be attached to bug reports.
This sub-command turned out to be very useful in helping Scalar developers
identify and fix issues.

Johannes Schindelin (3):
  Implement `scalar diagnose`
  scalar diagnose: include disk space information
  scalar diagnose: show a spinner while staging content

Matthew John Cheetham (2):
  scalar: teach `diagnose` to gather packfile info
  scalar: teach `diagnose` to gather loose objects information

 contrib/scalar/scalar.c          | 336 +++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt        |  12 ++
 contrib/scalar/t/t9099-scalar.sh |  17 ++
 3 files changed, 365 insertions(+)


base-commit: ddc35d833dd6f9e8946b09cecd3311b8aa18d295
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1128%2Fdscho%2Fscalar-diagnose-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1128/dscho/scalar-diagnose-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1128
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 1/5] Implement `scalar diagnose`
  2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
@ 2022-01-26  8:41 ` Johannes Schindelin via GitGitGadget
  2022-01-26  9:34   ` René Scharfe
  2022-01-27 19:38   ` Elijah Newren
  2022-01-26  8:41 ` [PATCH 2/5] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-01-26  8:41 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Over the course of Scalar's development, it became obvious that there is
a need for a command that can gather all kinds of useful information
that can help identify the most typical problems with large
worktrees/repositories.

The `diagnose` command is the culmination of this hard-won knowledge: it
gathers the installed hooks, the config, a couple statistics describing
the data shape, among other pieces of information, and then wraps
everything up in a tidy, neat `.zip` archive.

Note: originally, Scalar was implemented in C# using the .NET API, where
we had the luxury of a comprehensive standard library that includes
basic functionality such as writing a `.zip` file. In the C version, we
lack such a commodity. Rather than introducing a dependency on, say,
libzip, we slightly abuse Git's `archive` command: Instead of writing
the `.zip` file directly, we stage the file contents in a Git index of a
temporary, bare repository, only to let `git archive` have at it, and
finally removing the temporary repository.

Also note: Due to the frequently-spawned `git hash-object` processes,
this command is quite a bit slow on Windows. Should it turn out to be a
big problem, the lack of a batch mode of the `hash-object` command could
potentially be worked around via using `git fast-import` with a crafted
`stdin`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 170 +++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt        |  12 +++
 contrib/scalar/t/t9099-scalar.sh |  13 +++
 3 files changed, 195 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 1ce9c2b00e8..13f2b0f4d5a 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -259,6 +259,108 @@ static int unregister_dir(void)
 	return res;
 }
 
+static int stage(const char *git_dir, struct strbuf *buf, const char *path)
+{
+	struct strbuf cacheinfo = STRBUF_INIT;
+	struct child_process cp = CHILD_PROCESS_INIT;
+	int res;
+
+	strbuf_addstr(&cacheinfo, "100644,");
+
+	cp.git_cmd = 1;
+	strvec_pushl(&cp.args, "--git-dir", git_dir,
+		     "hash-object", "-w", "--stdin", NULL);
+	res = pipe_command(&cp, buf->buf, buf->len, &cacheinfo, 256, NULL, 0);
+	if (!res) {
+		strbuf_rtrim(&cacheinfo);
+		strbuf_addch(&cacheinfo, ',');
+		/* We cannot stage `.git`, use `_git` instead. */
+		if (starts_with(path, ".git/"))
+			strbuf_addf(&cacheinfo, "_%s", path + 1);
+		else
+			strbuf_addstr(&cacheinfo, path);
+
+		child_process_init(&cp);
+		cp.git_cmd = 1;
+		strvec_pushl(&cp.args, "--git-dir", git_dir,
+			     "update-index", "--add", "--cacheinfo",
+			     cacheinfo.buf, NULL);
+		res = run_command(&cp);
+	}
+
+	strbuf_release(&cacheinfo);
+	return res;
+}
+
+static int stage_file(const char *git_dir, const char *path)
+{
+	struct strbuf buf = STRBUF_INIT;
+	int res;
+
+	if (strbuf_read_file(&buf, path, 0) < 0)
+		return error(_("could not read '%s'"), path);
+
+	res = stage(git_dir, &buf, path);
+
+	strbuf_release(&buf);
+	return res;
+}
+
+static int stage_directory(const char *git_dir, const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir = opendir(at_root ? "." : path);
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	if (!dir)
+		return error(_("could not open directory '%s'"), path);
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if ((e->d_type == DT_REG && stage_file(git_dir, buf.buf)) ||
+		    (e->d_type == DT_DIR && recurse &&
+		     stage_directory(git_dir, buf.buf, recurse)))
+			res = -1;
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
+static int index_to_zip(const char *git_dir)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	struct strbuf oid = STRBUF_INIT;
+
+	cp.git_cmd = 1;
+	strvec_pushl(&cp.args, "--git-dir", git_dir, "write-tree", NULL);
+	if (pipe_command(&cp, NULL, 0, &oid, the_hash_algo->hexsz + 1,
+			 NULL, 0))
+		return error(_("could not write temporary tree object"));
+
+	strbuf_rtrim(&oid);
+	child_process_init(&cp);
+	cp.git_cmd = 1;
+	strvec_pushl(&cp.args, "--git-dir", git_dir, "archive", "-o", NULL);
+	strvec_pushf(&cp.args, "%s.zip", git_dir);
+	strvec_pushl(&cp.args, oid.buf, "--", NULL);
+	strbuf_release(&oid);
+	return run_command(&cp);
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -499,6 +601,73 @@ cleanup:
 	return res;
 }
 
+static int cmd_diagnose(int argc, const char **argv)
+{
+	struct option options[] = {
+		OPT_END(),
+	};
+	const char * const usage[] = {
+		N_("scalar diagnose [<enlistment>]"),
+		NULL
+	};
+	struct strbuf tmp_dir = STRBUF_INIT;
+	time_t now = time(NULL);
+	struct tm tm;
+	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
+	int res = 0;
+
+	argc = parse_options(argc, argv, NULL, options,
+			     usage, 0);
+
+	setup_enlistment_directory(argc, argv, usage, options, &buf);
+
+	strbuf_addstr(&buf, "/.scalarDiagnostics/scalar_");
+	strbuf_addftime(&buf, "%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
+	if (run_git("init", "-q", "-b", "dummy", "--bare", buf.buf, NULL)) {
+		res = error(_("could not initialize temporary repository: %s"),
+			    buf.buf);
+		goto diagnose_cleanup;
+	}
+	strbuf_realpath(&tmp_dir, buf.buf, 1);
+
+	strbuf_reset(&buf);
+	strbuf_addf(&buf, "Collecting diagnostic info into temp folder %s\n\n",
+		    tmp_dir.buf);
+
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	fwrite(buf.buf, buf.len, 1, stdout);
+
+	if ((res = stage(tmp_dir.buf, &buf, "diagnostics.log")))
+		goto diagnose_cleanup;
+
+	if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
+	    (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
+	    (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
+	    (res = stage_directory(tmp_dir.buf, ".git/logs", 1)) ||
+	    (res = stage_directory(tmp_dir.buf, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	res = index_to_zip(tmp_dir.buf);
+
+	if (!res)
+		res = remove_dir_recursively(&tmp_dir, 0);
+
+	if (!res)
+		printf("\n"
+		       "Diagnostics complete.\n"
+		       "All of the gathered info is captured in '%s.zip'\n",
+		       tmp_dir.buf);
+
+diagnose_cleanup:
+	strbuf_release(&tmp_dir);
+	strbuf_release(&path);
+	strbuf_release(&buf);
+
+	return res;
+}
+
 static int cmd_list(int argc, const char **argv)
 {
 	if (argc != 1)
@@ -800,6 +969,7 @@ static struct {
 	{ "reconfigure", cmd_reconfigure },
 	{ "delete", cmd_delete },
 	{ "version", cmd_version },
+	{ "diagnose", cmd_diagnose },
 	{ NULL, NULL},
 };
 
diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
index f416d637289..22583fe046e 100644
--- a/contrib/scalar/scalar.txt
+++ b/contrib/scalar/scalar.txt
@@ -14,6 +14,7 @@ scalar register [<enlistment>]
 scalar unregister [<enlistment>]
 scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
 scalar reconfigure [ --all | <enlistment> ]
+scalar diagnose [<enlistment>]
 scalar delete <enlistment>
 
 DESCRIPTION
@@ -129,6 +130,17 @@ reconfigure the enlistment.
 With the `--all` option, all enlistments currently registered with Scalar
 will be reconfigured. Use this option after each Scalar upgrade.
 
+Diagnose
+~~~~~~~~
+
+diagnose [<enlistment>]::
+    When reporting issues with Scalar, it is often helpful to provide the
+    information gathered by this command, including logs and certain
+    statistics describing the data shape of the current enlistment.
++
+The output of this command is a `.zip` file that is written into
+a directory adjacent to the worktree in the `src` directory.
+
 Delete
 ~~~~~~
 
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 2e1502ad45e..ecd06e207c2 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -65,6 +65,19 @@ test_expect_success 'scalar clone' '
 	)
 '
 
+SQ="'"
+test_expect_success UNZIP 'scalar diagnose' '
+	scalar diagnose cloned >out &&
+	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
+	zip_path=$(cat zip_path) &&
+	test -n "$zip_path" &&
+	unzip -v "$zip_path" &&
+	folder=${zip_path%.zip} &&
+	test_path_is_missing "$folder" &&
+	unzip -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out
+'
+
 test_expect_success 'scalar reconfigure' '
 	git init one/src &&
 	scalar register one &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 2/5] scalar diagnose: include disk space information
  2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
  2022-01-26  8:41 ` [PATCH 1/5] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-01-26  8:41 ` Johannes Schindelin via GitGitGadget
  2022-01-26  8:41 ` [PATCH 3/5] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-01-26  8:41 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When analyzing problems with large worktrees/repositories, it is useful
to know how close to a "full disk" situation Scalar/Git operates. Let's
include this information.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c | 53 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 13f2b0f4d5a..e26fb2fc018 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -361,6 +361,58 @@ static int index_to_zip(const char *git_dir)
 	return run_command(&cp);
 }
 
+#ifndef WIN32
+#include <sys/statvfs.h>
+#endif
+
+static int get_disk_info(struct strbuf *out)
+{
+#ifdef WIN32
+	struct strbuf buf = STRBUF_INIT;
+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
+	DWORD serial_number, component_length, flags;
+	ULARGE_INTEGER avail2caller, total, avail;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
+		error(_("could not determine free disk size for '%s'"),
+		      buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
+				   &serial_number, &component_length, &flags,
+				   fs_name, sizeof(fs_name))) {
+		error(_("could not get info for '%s'"), buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
+	strbuf_addch(out, '\n');
+	strbuf_release(&buf);
+#else
+	struct strbuf buf = STRBUF_INIT;
+	struct statvfs stat;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (statvfs(buf.buf, &stat) < 0) {
+		error_errno(_("could not determine free disk size for '%s'"),
+			    buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
+	strbuf_release(&buf);
+#endif
+	return 0;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -637,6 +689,7 @@ static int cmd_diagnose(int argc, const char **argv)
 	get_version_info(&buf, 1);
 
 	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	get_disk_info(&buf);
 	fwrite(buf.buf, buf.len, 1, stdout);
 
 	if ((res = stage(tmp_dir.buf, &buf, "diagnostics.log")))
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 3/5] scalar: teach `diagnose` to gather packfile info
  2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
  2022-01-26  8:41 ` [PATCH 1/5] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
  2022-01-26  8:41 ` [PATCH 2/5] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
@ 2022-01-26  8:41 ` Matthew John Cheetham via GitGitGadget
  2022-01-26 22:43   ` Taylor Blau
  2022-01-26  8:41 ` [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 109+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-01-26  8:41 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

Teach the `scalar diagnose` command to gather file size information
about pack files.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
---
 contrib/scalar/scalar.c          | 39 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  2 ++
 2 files changed, 41 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index e26fb2fc018..690933ffdf3 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -653,6 +653,39 @@ cleanup:
 	return res;
 }
 
+static void dir_file_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	struct stat e_stat;
+	struct strbuf file_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Contents of ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&file_path, path);
+	strbuf_addch(&file_path, '/');
+	base_path_len = file_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG) {
+			strbuf_setlen(&file_path, base_path_len);
+			strbuf_addstr(&file_path, e->d_name);
+			if (!stat(file_path.buf, &e_stat))
+				strbuf_addf(buf, "%-70s %16"PRIuMAX"\n",
+					    e->d_name,
+					    (uintmax_t)e_stat.st_size);
+		}
+
+	strbuf_release(&file_path);
+	closedir(dir);
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -695,6 +728,12 @@ static int cmd_diagnose(int argc, const char **argv)
 	if ((res = stage(tmp_dir.buf, &buf, "diagnostics.log")))
 		goto diagnose_cleanup;
 
+	strbuf_reset(&buf);
+	dir_file_stats(&buf, ".git/objects/pack");
+
+	if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt")))
+		goto diagnose_cleanup;
+
 	if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
 	    (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
 	    (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index ecd06e207c2..b1745851e31 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -75,6 +75,8 @@ test_expect_success UNZIP 'scalar diagnose' '
 	folder=${zip_path%.zip} &&
 	test_path_is_missing "$folder" &&
 	unzip -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out &&
+	unzip -p "$zip_path" packs-local.txt >out &&
 	test_file_not_empty out
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information
  2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-01-26  8:41 ` [PATCH 3/5] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
@ 2022-01-26  8:41 ` Matthew John Cheetham via GitGitGadget
  2022-01-26 22:50   ` Taylor Blau
  2022-01-27 18:59   ` Elijah Newren
  2022-01-26  8:41 ` [PATCH 5/5] scalar diagnose: show a spinner while staging content Johannes Schindelin via GitGitGadget
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 109+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-01-26  8:41 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

When operating at the scale that Scalar wants to support, certain data
shapes are more likely to cause undesirable performance issues, such as
large numbers or large sizes of loose objects.

By including statistics about this, `scalar diagnose` now makes it
easier to identify such scenarios.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
---
 contrib/scalar/scalar.c          | 60 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  2 ++
 2 files changed, 62 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 690933ffdf3..c0ad4948215 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -686,6 +686,60 @@ static void dir_file_stats(struct strbuf *buf, const char *path)
 	closedir(dir);
 }
 
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -734,6 +788,12 @@ static int cmd_diagnose(int argc, const char **argv)
 	if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt")))
 		goto diagnose_cleanup;
 
+	strbuf_reset(&buf);
+	loose_objs_stats(&buf, ".git/objects");
+
+	if ((res = stage(tmp_dir.buf, &buf, "objects-local.txt")))
+		goto diagnose_cleanup;
+
 	if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
 	    (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
 	    (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index b1745851e31..f2ec156d819 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -77,6 +77,8 @@ test_expect_success UNZIP 'scalar diagnose' '
 	unzip -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
 	unzip -p "$zip_path" packs-local.txt >out &&
+	test_file_not_empty out &&
+	unzip -p "$zip_path" objects-local.txt >out &&
 	test_file_not_empty out
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 5/5] scalar diagnose: show a spinner while staging content
  2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-01-26  8:41 ` [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
@ 2022-01-26  8:41 ` Johannes Schindelin via GitGitGadget
  2022-01-27 15:19 ` [PATCH 0/5] scalar: implement the subcommand "diagnose" Derrick Stolee
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
  6 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-01-26  8:41 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

It can take a while to gather all the information that `scalar diagnose`
wants to accumulate. Typically this happens when the user is in need of
quick solutions and therefore their patience is tested already. By
showing a little spinner that spins around, we hope to help the user
muster just a tiny bit more patience until `scalar diagnose` is done.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index c0ad4948215..224329f38f5 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -259,12 +259,26 @@ static int unregister_dir(void)
 	return res;
 }
 
+static void spinner(void)
+{
+	static const char whee[] = "|\010/\010-\010\\\010", *next = whee;
+
+	if (!next)
+		return;
+	if (write(2, next, 2) < 0)
+		next = NULL;
+	else
+		next = next[2] ? next + 2 : whee;
+}
+
 static int stage(const char *git_dir, struct strbuf *buf, const char *path)
 {
 	struct strbuf cacheinfo = STRBUF_INIT;
 	struct child_process cp = CHILD_PROCESS_INIT;
 	int res;
 
+	spinner();
+
 	strbuf_addstr(&cacheinfo, "100644,");
 
 	cp.git_cmd = 1;
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 1/5] Implement `scalar diagnose`
  2022-01-26  8:41 ` [PATCH 1/5] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-01-26  9:34   ` René Scharfe
  2022-01-26 22:20     ` Taylor Blau
  2022-01-27 19:38   ` Elijah Newren
  1 sibling, 1 reply; 109+ messages in thread
From: René Scharfe @ 2022-01-26  9:34 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git; +Cc: Johannes Schindelin

Am 26.01.22 um 09:41 schrieb Johannes Schindelin via GitGitGadget:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> Over the course of Scalar's development, it became obvious that there is
> a need for a command that can gather all kinds of useful information
> that can help identify the most typical problems with large
> worktrees/repositories.
>
> The `diagnose` command is the culmination of this hard-won knowledge: it
> gathers the installed hooks, the config, a couple statistics describing
> the data shape, among other pieces of information, and then wraps
> everything up in a tidy, neat `.zip` archive.
>
> Note: originally, Scalar was implemented in C# using the .NET API, where
> we had the luxury of a comprehensive standard library that includes
> basic functionality such as writing a `.zip` file. In the C version, we
> lack such a commodity. Rather than introducing a dependency on, say,
> libzip, we slightly abuse Git's `archive` command: Instead of writing
> the `.zip` file directly, we stage the file contents in a Git index of a
> temporary, bare repository, only to let `git archive` have at it, and
> finally removing the temporary repository.

git archive allows you to include untracked files in an archive with its
option --add-file.  You can see an example in Git's Makefile; search for
GIT_ARCHIVE_EXTRA_FILES.  It still requires a tree argument, but the
empty tree object should suffice if you don't want to include any
tracked files.  It doesn't currently support streaming, though, i.e.
files are fully read into memory, so it's impractical for huge ones.

> Also note: Due to the frequently-spawned `git hash-object` processes,
> this command is quite a bit slow on Windows. Should it turn out to be a
> big problem, the lack of a batch mode of the `hash-object` command could
> potentially be worked around via using `git fast-import` with a crafted
> `stdin`.

Or we could add streaming support to git archive --add-file..

>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  contrib/scalar/scalar.c          | 170 +++++++++++++++++++++++++++++++
>  contrib/scalar/scalar.txt        |  12 +++
>  contrib/scalar/t/t9099-scalar.sh |  13 +++
>  3 files changed, 195 insertions(+)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index 1ce9c2b00e8..13f2b0f4d5a 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -259,6 +259,108 @@ static int unregister_dir(void)
>  	return res;
>  }
>
> +static int stage(const char *git_dir, struct strbuf *buf, const char *path)
> +{
> +	struct strbuf cacheinfo = STRBUF_INIT;
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	int res;
> +
> +	strbuf_addstr(&cacheinfo, "100644,");
> +
> +	cp.git_cmd = 1;
> +	strvec_pushl(&cp.args, "--git-dir", git_dir,
> +		     "hash-object", "-w", "--stdin", NULL);
> +	res = pipe_command(&cp, buf->buf, buf->len, &cacheinfo, 256, NULL, 0);
> +	if (!res) {
> +		strbuf_rtrim(&cacheinfo);
> +		strbuf_addch(&cacheinfo, ',');
> +		/* We cannot stage `.git`, use `_git` instead. */
> +		if (starts_with(path, ".git/"))
> +			strbuf_addf(&cacheinfo, "_%s", path + 1);
> +		else
> +			strbuf_addstr(&cacheinfo, path);
> +
> +		child_process_init(&cp);
> +		cp.git_cmd = 1;
> +		strvec_pushl(&cp.args, "--git-dir", git_dir,
> +			     "update-index", "--add", "--cacheinfo",
> +			     cacheinfo.buf, NULL);
> +		res = run_command(&cp);
> +	}
> +
> +	strbuf_release(&cacheinfo);
> +	return res;
> +}
> +
> +static int stage_file(const char *git_dir, const char *path)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +	int res;
> +
> +	if (strbuf_read_file(&buf, path, 0) < 0)
> +		return error(_("could not read '%s'"), path);
> +
> +	res = stage(git_dir, &buf, path);
> +
> +	strbuf_release(&buf);
> +	return res;
> +}
> +
> +static int stage_directory(const char *git_dir, const char *path, int recurse)
> +{
> +	int at_root = !*path;
> +	DIR *dir = opendir(at_root ? "." : path);
> +	struct dirent *e;
> +	struct strbuf buf = STRBUF_INIT;
> +	size_t len;
> +	int res = 0;
> +
> +	if (!dir)
> +		return error(_("could not open directory '%s'"), path);
> +
> +	if (!at_root)
> +		strbuf_addf(&buf, "%s/", path);
> +	len = buf.len;
> +
> +	while (!res && (e = readdir(dir))) {
> +		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
> +			continue;
> +
> +		strbuf_setlen(&buf, len);
> +		strbuf_addstr(&buf, e->d_name);
> +
> +		if ((e->d_type == DT_REG && stage_file(git_dir, buf.buf)) ||
> +		    (e->d_type == DT_DIR && recurse &&
> +		     stage_directory(git_dir, buf.buf, recurse)))
> +			res = -1;
> +	}
> +
> +	closedir(dir);
> +	strbuf_release(&buf);
> +	return res;
> +}
> +
> +static int index_to_zip(const char *git_dir)
> +{
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	struct strbuf oid = STRBUF_INIT;
> +
> +	cp.git_cmd = 1;
> +	strvec_pushl(&cp.args, "--git-dir", git_dir, "write-tree", NULL);
> +	if (pipe_command(&cp, NULL, 0, &oid, the_hash_algo->hexsz + 1,
> +			 NULL, 0))
> +		return error(_("could not write temporary tree object"));
> +
> +	strbuf_rtrim(&oid);
> +	child_process_init(&cp);
> +	cp.git_cmd = 1;
> +	strvec_pushl(&cp.args, "--git-dir", git_dir, "archive", "-o", NULL);
> +	strvec_pushf(&cp.args, "%s.zip", git_dir);
> +	strvec_pushl(&cp.args, oid.buf, "--", NULL);
> +	strbuf_release(&oid);
> +	return run_command(&cp);
> +}
> +
>  /* printf-style interface, expects `<key>=<value>` argument */
>  static int set_config(const char *fmt, ...)
>  {
> @@ -499,6 +601,73 @@ cleanup:
>  	return res;
>  }
>
> +static int cmd_diagnose(int argc, const char **argv)
> +{
> +	struct option options[] = {
> +		OPT_END(),
> +	};
> +	const char * const usage[] = {
> +		N_("scalar diagnose [<enlistment>]"),
> +		NULL
> +	};
> +	struct strbuf tmp_dir = STRBUF_INIT;
> +	time_t now = time(NULL);
> +	struct tm tm;
> +	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
> +	int res = 0;
> +
> +	argc = parse_options(argc, argv, NULL, options,
> +			     usage, 0);
> +
> +	setup_enlistment_directory(argc, argv, usage, options, &buf);
> +
> +	strbuf_addstr(&buf, "/.scalarDiagnostics/scalar_");
> +	strbuf_addftime(&buf, "%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
> +	if (run_git("init", "-q", "-b", "dummy", "--bare", buf.buf, NULL)) {
> +		res = error(_("could not initialize temporary repository: %s"),
> +			    buf.buf);
> +		goto diagnose_cleanup;
> +	}
> +	strbuf_realpath(&tmp_dir, buf.buf, 1);
> +
> +	strbuf_reset(&buf);
> +	strbuf_addf(&buf, "Collecting diagnostic info into temp folder %s\n\n",
> +		    tmp_dir.buf);
> +
> +	get_version_info(&buf, 1);
> +
> +	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
> +	fwrite(buf.buf, buf.len, 1, stdout);
> +
> +	if ((res = stage(tmp_dir.buf, &buf, "diagnostics.log")))
> +		goto diagnose_cleanup;
> +
> +	if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
> +	    (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
> +	    (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
> +	    (res = stage_directory(tmp_dir.buf, ".git/logs", 1)) ||
> +	    (res = stage_directory(tmp_dir.buf, ".git/objects/info", 0)))
> +		goto diagnose_cleanup;
> +
> +	res = index_to_zip(tmp_dir.buf);
> +
> +	if (!res)
> +		res = remove_dir_recursively(&tmp_dir, 0);
> +
> +	if (!res)
> +		printf("\n"
> +		       "Diagnostics complete.\n"
> +		       "All of the gathered info is captured in '%s.zip'\n",
> +		       tmp_dir.buf);
> +
> +diagnose_cleanup:
> +	strbuf_release(&tmp_dir);
> +	strbuf_release(&path);
> +	strbuf_release(&buf);
> +
> +	return res;
> +}
> +
>  static int cmd_list(int argc, const char **argv)
>  {
>  	if (argc != 1)
> @@ -800,6 +969,7 @@ static struct {
>  	{ "reconfigure", cmd_reconfigure },
>  	{ "delete", cmd_delete },
>  	{ "version", cmd_version },
> +	{ "diagnose", cmd_diagnose },
>  	{ NULL, NULL},
>  };
>
> diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
> index f416d637289..22583fe046e 100644
> --- a/contrib/scalar/scalar.txt
> +++ b/contrib/scalar/scalar.txt
> @@ -14,6 +14,7 @@ scalar register [<enlistment>]
>  scalar unregister [<enlistment>]
>  scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
>  scalar reconfigure [ --all | <enlistment> ]
> +scalar diagnose [<enlistment>]
>  scalar delete <enlistment>
>
>  DESCRIPTION
> @@ -129,6 +130,17 @@ reconfigure the enlistment.
>  With the `--all` option, all enlistments currently registered with Scalar
>  will be reconfigured. Use this option after each Scalar upgrade.
>
> +Diagnose
> +~~~~~~~~
> +
> +diagnose [<enlistment>]::
> +    When reporting issues with Scalar, it is often helpful to provide the
> +    information gathered by this command, including logs and certain
> +    statistics describing the data shape of the current enlistment.
> ++
> +The output of this command is a `.zip` file that is written into
> +a directory adjacent to the worktree in the `src` directory.
> +
>  Delete
>  ~~~~~~
>
> diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> index 2e1502ad45e..ecd06e207c2 100755
> --- a/contrib/scalar/t/t9099-scalar.sh
> +++ b/contrib/scalar/t/t9099-scalar.sh
> @@ -65,6 +65,19 @@ test_expect_success 'scalar clone' '
>  	)
>  '
>
> +SQ="'"
> +test_expect_success UNZIP 'scalar diagnose' '
> +	scalar diagnose cloned >out &&
> +	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
> +	zip_path=$(cat zip_path) &&
> +	test -n "$zip_path" &&
> +	unzip -v "$zip_path" &&
> +	folder=${zip_path%.zip} &&
> +	test_path_is_missing "$folder" &&
> +	unzip -p "$zip_path" diagnostics.log >out &&
> +	test_file_not_empty out
> +'
> +
>  test_expect_success 'scalar reconfigure' '
>  	git init one/src &&
>  	scalar register one &&


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 1/5] Implement `scalar diagnose`
  2022-01-26  9:34   ` René Scharfe
@ 2022-01-26 22:20     ` Taylor Blau
  2022-02-06 21:34       ` Johannes Schindelin
  0 siblings, 1 reply; 109+ messages in thread
From: Taylor Blau @ 2022-01-26 22:20 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

On Wed, Jan 26, 2022 at 10:34:04AM +0100, René Scharfe wrote:
> Am 26.01.22 um 09:41 schrieb Johannes Schindelin via GitGitGadget:
> > Note: originally, Scalar was implemented in C# using the .NET API, where
> > we had the luxury of a comprehensive standard library that includes
> > basic functionality such as writing a `.zip` file. In the C version, we
> > lack such a commodity. Rather than introducing a dependency on, say,
> > libzip, we slightly abuse Git's `archive` command: Instead of writing
> > the `.zip` file directly, we stage the file contents in a Git index of a
> > temporary, bare repository, only to let `git archive` have at it, and
> > finally removing the temporary repository.
>
> git archive allows you to include untracked files in an archive with its
> option --add-file.  You can see an example in Git's Makefile; search for
> GIT_ARCHIVE_EXTRA_FILES.  It still requires a tree argument, but the
> empty tree object should suffice if you don't want to include any
> tracked files.  It doesn't currently support streaming, though, i.e.
> files are fully read into memory, so it's impractical for huge ones.

Using `--add-file` would likely be preferable to setting up a temporary
repository just to invoke `git archive` in it. Johannes would be the
expert to ask whether or not big files are going to be a problem here
(based on a cursory scan of the new functions in scalar.c, I don't
expect this to be the case).

The new stage_directory() function _could_ add `--add-file` arguments in
a loop around readdir(), but it might also be nice to add a new
`--add-directory` function to `git archive` which would do the "heavy"
lifting for us.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 3/5] scalar: teach `diagnose` to gather packfile info
  2022-01-26  8:41 ` [PATCH 3/5] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
@ 2022-01-26 22:43   ` Taylor Blau
  2022-01-27 15:14     ` Derrick Stolee
  0 siblings, 1 reply; 109+ messages in thread
From: Taylor Blau @ 2022-01-26 22:43 UTC (permalink / raw)
  To: Matthew John Cheetham via GitGitGadget
  Cc: git, Johannes Schindelin, Matthew John Cheetham

On Wed, Jan 26, 2022 at 08:41:45AM +0000, Matthew John Cheetham via GitGitGadget wrote:
> From: Matthew John Cheetham <mjcheetham@outlook.com>
>
> Teach the `scalar diagnose` command to gather file size information
> about pack files.
>
> Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
> ---
>  contrib/scalar/scalar.c          | 39 ++++++++++++++++++++++++++++++++
>  contrib/scalar/t/t9099-scalar.sh |  2 ++
>  2 files changed, 41 insertions(+)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index e26fb2fc018..690933ffdf3 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -653,6 +653,39 @@ cleanup:
>  	return res;
>  }
>
> +static void dir_file_stats(struct strbuf *buf, const char *path)
> +{
> +	DIR *dir = opendir(path);
> +	struct dirent *e;
> +	struct stat e_stat;
> +	struct strbuf file_path = STRBUF_INIT;
> +	size_t base_path_len;
> +
> +	if (!dir)
> +		return;
> +
> +	strbuf_addstr(buf, "Contents of ");
> +	strbuf_add_absolute_path(buf, path);
> +	strbuf_addstr(buf, ":\n");
> +
> +	strbuf_add_absolute_path(&file_path, path);
> +	strbuf_addch(&file_path, '/');
> +	base_path_len = file_path.len;
> +
> +	while ((e = readdir(dir)) != NULL)

Hmm. Is there a reason that this couldn't use
for_each_file_in_pack_dir() with a callback that just does the stat()
and buffer manipulation?

I don't think it's critical either way, but it would eliminate some of
the boilerplate that is shared between this implementation and the one
that already exists in for_each_file_in_pack_dir().

> +		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG) {
> +			strbuf_setlen(&file_path, base_path_len);
> +			strbuf_addstr(&file_path, e->d_name);

For what it's worth, I think the callback would start here:

> +			if (!stat(file_path.buf, &e_stat))
> +				strbuf_addf(buf, "%-70s %16"PRIuMAX"\n",
> +					    e->d_name,
> +					    (uintmax_t)e_stat.st_size);

...and end here.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information
  2022-01-26  8:41 ` [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
@ 2022-01-26 22:50   ` Taylor Blau
  2022-01-27 15:17     ` Derrick Stolee
  2022-01-27 18:59   ` Elijah Newren
  1 sibling, 1 reply; 109+ messages in thread
From: Taylor Blau @ 2022-01-26 22:50 UTC (permalink / raw)
  To: Matthew John Cheetham via GitGitGadget
  Cc: git, Johannes Schindelin, Matthew John Cheetham

On Wed, Jan 26, 2022 at 08:41:46AM +0000, Matthew John Cheetham via GitGitGadget wrote:
> +	while ((e = readdir(dir)) != NULL)
> +		if (!is_dot_or_dotdot(e->d_name) &&
> +		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
> +		    !hex_to_bytes(&c, e->d_name, 1)) {

What is this call to hex_to_bytes() for? I assume it's checking to make
sure the directory we're looking at is one of the shards of loose
objects.

Similar to my suggestion on the previous patch, I think that we could
get rid of this function entirely and replace it with a call to
for_each_loose_file_in_objdir().

We'll pay a little bit of extra cost to parse out each loose object's
OID, but it should be negligible since we're not actually opening up
each object.

> diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> index b1745851e31..f2ec156d819 100755
> --- a/contrib/scalar/t/t9099-scalar.sh
> +++ b/contrib/scalar/t/t9099-scalar.sh
> @@ -77,6 +77,8 @@ test_expect_success UNZIP 'scalar diagnose' '
>  	unzip -p "$zip_path" diagnostics.log >out &&
>  	test_file_not_empty out &&
>  	unzip -p "$zip_path" packs-local.txt >out &&
> +	test_file_not_empty out &&

A more comprehensive test (here, and in the earlier instances, too)
might be useful beyond just "does this file exist in the archive".

Constructing an example repository where the number of loose objects is
known ahead of time, and then finding that number in the output of
objects-local.txt might be worthwhile to give us some extra confidence
that this is working as intended.

> +	unzip -p "$zip_path" objects-local.txt >out &&
>  	test_file_not_empty out
>  '

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 3/5] scalar: teach `diagnose` to gather packfile info
  2022-01-26 22:43   ` Taylor Blau
@ 2022-01-27 15:14     ` Derrick Stolee
  2022-02-06 21:38       ` Johannes Schindelin
  0 siblings, 1 reply; 109+ messages in thread
From: Derrick Stolee @ 2022-01-27 15:14 UTC (permalink / raw)
  To: Taylor Blau, Matthew John Cheetham via GitGitGadget
  Cc: git, Johannes Schindelin, Matthew John Cheetham

On 1/26/2022 5:43 PM, Taylor Blau wrote:
> On Wed, Jan 26, 2022 at 08:41:45AM +0000, Matthew John Cheetham via GitGitGadget wrote:
>> From: Matthew John Cheetham <mjcheetham@outlook.com>
>>
>> Teach the `scalar diagnose` command to gather file size information
>> about pack files.
>>
>> Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
>> ---
>>  contrib/scalar/scalar.c          | 39 ++++++++++++++++++++++++++++++++
>>  contrib/scalar/t/t9099-scalar.sh |  2 ++
>>  2 files changed, 41 insertions(+)
>>
>> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
>> index e26fb2fc018..690933ffdf3 100644
>> --- a/contrib/scalar/scalar.c
>> +++ b/contrib/scalar/scalar.c
>> @@ -653,6 +653,39 @@ cleanup:
>>  	return res;
>>  }
>>
>> +static void dir_file_stats(struct strbuf *buf, const char *path)
>> +{
>> +	DIR *dir = opendir(path);
>> +	struct dirent *e;
>> +	struct stat e_stat;
>> +	struct strbuf file_path = STRBUF_INIT;
>> +	size_t base_path_len;
>> +
>> +	if (!dir)
>> +		return;
>> +
>> +	strbuf_addstr(buf, "Contents of ");
>> +	strbuf_add_absolute_path(buf, path);
>> +	strbuf_addstr(buf, ":\n");
>> +
>> +	strbuf_add_absolute_path(&file_path, path);
>> +	strbuf_addch(&file_path, '/');
>> +	base_path_len = file_path.len;
>> +
>> +	while ((e = readdir(dir)) != NULL)
> 
> Hmm. Is there a reason that this couldn't use
> for_each_file_in_pack_dir() with a callback that just does the stat()
> and buffer manipulation?
> 
> I don't think it's critical either way, but it would eliminate some of
> the boilerplate that is shared between this implementation and the one
> that already exists in for_each_file_in_pack_dir().

It's helpful to see if there are other crud files in the pack
directory. This method is also extended in microsoft/git to
scan the alternates directory (which we expect to exist as the
"shared objects cache).

We might want to modify the implementation in this series to
run dir_file_stats() on each odb in the_repository. This would
give us the data for the shared object cache for free while
being more general to other Git repos. (It would require us to
do some reaction work in microsoft/git and be a change of
behavior, but we are the only ones who have looked at these
diagnose files before, so that change will be easy to manage.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information
  2022-01-26 22:50   ` Taylor Blau
@ 2022-01-27 15:17     ` Derrick Stolee
  0 siblings, 0 replies; 109+ messages in thread
From: Derrick Stolee @ 2022-01-27 15:17 UTC (permalink / raw)
  To: Taylor Blau, Matthew John Cheetham via GitGitGadget
  Cc: git, Johannes Schindelin, Matthew John Cheetham

On 1/26/2022 5:50 PM, Taylor Blau wrote:
> On Wed, Jan 26, 2022 at 08:41:46AM +0000, Matthew John Cheetham via GitGitGadget wrote:
>> +	while ((e = readdir(dir)) != NULL)
>> +		if (!is_dot_or_dotdot(e->d_name) &&
>> +		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
>> +		    !hex_to_bytes(&c, e->d_name, 1)) {
> 
> What is this call to hex_to_bytes() for? I assume it's checking to make
> sure the directory we're looking at is one of the shards of loose
> objects.
> 
> Similar to my suggestion on the previous patch, I think that we could
> get rid of this function entirely and replace it with a call to
> for_each_loose_file_in_objdir().

There is a possibility that there are files other than loose objects
in these directories, so summarizing those counts might be helpful
information. For example: if somehow .git/objects/00/ was full of a
bunch of non-objects, it would still slow down Git commands that ask
for a short-sha starting with "00".

While this shouldn't be a normal case, the 'diagnose' command is
built to help us find these extremely odd scenarios because they
_have_ happened before (typically because of a VFS for Git bug
taught us how to look for these situations).

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 0/5] scalar: implement the subcommand "diagnose"
  2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                   ` (4 preceding siblings ...)
  2022-01-26  8:41 ` [PATCH 5/5] scalar diagnose: show a spinner while staging content Johannes Schindelin via GitGitGadget
@ 2022-01-27 15:19 ` Derrick Stolee
  2022-02-06 21:13   ` Johannes Schindelin
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
  6 siblings, 1 reply; 109+ messages in thread
From: Derrick Stolee @ 2022-01-27 15:19 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git
  Cc: Johannes Schindelin, Emily Shaffer

On 1/26/2022 3:41 AM, Johannes Schindelin via GitGitGadget wrote:
> Over the course of the years, we developed a sub-command that gathers
> diagnostic data into a .zip file that can then be attached to bug reports.
> This sub-command turned out to be very useful in helping Scalar developers
> identify and fix issues.

For historical context: The 'diagnose' command was implemented in VFS for
Git and ported to the C# version of Scalar before 'git bugreport' existed,
but they serve very similar purposes.

I wonder if 'scalar diagnose' could include some of the information
captured by 'git bugreport' or whether this implementation of 'diagnose'
could help inform 'git bugreport' in any way.

CC'ing Emily for thoughts.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information
  2022-01-26  8:41 ` [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
  2022-01-26 22:50   ` Taylor Blau
@ 2022-01-27 18:59   ` Elijah Newren
  2022-02-06 21:25     ` Johannes Schindelin
  1 sibling, 1 reply; 109+ messages in thread
From: Elijah Newren @ 2022-01-27 18:59 UTC (permalink / raw)
  To: Matthew John Cheetham via GitGitGadget
  Cc: Git Mailing List, Johannes Schindelin, Matthew John Cheetham

On Wed, Jan 26, 2022 at 3:37 PM Matthew John Cheetham via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Matthew John Cheetham <mjcheetham@outlook.com>
>
> When operating at the scale that Scalar wants to support, certain data
> shapes are more likely to cause undesirable performance issues, such as
> large numbers or large sizes of loose objects.

Makes sense.

> By including statistics about this, `scalar diagnose` now makes it
> easier to identify such scenarios.
>
> Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
> ---
>  contrib/scalar/scalar.c          | 60 ++++++++++++++++++++++++++++++++
>  contrib/scalar/t/t9099-scalar.sh |  2 ++
>  2 files changed, 62 insertions(+)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index 690933ffdf3..c0ad4948215 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -686,6 +686,60 @@ static void dir_file_stats(struct strbuf *buf, const char *path)
>         closedir(dir);
>  }
>
> +static int count_files(char *path)
> +{
> +       DIR *dir = opendir(path);
> +       struct dirent *e;
> +       int count = 0;
> +
> +       if (!dir)
> +               return 0;
> +
> +       while ((e = readdir(dir)) != NULL)
> +               if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
> +                       count++;
> +
> +       closedir(dir);
> +       return count;
> +}
> +
> +static void loose_objs_stats(struct strbuf *buf, const char *path)
> +{
> +       DIR *dir = opendir(path);
> +       struct dirent *e;
> +       int count;
> +       int total = 0;
> +       unsigned char c;
> +       struct strbuf count_path = STRBUF_INIT;
> +       size_t base_path_len;
> +
> +       if (!dir)
> +               return;
> +
> +       strbuf_addstr(buf, "Object directory stats for ");
> +       strbuf_add_absolute_path(buf, path);
> +       strbuf_addstr(buf, ":\n");
> +
> +       strbuf_add_absolute_path(&count_path, path);
> +       strbuf_addch(&count_path, '/');
> +       base_path_len = count_path.len;
> +
> +       while ((e = readdir(dir)) != NULL)
> +               if (!is_dot_or_dotdot(e->d_name) &&
> +                   e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
> +                   !hex_to_bytes(&c, e->d_name, 1)) {

You only recurse into directories, ignoring individual files.

> +                       strbuf_setlen(&count_path, base_path_len);
> +                       strbuf_addstr(&count_path, e->d_name);
> +                       total += (count = count_files(count_path.buf));
> +                       strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);

This shows the number of files within a directory.

> +               }
> +
> +       strbuf_addf(buf, "Total: %d loose objects", total);

and this shows the total number of files across all the directories.

But the commit message suggested you also wanted to check for large
sizes of loose objects.  Did that get ripped out at some point with
the commit message not being updated, or is it perhaps going to be
included later?

> +
> +       strbuf_release(&count_path);
> +       closedir(dir);
> +}
> +
>  static int cmd_diagnose(int argc, const char **argv)
>  {
>         struct option options[] = {
> @@ -734,6 +788,12 @@ static int cmd_diagnose(int argc, const char **argv)
>         if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt")))
>                 goto diagnose_cleanup;
>
> +       strbuf_reset(&buf);
> +       loose_objs_stats(&buf, ".git/objects");
> +
> +       if ((res = stage(tmp_dir.buf, &buf, "objects-local.txt")))
> +               goto diagnose_cleanup;
> +
>         if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
>             (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
>             (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
> diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> index b1745851e31..f2ec156d819 100755
> --- a/contrib/scalar/t/t9099-scalar.sh
> +++ b/contrib/scalar/t/t9099-scalar.sh
> @@ -77,6 +77,8 @@ test_expect_success UNZIP 'scalar diagnose' '
>         unzip -p "$zip_path" diagnostics.log >out &&
>         test_file_not_empty out &&
>         unzip -p "$zip_path" packs-local.txt >out &&
> +       test_file_not_empty out &&
> +       unzip -p "$zip_path" objects-local.txt >out &&
>         test_file_not_empty out
>  '
>
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 1/5] Implement `scalar diagnose`
  2022-01-26  8:41 ` [PATCH 1/5] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
  2022-01-26  9:34   ` René Scharfe
@ 2022-01-27 19:38   ` Elijah Newren
  1 sibling, 0 replies; 109+ messages in thread
From: Elijah Newren @ 2022-01-27 19:38 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: Git Mailing List, Johannes Schindelin

On Wed, Jan 26, 2022 at 3:37 PM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> Over the course of Scalar's development, it became obvious that there is
> a need for a command that can gather all kinds of useful information
> that can help identify the most typical problems with large
> worktrees/repositories.
>
> The `diagnose` command is the culmination of this hard-won knowledge: it
> gathers the installed hooks, the config, a couple statistics describing
> the data shape, among other pieces of information, and then wraps
> everything up in a tidy, neat `.zip` archive.
>
> Note: originally, Scalar was implemented in C# using the .NET API, where
> we had the luxury of a comprehensive standard library that includes
> basic functionality such as writing a `.zip` file. In the C version, we
> lack such a commodity. Rather than introducing a dependency on, say,
> libzip, we slightly abuse Git's `archive` command: Instead of writing
> the `.zip` file directly, we stage the file contents in a Git index of a
> temporary, bare repository, only to let `git archive` have at it, and
> finally removing the temporary repository.
>
> Also note: Due to the frequently-spawned `git hash-object` processes,
> this command is quite a bit slow on Windows. Should it turn out to be a
> big problem, the lack of a batch mode of the `hash-object` command could
> potentially be worked around via using `git fast-import` with a crafted
> `stdin`.

hash-object and update-index processes, right?  You spawn one of each
for each object.

I was you investigate the fast-import idea because it gets rid of the
N hash-object processes, the N update-index processes, and the
write-tree process, instead giving you a single fast-import process as
a preliminary to calling out to git archive.  It'd also have the
advantage of providing just one pack instead of many loose objects.

But René's suggestion to use and extend archive's ability to handle
untracked files sounds like a better idea.

>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  contrib/scalar/scalar.c          | 170 +++++++++++++++++++++++++++++++
>  contrib/scalar/scalar.txt        |  12 +++
>  contrib/scalar/t/t9099-scalar.sh |  13 +++
>  3 files changed, 195 insertions(+)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index 1ce9c2b00e8..13f2b0f4d5a 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -259,6 +259,108 @@ static int unregister_dir(void)
>         return res;
>  }
>
> +static int stage(const char *git_dir, struct strbuf *buf, const char *path)
> +{
> +       struct strbuf cacheinfo = STRBUF_INIT;
> +       struct child_process cp = CHILD_PROCESS_INIT;
> +       int res;
> +
> +       strbuf_addstr(&cacheinfo, "100644,");
> +
> +       cp.git_cmd = 1;
> +       strvec_pushl(&cp.args, "--git-dir", git_dir,
> +                    "hash-object", "-w", "--stdin", NULL);
> +       res = pipe_command(&cp, buf->buf, buf->len, &cacheinfo, 256, NULL, 0);
> +       if (!res) {
> +               strbuf_rtrim(&cacheinfo);
> +               strbuf_addch(&cacheinfo, ',');
> +               /* We cannot stage `.git`, use `_git` instead. */
> +               if (starts_with(path, ".git/"))
> +                       strbuf_addf(&cacheinfo, "_%s", path + 1);
> +               else
> +                       strbuf_addstr(&cacheinfo, path);
> +
> +               child_process_init(&cp);
> +               cp.git_cmd = 1;
> +               strvec_pushl(&cp.args, "--git-dir", git_dir,
> +                            "update-index", "--add", "--cacheinfo",
> +                            cacheinfo.buf, NULL);
> +               res = run_command(&cp);
> +       }
> +
> +       strbuf_release(&cacheinfo);
> +       return res;
> +}
> +
> +static int stage_file(const char *git_dir, const char *path)
> +{
> +       struct strbuf buf = STRBUF_INIT;
> +       int res;
> +
> +       if (strbuf_read_file(&buf, path, 0) < 0)
> +               return error(_("could not read '%s'"), path);
> +
> +       res = stage(git_dir, &buf, path);
> +
> +       strbuf_release(&buf);
> +       return res;
> +}
> +
> +static int stage_directory(const char *git_dir, const char *path, int recurse)
> +{
> +       int at_root = !*path;
> +       DIR *dir = opendir(at_root ? "." : path);
> +       struct dirent *e;
> +       struct strbuf buf = STRBUF_INIT;
> +       size_t len;
> +       int res = 0;
> +
> +       if (!dir)
> +               return error(_("could not open directory '%s'"), path);
> +
> +       if (!at_root)
> +               strbuf_addf(&buf, "%s/", path);
> +       len = buf.len;
> +
> +       while (!res && (e = readdir(dir))) {
> +               if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
> +                       continue;
> +
> +               strbuf_setlen(&buf, len);
> +               strbuf_addstr(&buf, e->d_name);
> +
> +               if ((e->d_type == DT_REG && stage_file(git_dir, buf.buf)) ||
> +                   (e->d_type == DT_DIR && recurse &&
> +                    stage_directory(git_dir, buf.buf, recurse)))
> +                       res = -1;
> +       }
> +
> +       closedir(dir);
> +       strbuf_release(&buf);
> +       return res;
> +}
> +
> +static int index_to_zip(const char *git_dir)
> +{
> +       struct child_process cp = CHILD_PROCESS_INIT;
> +       struct strbuf oid = STRBUF_INIT;
> +
> +       cp.git_cmd = 1;
> +       strvec_pushl(&cp.args, "--git-dir", git_dir, "write-tree", NULL);
> +       if (pipe_command(&cp, NULL, 0, &oid, the_hash_algo->hexsz + 1,
> +                        NULL, 0))
> +               return error(_("could not write temporary tree object"));
> +
> +       strbuf_rtrim(&oid);
> +       child_process_init(&cp);
> +       cp.git_cmd = 1;
> +       strvec_pushl(&cp.args, "--git-dir", git_dir, "archive", "-o", NULL);
> +       strvec_pushf(&cp.args, "%s.zip", git_dir);
> +       strvec_pushl(&cp.args, oid.buf, "--", NULL);
> +       strbuf_release(&oid);
> +       return run_command(&cp);
> +}
> +
>  /* printf-style interface, expects `<key>=<value>` argument */
>  static int set_config(const char *fmt, ...)
>  {
> @@ -499,6 +601,73 @@ cleanup:
>         return res;
>  }
>
> +static int cmd_diagnose(int argc, const char **argv)
> +{
> +       struct option options[] = {
> +               OPT_END(),
> +       };
> +       const char * const usage[] = {
> +               N_("scalar diagnose [<enlistment>]"),
> +               NULL
> +       };
> +       struct strbuf tmp_dir = STRBUF_INIT;
> +       time_t now = time(NULL);
> +       struct tm tm;
> +       struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
> +       int res = 0;
> +
> +       argc = parse_options(argc, argv, NULL, options,
> +                            usage, 0);
> +
> +       setup_enlistment_directory(argc, argv, usage, options, &buf);
> +
> +       strbuf_addstr(&buf, "/.scalarDiagnostics/scalar_");
> +       strbuf_addftime(&buf, "%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
> +       if (run_git("init", "-q", "-b", "dummy", "--bare", buf.buf, NULL)) {
> +               res = error(_("could not initialize temporary repository: %s"),
> +                           buf.buf);
> +               goto diagnose_cleanup;
> +       }
> +       strbuf_realpath(&tmp_dir, buf.buf, 1);
> +
> +       strbuf_reset(&buf);
> +       strbuf_addf(&buf, "Collecting diagnostic info into temp folder %s\n\n",
> +                   tmp_dir.buf);
> +
> +       get_version_info(&buf, 1);
> +
> +       strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
> +       fwrite(buf.buf, buf.len, 1, stdout);
> +
> +       if ((res = stage(tmp_dir.buf, &buf, "diagnostics.log")))
> +               goto diagnose_cleanup;
> +
> +       if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
> +           (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
> +           (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
> +           (res = stage_directory(tmp_dir.buf, ".git/logs", 1)) ||
> +           (res = stage_directory(tmp_dir.buf, ".git/objects/info", 0)))
> +               goto diagnose_cleanup;
> +
> +       res = index_to_zip(tmp_dir.buf);
> +
> +       if (!res)
> +               res = remove_dir_recursively(&tmp_dir, 0);
> +
> +       if (!res)
> +               printf("\n"
> +                      "Diagnostics complete.\n"
> +                      "All of the gathered info is captured in '%s.zip'\n",
> +                      tmp_dir.buf);
> +
> +diagnose_cleanup:
> +       strbuf_release(&tmp_dir);
> +       strbuf_release(&path);
> +       strbuf_release(&buf);
> +
> +       return res;
> +}
> +
>  static int cmd_list(int argc, const char **argv)
>  {
>         if (argc != 1)
> @@ -800,6 +969,7 @@ static struct {
>         { "reconfigure", cmd_reconfigure },
>         { "delete", cmd_delete },
>         { "version", cmd_version },
> +       { "diagnose", cmd_diagnose },
>         { NULL, NULL},
>  };
>
> diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
> index f416d637289..22583fe046e 100644
> --- a/contrib/scalar/scalar.txt
> +++ b/contrib/scalar/scalar.txt
> @@ -14,6 +14,7 @@ scalar register [<enlistment>]
>  scalar unregister [<enlistment>]
>  scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
>  scalar reconfigure [ --all | <enlistment> ]
> +scalar diagnose [<enlistment>]
>  scalar delete <enlistment>
>
>  DESCRIPTION
> @@ -129,6 +130,17 @@ reconfigure the enlistment.
>  With the `--all` option, all enlistments currently registered with Scalar
>  will be reconfigured. Use this option after each Scalar upgrade.
>
> +Diagnose
> +~~~~~~~~
> +
> +diagnose [<enlistment>]::
> +    When reporting issues with Scalar, it is often helpful to provide the
> +    information gathered by this command, including logs and certain
> +    statistics describing the data shape of the current enlistment.
> ++
> +The output of this command is a `.zip` file that is written into
> +a directory adjacent to the worktree in the `src` directory.
> +
>  Delete
>  ~~~~~~
>
> diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> index 2e1502ad45e..ecd06e207c2 100755
> --- a/contrib/scalar/t/t9099-scalar.sh
> +++ b/contrib/scalar/t/t9099-scalar.sh
> @@ -65,6 +65,19 @@ test_expect_success 'scalar clone' '
>         )
>  '
>
> +SQ="'"
> +test_expect_success UNZIP 'scalar diagnose' '
> +       scalar diagnose cloned >out &&
> +       sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
> +       zip_path=$(cat zip_path) &&
> +       test -n "$zip_path" &&
> +       unzip -v "$zip_path" &&
> +       folder=${zip_path%.zip} &&
> +       test_path_is_missing "$folder" &&
> +       unzip -p "$zip_path" diagnostics.log >out &&
> +       test_file_not_empty out
> +'
> +
>  test_expect_success 'scalar reconfigure' '
>         git init one/src &&
>         scalar register one &&
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 0/5] scalar: implement the subcommand "diagnose"
  2022-01-27 15:19 ` [PATCH 0/5] scalar: implement the subcommand "diagnose" Derrick Stolee
@ 2022-02-06 21:13   ` Johannes Schindelin
  0 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2022-02-06 21:13 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Johannes Schindelin via GitGitGadget, git, Emily Shaffer

Hi Stolee & Emily,

On Thu, 27 Jan 2022, Derrick Stolee wrote:

> On 1/26/2022 3:41 AM, Johannes Schindelin via GitGitGadget wrote:
> > Over the course of the years, we developed a sub-command that gathers
> > diagnostic data into a .zip file that can then be attached to bug reports.
> > This sub-command turned out to be very useful in helping Scalar developers
> > identify and fix issues.
>
> For historical context: The 'diagnose' command was implemented in VFS for
> Git and ported to the C# version of Scalar before 'git bugreport' existed,
> but they serve very similar purposes.
>
> I wonder if 'scalar diagnose' could include some of the information
> captured by 'git bugreport' or whether this implementation of 'diagnose'
> could help inform 'git bugreport' in any way.

Indeed, I think that the `bugreport` command could easily benefit from at
least the number of pack files and loose objects.

Ciao,
Dscho

>
> CC'ing Emily for thoughts.
>
> Thanks,
> -Stolee
>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information
  2022-01-27 18:59   ` Elijah Newren
@ 2022-02-06 21:25     ` Johannes Schindelin
  0 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2022-02-06 21:25 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Matthew John Cheetham via GitGitGadget, Git Mailing List,
	Matthew John Cheetham

Hi Elijah,

On Thu, 27 Jan 2022, Elijah Newren wrote:

> On Wed, Jan 26, 2022 at 3:37 PM Matthew John Cheetham via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Matthew John Cheetham <mjcheetham@outlook.com>
> >
> > When operating at the scale that Scalar wants to support, certain data
> > shapes are more likely to cause undesirable performance issues, such as
> > large numbers or large sizes of loose objects.
>
> Makes sense.
>
> > By including statistics about this, `scalar diagnose` now makes it
> > easier to identify such scenarios.
> >
> > Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
> > ---
> >  contrib/scalar/scalar.c          | 60 ++++++++++++++++++++++++++++++++
> >  contrib/scalar/t/t9099-scalar.sh |  2 ++
> >  2 files changed, 62 insertions(+)
> >
> > diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> > index 690933ffdf3..c0ad4948215 100644
> > --- a/contrib/scalar/scalar.c
> > +++ b/contrib/scalar/scalar.c
> > @@ -686,6 +686,60 @@ static void dir_file_stats(struct strbuf *buf, const char *path)
> >         closedir(dir);
> >  }
> >
> > +static int count_files(char *path)
> > +{
> > +       DIR *dir = opendir(path);
> > +       struct dirent *e;
> > +       int count = 0;
> > +
> > +       if (!dir)
> > +               return 0;
> > +
> > +       while ((e = readdir(dir)) != NULL)
> > +               if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
> > +                       count++;
> > +
> > +       closedir(dir);
> > +       return count;
> > +}
> > +
> > +static void loose_objs_stats(struct strbuf *buf, const char *path)
> > +{
> > +       DIR *dir = opendir(path);
> > +       struct dirent *e;
> > +       int count;
> > +       int total = 0;
> > +       unsigned char c;
> > +       struct strbuf count_path = STRBUF_INIT;
> > +       size_t base_path_len;
> > +
> > +       if (!dir)
> > +               return;
> > +
> > +       strbuf_addstr(buf, "Object directory stats for ");
> > +       strbuf_add_absolute_path(buf, path);
> > +       strbuf_addstr(buf, ":\n");
> > +
> > +       strbuf_add_absolute_path(&count_path, path);
> > +       strbuf_addch(&count_path, '/');
> > +       base_path_len = count_path.len;
> > +
> > +       while ((e = readdir(dir)) != NULL)
> > +               if (!is_dot_or_dotdot(e->d_name) &&
> > +                   e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
> > +                   !hex_to_bytes(&c, e->d_name, 1)) {
>
> You only recurse into directories, ignoring individual files.
>
> > +                       strbuf_setlen(&count_path, base_path_len);
> > +                       strbuf_addstr(&count_path, e->d_name);
> > +                       total += (count = count_files(count_path.buf));
> > +                       strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
>
> This shows the number of files within a directory.
>
> > +               }
> > +
> > +       strbuf_addf(buf, "Total: %d loose objects", total);
>
> and this shows the total number of files across all the directories.
>
> But the commit message suggested you also wanted to check for large
> sizes of loose objects.  Did that get ripped out at some point with
> the commit message not being updated, or is it perhaps going to be
> included later?

No, there was no plan to include this information later, as the original
.NET implementation of `scalar diagnose` did not provide that information,
either (which I take as a strong sign that we never needed this type of
information to help users, at least not up until this point).

Besides, it would be kind of a difficult thing to say conclusively what
makes a loose file "big". Is it the zlib-compressed size on disk? Or the
unpacked size? Should there be a configurable threshold to determine when
an object is big? Should `core.bigFileThreshold` be co-opted for this?

Together with the fact that there was no need for this information in
practice, it makes me doubt that we should add this type of information. I
actually suspect that _iff_ information of that type would be helpful, a
more complete tool like git-sizer (https://github.com/github/git-sizer/)
would be needed, and I do not really want to subsume git-sizer's
functionality in `scalar diagnose`.

I rephrased the commit message.

Ciao,
Dscho

>
> > +
> > +       strbuf_release(&count_path);
> > +       closedir(dir);
> > +}
> > +
> >  static int cmd_diagnose(int argc, const char **argv)
> >  {
> >         struct option options[] = {
> > @@ -734,6 +788,12 @@ static int cmd_diagnose(int argc, const char **argv)
> >         if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt")))
> >                 goto diagnose_cleanup;
> >
> > +       strbuf_reset(&buf);
> > +       loose_objs_stats(&buf, ".git/objects");
> > +
> > +       if ((res = stage(tmp_dir.buf, &buf, "objects-local.txt")))
> > +               goto diagnose_cleanup;
> > +
> >         if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
> >             (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
> >             (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
> > diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> > index b1745851e31..f2ec156d819 100755
> > --- a/contrib/scalar/t/t9099-scalar.sh
> > +++ b/contrib/scalar/t/t9099-scalar.sh
> > @@ -77,6 +77,8 @@ test_expect_success UNZIP 'scalar diagnose' '
> >         unzip -p "$zip_path" diagnostics.log >out &&
> >         test_file_not_empty out &&
> >         unzip -p "$zip_path" packs-local.txt >out &&
> > +       test_file_not_empty out &&
> > +       unzip -p "$zip_path" objects-local.txt >out &&
> >         test_file_not_empty out
> >  '
> >
> > --
> > gitgitgadget
>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 1/5] Implement `scalar diagnose`
  2022-01-26 22:20     ` Taylor Blau
@ 2022-02-06 21:34       ` Johannes Schindelin
  0 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2022-02-06 21:34 UTC (permalink / raw)
  To: Taylor Blau; +Cc: René Scharfe, Johannes Schindelin via GitGitGadget, git

[-- Attachment #1: Type: text/plain, Size: 2461 bytes --]

Hi René & Taylor,

On Wed, 26 Jan 2022, Taylor Blau wrote:

> On Wed, Jan 26, 2022 at 10:34:04AM +0100, René Scharfe wrote:
> > Am 26.01.22 um 09:41 schrieb Johannes Schindelin via GitGitGadget:
> > > Note: originally, Scalar was implemented in C# using the .NET API, where
> > > we had the luxury of a comprehensive standard library that includes
> > > basic functionality such as writing a `.zip` file. In the C version, we
> > > lack such a commodity. Rather than introducing a dependency on, say,
> > > libzip, we slightly abuse Git's `archive` command: Instead of writing
> > > the `.zip` file directly, we stage the file contents in a Git index of a
> > > temporary, bare repository, only to let `git archive` have at it, and
> > > finally removing the temporary repository.
> >
> > git archive allows you to include untracked files in an archive with its
> > option --add-file.  You can see an example in Git's Makefile; search for
> > GIT_ARCHIVE_EXTRA_FILES.  It still requires a tree argument, but the
> > empty tree object should suffice if you don't want to include any
> > tracked files.  It doesn't currently support streaming, though, i.e.
> > files are fully read into memory, so it's impractical for huge ones.

That's a good point.

I did not want to invent any `fast-import`-like streaming protocol just
for the sake of supporting the "funny" use case of `scalar diagnose`, so I
invented a new option `--add-file-with-content=<path>:<content>` (with the
obvious limitation that the `<path>` cannot contain any colon, if that is
desired, users will still need to write out untracked files).

> Using `--add-file` would likely be preferable to setting up a temporary
> repository just to invoke `git archive` in it. Johannes would be the
> expert to ask whether or not big files are going to be a problem here
> (based on a cursory scan of the new functions in scalar.c, I don't
> expect this to be the case).

Indeed, it is unlikely that any large files are included.

> The new stage_directory() function _could_ add `--add-file` arguments in
> a loop around readdir(), but it might also be nice to add a new
> `--add-directory` function to `git archive` which would do the "heavy"
> lifting for us.

I went one step further and used `write_archive()` to do the
heavy-lifting. That way, we truly avoid spawning any separate process let
alone creating any throw-away repository.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 3/5] scalar: teach `diagnose` to gather packfile info
  2022-01-27 15:14     ` Derrick Stolee
@ 2022-02-06 21:38       ` Johannes Schindelin
  0 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2022-02-06 21:38 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Taylor Blau, Matthew John Cheetham via GitGitGadget, git,
	Matthew John Cheetham

Hi Stolee & Taylor,

On Thu, 27 Jan 2022, Derrick Stolee wrote:

> On 1/26/2022 5:43 PM, Taylor Blau wrote:
> > On Wed, Jan 26, 2022 at 08:41:45AM +0000, Matthew John Cheetham via GitGitGadget wrote:
> >> From: Matthew John Cheetham <mjcheetham@outlook.com>
> >>
> >> Teach the `scalar diagnose` command to gather file size information
> >> about pack files.
> >>
> >> Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
> >> ---
> >>  contrib/scalar/scalar.c          | 39 ++++++++++++++++++++++++++++++++
> >>  contrib/scalar/t/t9099-scalar.sh |  2 ++
> >>  2 files changed, 41 insertions(+)
> >>
> >> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> >> index e26fb2fc018..690933ffdf3 100644
> >> --- a/contrib/scalar/scalar.c
> >> +++ b/contrib/scalar/scalar.c
> >> @@ -653,6 +653,39 @@ cleanup:
> >>  	return res;
> >>  }
> >>
> >> +static void dir_file_stats(struct strbuf *buf, const char *path)
> >> +{
> >> +	DIR *dir = opendir(path);
> >> +	struct dirent *e;
> >> +	struct stat e_stat;
> >> +	struct strbuf file_path = STRBUF_INIT;
> >> +	size_t base_path_len;
> >> +
> >> +	if (!dir)
> >> +		return;
> >> +
> >> +	strbuf_addstr(buf, "Contents of ");
> >> +	strbuf_add_absolute_path(buf, path);
> >> +	strbuf_addstr(buf, ":\n");
> >> +
> >> +	strbuf_add_absolute_path(&file_path, path);
> >> +	strbuf_addch(&file_path, '/');
> >> +	base_path_len = file_path.len;
> >> +
> >> +	while ((e = readdir(dir)) != NULL)
> >
> > Hmm. Is there a reason that this couldn't use
> > for_each_file_in_pack_dir() with a callback that just does the stat()
> > and buffer manipulation?
> >
> > I don't think it's critical either way, but it would eliminate some of
> > the boilerplate that is shared between this implementation and the one
> > that already exists in for_each_file_in_pack_dir().
>
> It's helpful to see if there are other crud files in the pack
> directory. This method is also extended in microsoft/git to
> scan the alternates directory (which we expect to exist as the
> "shared objects cache).
>
> We might want to modify the implementation in this series to
> run dir_file_stats() on each odb in the_repository. This would
> give us the data for the shared object cache for free while
> being more general to other Git repos. (It would require us to
> do some reaction work in microsoft/git and be a change of
> behavior, but we are the only ones who have looked at these
> diagnose files before, so that change will be easy to manage.)

Good points all around. I went with the `for_each_file_in_pack_dir()`
approach, and threw in the now very simple change to also enumerate the
alternates, if there are any.

And yes, that will require some reaction work in microsoft/git, but for an
obvious improvement like this one, I don't grumble about the extra burden.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v2 0/6] scalar: implement the subcommand "diagnose"
  2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                   ` (5 preceding siblings ...)
  2022-01-27 15:19 ` [PATCH 0/5] scalar: implement the subcommand "diagnose" Derrick Stolee
@ 2022-02-06 22:39 ` Johannes Schindelin via GitGitGadget
  2022-02-06 22:39   ` [PATCH v2 1/6] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
                     ` (6 more replies)
  6 siblings, 7 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-02-06 22:39 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin

Over the course of the years, we developed a sub-command that gathers
diagnostic data into a .zip file that can then be attached to bug reports.
This sub-command turned out to be very useful in helping Scalar developers
identify and fix issues.

Changes since v1:

 * Instead of creating a throw-away repository, staging the contents of the
   .zip file and then using git write-tree and git archive to write the .zip
   file, the patch series now introduces a new option to git archive and
   uses write_archive() directly (avoiding any separate process).
 * Since the command avoids separate processes, it is now blazing fast on
   Windows, and I dropped the spinner() function because it's no longer
   needed.
 * While reworking the test case, I noticed that scalar [...] <enlistment>
   failed to verify that the specified directory exists, and would happily
   "traverse to its parent directory" on its quest to find a Scalar
   enlistment. That is of course incorrect, and has been fixed as a "while
   at it" sort of preparatory commit.
 * I had forgotten to sign off on all the commits, which has been fixed.
 * Instead of some "home-grown" readdir()-based function, the code now uses
   for_each_file_in_pack_dir() to look through the pack directories.
 * If any alternates are configured, their pack directories are now included
   in the output.
 * The commit message that might be interpreted to promise information about
   large loose files has been corrected to no longer promise that.
 * The test cases have been adjusted to test a little bit more (e.g.
   verifying that specific paths are mentioned in the output, instead of
   merely verifying that the output is non-empty).

Johannes Schindelin (4):
  archive: optionally add "virtual" files
  scalar: validate the optional enlistment argument
  Implement `scalar diagnose`
  scalar diagnose: include disk space information

Matthew John Cheetham (2):
  scalar: teach `diagnose` to gather packfile info
  scalar: teach `diagnose` to gather loose objects information

 Documentation/git-archive.txt    |  11 ++
 archive.c                        |  51 +++++-
 contrib/scalar/scalar.c          | 291 ++++++++++++++++++++++++++++++-
 contrib/scalar/scalar.txt        |  12 ++
 contrib/scalar/t/t9099-scalar.sh |  27 +++
 t/t5003-archive-zip.sh           |  12 ++
 6 files changed, 394 insertions(+), 10 deletions(-)


base-commit: ddc35d833dd6f9e8946b09cecd3311b8aa18d295
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1128%2Fdscho%2Fscalar-diagnose-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1128/dscho/scalar-diagnose-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1128

Range-diff vs v1:

 -:  ----------- > 1:  49ff3c1f2b3 archive: optionally add "virtual" files
 -:  ----------- > 2:  600da8d465e scalar: validate the optional enlistment argument
 1:  ce85506e7a4 ! 3:  0d570137bb6 Implement `scalar diagnose`
     @@ Commit message
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## contrib/scalar/scalar.c ##
     +@@
     + #include "dir.h"
     + #include "packfile.h"
     + #include "help.h"
     ++#include "archive.h"
     + 
     + /*
     +  * Remove the deepest subdirectory in the provided path string. Path must not
      @@ contrib/scalar/scalar.c: static int unregister_dir(void)
       	return res;
       }
       
     -+static int stage(const char *git_dir, struct strbuf *buf, const char *path)
     -+{
     -+	struct strbuf cacheinfo = STRBUF_INIT;
     -+	struct child_process cp = CHILD_PROCESS_INIT;
     -+	int res;
     -+
     -+	strbuf_addstr(&cacheinfo, "100644,");
     -+
     -+	cp.git_cmd = 1;
     -+	strvec_pushl(&cp.args, "--git-dir", git_dir,
     -+		     "hash-object", "-w", "--stdin", NULL);
     -+	res = pipe_command(&cp, buf->buf, buf->len, &cacheinfo, 256, NULL, 0);
     -+	if (!res) {
     -+		strbuf_rtrim(&cacheinfo);
     -+		strbuf_addch(&cacheinfo, ',');
     -+		/* We cannot stage `.git`, use `_git` instead. */
     -+		if (starts_with(path, ".git/"))
     -+			strbuf_addf(&cacheinfo, "_%s", path + 1);
     -+		else
     -+			strbuf_addstr(&cacheinfo, path);
     -+
     -+		child_process_init(&cp);
     -+		cp.git_cmd = 1;
     -+		strvec_pushl(&cp.args, "--git-dir", git_dir,
     -+			     "update-index", "--add", "--cacheinfo",
     -+			     cacheinfo.buf, NULL);
     -+		res = run_command(&cp);
     -+	}
     -+
     -+	strbuf_release(&cacheinfo);
     -+	return res;
     -+}
     -+
     -+static int stage_file(const char *git_dir, const char *path)
     -+{
     -+	struct strbuf buf = STRBUF_INIT;
     -+	int res;
     -+
     -+	if (strbuf_read_file(&buf, path, 0) < 0)
     -+		return error(_("could not read '%s'"), path);
     -+
     -+	res = stage(git_dir, &buf, path);
     -+
     -+	strbuf_release(&buf);
     -+	return res;
     -+}
     -+
     -+static int stage_directory(const char *git_dir, const char *path, int recurse)
     ++static int add_directory_to_archiver(struct strvec *archiver_args,
     ++					  const char *path, int recurse)
      +{
      +	int at_root = !*path;
      +	DIR *dir = opendir(at_root ? "." : path);
     @@ contrib/scalar/scalar.c: static int unregister_dir(void)
      +	if (!at_root)
      +		strbuf_addf(&buf, "%s/", path);
      +	len = buf.len;
     ++	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
      +
      +	while (!res && (e = readdir(dir))) {
      +		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
     @@ contrib/scalar/scalar.c: static int unregister_dir(void)
      +		strbuf_setlen(&buf, len);
      +		strbuf_addstr(&buf, e->d_name);
      +
     -+		if ((e->d_type == DT_REG && stage_file(git_dir, buf.buf)) ||
     -+		    (e->d_type == DT_DIR && recurse &&
     -+		     stage_directory(git_dir, buf.buf, recurse)))
     ++		if (e->d_type == DT_REG)
     ++			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
     ++		else if (e->d_type != DT_DIR)
      +			res = -1;
     ++		else if (recurse)
     ++		     add_directory_to_archiver(archiver_args, buf.buf, recurse);
      +	}
      +
      +	closedir(dir);
      +	strbuf_release(&buf);
      +	return res;
      +}
     -+
     -+static int index_to_zip(const char *git_dir)
     -+{
     -+	struct child_process cp = CHILD_PROCESS_INIT;
     -+	struct strbuf oid = STRBUF_INIT;
     -+
     -+	cp.git_cmd = 1;
     -+	strvec_pushl(&cp.args, "--git-dir", git_dir, "write-tree", NULL);
     -+	if (pipe_command(&cp, NULL, 0, &oid, the_hash_algo->hexsz + 1,
     -+			 NULL, 0))
     -+		return error(_("could not write temporary tree object"));
     -+
     -+	strbuf_rtrim(&oid);
     -+	child_process_init(&cp);
     -+	cp.git_cmd = 1;
     -+	strvec_pushl(&cp.args, "--git-dir", git_dir, "archive", "-o", NULL);
     -+	strvec_pushf(&cp.args, "%s.zip", git_dir);
     -+	strvec_pushl(&cp.args, oid.buf, "--", NULL);
     -+	strbuf_release(&oid);
     -+	return run_command(&cp);
     -+}
      +
       /* printf-style interface, expects `<key>=<value>` argument */
       static int set_config(const char *fmt, ...)
     @@ contrib/scalar/scalar.c: cleanup:
      +		N_("scalar diagnose [<enlistment>]"),
      +		NULL
      +	};
     -+	struct strbuf tmp_dir = STRBUF_INIT;
     ++	struct strbuf zip_path = STRBUF_INIT;
     ++	struct strvec archiver_args = STRVEC_INIT;
     ++	char **argv_copy = NULL;
     ++	int stdout_fd = -1, archiver_fd = -1;
      +	time_t now = time(NULL);
      +	struct tm tm;
      +	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
     ++	size_t off;
      +	int res = 0;
      +
      +	argc = parse_options(argc, argv, NULL, options,
      +			     usage, 0);
      +
     -+	setup_enlistment_directory(argc, argv, usage, options, &buf);
     ++	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
     ++
     ++	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
     ++	strbuf_addftime(&zip_path,
     ++			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
     ++	strbuf_addstr(&zip_path, ".zip");
     ++	switch (safe_create_leading_directories(zip_path.buf)) {
     ++	case SCLD_EXISTS:
     ++	case SCLD_OK:
     ++		break;
     ++	default:
     ++		error_errno(_("could not create directory for '%s'"),
     ++			    zip_path.buf);
     ++		goto diagnose_cleanup;
     ++	}
     ++	stdout_fd = dup(1);
     ++	if (stdout_fd < 0) {
     ++		res = error_errno(_("could not duplicate stdout"));
     ++		goto diagnose_cleanup;
     ++	}
      +
     -+	strbuf_addstr(&buf, "/.scalarDiagnostics/scalar_");
     -+	strbuf_addftime(&buf, "%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
     -+	if (run_git("init", "-q", "-b", "dummy", "--bare", buf.buf, NULL)) {
     -+		res = error(_("could not initialize temporary repository: %s"),
     -+			    buf.buf);
     ++	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
     ++	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
     ++		res = error_errno(_("could not redirect output"));
      +		goto diagnose_cleanup;
      +	}
     -+	strbuf_realpath(&tmp_dir, buf.buf, 1);
      +
     -+	strbuf_reset(&buf);
     -+	strbuf_addf(&buf, "Collecting diagnostic info into temp folder %s\n\n",
     -+		    tmp_dir.buf);
     ++	init_zip_archiver();
     ++	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
      +
     ++	strbuf_reset(&buf);
     ++	strbuf_addstr(&buf,
     ++		      "--add-file-with-content=diagnostics.log:"
     ++		      "Collecting diagnostic info\n\n");
      +	get_version_info(&buf, 1);
      +
      +	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
     -+	fwrite(buf.buf, buf.len, 1, stdout);
     -+
     -+	if ((res = stage(tmp_dir.buf, &buf, "diagnostics.log")))
     -+		goto diagnose_cleanup;
     -+
     -+	if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
     -+	    (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
     -+	    (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
     -+	    (res = stage_directory(tmp_dir.buf, ".git/logs", 1)) ||
     -+	    (res = stage_directory(tmp_dir.buf, ".git/objects/info", 0)))
     ++	off = strchr(buf.buf, ':') + 1 - buf.buf;
     ++	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
     ++	strvec_push(&archiver_args, buf.buf);
     ++
     ++	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
     ++	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
     ++	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
     ++	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
     ++	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
      +		goto diagnose_cleanup;
      +
     -+	res = index_to_zip(tmp_dir.buf);
     ++	strvec_pushl(&archiver_args, "--prefix=",
     ++		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
      +
     -+	if (!res)
     -+		res = remove_dir_recursively(&tmp_dir, 0);
     ++	/* `write_archive()` modifies the `argv` passed to it. Let it. */
     ++	argv_copy = xmemdupz(archiver_args.v,
     ++			     sizeof(char *) * archiver_args.nr);
     ++	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
     ++			    the_repository, NULL, 0);
     ++	if (res) {
     ++		error(_("failed to write archive"));
     ++		goto diagnose_cleanup;
     ++	}
      +
      +	if (!res)
      +		printf("\n"
      +		       "Diagnostics complete.\n"
     -+		       "All of the gathered info is captured in '%s.zip'\n",
     -+		       tmp_dir.buf);
     ++		       "All of the gathered info is captured in '%s'\n",
     ++		       zip_path.buf);
      +
      +diagnose_cleanup:
     -+	strbuf_release(&tmp_dir);
     ++	if (archiver_fd >= 0) {
     ++		close(1);
     ++		dup2(stdout_fd, 1);
     ++	}
     ++	free(argv_copy);
     ++	strvec_clear(&archiver_args);
     ++	strbuf_release(&zip_path);
      +	strbuf_release(&path);
      +	strbuf_release(&buf);
      +
     @@ contrib/scalar/scalar.txt: reconfigure the enlistment.
       
      
       ## contrib/scalar/t/t9099-scalar.sh ##
     -@@ contrib/scalar/t/t9099-scalar.sh: test_expect_success 'scalar clone' '
     - 	)
     +@@ contrib/scalar/t/t9099-scalar.sh: test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
     + 	grep "cloned. does not exist" err
       '
       
      +SQ="'"
      +test_expect_success UNZIP 'scalar diagnose' '
     ++	scalar clone "file://$(pwd)" cloned --single-branch &&
      +	scalar diagnose cloned >out &&
      +	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
      +	zip_path=$(cat zip_path) &&
     @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success 'scalar clone' '
      +	test_file_not_empty out
      +'
      +
     - test_expect_success 'scalar reconfigure' '
     - 	git init one/src &&
     - 	scalar register one &&
     + test_done
 2:  f8885b27502 ! 4:  938e38b5a09 scalar diagnose: include disk space information
     @@ Commit message
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## contrib/scalar/scalar.c ##
     -@@ contrib/scalar/scalar.c: static int index_to_zip(const char *git_dir)
     - 	return run_command(&cp);
     +@@ contrib/scalar/scalar.c: static int add_directory_to_archiver(struct strvec *archiver_args,
     + 	return res;
       }
       
      +#ifndef WIN32
     @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
       
       	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
      +	get_disk_info(&buf);
     - 	fwrite(buf.buf, buf.len, 1, stdout);
     - 
     - 	if ((res = stage(tmp_dir.buf, &buf, "diagnostics.log")))
     + 	off = strchr(buf.buf, ':') + 1 - buf.buf;
     + 	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
     + 	strvec_push(&archiver_args, buf.buf);
     +
     + ## contrib/scalar/t/t9099-scalar.sh ##
     +@@ contrib/scalar/t/t9099-scalar.sh: SQ="'"
     + test_expect_success UNZIP 'scalar diagnose' '
     + 	scalar clone "file://$(pwd)" cloned --single-branch &&
     + 	scalar diagnose cloned >out &&
     ++	grep "Available space" out &&
     + 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
     + 	zip_path=$(cat zip_path) &&
     + 	test -n "$zip_path" &&
 3:  330b36de799 ! 5:  bd9428919fa scalar: teach `diagnose` to gather packfile info
     @@ Metadata
       ## Commit message ##
          scalar: teach `diagnose` to gather packfile info
      
     -    Teach the `scalar diagnose` command to gather file size information
     -    about pack files.
     +    It's helpful to see if there are other crud files in the pack
     +    directory. Let's teach the `scalar diagnose` command to gather
     +    file size information about pack files.
     +
     +    While at it, also enumerate the pack files in the alternate
     +    object directories, if any are registered.
      
          Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
     +    Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## contrib/scalar/scalar.c ##
     +@@
     + #include "packfile.h"
     + #include "help.h"
     + #include "archive.h"
     ++#include "object-store.h"
     + 
     + /*
     +  * Remove the deepest subdirectory in the provided path string. Path must not
      @@ contrib/scalar/scalar.c: cleanup:
       	return res;
       }
       
     -+static void dir_file_stats(struct strbuf *buf, const char *path)
     ++static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
     ++				   const char *file_name, void *data)
      +{
     -+	DIR *dir = opendir(path);
     -+	struct dirent *e;
     -+	struct stat e_stat;
     -+	struct strbuf file_path = STRBUF_INIT;
     -+	size_t base_path_len;
     ++	struct strbuf *buf = data;
     ++	struct stat st;
      +
     -+	if (!dir)
     -+		return;
     ++	if (!stat(full_path, &st))
     ++		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
     ++			    (uintmax_t)st.st_size);
     ++}
      +
     -+	strbuf_addstr(buf, "Contents of ");
     -+	strbuf_add_absolute_path(buf, path);
     -+	strbuf_addstr(buf, ":\n");
     ++static int dir_file_stats(struct object_directory *object_dir, void *data)
     ++{
     ++	struct strbuf *buf = data;
      +
     -+	strbuf_add_absolute_path(&file_path, path);
     -+	strbuf_addch(&file_path, '/');
     -+	base_path_len = file_path.len;
     ++	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
      +
     -+	while ((e = readdir(dir)) != NULL)
     -+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG) {
     -+			strbuf_setlen(&file_path, base_path_len);
     -+			strbuf_addstr(&file_path, e->d_name);
     -+			if (!stat(file_path.buf, &e_stat))
     -+				strbuf_addf(buf, "%-70s %16"PRIuMAX"\n",
     -+					    e->d_name,
     -+					    (uintmax_t)e_stat.st_size);
     -+		}
     ++	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
     ++				  data);
      +
     -+	strbuf_release(&file_path);
     -+	closedir(dir);
     ++	return 0;
      +}
      +
       static int cmd_diagnose(int argc, const char **argv)
       {
       	struct option options[] = {
      @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
     - 	if ((res = stage(tmp_dir.buf, &buf, "diagnostics.log")))
     - 		goto diagnose_cleanup;
     + 	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
     + 	strvec_push(&archiver_args, buf.buf);
       
      +	strbuf_reset(&buf);
     -+	dir_file_stats(&buf, ".git/objects/pack");
     -+
     -+	if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt")))
     -+		goto diagnose_cleanup;
     ++	strbuf_addstr(&buf, "--add-file-with-content=packs-local.txt:");
     ++	dir_file_stats(the_repository->objects->odb, &buf);
     ++	foreach_alt_odb(dir_file_stats, &buf);
     ++	strvec_push(&archiver_args, buf.buf);
      +
     - 	if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
     - 	    (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
     - 	    (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
     + 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
     + 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
     + 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
      
       ## contrib/scalar/t/t9099-scalar.sh ##
     +@@ contrib/scalar/t/t9099-scalar.sh: test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
     + SQ="'"
     + test_expect_success UNZIP 'scalar diagnose' '
     + 	scalar clone "file://$(pwd)" cloned --single-branch &&
     ++	git repack &&
     ++	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
     + 	scalar diagnose cloned >out &&
     + 	grep "Available space" out &&
     + 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
      @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success UNZIP 'scalar diagnose' '
       	folder=${zip_path%.zip} &&
       	test_path_is_missing "$folder" &&
       	unzip -p "$zip_path" diagnostics.log >out &&
     +-	test_file_not_empty out
      +	test_file_not_empty out &&
      +	unzip -p "$zip_path" packs-local.txt >out &&
     - 	test_file_not_empty out
     ++	grep "$(pwd)/.git/objects" out
       '
       
     + test_done
 4:  213f2c94b73 ! 6:  7a8875be425 scalar: teach `diagnose` to gather loose objects information
     @@ Commit message
      
          When operating at the scale that Scalar wants to support, certain data
          shapes are more likely to cause undesirable performance issues, such as
     -    large numbers or large sizes of loose objects.
     +    large numbers of loose objects.
      
          By including statistics about this, `scalar diagnose` now makes it
          easier to identify such scenarios.
      
          Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
     +    Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## contrib/scalar/scalar.c ##
     -@@ contrib/scalar/scalar.c: static void dir_file_stats(struct strbuf *buf, const char *path)
     - 	closedir(dir);
     +@@ contrib/scalar/scalar.c: static int dir_file_stats(struct object_directory *object_dir, void *data)
     + 	return 0;
       }
       
      +static int count_files(char *path)
     @@ contrib/scalar/scalar.c: static void dir_file_stats(struct strbuf *buf, const ch
       {
       	struct option options[] = {
      @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
     - 	if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt")))
     - 		goto diagnose_cleanup;
     + 	foreach_alt_odb(dir_file_stats, &buf);
     + 	strvec_push(&archiver_args, buf.buf);
       
      +	strbuf_reset(&buf);
     ++	strbuf_addstr(&buf, "--add-file-with-content=objects-local.txt:");
      +	loose_objs_stats(&buf, ".git/objects");
     ++	strvec_push(&archiver_args, buf.buf);
      +
     -+	if ((res = stage(tmp_dir.buf, &buf, "objects-local.txt")))
     -+		goto diagnose_cleanup;
     -+
     - 	if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
     - 	    (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
     - 	    (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
     + 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
     + 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
     + 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
      
       ## contrib/scalar/t/t9099-scalar.sh ##
     +@@ contrib/scalar/t/t9099-scalar.sh: test_expect_success UNZIP 'scalar diagnose' '
     + 	scalar clone "file://$(pwd)" cloned --single-branch &&
     + 	git repack &&
     + 	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
     ++	test_commit -C cloned/src loose &&
     + 	scalar diagnose cloned >out &&
     + 	grep "Available space" out &&
     + 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
      @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success UNZIP 'scalar diagnose' '
       	unzip -p "$zip_path" diagnostics.log >out &&
       	test_file_not_empty out &&
       	unzip -p "$zip_path" packs-local.txt >out &&
     -+	test_file_not_empty out &&
     +-	grep "$(pwd)/.git/objects" out
     ++	grep "$(pwd)/.git/objects" out &&
      +	unzip -p "$zip_path" objects-local.txt >out &&
     - 	test_file_not_empty out
     ++	grep "^Total: [1-9]" out
       '
       
     + test_done
 5:  3a2cdce554a < -:  ----------- scalar diagnose: show a spinner while staging content

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
@ 2022-02-06 22:39   ` Johannes Schindelin via GitGitGadget
  2022-02-07 19:55     ` René Scharfe
  2022-02-06 22:39   ` [PATCH v2 2/6] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-02-06 22:39 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

With the `--add-file-with-content=<path>:<content>` option, `git
archive` now supports use cases where relatively trivial files need to
be added that do not exist on disk.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 11 ++++++++
 archive.c                     | 51 +++++++++++++++++++++++++++++------
 t/t5003-archive-zip.sh        | 12 +++++++++
 3 files changed, 66 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index bc4e76a7834..1b52a0a65a1 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -61,6 +61,17 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 
+--add-file-with-content=<path>:<content>::
+	Add the specified contents to the archive.  Can be repeated to add
+	multiple files.  The path of the file in the archive is built
+	by concatenating the value for `--prefix` (if any) and the
+	basename of <file>.
++
+The `<path>` cannot contain any colon, the file mode is limited to
+a regular file, and the option may be subject platform-dependent
+command-line limits. For non-trivial cases, write an untracked file
+and use `--add-file` instead.
+
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
 	as well (see <<ATTRIBUTES>>).
diff --git a/archive.c b/archive.c
index a3bbb091256..172efd690c3 100644
--- a/archive.c
+++ b/archive.c
@@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
 struct extra_file_info {
 	char *base;
 	struct stat stat;
+	void *content;
 };
 
 int write_archive_entries(struct archiver_args *args,
@@ -337,7 +338,13 @@ int write_archive_entries(struct archiver_args *args,
 		strbuf_addstr(&path_in_archive, basename(path));
 
 		strbuf_reset(&content);
-		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
+		if (info->content)
+			err = write_entry(args, &fake_oid, path_in_archive.buf,
+					  path_in_archive.len,
+					  info->stat.st_mode,
+					  info->content, info->stat.st_size);
+		else if (strbuf_read_file(&content, path,
+					  info->stat.st_size) < 0)
 			err = error_errno(_("could not read '%s'"), path);
 		else
 			err = write_entry(args, &fake_oid, path_in_archive.buf,
@@ -493,6 +500,7 @@ static void extra_file_info_clear(void *util, const char *str)
 {
 	struct extra_file_info *info = util;
 	free(info->base);
+	free(info->content);
 	free(info);
 }
 
@@ -514,14 +522,38 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 	if (!arg)
 		return -1;
 
-	path = prefix_filename(args->prefix, arg);
-	item = string_list_append_nodup(&args->extra_files, path);
-	item->util = info = xmalloc(sizeof(*info));
+	info = xmalloc(sizeof(*info));
 	info->base = xstrdup_or_null(base);
-	if (stat(path, &info->stat))
-		die(_("File not found: %s"), path);
-	if (!S_ISREG(info->stat.st_mode))
-		die(_("Not a regular file: %s"), path);
+
+	if (strcmp(opt->long_name, "add-file-with-content")) {
+		path = prefix_filename(args->prefix, arg);
+		if (stat(path, &info->stat))
+			die(_("File not found: %s"), path);
+		if (!S_ISREG(info->stat.st_mode))
+			die(_("Not a regular file: %s"), path);
+		info->content = NULL; /* read the file later */
+	} else {
+		const char *colon = strchr(arg, ':');
+		char *p;
+
+		if (!colon)
+			die(_("missing colon: '%s'"), arg);
+
+		p = xstrndup(arg, colon - arg);
+		if (!args->prefix)
+			path = p;
+		else {
+			path = prefix_filename(args->prefix, p);
+			free(p);
+		}
+		memset(&info->stat, 0, sizeof(info->stat));
+		info->stat.st_mode = S_IFREG | 0644;
+		info->content = xstrdup(colon + 1);
+		info->stat.st_size = strlen(info->content);
+	}
+	item = string_list_append_nodup(&args->extra_files, path);
+	item->util = info;
+
 	return 0;
 }
 
@@ -554,6 +586,9 @@ static int parse_archive_args(int argc, const char **argv,
 		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
 		  N_("add untracked file to archive"), 0, add_file_cb,
 		  (intptr_t)&base },
+		{ OPTION_CALLBACK, 0, "add-file-with-content", args,
+		  N_("file"), N_("add untracked file to archive"), 0,
+		  add_file_cb, (intptr_t)&base },
 		OPT_STRING('o', "output", &output, N_("file"),
 			N_("write the archive to this file")),
 		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 1e6d18b140e..8ff1257f1a0 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -206,6 +206,18 @@ test_expect_success 'git archive --format=zip --add-file' '
 check_zip with_untracked
 check_added with_untracked untracked untracked
 
+test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
+	git archive --format=zip >with_file_with_content.zip \
+		--add-file-with-content=hello:world $EMPTY_TREE &&
+	test_when_finished "rm -rf tmp-unpack" &&
+	mkdir tmp-unpack && (
+		cd tmp-unpack &&
+		"$GIT_UNZIP" ../with_file_with_content.zip &&
+		test_path_is_file hello &&
+		test world = $(cat hello)
+	)
+'
+
 test_expect_success 'git archive --format=zip --add-file twice' '
 	echo untracked >untracked &&
 	git archive --format=zip --prefix=one/ --add-file=untracked \
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v2 2/6] scalar: validate the optional enlistment argument
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
  2022-02-06 22:39   ` [PATCH v2 1/6] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
@ 2022-02-06 22:39   ` Johannes Schindelin via GitGitGadget
  2022-02-06 22:39   ` [PATCH v2 3/6] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-02-06 22:39 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `scalar` command needs a Scalar enlistment for many subcommands, and
looks in the current directory for such an enlistment (traversing the
parent directories until it finds one).

These is subcommands can also be called with an optional argument
specifying the enlistment. Here, too, we traverse parent directories as
needed, until we find an enlistment.

However, if the specified directory does not even exist, or is not a
directory, we should stop right there, with an error message.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 6 ++++--
 contrib/scalar/t/t9099-scalar.sh | 5 +++++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 1ce9c2b00e8..00dcd4b50ef 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -43,9 +43,11 @@ static void setup_enlistment_directory(int argc, const char **argv,
 		usage_with_options(usagestr, options);
 
 	/* find the worktree, determine its corresponding root */
-	if (argc == 1)
+	if (argc == 1) {
 		strbuf_add_absolute_path(&path, argv[0]);
-	else if (strbuf_getcwd(&path) < 0)
+		if (!is_directory(path.buf))
+			die(_("'%s' does not exist"), path.buf);
+	} else if (strbuf_getcwd(&path) < 0)
 		die(_("need a working directory"));
 
 	strbuf_trim_trailing_dir_sep(&path);
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 2e1502ad45e..9d83fdf25e8 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -85,4 +85,9 @@ test_expect_success 'scalar delete with enlistment' '
 	test_path_is_missing cloned
 '
 
+test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
+	! scalar run config cloned 2>err &&
+	grep "cloned. does not exist" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v2 3/6] Implement `scalar diagnose`
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
  2022-02-06 22:39   ` [PATCH v2 1/6] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
  2022-02-06 22:39   ` [PATCH v2 2/6] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
@ 2022-02-06 22:39   ` Johannes Schindelin via GitGitGadget
  2022-02-07 19:55     ` René Scharfe
  2022-02-06 22:39   ` [PATCH v2 4/6] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-02-06 22:39 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Over the course of Scalar's development, it became obvious that there is
a need for a command that can gather all kinds of useful information
that can help identify the most typical problems with large
worktrees/repositories.

The `diagnose` command is the culmination of this hard-won knowledge: it
gathers the installed hooks, the config, a couple statistics describing
the data shape, among other pieces of information, and then wraps
everything up in a tidy, neat `.zip` archive.

Note: originally, Scalar was implemented in C# using the .NET API, where
we had the luxury of a comprehensive standard library that includes
basic functionality such as writing a `.zip` file. In the C version, we
lack such a commodity. Rather than introducing a dependency on, say,
libzip, we slightly abuse Git's `archive` command: Instead of writing
the `.zip` file directly, we stage the file contents in a Git index of a
temporary, bare repository, only to let `git archive` have at it, and
finally removing the temporary repository.

Also note: Due to the frequently-spawned `git hash-object` processes,
this command is quite a bit slow on Windows. Should it turn out to be a
big problem, the lack of a batch mode of the `hash-object` command could
potentially be worked around via using `git fast-import` with a crafted
`stdin`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 143 +++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt        |  12 +++
 contrib/scalar/t/t9099-scalar.sh |  14 +++
 3 files changed, 169 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 00dcd4b50ef..30ce0799c7a 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,6 +11,7 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
+#include "archive.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -261,6 +262,44 @@ static int unregister_dir(void)
 	return res;
 }
 
+static int add_directory_to_archiver(struct strvec *archiver_args,
+					  const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir = opendir(at_root ? "." : path);
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	if (!dir)
+		return error(_("could not open directory '%s'"), path);
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if (e->d_type == DT_REG)
+			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
+		else if (e->d_type != DT_DIR)
+			res = -1;
+		else if (recurse)
+		     add_directory_to_archiver(archiver_args, buf.buf, recurse);
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -501,6 +540,109 @@ cleanup:
 	return res;
 }
 
+static int cmd_diagnose(int argc, const char **argv)
+{
+	struct option options[] = {
+		OPT_END(),
+	};
+	const char * const usage[] = {
+		N_("scalar diagnose [<enlistment>]"),
+		NULL
+	};
+	struct strbuf zip_path = STRBUF_INIT;
+	struct strvec archiver_args = STRVEC_INIT;
+	char **argv_copy = NULL;
+	int stdout_fd = -1, archiver_fd = -1;
+	time_t now = time(NULL);
+	struct tm tm;
+	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
+	size_t off;
+	int res = 0;
+
+	argc = parse_options(argc, argv, NULL, options,
+			     usage, 0);
+
+	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
+
+	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
+	strbuf_addftime(&zip_path,
+			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
+	strbuf_addstr(&zip_path, ".zip");
+	switch (safe_create_leading_directories(zip_path.buf)) {
+	case SCLD_EXISTS:
+	case SCLD_OK:
+		break;
+	default:
+		error_errno(_("could not create directory for '%s'"),
+			    zip_path.buf);
+		goto diagnose_cleanup;
+	}
+	stdout_fd = dup(1);
+	if (stdout_fd < 0) {
+		res = error_errno(_("could not duplicate stdout"));
+		goto diagnose_cleanup;
+	}
+
+	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
+		res = error_errno(_("could not redirect output"));
+		goto diagnose_cleanup;
+	}
+
+	init_zip_archiver();
+	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf,
+		      "--add-file-with-content=diagnostics.log:"
+		      "Collecting diagnostic info\n\n");
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	off = strchr(buf.buf, ':') + 1 - buf.buf;
+	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
+	strvec_push(&archiver_args, buf.buf);
+
+	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	strvec_pushl(&archiver_args, "--prefix=",
+		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
+
+	/* `write_archive()` modifies the `argv` passed to it. Let it. */
+	argv_copy = xmemdupz(archiver_args.v,
+			     sizeof(char *) * archiver_args.nr);
+	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
+			    the_repository, NULL, 0);
+	if (res) {
+		error(_("failed to write archive"));
+		goto diagnose_cleanup;
+	}
+
+	if (!res)
+		printf("\n"
+		       "Diagnostics complete.\n"
+		       "All of the gathered info is captured in '%s'\n",
+		       zip_path.buf);
+
+diagnose_cleanup:
+	if (archiver_fd >= 0) {
+		close(1);
+		dup2(stdout_fd, 1);
+	}
+	free(argv_copy);
+	strvec_clear(&archiver_args);
+	strbuf_release(&zip_path);
+	strbuf_release(&path);
+	strbuf_release(&buf);
+
+	return res;
+}
+
 static int cmd_list(int argc, const char **argv)
 {
 	if (argc != 1)
@@ -802,6 +944,7 @@ static struct {
 	{ "reconfigure", cmd_reconfigure },
 	{ "delete", cmd_delete },
 	{ "version", cmd_version },
+	{ "diagnose", cmd_diagnose },
 	{ NULL, NULL},
 };
 
diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
index f416d637289..22583fe046e 100644
--- a/contrib/scalar/scalar.txt
+++ b/contrib/scalar/scalar.txt
@@ -14,6 +14,7 @@ scalar register [<enlistment>]
 scalar unregister [<enlistment>]
 scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
 scalar reconfigure [ --all | <enlistment> ]
+scalar diagnose [<enlistment>]
 scalar delete <enlistment>
 
 DESCRIPTION
@@ -129,6 +130,17 @@ reconfigure the enlistment.
 With the `--all` option, all enlistments currently registered with Scalar
 will be reconfigured. Use this option after each Scalar upgrade.
 
+Diagnose
+~~~~~~~~
+
+diagnose [<enlistment>]::
+    When reporting issues with Scalar, it is often helpful to provide the
+    information gathered by this command, including logs and certain
+    statistics describing the data shape of the current enlistment.
++
+The output of this command is a `.zip` file that is written into
+a directory adjacent to the worktree in the `src` directory.
+
 Delete
 ~~~~~~
 
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 9d83fdf25e8..bbd07a44426 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -90,4 +90,18 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 	grep "cloned. does not exist" err
 '
 
+SQ="'"
+test_expect_success UNZIP 'scalar diagnose' '
+	scalar clone "file://$(pwd)" cloned --single-branch &&
+	scalar diagnose cloned >out &&
+	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
+	zip_path=$(cat zip_path) &&
+	test -n "$zip_path" &&
+	unzip -v "$zip_path" &&
+	folder=${zip_path%.zip} &&
+	test_path_is_missing "$folder" &&
+	unzip -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v2 4/6] scalar diagnose: include disk space information
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-02-06 22:39   ` [PATCH v2 3/6] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-02-06 22:39   ` Johannes Schindelin via GitGitGadget
  2022-02-06 22:39   ` [PATCH v2 5/6] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-02-06 22:39 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When analyzing problems with large worktrees/repositories, it is useful
to know how close to a "full disk" situation Scalar/Git operates. Let's
include this information.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 53 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  1 +
 2 files changed, 54 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 30ce0799c7a..fd666376109 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -300,6 +300,58 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 	return res;
 }
 
+#ifndef WIN32
+#include <sys/statvfs.h>
+#endif
+
+static int get_disk_info(struct strbuf *out)
+{
+#ifdef WIN32
+	struct strbuf buf = STRBUF_INIT;
+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
+	DWORD serial_number, component_length, flags;
+	ULARGE_INTEGER avail2caller, total, avail;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
+		error(_("could not determine free disk size for '%s'"),
+		      buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
+				   &serial_number, &component_length, &flags,
+				   fs_name, sizeof(fs_name))) {
+		error(_("could not get info for '%s'"), buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
+	strbuf_addch(out, '\n');
+	strbuf_release(&buf);
+#else
+	struct strbuf buf = STRBUF_INIT;
+	struct statvfs stat;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (statvfs(buf.buf, &stat) < 0) {
+		error_errno(_("could not determine free disk size for '%s'"),
+			    buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
+	strbuf_release(&buf);
+#endif
+	return 0;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -599,6 +651,7 @@ static int cmd_diagnose(int argc, const char **argv)
 	get_version_info(&buf, 1);
 
 	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	get_disk_info(&buf);
 	off = strchr(buf.buf, ':') + 1 - buf.buf;
 	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
 	strvec_push(&archiver_args, buf.buf);
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index bbd07a44426..f3d037823c8 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -94,6 +94,7 @@ SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	scalar diagnose cloned >out &&
+	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
 	zip_path=$(cat zip_path) &&
 	test -n "$zip_path" &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v2 5/6] scalar: teach `diagnose` to gather packfile info
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
                     ` (3 preceding siblings ...)
  2022-02-06 22:39   ` [PATCH v2 4/6] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
@ 2022-02-06 22:39   ` Matthew John Cheetham via GitGitGadget
  2022-02-06 22:39   ` [PATCH v2 6/6] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
  6 siblings, 0 replies; 109+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-02-06 22:39 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

It's helpful to see if there are other crud files in the pack
directory. Let's teach the `scalar diagnose` command to gather
file size information about pack files.

While at it, also enumerate the pack files in the alternate
object directories, if any are registered.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 30 ++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  6 +++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index fd666376109..331d48b2a80 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "help.h"
 #include "archive.h"
+#include "object-store.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -592,6 +593,29 @@ cleanup:
 	return res;
 }
 
+static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
+				   const char *file_name, void *data)
+{
+	struct strbuf *buf = data;
+	struct stat st;
+
+	if (!stat(full_path, &st))
+		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
+			    (uintmax_t)st.st_size);
+}
+
+static int dir_file_stats(struct object_directory *object_dir, void *data)
+{
+	struct strbuf *buf = data;
+
+	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
+
+	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
+				  data);
+
+	return 0;
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -656,6 +680,12 @@ static int cmd_diagnose(int argc, const char **argv)
 	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
 	strvec_push(&archiver_args, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-file-with-content=packs-local.txt:");
+	dir_file_stats(the_repository->objects->odb, &buf);
+	foreach_alt_odb(dir_file_stats, &buf);
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index f3d037823c8..e049221609d 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -93,6 +93,8 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
+	git repack &&
+	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
 	scalar diagnose cloned >out &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
@@ -102,7 +104,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	folder=${zip_path%.zip} &&
 	test_path_is_missing "$folder" &&
 	unzip -p "$zip_path" diagnostics.log >out &&
-	test_file_not_empty out
+	test_file_not_empty out &&
+	unzip -p "$zip_path" packs-local.txt >out &&
+	grep "$(pwd)/.git/objects" out
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v2 6/6] scalar: teach `diagnose` to gather loose objects information
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
                     ` (4 preceding siblings ...)
  2022-02-06 22:39   ` [PATCH v2 5/6] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
@ 2022-02-06 22:39   ` Matthew John Cheetham via GitGitGadget
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
  6 siblings, 0 replies; 109+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-02-06 22:39 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

When operating at the scale that Scalar wants to support, certain data
shapes are more likely to cause undesirable performance issues, such as
large numbers of loose objects.

By including statistics about this, `scalar diagnose` now makes it
easier to identify such scenarios.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 59 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  5 ++-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 331d48b2a80..537b97ae734 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -616,6 +616,60 @@ static int dir_file_stats(struct object_directory *object_dir, void *data)
 	return 0;
 }
 
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -686,6 +740,11 @@ static int cmd_diagnose(int argc, const char **argv)
 	foreach_alt_odb(dir_file_stats, &buf);
 	strvec_push(&archiver_args, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-file-with-content=objects-local.txt:");
+	loose_objs_stats(&buf, ".git/objects");
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index e049221609d..9b4eedbb0aa 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -95,6 +95,7 @@ test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	git repack &&
 	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
+	test_commit -C cloned/src loose &&
 	scalar diagnose cloned >out &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
@@ -106,7 +107,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	unzip -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
 	unzip -p "$zip_path" packs-local.txt >out &&
-	grep "$(pwd)/.git/objects" out
+	grep "$(pwd)/.git/objects" out &&
+	unzip -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [1-9]" out
 '
 
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-06 22:39   ` [PATCH v2 1/6] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
@ 2022-02-07 19:55     ` René Scharfe
  2022-02-07 23:30       ` Junio C Hamano
  2022-02-08 12:54       ` Johannes Schindelin
  0 siblings, 2 replies; 109+ messages in thread
From: René Scharfe @ 2022-02-07 19:55 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git
  Cc: Taylor Blau, Derrick Stolee, Elijah Newren, Johannes Schindelin

Am 06.02.22 um 23:39 schrieb Johannes Schindelin via GitGitGadget:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> With the `--add-file-with-content=<path>:<content>` option, `git
> archive` now supports use cases where relatively trivial files need to
> be added that do not exist on disk.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  Documentation/git-archive.txt | 11 ++++++++
>  archive.c                     | 51 +++++++++++++++++++++++++++++------
>  t/t5003-archive-zip.sh        | 12 +++++++++
>  3 files changed, 66 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
> index bc4e76a7834..1b52a0a65a1 100644
> --- a/Documentation/git-archive.txt
> +++ b/Documentation/git-archive.txt
> @@ -61,6 +61,17 @@ OPTIONS
>  	by concatenating the value for `--prefix` (if any) and the
>  	basename of <file>.
>
> +--add-file-with-content=<path>:<content>::
> +	Add the specified contents to the archive.  Can be repeated to add
> +	multiple files.  The path of the file in the archive is built
> +	by concatenating the value for `--prefix` (if any) and the
> +	basename of <file>.
> ++
> +The `<path>` cannot contain any colon, the file mode is limited to
> +a regular file, and the option may be subject platform-dependent

s/subject/& to/

> +command-line limits. For non-trivial cases, write an untracked file
> +and use `--add-file` instead.
> +

We could use that option in Git's own Makefile to add the file named
"version", which contains $GIT_VERSION.  Hmm, but it also contains a
terminating newline, which would be a bit tricky (but not impossible) to
add.  Would it make sense to add one automatically if it's missing (e.g.
with strbuf_complete_line)?  Not sure.

>  --worktree-attributes::
>  	Look for attributes in .gitattributes files in the working tree
>  	as well (see <<ATTRIBUTES>>).
> diff --git a/archive.c b/archive.c
> index a3bbb091256..172efd690c3 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
>  struct extra_file_info {
>  	char *base;
>  	struct stat stat;
> +	void *content;
>  };
>
>  int write_archive_entries(struct archiver_args *args,
> @@ -337,7 +338,13 @@ int write_archive_entries(struct archiver_args *args,
>  		strbuf_addstr(&path_in_archive, basename(path));
>
>  		strbuf_reset(&content);
> -		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
> +		if (info->content)
> +			err = write_entry(args, &fake_oid, path_in_archive.buf,
> +					  path_in_archive.len,
> +					  info->stat.st_mode,
> +					  info->content, info->stat.st_size);
> +		else if (strbuf_read_file(&content, path,
> +					  info->stat.st_size) < 0)
>  			err = error_errno(_("could not read '%s'"), path);
>  		else
>  			err = write_entry(args, &fake_oid, path_in_archive.buf,
> @@ -493,6 +500,7 @@ static void extra_file_info_clear(void *util, const char *str)
>  {
>  	struct extra_file_info *info = util;
>  	free(info->base);
> +	free(info->content);
>  	free(info);
>  }
>
> @@ -514,14 +522,38 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
>  	if (!arg)
>  		return -1;
>
> -	path = prefix_filename(args->prefix, arg);
> -	item = string_list_append_nodup(&args->extra_files, path);
> -	item->util = info = xmalloc(sizeof(*info));
> +	info = xmalloc(sizeof(*info));
>  	info->base = xstrdup_or_null(base);
> -	if (stat(path, &info->stat))
> -		die(_("File not found: %s"), path);
> -	if (!S_ISREG(info->stat.st_mode))
> -		die(_("Not a regular file: %s"), path);
> +
> +	if (strcmp(opt->long_name, "add-file-with-content")) {

Equivalent to:

	if (!strcmp(opt->long_name, "add-file")) {

I mention that because the inequality check confused me a bit at first.

> +		path = prefix_filename(args->prefix, arg);
> +		if (stat(path, &info->stat))
> +			die(_("File not found: %s"), path);
> +		if (!S_ISREG(info->stat.st_mode))
> +			die(_("Not a regular file: %s"), path);
> +		info->content = NULL; /* read the file later */
> +	} else {
> +		const char *colon = strchr(arg, ':');
> +		char *p;
> +
> +		if (!colon)
> +			die(_("missing colon: '%s'"), arg);
> +
> +		p = xstrndup(arg, colon - arg);
> +		if (!args->prefix)
> +			path = p;
> +		else {
> +			path = prefix_filename(args->prefix, p);
> +			free(p);
> +		}
> +		memset(&info->stat, 0, sizeof(info->stat));
> +		info->stat.st_mode = S_IFREG | 0644;
> +		info->content = xstrdup(colon + 1);
> +		info->stat.st_size = strlen(info->content);
> +	}
> +	item = string_list_append_nodup(&args->extra_files, path);
> +	item->util = info;
> +
>  	return 0;
>  }
>
> @@ -554,6 +586,9 @@ static int parse_archive_args(int argc, const char **argv,
>  		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
>  		  N_("add untracked file to archive"), 0, add_file_cb,
>  		  (intptr_t)&base },
> +		{ OPTION_CALLBACK, 0, "add-file-with-content", args,
> +		  N_("file"), N_("add untracked file to archive"), 0,
                      ^^^^
"<file>" seems wrong, because there is no actual file.  It should rather
be "<name>:<content>" for the virtual one, right?

> +		  add_file_cb, (intptr_t)&base },
>  		OPT_STRING('o', "output", &output, N_("file"),
>  			N_("write the archive to this file")),
>  		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> index 1e6d18b140e..8ff1257f1a0 100755
> --- a/t/t5003-archive-zip.sh
> +++ b/t/t5003-archive-zip.sh
> @@ -206,6 +206,18 @@ test_expect_success 'git archive --format=zip --add-file' '
>  check_zip with_untracked
>  check_added with_untracked untracked untracked
>
> +test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
> +	git archive --format=zip >with_file_with_content.zip \
> +		--add-file-with-content=hello:world $EMPTY_TREE &&
> +	test_when_finished "rm -rf tmp-unpack" &&
> +	mkdir tmp-unpack && (
> +		cd tmp-unpack &&
> +		"$GIT_UNZIP" ../with_file_with_content.zip &&
> +		test_path_is_file hello &&
> +		test world = $(cat hello)
> +	)
> +'
> +
>  test_expect_success 'git archive --format=zip --add-file twice' '
>  	echo untracked >untracked &&
>  	git archive --format=zip --prefix=one/ --add-file=untracked \

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v2 3/6] Implement `scalar diagnose`
  2022-02-06 22:39   ` [PATCH v2 3/6] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-02-07 19:55     ` René Scharfe
  2022-02-08 12:08       ` Johannes Schindelin
  0 siblings, 1 reply; 109+ messages in thread
From: René Scharfe @ 2022-02-07 19:55 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git
  Cc: Taylor Blau, Derrick Stolee, Elijah Newren, Johannes Schindelin



Am 06.02.22 um 23:39 schrieb Johannes Schindelin via GitGitGadget:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> Over the course of Scalar's development, it became obvious that there is
> a need for a command that can gather all kinds of useful information
> that can help identify the most typical problems with large
> worktrees/repositories.
>
> The `diagnose` command is the culmination of this hard-won knowledge: it
> gathers the installed hooks, the config, a couple statistics describing
> the data shape, among other pieces of information, and then wraps
> everything up in a tidy, neat `.zip` archive.
>
> Note: originally, Scalar was implemented in C# using the .NET API, where
> we had the luxury of a comprehensive standard library that includes
> basic functionality such as writing a `.zip` file. In the C version, we
> lack such a commodity. Rather than introducing a dependency on, say,
> libzip, we slightly abuse Git's `archive` command: Instead of writing
> the `.zip` file directly, we stage the file contents in a Git index of a
> temporary, bare repository, only to let `git archive` have at it, and
> finally removing the temporary repository.
>
> Also note: Due to the frequently-spawned `git hash-object` processes,
> this command is quite a bit slow on Windows. Should it turn out to be a
> big problem, the lack of a batch mode of the `hash-object` command could
> potentially be worked around via using `git fast-import` with a crafted
> `stdin`.

The two paragraphs above are not in sync with the patch.

>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  contrib/scalar/scalar.c          | 143 +++++++++++++++++++++++++++++++
>  contrib/scalar/scalar.txt        |  12 +++
>  contrib/scalar/t/t9099-scalar.sh |  14 +++
>  3 files changed, 169 insertions(+)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index 00dcd4b50ef..30ce0799c7a 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -11,6 +11,7 @@
>  #include "dir.h"
>  #include "packfile.h"
>  #include "help.h"
> +#include "archive.h"
>
>  /*
>   * Remove the deepest subdirectory in the provided path string. Path must not
> @@ -261,6 +262,44 @@ static int unregister_dir(void)
>  	return res;
>  }
>
> +static int add_directory_to_archiver(struct strvec *archiver_args,
> +					  const char *path, int recurse)
> +{
> +	int at_root = !*path;
> +	DIR *dir = opendir(at_root ? "." : path);
> +	struct dirent *e;
> +	struct strbuf buf = STRBUF_INIT;
> +	size_t len;
> +	int res = 0;
> +
> +	if (!dir)
> +		return error(_("could not open directory '%s'"), path);
> +
> +	if (!at_root)
> +		strbuf_addf(&buf, "%s/", path);
> +	len = buf.len;
> +	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
> +
> +	while (!res && (e = readdir(dir))) {
> +		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
> +			continue;
> +
> +		strbuf_setlen(&buf, len);
> +		strbuf_addstr(&buf, e->d_name);
> +
> +		if (e->d_type == DT_REG)
> +			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
> +		else if (e->d_type != DT_DIR)
> +			res = -1;
> +		else if (recurse)
> +		     add_directory_to_archiver(archiver_args, buf.buf, recurse);
> +	}
> +
> +	closedir(dir);
> +	strbuf_release(&buf);
> +	return res;
> +}
> +
>  /* printf-style interface, expects `<key>=<value>` argument */
>  static int set_config(const char *fmt, ...)
>  {
> @@ -501,6 +540,109 @@ cleanup:
>  	return res;
>  }
>
> +static int cmd_diagnose(int argc, const char **argv)
> +{
> +	struct option options[] = {
> +		OPT_END(),
> +	};
> +	const char * const usage[] = {
> +		N_("scalar diagnose [<enlistment>]"),
> +		NULL
> +	};
> +	struct strbuf zip_path = STRBUF_INIT;
> +	struct strvec archiver_args = STRVEC_INIT;
> +	char **argv_copy = NULL;
> +	int stdout_fd = -1, archiver_fd = -1;
> +	time_t now = time(NULL);
> +	struct tm tm;
> +	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
> +	size_t off;
> +	int res = 0;
> +
> +	argc = parse_options(argc, argv, NULL, options,
> +			     usage, 0);
> +
> +	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
> +
> +	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
> +	strbuf_addftime(&zip_path,
> +			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
> +	strbuf_addstr(&zip_path, ".zip");
> +	switch (safe_create_leading_directories(zip_path.buf)) {
> +	case SCLD_EXISTS:
> +	case SCLD_OK:
> +		break;
> +	default:
> +		error_errno(_("could not create directory for '%s'"),
> +			    zip_path.buf);
> +		goto diagnose_cleanup;
> +	}
> +	stdout_fd = dup(1);
> +	if (stdout_fd < 0) {
> +		res = error_errno(_("could not duplicate stdout"));
> +		goto diagnose_cleanup;
> +	}
> +
> +	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
> +	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
> +		res = error_errno(_("could not redirect output"));
> +		goto diagnose_cleanup;
> +	}
> +
> +	init_zip_archiver();
> +	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
> +
> +	strbuf_reset(&buf);
> +	strbuf_addstr(&buf,
> +		      "--add-file-with-content=diagnostics.log:"
> +		      "Collecting diagnostic info\n\n");
> +	get_version_info(&buf, 1);
> +
> +	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
> +	off = strchr(buf.buf, ':') + 1 - buf.buf;
> +	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
> +	strvec_push(&archiver_args, buf.buf);

Fun trick to reuse the buffer for both the ZIP entry and stdout. :)  I'd
have omitted the option from buf and added it like this, for simplicity:

	strvec_pushf(&archiver_args,
		     "--add-file-with-content=diagnostics.log:%s", buf.buf);

Just a thought.

> +
> +	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
> +	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
> +	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
> +	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
> +	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
> +		goto diagnose_cleanup;
> +
> +	strvec_pushl(&archiver_args, "--prefix=",
> +		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
> +
> +	/* `write_archive()` modifies the `argv` passed to it. Let it. */
> +	argv_copy = xmemdupz(archiver_args.v,
> +			     sizeof(char *) * archiver_args.nr);

Leaking the whole thing would be fine as well for this command, but
cleaning up is tidier, of course.

> +	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
> +			    the_repository, NULL, 0);

Ah -- no shell means no command line length limits. :)

> +	if (res) {
> +		error(_("failed to write archive"));
> +		goto diagnose_cleanup;
> +	}
> +
> +	if (!res)
> +		printf("\n"
> +		       "Diagnostics complete.\n"
> +		       "All of the gathered info is captured in '%s'\n",
> +		       zip_path.buf);

Is this message appended to the ZIP file or does it go to stdout?

In any case: mixing write(2) and stdio(3) is not a good idea.  Using
fwrite(3) instead of write_or_die above and doing the stdout dup(2)
dance only tightly around the write_archive call would help, I think.

> +
> +diagnose_cleanup:
> +	if (archiver_fd >= 0) {
> +		close(1);
> +		dup2(stdout_fd, 1);
> +	}
> +	free(argv_copy);
> +	strvec_clear(&archiver_args);
> +	strbuf_release(&zip_path);
> +	strbuf_release(&path);
> +	strbuf_release(&buf);
> +
> +	return res;
> +}
> +
>  static int cmd_list(int argc, const char **argv)
>  {
>  	if (argc != 1)
> @@ -802,6 +944,7 @@ static struct {
>  	{ "reconfigure", cmd_reconfigure },
>  	{ "delete", cmd_delete },
>  	{ "version", cmd_version },
> +	{ "diagnose", cmd_diagnose },
>  	{ NULL, NULL},
>  };
>
> diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
> index f416d637289..22583fe046e 100644
> --- a/contrib/scalar/scalar.txt
> +++ b/contrib/scalar/scalar.txt
> @@ -14,6 +14,7 @@ scalar register [<enlistment>]
>  scalar unregister [<enlistment>]
>  scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
>  scalar reconfigure [ --all | <enlistment> ]
> +scalar diagnose [<enlistment>]
>  scalar delete <enlistment>
>
>  DESCRIPTION
> @@ -129,6 +130,17 @@ reconfigure the enlistment.
>  With the `--all` option, all enlistments currently registered with Scalar
>  will be reconfigured. Use this option after each Scalar upgrade.
>
> +Diagnose
> +~~~~~~~~
> +
> +diagnose [<enlistment>]::
> +    When reporting issues with Scalar, it is often helpful to provide the
> +    information gathered by this command, including logs and certain
> +    statistics describing the data shape of the current enlistment.
> ++
> +The output of this command is a `.zip` file that is written into
> +a directory adjacent to the worktree in the `src` directory.
> +
>  Delete
>  ~~~~~~
>
> diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> index 9d83fdf25e8..bbd07a44426 100755
> --- a/contrib/scalar/t/t9099-scalar.sh
> +++ b/contrib/scalar/t/t9099-scalar.sh
> @@ -90,4 +90,18 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
>  	grep "cloned. does not exist" err
>  '
>
> +SQ="'"
> +test_expect_success UNZIP 'scalar diagnose' '
> +	scalar clone "file://$(pwd)" cloned --single-branch &&
> +	scalar diagnose cloned >out &&
> +	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
> +	zip_path=$(cat zip_path) &&
> +	test -n "$zip_path" &&
> +	unzip -v "$zip_path" &&
> +	folder=${zip_path%.zip} &&
> +	test_path_is_missing "$folder" &&
> +	unzip -p "$zip_path" diagnostics.log >out &&
> +	test_file_not_empty out
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-07 19:55     ` René Scharfe
@ 2022-02-07 23:30       ` Junio C Hamano
  2022-02-08 13:12         ` Johannes Schindelin
  2022-02-08 12:54       ` Johannes Schindelin
  1 sibling, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2022-02-07 23:30 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin via GitGitGadget, git, Taylor Blau,
	Derrick Stolee, Elijah Newren, Johannes Schindelin

René Scharfe <l.s.r@web.de> writes:

> We could use that option in Git's own Makefile to add the file named
> "version", which contains $GIT_VERSION.  Hmm, but it also contains a
> terminating newline, which would be a bit tricky (but not impossible) to
> add.  Would it make sense to add one automatically if it's missing (e.g.
> with strbuf_complete_line)?  Not sure.

I do not think it is a good UI to give raw file content from the
command line, which will be usable only for trivial, even single
liner files, and forces people to learn two parallel option, one
for trivial ones and the other for contents with meaningful size.

"--add-blob=<path>:<blob-object-name>" may be another option, useful
when you have done "hash-object -w" already, and can be used to add
single-liner, or an entire novel.

In any case, "--add-file=<file>", which we already have, would be
more appropriate feature to use to record our "version" file, so
there is no need to change our Makefile for it.


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v2 3/6] Implement `scalar diagnose`
  2022-02-07 19:55     ` René Scharfe
@ 2022-02-08 12:08       ` Johannes Schindelin
  0 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2022-02-08 12:08 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin via GitGitGadget, git, Taylor Blau,
	Derrick Stolee, Elijah Newren

[-- Attachment #1: Type: text/plain, Size: 4447 bytes --]

Hi René,

On Mon, 7 Feb 2022, René Scharfe wrote:

> > Note: originally, Scalar was implemented in C# using the .NET API, where
> > we had the luxury of a comprehensive standard library that includes
> > basic functionality such as writing a `.zip` file. In the C version, we
> > lack such a commodity. Rather than introducing a dependency on, say,
> > libzip, we slightly abuse Git's `archive` command: Instead of writing
> > the `.zip` file directly, we stage the file contents in a Git index of a
> > temporary, bare repository, only to let `git archive` have at it, and
> > finally removing the temporary repository.
> >
> > Also note: Due to the frequently-spawned `git hash-object` processes,
> > this command is quite a bit slow on Windows. Should it turn out to be a
> > big problem, the lack of a batch mode of the `hash-object` command could
> > potentially be worked around via using `git fast-import` with a crafted
> > `stdin`.
>
> The two paragraphs above are not in sync with the patch.

Whoopsie!

> > +	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
> > +	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
> > +		res = error_errno(_("could not redirect output"));
> > +		goto diagnose_cleanup;
> > +	}
> > +
> > +	init_zip_archiver();
> > +	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
> > +
> > +	strbuf_reset(&buf);
> > +	strbuf_addstr(&buf,
> > +		      "--add-file-with-content=diagnostics.log:"
> > +		      "Collecting diagnostic info\n\n");
> > +	get_version_info(&buf, 1);
> > +
> > +	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
> > +	off = strchr(buf.buf, ':') + 1 - buf.buf;
> > +	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
> > +	strvec_push(&archiver_args, buf.buf);
>
> Fun trick to reuse the buffer for both the ZIP entry and stdout. :)  I'd
> have omitted the option from buf and added it like this, for simplicity:
>
> 	strvec_pushf(&archiver_args,
> 		     "--add-file-with-content=diagnostics.log:%s", buf.buf);
>
> Just a thought.

Oh, that's even better. I did not like that `off` pattern at all but
forgot to think of `pushf()`. Thanks!

> > +
> > +	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
> > +	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
> > +	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
> > +	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
> > +	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
> > +		goto diagnose_cleanup;
> > +
> > +	strvec_pushl(&archiver_args, "--prefix=",
> > +		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
> > +
> > +	/* `write_archive()` modifies the `argv` passed to it. Let it. */
> > +	argv_copy = xmemdupz(archiver_args.v,
> > +			     sizeof(char *) * archiver_args.nr);
>
> Leaking the whole thing would be fine as well for this command, but
> cleaning up is tidier, of course.
>
> > +	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
> > +			    the_repository, NULL, 0);
>
> Ah -- no shell means no command line length limits. :)

Yes!!!

It also makes the command a ridiculous amount faster on Windows.

> > +	if (res) {
> > +		error(_("failed to write archive"));
> > +		goto diagnose_cleanup;
> > +	}
> > +
> > +	if (!res)
> > +		printf("\n"
> > +		       "Diagnostics complete.\n"
> > +		       "All of the gathered info is captured in '%s'\n",
> > +		       zip_path.buf);
>
> Is this message appended to the ZIP file or does it go to stdout?

It goes to `stdout`, this is for the user who runs `scalar diagnose`.

Hmm.

Now that you pointed it out, I think I want it to go to `stderr` instead.

> In any case: mixing write(2) and stdio(3) is not a good idea.  Using
> fwrite(3) instead of write_or_die above and doing the stdout dup(2)
> dance only tightly around the write_archive call would help, I think.

Sure, but let's print this message to `stderr` instead, that'll be much
cleaner, right?

Alternatively, I think I'd rather move the `printf()` below...

>
> > +
> > +diagnose_cleanup:
> > +	if (archiver_fd >= 0) {
> > +		close(1);
> > +		dup2(stdout_fd, 1);
> > +	}

... this re-redirection.

What do you think? `stdout` or `stderr`?

Thank you for your review!
Dscho

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-07 19:55     ` René Scharfe
  2022-02-07 23:30       ` Junio C Hamano
@ 2022-02-08 12:54       ` Johannes Schindelin
  1 sibling, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2022-02-08 12:54 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin via GitGitGadget, git, Taylor Blau,
	Derrick Stolee, Elijah Newren

[-- Attachment #1: Type: text/plain, Size: 6224 bytes --]

Hi René,

On Mon, 7 Feb 2022, René Scharfe wrote:

> Am 06.02.22 um 23:39 schrieb Johannes Schindelin via GitGitGadget:
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > With the `--add-file-with-content=<path>:<content>` option, `git
> > archive` now supports use cases where relatively trivial files need to
> > be added that do not exist on disk.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  Documentation/git-archive.txt | 11 ++++++++
> >  archive.c                     | 51 +++++++++++++++++++++++++++++------
> >  t/t5003-archive-zip.sh        | 12 +++++++++
> >  3 files changed, 66 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
> > index bc4e76a7834..1b52a0a65a1 100644
> > --- a/Documentation/git-archive.txt
> > +++ b/Documentation/git-archive.txt
> > @@ -61,6 +61,17 @@ OPTIONS
> >  	by concatenating the value for `--prefix` (if any) and the
> >  	basename of <file>.
> >
> > +--add-file-with-content=<path>:<content>::
> > +	Add the specified contents to the archive.  Can be repeated to add
> > +	multiple files.  The path of the file in the archive is built
> > +	by concatenating the value for `--prefix` (if any) and the
> > +	basename of <file>.
> > ++
> > +The `<path>` cannot contain any colon, the file mode is limited to
> > +a regular file, and the option may be subject platform-dependent
>
> s/subject/& to/

Thanks.

> > +command-line limits. For non-trivial cases, write an untracked file
> > +and use `--add-file` instead.
> > +
>
> We could use that option in Git's own Makefile to add the file named
> "version", which contains $GIT_VERSION.

We could do that, that opportunity is a side effect of this patch series.

> Hmm, but it also contains a terminating newline, which would be a bit
> tricky (but not impossible) to add.  Would it make sense to add one
> automatically if it's missing (e.g. with strbuf_complete_line)?  Not
> sure.

It is really easy:

	LF='
	'

	git archive --add-file-with-content=version:"$GIT_VERSION$LF" ...

(That's shell script, in the Makefile it would need those `\`
continuations.)

> >  --worktree-attributes::
> >  	Look for attributes in .gitattributes files in the working tree
> >  	as well (see <<ATTRIBUTES>>).
> > diff --git a/archive.c b/archive.c
> > index a3bbb091256..172efd690c3 100644
> > --- a/archive.c
> > +++ b/archive.c
> > @@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
> >  struct extra_file_info {
> >  	char *base;
> >  	struct stat stat;
> > +	void *content;
> >  };
> >
> >  int write_archive_entries(struct archiver_args *args,
> > @@ -337,7 +338,13 @@ int write_archive_entries(struct archiver_args *args,
> >  		strbuf_addstr(&path_in_archive, basename(path));
> >
> >  		strbuf_reset(&content);
> > -		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
> > +		if (info->content)
> > +			err = write_entry(args, &fake_oid, path_in_archive.buf,
> > +					  path_in_archive.len,
> > +					  info->stat.st_mode,
> > +					  info->content, info->stat.st_size);
> > +		else if (strbuf_read_file(&content, path,
> > +					  info->stat.st_size) < 0)
> >  			err = error_errno(_("could not read '%s'"), path);
> >  		else
> >  			err = write_entry(args, &fake_oid, path_in_archive.buf,
> > @@ -493,6 +500,7 @@ static void extra_file_info_clear(void *util, const char *str)
> >  {
> >  	struct extra_file_info *info = util;
> >  	free(info->base);
> > +	free(info->content);
> >  	free(info);
> >  }
> >
> > @@ -514,14 +522,38 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
> >  	if (!arg)
> >  		return -1;
> >
> > -	path = prefix_filename(args->prefix, arg);
> > -	item = string_list_append_nodup(&args->extra_files, path);
> > -	item->util = info = xmalloc(sizeof(*info));
> > +	info = xmalloc(sizeof(*info));
> >  	info->base = xstrdup_or_null(base);
> > -	if (stat(path, &info->stat))
> > -		die(_("File not found: %s"), path);
> > -	if (!S_ISREG(info->stat.st_mode))
> > -		die(_("Not a regular file: %s"), path);
> > +
> > +	if (strcmp(opt->long_name, "add-file-with-content")) {
>
> Equivalent to:
>
> 	if (!strcmp(opt->long_name, "add-file")) {
>
> I mention that because the inequality check confused me a bit at first.

Good point. For some reason I thought it would be clearer to handle
everything but `--add-file-with-content` here, but that "everything but"
is only `--add-file`, so I sowed more confusion. Sorry about that.

>
> > +		path = prefix_filename(args->prefix, arg);
> > +		if (stat(path, &info->stat))
> > +			die(_("File not found: %s"), path);
> > +		if (!S_ISREG(info->stat.st_mode))
> > +			die(_("Not a regular file: %s"), path);
> > +		info->content = NULL; /* read the file later */
> > +	} else {
> > +		const char *colon = strchr(arg, ':');
> > +		char *p;
> > +
> > +		if (!colon)
> > +			die(_("missing colon: '%s'"), arg);
> > +
> > +		p = xstrndup(arg, colon - arg);
> > +		if (!args->prefix)
> > +			path = p;
> > +		else {
> > +			path = prefix_filename(args->prefix, p);
> > +			free(p);
> > +		}
> > +		memset(&info->stat, 0, sizeof(info->stat));
> > +		info->stat.st_mode = S_IFREG | 0644;
> > +		info->content = xstrdup(colon + 1);
> > +		info->stat.st_size = strlen(info->content);
> > +	}
> > +	item = string_list_append_nodup(&args->extra_files, path);
> > +	item->util = info;
> > +
> >  	return 0;
> >  }
> >
> > @@ -554,6 +586,9 @@ static int parse_archive_args(int argc, const char **argv,
> >  		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
> >  		  N_("add untracked file to archive"), 0, add_file_cb,
> >  		  (intptr_t)&base },
> > +		{ OPTION_CALLBACK, 0, "add-file-with-content", args,
> > +		  N_("file"), N_("add untracked file to archive"), 0,
>                       ^^^^
> "<file>" seems wrong, because there is no actual file.  It should rather
> be "<name>:<content>" for the virtual one, right?

Or `<path>:<content>`. Yes.

Again, thank you for your clear and helpful review,
Dscho

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-07 23:30       ` Junio C Hamano
@ 2022-02-08 13:12         ` Johannes Schindelin
  2022-02-08 17:44           ` Junio C Hamano
  0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2022-02-08 13:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: René Scharfe, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

[-- Attachment #1: Type: text/plain, Size: 2099 bytes --]

Hi Junio,

On Mon, 7 Feb 2022, Junio C Hamano wrote:

> René Scharfe <l.s.r@web.de> writes:
>
> > We could use that option in Git's own Makefile to add the file named
> > "version", which contains $GIT_VERSION.  Hmm, but it also contains a
> > terminating newline, which would be a bit tricky (but not impossible) to
> > add.  Would it make sense to add one automatically if it's missing (e.g.
> > with strbuf_complete_line)?  Not sure.
>
> I do not think it is a good UI to give raw file content from the
> command line, which will be usable only for trivial, even single
> liner files, and forces people to learn two parallel option, one
> for trivial ones and the other for contents with meaningful size.

Nevertheless, it is still the most elegant way that I can think of to
generate a diagnostic `.zip` file without messing up the very things that
are to be diagnosed: the repository and the worktree.

> "--add-blob=<path>:<blob-object-name>" may be another option, useful
> when you have done "hash-object -w" already, and can be used to add
> single-liner, or an entire novel.

This would mess with the repository. Granted, it is unlikely that adding a
tiny blob will all of a sudden work around a bug that the user wanted to
report, but less big mutations have been known to subtly change a bug's
manifested symptoms.

So I really do not want to do that, not in `scalar diagnose.

> In any case, "--add-file=<file>", which we already have, would be
> more appropriate feature to use to record our "version" file, so
> there is no need to change our Makefile for it.

Same here. It is bad enough that `scalar diagnose` has to create a
directory in the current enlistment. Let's not make the situation even
worse.

The most elegant solution would have been that streaming `--add-file` mode
suggested by René, I think, but that's too involved to implement just to
benefit `scalar diagnose`. It's not like we can simply stream the contents
via `stdin`, as there are more than one "virtual" file we need to add to
that `.zip` file.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-08 13:12         ` Johannes Schindelin
@ 2022-02-08 17:44           ` Junio C Hamano
  2022-02-08 20:58             ` René Scharfe
  0 siblings, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2022-02-08 17:44 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: René Scharfe, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> > We could use that option in Git's own Makefile to add the file named
>> > "version", which contains $GIT_VERSION.  Hmm, but it also contains a
>> > terminating newline, which would be a bit tricky (but not impossible) to
>> > add.  Would it make sense to add one automatically if it's missing (e.g.
>> > with strbuf_complete_line)?  Not sure.
>>
>> I do not think it is a good UI to give raw file content from the
>> command line, which will be usable only for trivial, even single
>> liner files, and forces people to learn two parallel option, one
>> for trivial ones and the other for contents with meaningful size.
>
> Nevertheless, it is still the most elegant way that I can think of to
> generate a diagnostic `.zip` file without messing up the very things that
> are to be diagnosed: the repository and the worktree.

Puzzled.  Are you feeding contents of a .zip file from the command
line?

I was mostly worried about busting command line argument limit by
trying to feed too many bytes, as the ceiling is fairly low on some
platforms.  Another worry was that when <contents> can have
arbitrary bytes, with --opt=<path>:<contents> syntax, the input
becomes ambiguous (i.e. "which colon is the <path> separator?"),
without some way to escape a colon in the payload.

For a single-liner, --add-file-with-contents=<path>:<contents> would
be an OK way, and my comment was not a strong objection against this
new option existing.  It was primarily an objection against changing
the way to add the 'version' file in our "make dist" procedure to
use it anyway.

But now I think about it more, I am becoming less happy about it
existing in the first place.

This will throw another monkey wrench to Konstantin's plan [*] to
make "git archive" output verifiable with the signature on original
Git objects, but it is not a new problem ;-)


[Reference]

* https://lore.kernel.org/git/20220207213449.ljqjhdx4f45a3lx5@meerkat.local/

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-08 17:44           ` Junio C Hamano
@ 2022-02-08 20:58             ` René Scharfe
  2022-02-09 22:48               ` Junio C Hamano
  0 siblings, 1 reply; 109+ messages in thread
From: René Scharfe @ 2022-02-08 20:58 UTC (permalink / raw)
  To: Junio C Hamano, Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, Taylor Blau,
	Derrick Stolee, Elijah Newren

Am 08.02.22 um 18:44 schrieb Junio C Hamano:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>>>> We could use that option in Git's own Makefile to add the file named
>>>> "version", which contains $GIT_VERSION.  Hmm, but it also contains a
>>>> terminating newline, which would be a bit tricky (but not impossible) to
>>>> add.  Would it make sense to add one automatically if it's missing (e.g.
>>>> with strbuf_complete_line)?  Not sure.
>>>
>>> I do not think it is a good UI to give raw file content from the
>>> command line, which will be usable only for trivial, even single
>>> liner files, and forces people to learn two parallel option, one
>>> for trivial ones and the other for contents with meaningful size.
>>
>> Nevertheless, it is still the most elegant way that I can think of to
>> generate a diagnostic `.zip` file without messing up the very things that
>> are to be diagnosed: the repository and the worktree.
>
> Puzzled.  Are you feeding contents of a .zip file from the command
> line?

Kind of.  Command line arguments are built and handed to write_archive()
in-process.  It's done by patch 3 and extended by 5 and 6.

The number of files is relatively low and they aren't huge, right?
Staging their content in the object database would be messy, but $TMPDIR
might be able to take them with a low impact.  Unless the problem to
diagnose is that this directory is full -- but you don't need a fancy
report for that. :)

Currently there is no easy way to write a temporary file with a chosen
name.  diff.c would benefit from such a thing when running an external
diff program; currently it adds a random prefix.  git archive --add-file
also uses the filename (and discards the directory part).  The patch
below adds a function to create temporary files with a chosen name.
Perhaps it would be useful here as well, instead of the new option?

> I was mostly worried about busting command line argument limit by
> trying to feed too many bytes, as the ceiling is fairly low on some
> platforms.

Command line length limits don't apply to the way scalar uses the new
option.

> Another worry was that when <contents> can have
> arbitrary bytes, with --opt=<path>:<contents> syntax, the input
> becomes ambiguous (i.e. "which colon is the <path> separator?"),
> without some way to escape a colon in the payload.

The first colon is the separator here.

> For a single-liner, --add-file-with-contents=<path>:<contents> would
> be an OK way, and my comment was not a strong objection against this
> new option existing.  It was primarily an objection against changing
> the way to add the 'version' file in our "make dist" procedure to
> use it anyway.
>
> But now I think about it more, I am becoming less happy about it
> existing in the first place.
>
> This will throw another monkey wrench to Konstantin's plan [*] to
> make "git archive" output verifiable with the signature on original
> Git objects, but it is not a new problem ;-)
>
>
> [Reference]
>
> * https://lore.kernel.org/git/20220207213449.ljqjhdx4f45a3lx5@meerkat.local/

I don't see the conflict: If an untracked file is added to an archive
using --add-file, --add-file-with-content, or ZIP or tar then we'd
*want* the verification against a signed commit or tag to fail, no?  A
different signature would be required for the non-tracked parts.

René


--- >8 ---
Subject: [PATCH] tempfile: add mks_tempfile_dt()

Add a function to create a temporary file with a certain name in a
temporary directory created using mkdtemp(3).  Its result is more
sightly than the paths created by mks_tempfile_ts(), which include
a random prefix.  That's useful for files passed to a program that
displays their name, e.g. an external diff tool.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 tempfile.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tempfile.h | 13 +++++++++++
 2 files changed, 76 insertions(+)

diff --git a/tempfile.c b/tempfile.c
index 94aa18f3f7..2024c82691 100644
--- a/tempfile.c
+++ b/tempfile.c
@@ -56,6 +56,20 @@

 static VOLATILE_LIST_HEAD(tempfile_list);

+static void remove_template_directory(struct tempfile *tempfile,
+				      int in_signal_handler)
+{
+	if (tempfile->directorylen > 0 &&
+	    tempfile->directorylen < tempfile->filename.len &&
+	    tempfile->filename.buf[tempfile->directorylen] == '/') {
+		strbuf_setlen(&tempfile->filename, tempfile->directorylen);
+		if (in_signal_handler)
+			rmdir(tempfile->filename.buf);
+		else
+			rmdir_or_warn(tempfile->filename.buf);
+	}
+}
+
 static void remove_tempfiles(int in_signal_handler)
 {
 	pid_t me = getpid();
@@ -74,6 +88,7 @@ static void remove_tempfiles(int in_signal_handler)
 			unlink(p->filename.buf);
 		else
 			unlink_or_warn(p->filename.buf);
+		remove_template_directory(p, in_signal_handler);

 		p->active = 0;
 	}
@@ -100,6 +115,7 @@ static struct tempfile *new_tempfile(void)
 	tempfile->owner = 0;
 	INIT_LIST_HEAD(&tempfile->list);
 	strbuf_init(&tempfile->filename, 0);
+	tempfile->directorylen = 0;
 	return tempfile;
 }

@@ -198,6 +214,52 @@ struct tempfile *mks_tempfile_tsm(const char *filename_template, int suffixlen,
 	return tempfile;
 }

+struct tempfile *mks_tempfile_dt(const char *directory_template,
+				 const char *filename)
+{
+	struct tempfile *tempfile;
+	const char *tmpdir;
+	struct strbuf sb = STRBUF_INIT;
+	int fd;
+	size_t directorylen;
+
+	if (!ends_with(directory_template, "XXXXXX")) {
+		errno = EINVAL;
+		return NULL;
+	}
+
+	tmpdir = getenv("TMPDIR");
+	if (!tmpdir)
+		tmpdir = "/tmp";
+
+	strbuf_addf(&sb, "%s/%s", tmpdir, directory_template);
+	directorylen = sb.len;
+	if (!mkdtemp(sb.buf)) {
+		int orig_errno = errno;
+		strbuf_release(&sb);
+		errno = orig_errno;
+		return NULL;
+	}
+
+	strbuf_addf(&sb, "/%s", filename);
+	fd = open(sb.buf, O_CREAT | O_EXCL | O_RDWR, 0600);
+	if (fd < 0) {
+		int orig_errno = errno;
+		strbuf_setlen(&sb, directorylen);
+		rmdir(sb.buf);
+		strbuf_release(&sb);
+		errno = orig_errno;
+		return NULL;
+	}
+
+	tempfile = new_tempfile();
+	strbuf_swap(&tempfile->filename, &sb);
+	tempfile->directorylen = directorylen;
+	tempfile->fd = fd;
+	activate_tempfile(tempfile);
+	return tempfile;
+}
+
 struct tempfile *xmks_tempfile_m(const char *filename_template, int mode)
 {
 	struct tempfile *tempfile;
@@ -316,6 +378,7 @@ void delete_tempfile(struct tempfile **tempfile_p)

 	close_tempfile_gently(tempfile);
 	unlink_or_warn(tempfile->filename.buf);
+	remove_template_directory(tempfile, 0);
 	deactivate_tempfile(tempfile);
 	*tempfile_p = NULL;
 }
diff --git a/tempfile.h b/tempfile.h
index 4de3bc77d2..d7804a214a 100644
--- a/tempfile.h
+++ b/tempfile.h
@@ -82,6 +82,7 @@ struct tempfile {
 	FILE *volatile fp;
 	volatile pid_t owner;
 	struct strbuf filename;
+	size_t directorylen;
 };

 /*
@@ -198,6 +199,18 @@ static inline struct tempfile *xmks_tempfile(const char *filename_template)
 	return xmks_tempfile_m(filename_template, 0600);
 }

+/*
+ * Attempt to create a temporary directory in $TMPDIR and to create and
+ * open a file in that new directory. Derive the directory name from the
+ * template in the manner of mkdtemp(). Arrange for directory and file
+ * to be deleted if the program exits before they are deleted
+ * explicitly. On success return a tempfile whose "filename" member
+ * contains the full path of the file and its "fd" member is open for
+ * writing the file. On error return NULL and set errno appropriately.
+ */
+struct tempfile *mks_tempfile_dt(const char *directory_template,
+				 const char *filename);
+
 /*
  * Associate a stdio stream with the temporary file (which must still
  * be open). Return `NULL` (*without* deleting the file) on error. The
--
2.35.1

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-08 20:58             ` René Scharfe
@ 2022-02-09 22:48               ` Junio C Hamano
  2022-02-10 19:10                 ` René Scharfe
  0 siblings, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2022-02-09 22:48 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

René Scharfe <l.s.r@web.de> writes:

>>> Nevertheless, it is still the most elegant way that I can think of to
>>> generate a diagnostic `.zip` file without messing up the very things that
>>> are to be diagnosed: the repository and the worktree.
>>
>> Puzzled.  Are you feeding contents of a .zip file from the command
>> line?
>
> Kind of.  Command line arguments are built and handed to write_archive()
> in-process.  It's done by patch 3 and extended by 5 and 6.

I meant to ask if this is doing

    git archive --store-contents-at-path="report.zip:$(cat diag.zip)"

as I misunderstood what 'the diagnostic .zip file' referred to.
That was a reference to the output of the "git archive" command.

> The number of files is relatively low and they aren't huge, right?

As long as it is expected to fit on the command line, that's fine.
But if the question is "it is OK to add a new option with known
limitation", then it should be stated a bit differently.

"We add this option for use cases where we handle only small number
of one-liner files", and it is OK.  We may however want to do
something imilar to what we do to the "-m '<message>'" option used
by "git commit" and "git merge", i.e. add the final LF when it is
missing to make it a complete line, to hint the fact that this is
meant to add a small number of single liner files.

>> Another worry was that when <contents> can have
>> arbitrary bytes, with --opt=<path>:<contents> syntax, the input
>> becomes ambiguous (i.e. "which colon is the <path> separator?"),
>> without some way to escape a colon in the payload.
>
> The first colon is the separator here.

Meaning you cannot have a colon in the path, which is not exactly
pleasing limitation.  I know you may not be able to do so on Windows
or CIFS mounted on non-Windows, but we do not limit ourselves to
portable filename character set (POSIX.1 3.282), either.

>> This will throw another monkey wrench to Konstantin's plan [*] to
>> make "git archive" output verifiable with the signature on original
>> Git objects, but it is not a new problem ;-)
>>
>>
>> [Reference]
>>
>> * https://lore.kernel.org/git/20220207213449.ljqjhdx4f45a3lx5@meerkat.local/
>
> I don't see the conflict: If an untracked file is added to an archive
> using --add-file, --add-file-with-content, or ZIP or tar then we'd
> *want* the verification against a signed commit or tag to fail, no?  A
> different signature would be required for the non-tracked parts.

Yes, which is exactly how this (and existing --add-file) makes
Konstantin's plan much less useful.

Thanks.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-09 22:48               ` Junio C Hamano
@ 2022-02-10 19:10                 ` René Scharfe
  2022-02-10 19:23                   ` Junio C Hamano
  0 siblings, 1 reply; 109+ messages in thread
From: René Scharfe @ 2022-02-10 19:10 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

Am 09.02.22 um 23:48 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> The number of files is relatively low and they aren't huge, right?
>
> As long as it is expected to fit on the command line, that's fine.
> But if the question is "it is OK to add a new option with known
> limitation", then it should be stated a bit differently.

I asked this question to find out if writing the files to $TMPDIR and
adding them with --add-file instead of with --add-file-with-content
would be feasible in patches 3 to 6.  git archive would not have to be
changed in that case.

>>> This will throw another monkey wrench to Konstantin's plan [*] to
>>> make "git archive" output verifiable with the signature on original
>>> Git objects, but it is not a new problem ;-)
>>>
>>>
>>> [Reference]
>>>
>>> * https://lore.kernel.org/git/20220207213449.ljqjhdx4f45a3lx5@meerkat.local/
>>
>> I don't see the conflict: If an untracked file is added to an archive
>> using --add-file, --add-file-with-content, or ZIP or tar then we'd
>> *want* the verification against a signed commit or tag to fail, no?  A
>> different signature would be required for the non-tracked parts.
>
> Yes, which is exactly how this (and existing --add-file) makes
> Konstantin's plan much less useful.
People added untracked files to archives before --add-file existed.

--add-file-with-content could be used to add the .GIT_ARCHIVE_SIG file.

Additional untracked files would need a manifest to specify which files
are (not) covered by the signed commit/tag.  Or the .GIT_ARCHIVE_SIG
files could be added just after the signed files as a rule, before any
other untracked files, as some kind of a separator.

Just listing untracked files and verifying the others might still be
useful.  Warning about untracked files shadowing tracked ones would be
very useful.

Some equivalent to the .GIT_ARCHIVE_SIG file containing a signature of
the untracked files could optionally be added at the end to allow full
verification -- but would require signing at archive creation time.

René

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-10 19:10                 ` René Scharfe
@ 2022-02-10 19:23                   ` Junio C Hamano
  2022-02-11 19:16                     ` René Scharfe
  0 siblings, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2022-02-10 19:23 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

René Scharfe <l.s.r@web.de> writes:

>> Yes, which is exactly how this (and existing --add-file) makes
>> Konstantin's plan much less useful.
> People added untracked files to archives before --add-file existed.
>
> --add-file-with-content could be used to add the .GIT_ARCHIVE_SIG file.
>
> Additional untracked files would need a manifest to specify which files
> are (not) covered by the signed commit/tag.  Or the .GIT_ARCHIVE_SIG
> files could be added just after the signed files as a rule, before any
> other untracked files, as some kind of a separator.

Or if people do not _exclude_ tracked files from the archive, then
the verifier who has a tarball and a Git tree object can consult the
tree object to see which ones are added untracked cruft.

> Just listing untracked files and verifying the others might still be
> useful.  Warning about untracked files shadowing tracked ones would be
> very useful.

Yup.

> Some equivalent to the .GIT_ARCHIVE_SIG file containing a signature of
> the untracked files could optionally be added at the end to allow full
> verification -- but would require signing at archive creation time.

Yeah, and at that point, it is not much more convenient than just
signing the whole archive (sans the SIG part, obviously), which is
what people have always done ;-)

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-10 19:23                   ` Junio C Hamano
@ 2022-02-11 19:16                     ` René Scharfe
  2022-02-11 21:27                       ` Junio C Hamano
  0 siblings, 1 reply; 109+ messages in thread
From: René Scharfe @ 2022-02-11 19:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

Am 10.02.22 um 20:23 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>> Yes, which is exactly how this (and existing --add-file) makes
>>> Konstantin's plan much less useful.

A harder obstacle to verification would be end-of-line conversion.
Retrying a failed signature check after applying convert_to_git() might
work, but not for files that have mixed line endings in the repository
and end up being homogenized during checkout (and thus archiving).

>> People added untracked files to archives before --add-file existed.
>>
>> --add-file-with-content could be used to add the .GIT_ARCHIVE_SIG file.
>>
>> Additional untracked files would need a manifest to specify which files
>> are (not) covered by the signed commit/tag.  Or the .GIT_ARCHIVE_SIG
>> files could be added just after the signed files as a rule, before any
>> other untracked files, as some kind of a separator.
>
> Or if people do not _exclude_ tracked files from the archive, then
> the verifier who has a tarball and a Git tree object can consult the
> tree object to see which ones are added untracked cruft.

True, but if you have the tree objects then you probably also have the
blobs and don't need the archive?  Or is this some kind of sparse
checkout scenario?

>> Some equivalent to the .GIT_ARCHIVE_SIG file containing a signature of
>> the untracked files could optionally be added at the end to allow full
>> verification -- but would require signing at archive creation time.
>
> Yeah, and at that point, it is not much more convenient than just
> signing the whole archive (sans the SIG part, obviously), which is
> what people have always done ;-)

Indeed.

René

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-11 19:16                     ` René Scharfe
@ 2022-02-11 21:27                       ` Junio C Hamano
  2022-02-12  9:12                         ` René Scharfe
  0 siblings, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2022-02-11 21:27 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

René Scharfe <l.s.r@web.de> writes:

>> Or if people do not _exclude_ tracked files from the archive, then
>> the verifier who has a tarball and a Git tree object can consult the
>> tree object to see which ones are added untracked cruft.
>
> True, but if you have the tree objects then you probably also have the
> blobs and don't need the archive?  Or is this some kind of sparse
> checkout scenario?

My phrasing was too loose.  This is a "how to verify a distro
tarball" (without having a copy of the project repository, but with
some common tools like "git") scenario.

The verifier has a tarball.  In addition, the verifier knows the
object name of the Git tree object the tarball was taken from, and
somehow trusts that the object name is genuine.  We can do either
"untar + git-add . && git write-tree" or its equivalent to see how
the contents hashes to the expected tree (or not).

How the verifier trusts the object name is out of scope (it may come
from a copy of a signed tag object and a copy of the commit object
that the tag points at and the contents of signed tag object, with
its known format, would allow you to write a stand alone tool to
verify the PGP signature).

Line-end normalization and smudge filter rules may get in the way,
if we truly did "untar" to the filesystem, but I thought "git
archive" didn't do smudge conversion and core.crlf handling when
creating the archive?



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-11 21:27                       ` Junio C Hamano
@ 2022-02-12  9:12                         ` René Scharfe
  2022-02-13  6:25                           ` Junio C Hamano
  0 siblings, 1 reply; 109+ messages in thread
From: René Scharfe @ 2022-02-12  9:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

Am 11.02.22 um 22:27 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>> Or if people do not _exclude_ tracked files from the archive, then
>>> the verifier who has a tarball and a Git tree object can consult the
>>> tree object to see which ones are added untracked cruft.
>>
>> True, but if you have the tree objects then you probably also have the
>> blobs and don't need the archive?  Or is this some kind of sparse
>> checkout scenario?
>
> My phrasing was too loose.  This is a "how to verify a distro
> tarball" (without having a copy of the project repository, but with
> some common tools like "git") scenario.
>
> The verifier has a tarball.  In addition, the verifier knows the
> object name of the Git tree object the tarball was taken from, and
> somehow trusts that the object name is genuine.  We can do either
> "untar + git-add . && git write-tree" or its equivalent to see how
> the contents hashes to the expected tree (or not).
>
> How the verifier trusts the object name is out of scope (it may come
> from a copy of a signed tag object and a copy of the commit object
> that the tag points at and the contents of signed tag object, with
> its known format, would allow you to write a stand alone tool to
> verify the PGP signature).

Right, but the tree hash does not directly allow to see which objects
are tracked or not.  This information is necessary to reconstruct the
signed tree.  (Having tracked files first, then the signature file and
then untracked files in the archive would be an easy way to transmit
it.)

> Line-end normalization and smudge filter rules may get in the way,
> if we truly did "untar" to the filesystem, but I thought "git
> archive" didn't do smudge conversion and core.crlf handling when
> creating the archive?

git archive uses convert_to_working_tree() to archive the same file
contents as tar or zip would.

René

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-12  9:12                         ` René Scharfe
@ 2022-02-13  6:25                           ` Junio C Hamano
  2022-02-13  9:02                             ` René Scharfe
  0 siblings, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2022-02-13  6:25 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

René Scharfe <l.s.r@web.de> writes:

>> The verifier has a tarball.  In addition, the verifier knows the
>> object name of the Git tree object the tarball was taken from, and
>> somehow trusts that the object name is genuine.  We can do either
>> "untar + git-add . && git write-tree" or its equivalent to see how
>> the contents hashes to the expected tree (or not).
> ...
> Right, but the tree hash does not directly allow to see which objects
> are tracked or not.

Ah, of course---it was silly of me to overlook this obvious fact X-<.
So we do need some extra "manifest" to declare what's untracked etc.,
if we allow --add-file etc. to munge the tree when creating a tarball
out of it.


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-13  6:25                           ` Junio C Hamano
@ 2022-02-13  9:02                             ` René Scharfe
  2022-02-14 17:22                               ` Junio C Hamano
  0 siblings, 1 reply; 109+ messages in thread
From: René Scharfe @ 2022-02-13  9:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

Am 13.02.22 um 07:25 schrieb Junio C Hamano:
>
> So we do need some extra "manifest" to declare what's untracked etc.,
> if we allow --add-file etc. to munge the tree when creating a tarball
> out of it.

Right, or get that information from the order of files in the archive,
by having tracked files come first, then the signature file with a
certain name and then untracked files.

René

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v2 1/6] archive: optionally add "virtual" files
  2022-02-13  9:02                             ` René Scharfe
@ 2022-02-14 17:22                               ` Junio C Hamano
  0 siblings, 0 replies; 109+ messages in thread
From: Junio C Hamano @ 2022-02-14 17:22 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Taylor Blau, Derrick Stolee, Elijah Newren

René Scharfe <l.s.r@web.de> writes:

> Am 13.02.22 um 07:25 schrieb Junio C Hamano:
>>
>> So we do need some extra "manifest" to declare what's untracked etc.,
>> if we allow --add-file etc. to munge the tree when creating a tarball
>> out of it.
>
> Right, or get that information from the order of files in the archive,
> by having tracked files come first, then the signature file with a
> certain name and then untracked files.

That sounds like a workable approach, modulo that the details of the
"signature file with a certain name" part needs to be worked out.

We should make sure that we clearly document that "--add-file=" and
friends add their material after the contents that come from the
tree-ish, and make sure that the program does so and will stay doing
so.  Otherwise users cannot easily create an archive that follows
the above rule.

Thanks.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v3 0/7] scalar: implement the subcommand "diagnose"
  2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
                     ` (5 preceding siblings ...)
  2022-02-06 22:39   ` [PATCH v2 6/6] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
@ 2022-05-04 15:25   ` Johannes Schindelin via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
                       ` (8 more replies)
  6 siblings, 9 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-04 15:25 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin

Over the course of the years, we developed a sub-command that gathers
diagnostic data into a .zip file that can then be attached to bug reports.
This sub-command turned out to be very useful in helping Scalar developers
identify and fix issues.

Changes since v2:

 * Clarified in the commit message what the biggest benefit of
   --add-file-with-content is.
 * The <path> part of the -add-file-with-content argument can now contain
   colons. To do this, the path needs to start and end in double-quote
   characters (which are stripped), and the backslash serves as escape
   character in that case (to allow the path to contain both colons and
   double-quotes).
 * Fixed incorrect grammar.
 * Instead of strcmp(<what-we-don't-want>), we now say
   !strcmp(<what-we-want>).
 * The help text for --add-file-with-content was improved a tiny bit.
 * Adjusted the commit message that still talked about spawning plenty of
   processes and about a throw-away repository for the sake of generating a
   .zip file.
 * Simplified the code that shows the diagnostics and adds them to the .zip
   file.
 * The final message that reports that the archive is complete is now
   printed to stderr instead of stdout.

Changes since v1:

 * Instead of creating a throw-away repository, staging the contents of the
   .zip file and then using git write-tree and git archive to write the .zip
   file, the patch series now introduces a new option to git archive and
   uses write_archive() directly (avoiding any separate process).
 * Since the command avoids separate processes, it is now blazing fast on
   Windows, and I dropped the spinner() function because it's no longer
   needed.
 * While reworking the test case, I noticed that scalar [...] <enlistment>
   failed to verify that the specified directory exists, and would happily
   "traverse to its parent directory" on its quest to find a Scalar
   enlistment. That is of course incorrect, and has been fixed as a "while
   at it" sort of preparatory commit.
 * I had forgotten to sign off on all the commits, which has been fixed.
 * Instead of some "home-grown" readdir()-based function, the code now uses
   for_each_file_in_pack_dir() to look through the pack directories.
 * If any alternates are configured, their pack directories are now included
   in the output.
 * The commit message that might be interpreted to promise information about
   large loose files has been corrected to no longer promise that.
 * The test cases have been adjusted to test a little bit more (e.g.
   verifying that specific paths are mentioned in the output, instead of
   merely verifying that the output is non-empty).

Johannes Schindelin (5):
  archive: optionally add "virtual" files
  archive --add-file-with-contents: allow paths containing colons
  scalar: validate the optional enlistment argument
  Implement `scalar diagnose`
  scalar diagnose: include disk space information

Matthew John Cheetham (2):
  scalar: teach `diagnose` to gather packfile info
  scalar: teach `diagnose` to gather loose objects information

 Documentation/git-archive.txt    |  16 ++
 archive.c                        |  75 +++++++-
 contrib/scalar/scalar.c          | 289 ++++++++++++++++++++++++++++++-
 contrib/scalar/scalar.txt        |  12 ++
 contrib/scalar/t/t9099-scalar.sh |  27 +++
 t/t5003-archive-zip.sh           |  20 +++
 6 files changed, 429 insertions(+), 10 deletions(-)


base-commit: ddc35d833dd6f9e8946b09cecd3311b8aa18d295
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1128%2Fdscho%2Fscalar-diagnose-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1128/dscho/scalar-diagnose-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1128

Range-diff vs v2:

 1:  49ff3c1f2b3 ! 1:  45662cf582a archive: optionally add "virtual" files
     @@ Commit message
          archive` now supports use cases where relatively trivial files need to
          be added that do not exist on disk.
      
     +    This will allow us to generate `.zip` files with generated content,
     +    without having to add said content to the object database and without
     +    having to write it out to disk.
     +
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## Documentation/git-archive.txt ##
     @@ Documentation/git-archive.txt: OPTIONS
      +	basename of <file>.
      ++
      +The `<path>` cannot contain any colon, the file mode is limited to
     -+a regular file, and the option may be subject platform-dependent
     ++a regular file, and the option may be subject to platform-dependent
      +command-line limits. For non-trivial cases, write an untracked file
      +and use `--add-file` instead.
      +
     @@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int
      -	if (!S_ISREG(info->stat.st_mode))
      -		die(_("Not a regular file: %s"), path);
      +
     -+	if (strcmp(opt->long_name, "add-file-with-content")) {
     ++	if (!strcmp(opt->long_name, "add-file")) {
      +		path = prefix_filename(args->prefix, arg);
      +		if (stat(path, &info->stat))
      +			die(_("File not found: %s"), path);
     @@ archive.c: static int parse_archive_args(int argc, const char **argv,
       		  N_("add untracked file to archive"), 0, add_file_cb,
       		  (intptr_t)&base },
      +		{ OPTION_CALLBACK, 0, "add-file-with-content", args,
     -+		  N_("file"), N_("add untracked file to archive"), 0,
     ++		  N_("path:content"), N_("add untracked file to archive"), 0,
      +		  add_file_cb, (intptr_t)&base },
       		OPT_STRING('o', "output", &output, N_("file"),
       			N_("write the archive to this file")),
 -:  ----------- > 2:  ce4b1b680c9 archive --add-file-with-contents: allow paths containing colons
 2:  600da8d465e = 3:  5a3eeb55409 scalar: validate the optional enlistment argument
 3:  0d570137bb6 ! 4:  dfe821d10fe Implement `scalar diagnose`
     @@ Commit message
          we had the luxury of a comprehensive standard library that includes
          basic functionality such as writing a `.zip` file. In the C version, we
          lack such a commodity. Rather than introducing a dependency on, say,
     -    libzip, we slightly abuse Git's `archive` command: Instead of writing
     -    the `.zip` file directly, we stage the file contents in a Git index of a
     -    temporary, bare repository, only to let `git archive` have at it, and
     -    finally removing the temporary repository.
     -
     -    Also note: Due to the frequently-spawned `git hash-object` processes,
     -    this command is quite a bit slow on Windows. Should it turn out to be a
     -    big problem, the lack of a batch mode of the `hash-object` command could
     -    potentially be worked around via using `git fast-import` with a crafted
     -    `stdin`.
     +    libzip, we slightly abuse Git's `archive` machinery: we write out a
     +    `.zip` of the empty try, augmented by a couple files that are added via
     +    the `--add-file*` options. We are careful trying not to modify the
     +    current repository in any way lest the very circumstances that required
     +    `scalar diagnose` to be run are changed by the `diagnose` run itself.
      
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
     @@ contrib/scalar/scalar.c: cleanup:
      +	time_t now = time(NULL);
      +	struct tm tm;
      +	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
     -+	size_t off;
      +	int res = 0;
      +
      +	argc = parse_options(argc, argv, NULL, options,
     @@ contrib/scalar/scalar.c: cleanup:
      +	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
      +
      +	strbuf_reset(&buf);
     -+	strbuf_addstr(&buf,
     -+		      "--add-file-with-content=diagnostics.log:"
     -+		      "Collecting diagnostic info\n\n");
     ++	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
      +	get_version_info(&buf, 1);
      +
      +	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
     -+	off = strchr(buf.buf, ':') + 1 - buf.buf;
     -+	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
     -+	strvec_push(&archiver_args, buf.buf);
     ++	write_or_die(stdout_fd, buf.buf, buf.len);
     ++	strvec_pushf(&archiver_args,
     ++		     "--add-file-with-content=diagnostics.log:%.*s",
     ++		     (int)buf.len, buf.buf);
      +
      +	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
      +	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
     @@ contrib/scalar/scalar.c: cleanup:
      +	}
      +
      +	if (!res)
     -+		printf("\n"
     ++		fprintf(stderr, "\n"
      +		       "Diagnostics complete.\n"
      +		       "All of the gathered info is captured in '%s'\n",
      +		       zip_path.buf);
 4:  938e38b5a09 ! 5:  bb162abd383 scalar diagnose: include disk space information
     @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
       
       	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
      +	get_disk_info(&buf);
     - 	off = strchr(buf.buf, ':') + 1 - buf.buf;
     - 	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
     - 	strvec_push(&archiver_args, buf.buf);
     + 	write_or_die(stdout_fd, buf.buf, buf.len);
     + 	strvec_pushf(&archiver_args,
     + 		     "--add-file-with-content=diagnostics.log:%.*s",
      
       ## contrib/scalar/t/t9099-scalar.sh ##
      @@ contrib/scalar/t/t9099-scalar.sh: SQ="'"
 5:  bd9428919fa ! 6:  32aaad7cce1 scalar: teach `diagnose` to gather packfile info
     @@ contrib/scalar/scalar.c: cleanup:
       {
       	struct option options[] = {
      @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
     - 	write_or_die(stdout_fd, buf.buf + off, buf.len - off);
     - 	strvec_push(&archiver_args, buf.buf);
     + 		     "--add-file-with-content=diagnostics.log:%.*s",
     + 		     (int)buf.len, buf.buf);
       
      +	strbuf_reset(&buf);
      +	strbuf_addstr(&buf, "--add-file-with-content=packs-local.txt:");
 6:  7a8875be425 = 7:  322932f0bb8 scalar: teach `diagnose` to gather loose objects information

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v3 1/7] archive: optionally add "virtual" files
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
@ 2022-05-04 15:25     ` Johannes Schindelin via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
                       ` (7 subsequent siblings)
  8 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-04 15:25 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

With the `--add-file-with-content=<path>:<content>` option, `git
archive` now supports use cases where relatively trivial files need to
be added that do not exist on disk.

This will allow us to generate `.zip` files with generated content,
without having to add said content to the object database and without
having to write it out to disk.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 11 ++++++++
 archive.c                     | 51 +++++++++++++++++++++++++++++------
 t/t5003-archive-zip.sh        | 12 +++++++++
 3 files changed, 66 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index bc4e76a7834..a0edc9167b2 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -61,6 +61,17 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 
+--add-file-with-content=<path>:<content>::
+	Add the specified contents to the archive.  Can be repeated to add
+	multiple files.  The path of the file in the archive is built
+	by concatenating the value for `--prefix` (if any) and the
+	basename of <file>.
++
+The `<path>` cannot contain any colon, the file mode is limited to
+a regular file, and the option may be subject to platform-dependent
+command-line limits. For non-trivial cases, write an untracked file
+and use `--add-file` instead.
+
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
 	as well (see <<ATTRIBUTES>>).
diff --git a/archive.c b/archive.c
index a3bbb091256..d798624cd5f 100644
--- a/archive.c
+++ b/archive.c
@@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
 struct extra_file_info {
 	char *base;
 	struct stat stat;
+	void *content;
 };
 
 int write_archive_entries(struct archiver_args *args,
@@ -337,7 +338,13 @@ int write_archive_entries(struct archiver_args *args,
 		strbuf_addstr(&path_in_archive, basename(path));
 
 		strbuf_reset(&content);
-		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
+		if (info->content)
+			err = write_entry(args, &fake_oid, path_in_archive.buf,
+					  path_in_archive.len,
+					  info->stat.st_mode,
+					  info->content, info->stat.st_size);
+		else if (strbuf_read_file(&content, path,
+					  info->stat.st_size) < 0)
 			err = error_errno(_("could not read '%s'"), path);
 		else
 			err = write_entry(args, &fake_oid, path_in_archive.buf,
@@ -493,6 +500,7 @@ static void extra_file_info_clear(void *util, const char *str)
 {
 	struct extra_file_info *info = util;
 	free(info->base);
+	free(info->content);
 	free(info);
 }
 
@@ -514,14 +522,38 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 	if (!arg)
 		return -1;
 
-	path = prefix_filename(args->prefix, arg);
-	item = string_list_append_nodup(&args->extra_files, path);
-	item->util = info = xmalloc(sizeof(*info));
+	info = xmalloc(sizeof(*info));
 	info->base = xstrdup_or_null(base);
-	if (stat(path, &info->stat))
-		die(_("File not found: %s"), path);
-	if (!S_ISREG(info->stat.st_mode))
-		die(_("Not a regular file: %s"), path);
+
+	if (!strcmp(opt->long_name, "add-file")) {
+		path = prefix_filename(args->prefix, arg);
+		if (stat(path, &info->stat))
+			die(_("File not found: %s"), path);
+		if (!S_ISREG(info->stat.st_mode))
+			die(_("Not a regular file: %s"), path);
+		info->content = NULL; /* read the file later */
+	} else {
+		const char *colon = strchr(arg, ':');
+		char *p;
+
+		if (!colon)
+			die(_("missing colon: '%s'"), arg);
+
+		p = xstrndup(arg, colon - arg);
+		if (!args->prefix)
+			path = p;
+		else {
+			path = prefix_filename(args->prefix, p);
+			free(p);
+		}
+		memset(&info->stat, 0, sizeof(info->stat));
+		info->stat.st_mode = S_IFREG | 0644;
+		info->content = xstrdup(colon + 1);
+		info->stat.st_size = strlen(info->content);
+	}
+	item = string_list_append_nodup(&args->extra_files, path);
+	item->util = info;
+
 	return 0;
 }
 
@@ -554,6 +586,9 @@ static int parse_archive_args(int argc, const char **argv,
 		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
 		  N_("add untracked file to archive"), 0, add_file_cb,
 		  (intptr_t)&base },
+		{ OPTION_CALLBACK, 0, "add-file-with-content", args,
+		  N_("path:content"), N_("add untracked file to archive"), 0,
+		  add_file_cb, (intptr_t)&base },
 		OPT_STRING('o', "output", &output, N_("file"),
 			N_("write the archive to this file")),
 		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 1e6d18b140e..8ff1257f1a0 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -206,6 +206,18 @@ test_expect_success 'git archive --format=zip --add-file' '
 check_zip with_untracked
 check_added with_untracked untracked untracked
 
+test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
+	git archive --format=zip >with_file_with_content.zip \
+		--add-file-with-content=hello:world $EMPTY_TREE &&
+	test_when_finished "rm -rf tmp-unpack" &&
+	mkdir tmp-unpack && (
+		cd tmp-unpack &&
+		"$GIT_UNZIP" ../with_file_with_content.zip &&
+		test_path_is_file hello &&
+		test world = $(cat hello)
+	)
+'
+
 test_expect_success 'git archive --format=zip --add-file twice' '
 	echo untracked >untracked &&
 	git archive --format=zip --prefix=one/ --add-file=untracked \
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
@ 2022-05-04 15:25     ` Johannes Schindelin via GitGitGadget
  2022-05-07  2:06       ` Elijah Newren
  2022-05-04 15:25     ` [PATCH v3 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
                       ` (6 subsequent siblings)
  8 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-04 15:25 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

By allowing the path to be enclosed in double-quotes, we can avoid
the limitation that paths cannot contain colons.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 13 +++++++++----
 archive.c                     | 34 +++++++++++++++++++++++++++++-----
 t/t5003-archive-zip.sh        |  8 ++++++++
 3 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index a0edc9167b2..1789ce4c232 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -67,10 +67,15 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 +
-The `<path>` cannot contain any colon, the file mode is limited to
-a regular file, and the option may be subject to platform-dependent
-command-line limits. For non-trivial cases, write an untracked file
-and use `--add-file` instead.
+The `<path>` argument can start and end with a literal double-quote
+character. In this case, the backslash is interpreted as escape
+character. The path must be quoted if it contains a colon, to avoid
+the colon from being misinterpreted as the separator between the
+path and the contents.
++
+The file mode is limited to a regular file, and the option may be
+subject to platform-dependent command-line limits. For non-trivial
+cases, write an untracked file and use `--add-file` instead.
 
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
diff --git a/archive.c b/archive.c
index d798624cd5f..3b751027143 100644
--- a/archive.c
+++ b/archive.c
@@ -533,13 +533,37 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 			die(_("Not a regular file: %s"), path);
 		info->content = NULL; /* read the file later */
 	} else {
-		const char *colon = strchr(arg, ':');
 		char *p;
 
-		if (!colon)
-			die(_("missing colon: '%s'"), arg);
+		if (*arg != '"') {
+			const char *colon = strchr(arg, ':');
+
+			if (!colon)
+				die(_("missing colon: '%s'"), arg);
+			p = xstrndup(arg, colon - arg);
+			arg = colon + 1;
+		} else {
+			struct strbuf buf = STRBUF_INIT;
+			const char *orig = arg;
+
+			for (;;) {
+				if (!*(++arg))
+					die(_("unclosed quote: '%s'"), orig);
+				if (*arg == '"')
+					break;
+				if (*arg == '\\' && *(++arg) == '\0')
+					die(_("trailing backslash: '%s"), orig);
+				else
+					strbuf_addch(&buf, *arg);
+			}
+
+			if (*(++arg) != ':')
+				die(_("missing colon: '%s'"), orig);
+
+			p = strbuf_detach(&buf, NULL);
+			arg++;
+		}
 
-		p = xstrndup(arg, colon - arg);
 		if (!args->prefix)
 			path = p;
 		else {
@@ -548,7 +572,7 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 		}
 		memset(&info->stat, 0, sizeof(info->stat));
 		info->stat.st_mode = S_IFREG | 0644;
-		info->content = xstrdup(colon + 1);
+		info->content = xstrdup(arg);
 		info->stat.st_size = strlen(info->content);
 	}
 	item = string_list_append_nodup(&args->extra_files, path);
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 8ff1257f1a0..5b8bbfc2692 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -207,13 +207,21 @@ check_zip with_untracked
 check_added with_untracked untracked untracked
 
 test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
+	if test_have_prereq FUNNYNAMES
+	then
+		QUOTED=quoted:colon
+	else
+		QUOTED=quoted
+	fi &&
 	git archive --format=zip >with_file_with_content.zip \
+		--add-file-with-content=\"$QUOTED\": \
 		--add-file-with-content=hello:world $EMPTY_TREE &&
 	test_when_finished "rm -rf tmp-unpack" &&
 	mkdir tmp-unpack && (
 		cd tmp-unpack &&
 		"$GIT_UNZIP" ../with_file_with_content.zip &&
 		test_path_is_file hello &&
+		test_path_is_file $QUOTED &&
 		test world = $(cat hello)
 	)
 '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v3 3/7] scalar: validate the optional enlistment argument
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
@ 2022-05-04 15:25     ` Johannes Schindelin via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
                       ` (5 subsequent siblings)
  8 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-04 15:25 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `scalar` command needs a Scalar enlistment for many subcommands, and
looks in the current directory for such an enlistment (traversing the
parent directories until it finds one).

These is subcommands can also be called with an optional argument
specifying the enlistment. Here, too, we traverse parent directories as
needed, until we find an enlistment.

However, if the specified directory does not even exist, or is not a
directory, we should stop right there, with an error message.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 6 ++++--
 contrib/scalar/t/t9099-scalar.sh | 5 +++++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 1ce9c2b00e8..00dcd4b50ef 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -43,9 +43,11 @@ static void setup_enlistment_directory(int argc, const char **argv,
 		usage_with_options(usagestr, options);
 
 	/* find the worktree, determine its corresponding root */
-	if (argc == 1)
+	if (argc == 1) {
 		strbuf_add_absolute_path(&path, argv[0]);
-	else if (strbuf_getcwd(&path) < 0)
+		if (!is_directory(path.buf))
+			die(_("'%s' does not exist"), path.buf);
+	} else if (strbuf_getcwd(&path) < 0)
 		die(_("need a working directory"));
 
 	strbuf_trim_trailing_dir_sep(&path);
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 2e1502ad45e..9d83fdf25e8 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -85,4 +85,9 @@ test_expect_success 'scalar delete with enlistment' '
 	test_path_is_missing cloned
 '
 
+test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
+	! scalar run config cloned 2>err &&
+	grep "cloned. does not exist" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v3 4/7] Implement `scalar diagnose`
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                       ` (2 preceding siblings ...)
  2022-05-04 15:25     ` [PATCH v3 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
@ 2022-05-04 15:25     ` Johannes Schindelin via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
                       ` (4 subsequent siblings)
  8 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-04 15:25 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Over the course of Scalar's development, it became obvious that there is
a need for a command that can gather all kinds of useful information
that can help identify the most typical problems with large
worktrees/repositories.

The `diagnose` command is the culmination of this hard-won knowledge: it
gathers the installed hooks, the config, a couple statistics describing
the data shape, among other pieces of information, and then wraps
everything up in a tidy, neat `.zip` archive.

Note: originally, Scalar was implemented in C# using the .NET API, where
we had the luxury of a comprehensive standard library that includes
basic functionality such as writing a `.zip` file. In the C version, we
lack such a commodity. Rather than introducing a dependency on, say,
libzip, we slightly abuse Git's `archive` machinery: we write out a
`.zip` of the empty try, augmented by a couple files that are added via
the `--add-file*` options. We are careful trying not to modify the
current repository in any way lest the very circumstances that required
`scalar diagnose` to be run are changed by the `diagnose` run itself.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 141 +++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt        |  12 +++
 contrib/scalar/t/t9099-scalar.sh |  14 +++
 3 files changed, 167 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 00dcd4b50ef..a290e52e1d2 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,6 +11,7 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
+#include "archive.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -261,6 +262,44 @@ static int unregister_dir(void)
 	return res;
 }
 
+static int add_directory_to_archiver(struct strvec *archiver_args,
+					  const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir = opendir(at_root ? "." : path);
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	if (!dir)
+		return error(_("could not open directory '%s'"), path);
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if (e->d_type == DT_REG)
+			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
+		else if (e->d_type != DT_DIR)
+			res = -1;
+		else if (recurse)
+		     add_directory_to_archiver(archiver_args, buf.buf, recurse);
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -501,6 +540,107 @@ cleanup:
 	return res;
 }
 
+static int cmd_diagnose(int argc, const char **argv)
+{
+	struct option options[] = {
+		OPT_END(),
+	};
+	const char * const usage[] = {
+		N_("scalar diagnose [<enlistment>]"),
+		NULL
+	};
+	struct strbuf zip_path = STRBUF_INIT;
+	struct strvec archiver_args = STRVEC_INIT;
+	char **argv_copy = NULL;
+	int stdout_fd = -1, archiver_fd = -1;
+	time_t now = time(NULL);
+	struct tm tm;
+	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
+	int res = 0;
+
+	argc = parse_options(argc, argv, NULL, options,
+			     usage, 0);
+
+	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
+
+	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
+	strbuf_addftime(&zip_path,
+			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
+	strbuf_addstr(&zip_path, ".zip");
+	switch (safe_create_leading_directories(zip_path.buf)) {
+	case SCLD_EXISTS:
+	case SCLD_OK:
+		break;
+	default:
+		error_errno(_("could not create directory for '%s'"),
+			    zip_path.buf);
+		goto diagnose_cleanup;
+	}
+	stdout_fd = dup(1);
+	if (stdout_fd < 0) {
+		res = error_errno(_("could not duplicate stdout"));
+		goto diagnose_cleanup;
+	}
+
+	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
+		res = error_errno(_("could not redirect output"));
+		goto diagnose_cleanup;
+	}
+
+	init_zip_archiver();
+	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	write_or_die(stdout_fd, buf.buf, buf.len);
+	strvec_pushf(&archiver_args,
+		     "--add-file-with-content=diagnostics.log:%.*s",
+		     (int)buf.len, buf.buf);
+
+	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	strvec_pushl(&archiver_args, "--prefix=",
+		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
+
+	/* `write_archive()` modifies the `argv` passed to it. Let it. */
+	argv_copy = xmemdupz(archiver_args.v,
+			     sizeof(char *) * archiver_args.nr);
+	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
+			    the_repository, NULL, 0);
+	if (res) {
+		error(_("failed to write archive"));
+		goto diagnose_cleanup;
+	}
+
+	if (!res)
+		fprintf(stderr, "\n"
+		       "Diagnostics complete.\n"
+		       "All of the gathered info is captured in '%s'\n",
+		       zip_path.buf);
+
+diagnose_cleanup:
+	if (archiver_fd >= 0) {
+		close(1);
+		dup2(stdout_fd, 1);
+	}
+	free(argv_copy);
+	strvec_clear(&archiver_args);
+	strbuf_release(&zip_path);
+	strbuf_release(&path);
+	strbuf_release(&buf);
+
+	return res;
+}
+
 static int cmd_list(int argc, const char **argv)
 {
 	if (argc != 1)
@@ -802,6 +942,7 @@ static struct {
 	{ "reconfigure", cmd_reconfigure },
 	{ "delete", cmd_delete },
 	{ "version", cmd_version },
+	{ "diagnose", cmd_diagnose },
 	{ NULL, NULL},
 };
 
diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
index f416d637289..22583fe046e 100644
--- a/contrib/scalar/scalar.txt
+++ b/contrib/scalar/scalar.txt
@@ -14,6 +14,7 @@ scalar register [<enlistment>]
 scalar unregister [<enlistment>]
 scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
 scalar reconfigure [ --all | <enlistment> ]
+scalar diagnose [<enlistment>]
 scalar delete <enlistment>
 
 DESCRIPTION
@@ -129,6 +130,17 @@ reconfigure the enlistment.
 With the `--all` option, all enlistments currently registered with Scalar
 will be reconfigured. Use this option after each Scalar upgrade.
 
+Diagnose
+~~~~~~~~
+
+diagnose [<enlistment>]::
+    When reporting issues with Scalar, it is often helpful to provide the
+    information gathered by this command, including logs and certain
+    statistics describing the data shape of the current enlistment.
++
+The output of this command is a `.zip` file that is written into
+a directory adjacent to the worktree in the `src` directory.
+
 Delete
 ~~~~~~
 
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 9d83fdf25e8..bbd07a44426 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -90,4 +90,18 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 	grep "cloned. does not exist" err
 '
 
+SQ="'"
+test_expect_success UNZIP 'scalar diagnose' '
+	scalar clone "file://$(pwd)" cloned --single-branch &&
+	scalar diagnose cloned >out &&
+	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
+	zip_path=$(cat zip_path) &&
+	test -n "$zip_path" &&
+	unzip -v "$zip_path" &&
+	folder=${zip_path%.zip} &&
+	test_path_is_missing "$folder" &&
+	unzip -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v3 5/7] scalar diagnose: include disk space information
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                       ` (3 preceding siblings ...)
  2022-05-04 15:25     ` [PATCH v3 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-05-04 15:25     ` Johannes Schindelin via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
                       ` (3 subsequent siblings)
  8 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-04 15:25 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When analyzing problems with large worktrees/repositories, it is useful
to know how close to a "full disk" situation Scalar/Git operates. Let's
include this information.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 53 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  1 +
 2 files changed, 54 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index a290e52e1d2..df44902c909 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -300,6 +300,58 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 	return res;
 }
 
+#ifndef WIN32
+#include <sys/statvfs.h>
+#endif
+
+static int get_disk_info(struct strbuf *out)
+{
+#ifdef WIN32
+	struct strbuf buf = STRBUF_INIT;
+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
+	DWORD serial_number, component_length, flags;
+	ULARGE_INTEGER avail2caller, total, avail;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
+		error(_("could not determine free disk size for '%s'"),
+		      buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
+				   &serial_number, &component_length, &flags,
+				   fs_name, sizeof(fs_name))) {
+		error(_("could not get info for '%s'"), buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
+	strbuf_addch(out, '\n');
+	strbuf_release(&buf);
+#else
+	struct strbuf buf = STRBUF_INIT;
+	struct statvfs stat;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (statvfs(buf.buf, &stat) < 0) {
+		error_errno(_("could not determine free disk size for '%s'"),
+			    buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
+	strbuf_release(&buf);
+#endif
+	return 0;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -596,6 +648,7 @@ static int cmd_diagnose(int argc, const char **argv)
 	get_version_info(&buf, 1);
 
 	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	get_disk_info(&buf);
 	write_or_die(stdout_fd, buf.buf, buf.len);
 	strvec_pushf(&archiver_args,
 		     "--add-file-with-content=diagnostics.log:%.*s",
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index bbd07a44426..f3d037823c8 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -94,6 +94,7 @@ SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	scalar diagnose cloned >out &&
+	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
 	zip_path=$(cat zip_path) &&
 	test -n "$zip_path" &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v3 6/7] scalar: teach `diagnose` to gather packfile info
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                       ` (4 preceding siblings ...)
  2022-05-04 15:25     ` [PATCH v3 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
@ 2022-05-04 15:25     ` Matthew John Cheetham via GitGitGadget
  2022-05-04 15:25     ` [PATCH v3 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
                       ` (2 subsequent siblings)
  8 siblings, 0 replies; 109+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-05-04 15:25 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

It's helpful to see if there are other crud files in the pack
directory. Let's teach the `scalar diagnose` command to gather
file size information about pack files.

While at it, also enumerate the pack files in the alternate
object directories, if any are registered.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 30 ++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  6 +++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index df44902c909..9adde8cf4b9 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "help.h"
 #include "archive.h"
+#include "object-store.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -592,6 +593,29 @@ cleanup:
 	return res;
 }
 
+static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
+				   const char *file_name, void *data)
+{
+	struct strbuf *buf = data;
+	struct stat st;
+
+	if (!stat(full_path, &st))
+		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
+			    (uintmax_t)st.st_size);
+}
+
+static int dir_file_stats(struct object_directory *object_dir, void *data)
+{
+	struct strbuf *buf = data;
+
+	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
+
+	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
+				  data);
+
+	return 0;
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -654,6 +678,12 @@ static int cmd_diagnose(int argc, const char **argv)
 		     "--add-file-with-content=diagnostics.log:%.*s",
 		     (int)buf.len, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-file-with-content=packs-local.txt:");
+	dir_file_stats(the_repository->objects->odb, &buf);
+	foreach_alt_odb(dir_file_stats, &buf);
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index f3d037823c8..e049221609d 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -93,6 +93,8 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
+	git repack &&
+	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
 	scalar diagnose cloned >out &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
@@ -102,7 +104,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	folder=${zip_path%.zip} &&
 	test_path_is_missing "$folder" &&
 	unzip -p "$zip_path" diagnostics.log >out &&
-	test_file_not_empty out
+	test_file_not_empty out &&
+	unzip -p "$zip_path" packs-local.txt >out &&
+	grep "$(pwd)/.git/objects" out
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v3 7/7] scalar: teach `diagnose` to gather loose objects information
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                       ` (5 preceding siblings ...)
  2022-05-04 15:25     ` [PATCH v3 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
@ 2022-05-04 15:25     ` Matthew John Cheetham via GitGitGadget
  2022-05-07  2:23     ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Elijah Newren
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
  8 siblings, 0 replies; 109+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-05-04 15:25 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

When operating at the scale that Scalar wants to support, certain data
shapes are more likely to cause undesirable performance issues, such as
large numbers of loose objects.

By including statistics about this, `scalar diagnose` now makes it
easier to identify such scenarios.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 59 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  5 ++-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 9adde8cf4b9..f2fe3858eca 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -616,6 +616,60 @@ static int dir_file_stats(struct object_directory *object_dir, void *data)
 	return 0;
 }
 
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -684,6 +738,11 @@ static int cmd_diagnose(int argc, const char **argv)
 	foreach_alt_odb(dir_file_stats, &buf);
 	strvec_push(&archiver_args, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-file-with-content=objects-local.txt:");
+	loose_objs_stats(&buf, ".git/objects");
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index e049221609d..9b4eedbb0aa 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -95,6 +95,7 @@ test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	git repack &&
 	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
+	test_commit -C cloned/src loose &&
 	scalar diagnose cloned >out &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
@@ -106,7 +107,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	unzip -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
 	unzip -p "$zip_path" packs-local.txt >out &&
-	grep "$(pwd)/.git/objects" out
+	grep "$(pwd)/.git/objects" out &&
+	unzip -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [1-9]" out
 '
 
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-04 15:25     ` [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
@ 2022-05-07  2:06       ` Elijah Newren
  2022-05-09 21:04         ` Johannes Schindelin
  0 siblings, 1 reply; 109+ messages in thread
From: Elijah Newren @ 2022-05-07  2:06 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: Git Mailing List, René Scharfe, Taylor Blau, Derrick Stolee,
	Johannes Schindelin

On Wed, May 4, 2022 at 8:25 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> By allowing the path to be enclosed in double-quotes, we can avoid
> the limitation that paths cannot contain colons.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  Documentation/git-archive.txt | 13 +++++++++----
>  archive.c                     | 34 +++++++++++++++++++++++++++++-----
>  t/t5003-archive-zip.sh        |  8 ++++++++
>  3 files changed, 46 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
> index a0edc9167b2..1789ce4c232 100644
> --- a/Documentation/git-archive.txt
> +++ b/Documentation/git-archive.txt
> @@ -67,10 +67,15 @@ OPTIONS
>         by concatenating the value for `--prefix` (if any) and the
>         basename of <file>.
>  +
> -The `<path>` cannot contain any colon, the file mode is limited to
> -a regular file, and the option may be subject to platform-dependent
> -command-line limits. For non-trivial cases, write an untracked file
> -and use `--add-file` instead.
> +The `<path>` argument can start and end with a literal double-quote
> +character. In this case, the backslash is interpreted as escape
> +character. The path must be quoted if it contains a colon, to avoid
> +the colon from being misinterpreted as the separator between the
> +path and the contents.

The path must also be quoted if it begins or ends with a double-quote, right?

Also, would people want to be able to pass a pathname from the output
of e.g. `git ls-files -o`, which may quote additional characters?

> ++
> +The file mode is limited to a regular file, and the option may be
> +subject to platform-dependent command-line limits. For non-trivial
> +cases, write an untracked file and use `--add-file` instead.
>
>  --worktree-attributes::
>         Look for attributes in .gitattributes files in the working tree
> diff --git a/archive.c b/archive.c
> index d798624cd5f..3b751027143 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -533,13 +533,37 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
>                         die(_("Not a regular file: %s"), path);
>                 info->content = NULL; /* read the file later */
>         } else {
> -               const char *colon = strchr(arg, ':');
>                 char *p;
>
> -               if (!colon)
> -                       die(_("missing colon: '%s'"), arg);
> +               if (*arg != '"') {
> +                       const char *colon = strchr(arg, ':');
> +
> +                       if (!colon)
> +                               die(_("missing colon: '%s'"), arg);
> +                       p = xstrndup(arg, colon - arg);
> +                       arg = colon + 1;
> +               } else {
> +                       struct strbuf buf = STRBUF_INIT;
> +                       const char *orig = arg;
> +
> +                       for (;;) {
> +                               if (!*(++arg))
> +                                       die(_("unclosed quote: '%s'"), orig);
> +                               if (*arg == '"')
> +                                       break;
> +                               if (*arg == '\\' && *(++arg) == '\0')
> +                                       die(_("trailing backslash: '%s"), orig);
> +                               else
> +                                       strbuf_addch(&buf, *arg);
> +                       }
> +
> +                       if (*(++arg) != ':')
> +                               die(_("missing colon: '%s'"), orig);
> +
> +                       p = strbuf_detach(&buf, NULL);
> +                       arg++;
> +               }

Should we use unquote_c_style() here instead of rolling another parser
to do unquoting?  That would have the added benefit of allowing people
to use filenames from the output of various git commands that do
special quoting -- such as octal sequences for non-ascii characters.

>
> -               p = xstrndup(arg, colon - arg);
>                 if (!args->prefix)
>                         path = p;
>                 else {
> @@ -548,7 +572,7 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
>                 }
>                 memset(&info->stat, 0, sizeof(info->stat));
>                 info->stat.st_mode = S_IFREG | 0644;
> -               info->content = xstrdup(colon + 1);
> +               info->content = xstrdup(arg);
>                 info->stat.st_size = strlen(info->content);
>         }
>         item = string_list_append_nodup(&args->extra_files, path);
> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> index 8ff1257f1a0..5b8bbfc2692 100755
> --- a/t/t5003-archive-zip.sh
> +++ b/t/t5003-archive-zip.sh
> @@ -207,13 +207,21 @@ check_zip with_untracked
>  check_added with_untracked untracked untracked
>
>  test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
> +       if test_have_prereq FUNNYNAMES
> +       then
> +               QUOTED=quoted:colon
> +       else
> +               QUOTED=quoted
> +       fi &&
>         git archive --format=zip >with_file_with_content.zip \
> +               --add-file-with-content=\"$QUOTED\": \
>                 --add-file-with-content=hello:world $EMPTY_TREE &&
>         test_when_finished "rm -rf tmp-unpack" &&
>         mkdir tmp-unpack && (
>                 cd tmp-unpack &&
>                 "$GIT_UNZIP" ../with_file_with_content.zip &&
>                 test_path_is_file hello &&
> +               test_path_is_file $QUOTED &&
>                 test world = $(cat hello)
>         )
>  '
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v3 0/7] scalar: implement the subcommand "diagnose"
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                       ` (6 preceding siblings ...)
  2022-05-04 15:25     ` [PATCH v3 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
@ 2022-05-07  2:23     ` Elijah Newren
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
  8 siblings, 0 replies; 109+ messages in thread
From: Elijah Newren @ 2022-05-07  2:23 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: Git Mailing List, René Scharfe, Taylor Blau, Derrick Stolee,
	Johannes Schindelin

On Wed, May 4, 2022 at 8:25 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Over the course of the years, we developed a sub-command that gathers
> diagnostic data into a .zip file that can then be attached to bug reports.
> This sub-command turned out to be very useful in helping Scalar developers
> identify and fix issues.
>
> Changes since v2:
>
>  * Clarified in the commit message what the biggest benefit of
>    --add-file-with-content is.
>  * The <path> part of the -add-file-with-content argument can now contain
>    colons. To do this, the path needs to start and end in double-quote
>    characters (which are stripped), and the backslash serves as escape
>    character in that case (to allow the path to contain both colons and
>    double-quotes).

You addressed all my previous feedback from an earlier round.  The
only thing I noticed in this round is I wonder if we should use
unquote_c_style() for this, as commented on the patch in question.

>  * Fixed incorrect grammar.
>  * Instead of strcmp(<what-we-don't-want>), we now say
>    !strcmp(<what-we-want>).
>  * The help text for --add-file-with-content was improved a tiny bit.
>  * Adjusted the commit message that still talked about spawning plenty of
>    processes and about a throw-away repository for the sake of generating a
>    .zip file.
>  * Simplified the code that shows the diagnostics and adds them to the .zip
>    file.
>  * The final message that reports that the archive is complete is now
>    printed to stderr instead of stdout.
>
> Changes since v1:
>
>  * Instead of creating a throw-away repository, staging the contents of the
>    .zip file and then using git write-tree and git archive to write the .zip
>    file, the patch series now introduces a new option to git archive and
>    uses write_archive() directly (avoiding any separate process).
>  * Since the command avoids separate processes, it is now blazing fast on
>    Windows, and I dropped the spinner() function because it's no longer
>    needed.
>  * While reworking the test case, I noticed that scalar [...] <enlistment>
>    failed to verify that the specified directory exists, and would happily
>    "traverse to its parent directory" on its quest to find a Scalar
>    enlistment. That is of course incorrect, and has been fixed as a "while
>    at it" sort of preparatory commit.
>  * I had forgotten to sign off on all the commits, which has been fixed.
>  * Instead of some "home-grown" readdir()-based function, the code now uses
>    for_each_file_in_pack_dir() to look through the pack directories.
>  * If any alternates are configured, their pack directories are now included
>    in the output.
>  * The commit message that might be interpreted to promise information about
>    large loose files has been corrected to no longer promise that.
>  * The test cases have been adjusted to test a little bit more (e.g.
>    verifying that specific paths are mentioned in the output, instead of
>    merely verifying that the output is non-empty).
>
> Johannes Schindelin (5):
>   archive: optionally add "virtual" files
>   archive --add-file-with-contents: allow paths containing colons
>   scalar: validate the optional enlistment argument
>   Implement `scalar diagnose`
>   scalar diagnose: include disk space information
>
> Matthew John Cheetham (2):
>   scalar: teach `diagnose` to gather packfile info
>   scalar: teach `diagnose` to gather loose objects information
>
>  Documentation/git-archive.txt    |  16 ++
>  archive.c                        |  75 +++++++-
>  contrib/scalar/scalar.c          | 289 ++++++++++++++++++++++++++++++-
>  contrib/scalar/scalar.txt        |  12 ++
>  contrib/scalar/t/t9099-scalar.sh |  27 +++
>  t/t5003-archive-zip.sh           |  20 +++
>  6 files changed, 429 insertions(+), 10 deletions(-)
>
>
> base-commit: ddc35d833dd6f9e8946b09cecd3311b8aa18d295
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1128%2Fdscho%2Fscalar-diagnose-v3
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1128/dscho/scalar-diagnose-v3
> Pull-Request: https://github.com/gitgitgadget/git/pull/1128
>
> Range-diff vs v2:
>
>  1:  49ff3c1f2b3 ! 1:  45662cf582a archive: optionally add "virtual" files
>      @@ Commit message
>           archive` now supports use cases where relatively trivial files need to
>           be added that do not exist on disk.
>
>      +    This will allow us to generate `.zip` files with generated content,
>      +    without having to add said content to the object database and without
>      +    having to write it out to disk.
>      +
>           Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>
>        ## Documentation/git-archive.txt ##
>      @@ Documentation/git-archive.txt: OPTIONS
>       + basename of <file>.
>       ++
>       +The `<path>` cannot contain any colon, the file mode is limited to
>      -+a regular file, and the option may be subject platform-dependent
>      ++a regular file, and the option may be subject to platform-dependent
>       +command-line limits. For non-trivial cases, write an untracked file
>       +and use `--add-file` instead.
>       +
>      @@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int
>       - if (!S_ISREG(info->stat.st_mode))
>       -         die(_("Not a regular file: %s"), path);
>       +
>      -+ if (strcmp(opt->long_name, "add-file-with-content")) {
>      ++ if (!strcmp(opt->long_name, "add-file")) {
>       +         path = prefix_filename(args->prefix, arg);
>       +         if (stat(path, &info->stat))
>       +                 die(_("File not found: %s"), path);
>      @@ archive.c: static int parse_archive_args(int argc, const char **argv,
>                   N_("add untracked file to archive"), 0, add_file_cb,
>                   (intptr_t)&base },
>       +         { OPTION_CALLBACK, 0, "add-file-with-content", args,
>      -+           N_("file"), N_("add untracked file to archive"), 0,
>      ++           N_("path:content"), N_("add untracked file to archive"), 0,
>       +           add_file_cb, (intptr_t)&base },
>                 OPT_STRING('o', "output", &output, N_("file"),
>                         N_("write the archive to this file")),
>  -:  ----------- > 2:  ce4b1b680c9 archive --add-file-with-contents: allow paths containing colons
>  2:  600da8d465e = 3:  5a3eeb55409 scalar: validate the optional enlistment argument
>  3:  0d570137bb6 ! 4:  dfe821d10fe Implement `scalar diagnose`
>      @@ Commit message
>           we had the luxury of a comprehensive standard library that includes
>           basic functionality such as writing a `.zip` file. In the C version, we
>           lack such a commodity. Rather than introducing a dependency on, say,
>      -    libzip, we slightly abuse Git's `archive` command: Instead of writing
>      -    the `.zip` file directly, we stage the file contents in a Git index of a
>      -    temporary, bare repository, only to let `git archive` have at it, and
>      -    finally removing the temporary repository.
>      -
>      -    Also note: Due to the frequently-spawned `git hash-object` processes,
>      -    this command is quite a bit slow on Windows. Should it turn out to be a
>      -    big problem, the lack of a batch mode of the `hash-object` command could
>      -    potentially be worked around via using `git fast-import` with a crafted
>      -    `stdin`.
>      +    libzip, we slightly abuse Git's `archive` machinery: we write out a
>      +    `.zip` of the empty try, augmented by a couple files that are added via
>      +    the `--add-file*` options. We are careful trying not to modify the
>      +    current repository in any way lest the very circumstances that required
>      +    `scalar diagnose` to be run are changed by the `diagnose` run itself.
>
>           Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>
>      @@ contrib/scalar/scalar.c: cleanup:
>       + time_t now = time(NULL);
>       + struct tm tm;
>       + struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
>      -+ size_t off;
>       + int res = 0;
>       +
>       + argc = parse_options(argc, argv, NULL, options,
>      @@ contrib/scalar/scalar.c: cleanup:
>       + strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
>       +
>       + strbuf_reset(&buf);
>      -+ strbuf_addstr(&buf,
>      -+               "--add-file-with-content=diagnostics.log:"
>      -+               "Collecting diagnostic info\n\n");
>      ++ strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
>       + get_version_info(&buf, 1);
>       +
>       + strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
>      -+ off = strchr(buf.buf, ':') + 1 - buf.buf;
>      -+ write_or_die(stdout_fd, buf.buf + off, buf.len - off);
>      -+ strvec_push(&archiver_args, buf.buf);
>      ++ write_or_die(stdout_fd, buf.buf, buf.len);
>      ++ strvec_pushf(&archiver_args,
>      ++              "--add-file-with-content=diagnostics.log:%.*s",
>      ++              (int)buf.len, buf.buf);
>       +
>       + if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
>       +     (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
>      @@ contrib/scalar/scalar.c: cleanup:
>       + }
>       +
>       + if (!res)
>      -+         printf("\n"
>      ++         fprintf(stderr, "\n"
>       +                "Diagnostics complete.\n"
>       +                "All of the gathered info is captured in '%s'\n",
>       +                zip_path.buf);
>  4:  938e38b5a09 ! 5:  bb162abd383 scalar diagnose: include disk space information
>      @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
>
>         strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
>       + get_disk_info(&buf);
>      -  off = strchr(buf.buf, ':') + 1 - buf.buf;
>      -  write_or_die(stdout_fd, buf.buf + off, buf.len - off);
>      -  strvec_push(&archiver_args, buf.buf);
>      +  write_or_die(stdout_fd, buf.buf, buf.len);
>      +  strvec_pushf(&archiver_args,
>      +               "--add-file-with-content=diagnostics.log:%.*s",
>
>        ## contrib/scalar/t/t9099-scalar.sh ##
>       @@ contrib/scalar/t/t9099-scalar.sh: SQ="'"
>  5:  bd9428919fa ! 6:  32aaad7cce1 scalar: teach `diagnose` to gather packfile info
>      @@ contrib/scalar/scalar.c: cleanup:
>        {
>         struct option options[] = {
>       @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
>      -  write_or_die(stdout_fd, buf.buf + off, buf.len - off);
>      -  strvec_push(&archiver_args, buf.buf);
>      +               "--add-file-with-content=diagnostics.log:%.*s",
>      +               (int)buf.len, buf.buf);
>
>       + strbuf_reset(&buf);
>       + strbuf_addstr(&buf, "--add-file-with-content=packs-local.txt:");
>  6:  7a8875be425 = 7:  322932f0bb8 scalar: teach `diagnose` to gather loose objects information
>
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-07  2:06       ` Elijah Newren
@ 2022-05-09 21:04         ` Johannes Schindelin
  0 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2022-05-09 21:04 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Johannes Schindelin via GitGitGadget, Git Mailing List,
	René Scharfe, Taylor Blau, Derrick Stolee

Hi Elijah,

On Fri, 6 May 2022, Elijah Newren wrote:

> On Wed, May 4, 2022 at 8:25 AM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > By allowing the path to be enclosed in double-quotes, we can avoid
> > the limitation that paths cannot contain colons.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  Documentation/git-archive.txt | 13 +++++++++----
> >  archive.c                     | 34 +++++++++++++++++++++++++++++-----
> >  t/t5003-archive-zip.sh        |  8 ++++++++
> >  3 files changed, 46 insertions(+), 9 deletions(-)
> >
> > diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
> > index a0edc9167b2..1789ce4c232 100644
> > --- a/Documentation/git-archive.txt
> > +++ b/Documentation/git-archive.txt
> > @@ -67,10 +67,15 @@ OPTIONS
> >         by concatenating the value for `--prefix` (if any) and the
> >         basename of <file>.
> >  +
> > -The `<path>` cannot contain any colon, the file mode is limited to
> > -a regular file, and the option may be subject to platform-dependent
> > -command-line limits. For non-trivial cases, write an untracked file
> > -and use `--add-file` instead.
> > +The `<path>` argument can start and end with a literal double-quote
> > +character. In this case, the backslash is interpreted as escape
> > +character. The path must be quoted if it contains a colon, to avoid
> > +the colon from being misinterpreted as the separator between the
> > +path and the contents.
>
> The path must also be quoted if it begins or ends with a double-quote, right?

True.

> Also, would people want to be able to pass a pathname from the output
> of e.g. `git ls-files -o`, which may quote additional characters?

Also true.

> > ++
> > +The file mode is limited to a regular file, and the option may be
> > +subject to platform-dependent command-line limits. For non-trivial
> > +cases, write an untracked file and use `--add-file` instead.
> >
> >  --worktree-attributes::
> >         Look for attributes in .gitattributes files in the working tree
> > diff --git a/archive.c b/archive.c
> > index d798624cd5f..3b751027143 100644
> > --- a/archive.c
> > +++ b/archive.c
> > @@ -533,13 +533,37 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
> >                         die(_("Not a regular file: %s"), path);
> >                 info->content = NULL; /* read the file later */
> >         } else {
> > -               const char *colon = strchr(arg, ':');
> >                 char *p;
> >
> > -               if (!colon)
> > -                       die(_("missing colon: '%s'"), arg);
> > +               if (*arg != '"') {
> > +                       const char *colon = strchr(arg, ':');
> > +
> > +                       if (!colon)
> > +                               die(_("missing colon: '%s'"), arg);
> > +                       p = xstrndup(arg, colon - arg);
> > +                       arg = colon + 1;
> > +               } else {
> > +                       struct strbuf buf = STRBUF_INIT;
> > +                       const char *orig = arg;
> > +
> > +                       for (;;) {
> > +                               if (!*(++arg))
> > +                                       die(_("unclosed quote: '%s'"), orig);
> > +                               if (*arg == '"')
> > +                                       break;
> > +                               if (*arg == '\\' && *(++arg) == '\0')
> > +                                       die(_("trailing backslash: '%s"), orig);
> > +                               else
> > +                                       strbuf_addch(&buf, *arg);
> > +                       }
> > +
> > +                       if (*(++arg) != ':')
> > +                               die(_("missing colon: '%s'"), orig);
> > +
> > +                       p = strbuf_detach(&buf, NULL);
> > +                       arg++;
> > +               }
>
> Should we use unquote_c_style() here instead of rolling another parser
> to do unquoting?  That would have the added benefit of allowing people
> to use filenames from the output of various git commands that do
> special quoting -- such as octal sequences for non-ascii characters.

Yep, let's do that. I somehow missed that function while glimpsing at
`quote.h`.

Thank you for your review!
Dscho

> >
> > -               p = xstrndup(arg, colon - arg);
> >                 if (!args->prefix)
> >                         path = p;
> >                 else {
> > @@ -548,7 +572,7 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
> >                 }
> >                 memset(&info->stat, 0, sizeof(info->stat));
> >                 info->stat.st_mode = S_IFREG | 0644;
> > -               info->content = xstrdup(colon + 1);
> > +               info->content = xstrdup(arg);
> >                 info->stat.st_size = strlen(info->content);
> >         }
> >         item = string_list_append_nodup(&args->extra_files, path);
> > diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> > index 8ff1257f1a0..5b8bbfc2692 100755
> > --- a/t/t5003-archive-zip.sh
> > +++ b/t/t5003-archive-zip.sh
> > @@ -207,13 +207,21 @@ check_zip with_untracked
> >  check_added with_untracked untracked untracked
> >
> >  test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
> > +       if test_have_prereq FUNNYNAMES
> > +       then
> > +               QUOTED=quoted:colon
> > +       else
> > +               QUOTED=quoted
> > +       fi &&
> >         git archive --format=zip >with_file_with_content.zip \
> > +               --add-file-with-content=\"$QUOTED\": \
> >                 --add-file-with-content=hello:world $EMPTY_TREE &&
> >         test_when_finished "rm -rf tmp-unpack" &&
> >         mkdir tmp-unpack && (
> >                 cd tmp-unpack &&
> >                 "$GIT_UNZIP" ../with_file_with_content.zip &&
> >                 test_path_is_file hello &&
> > +               test_path_is_file $QUOTED &&
> >                 test world = $(cat hello)
> >         )
> >  '
> > --
> > gitgitgadget
>
>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v4 0/7] scalar: implement the subcommand "diagnose"
  2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
                       ` (7 preceding siblings ...)
  2022-05-07  2:23     ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Elijah Newren
@ 2022-05-10 19:26     ` Johannes Schindelin via GitGitGadget
  2022-05-10 19:26       ` [PATCH v4 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
                         ` (8 more replies)
  8 siblings, 9 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-10 19:26 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin

Over the course of the years, we developed a sub-command that gathers
diagnostic data into a .zip file that can then be attached to bug reports.
This sub-command turned out to be very useful in helping Scalar developers
identify and fix issues.

Changes since v3:

 * We're now using unquote_c_style() instead of rolling our own unquoter.
 * Fixed the added regression test.
 * As pointed out by Scalar's Functional Tests, the
   add_directory_to_archiver() function should not fail when scalar diagnose
   encounters FSMonitor's Unix socket, but only warn instead.
 * Related: add_directory_to_archiver() needs to propagate errors from
   processing subdirectories so that the top-level call returns an error,
   too.

Changes since v2:

 * Clarified in the commit message what the biggest benefit of
   --add-file-with-content is.
 * The <path> part of the -add-file-with-content argument can now contain
   colons. To do this, the path needs to start and end in double-quote
   characters (which are stripped), and the backslash serves as escape
   character in that case (to allow the path to contain both colons and
   double-quotes).
 * Fixed incorrect grammar.
 * Instead of strcmp(<what-we-don't-want>), we now say
   !strcmp(<what-we-want>).
 * The help text for --add-file-with-content was improved a tiny bit.
 * Adjusted the commit message that still talked about spawning plenty of
   processes and about a throw-away repository for the sake of generating a
   .zip file.
 * Simplified the code that shows the diagnostics and adds them to the .zip
   file.
 * The final message that reports that the archive is complete is now
   printed to stderr instead of stdout.

Changes since v1:

 * Instead of creating a throw-away repository, staging the contents of the
   .zip file and then using git write-tree and git archive to write the .zip
   file, the patch series now introduces a new option to git archive and
   uses write_archive() directly (avoiding any separate process).
 * Since the command avoids separate processes, it is now blazing fast on
   Windows, and I dropped the spinner() function because it's no longer
   needed.
 * While reworking the test case, I noticed that scalar [...] <enlistment>
   failed to verify that the specified directory exists, and would happily
   "traverse to its parent directory" on its quest to find a Scalar
   enlistment. That is of course incorrect, and has been fixed as a "while
   at it" sort of preparatory commit.
 * I had forgotten to sign off on all the commits, which has been fixed.
 * Instead of some "home-grown" readdir()-based function, the code now uses
   for_each_file_in_pack_dir() to look through the pack directories.
 * If any alternates are configured, their pack directories are now included
   in the output.
 * The commit message that might be interpreted to promise information about
   large loose files has been corrected to no longer promise that.
 * The test cases have been adjusted to test a little bit more (e.g.
   verifying that specific paths are mentioned in the output, instead of
   merely verifying that the output is non-empty).

Johannes Schindelin (5):
  archive: optionally add "virtual" files
  archive --add-file-with-contents: allow paths containing colons
  scalar: validate the optional enlistment argument
  Implement `scalar diagnose`
  scalar diagnose: include disk space information

Matthew John Cheetham (2):
  scalar: teach `diagnose` to gather packfile info
  scalar: teach `diagnose` to gather loose objects information

 Documentation/git-archive.txt    |  17 ++
 archive.c                        |  61 ++++++-
 contrib/scalar/scalar.c          | 292 ++++++++++++++++++++++++++++++-
 contrib/scalar/scalar.txt        |  12 ++
 contrib/scalar/t/t9099-scalar.sh |  27 +++
 t/t5003-archive-zip.sh           |  20 +++
 6 files changed, 419 insertions(+), 10 deletions(-)


base-commit: ddc35d833dd6f9e8946b09cecd3311b8aa18d295
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1128%2Fdscho%2Fscalar-diagnose-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1128/dscho/scalar-diagnose-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1128

Range-diff vs v3:

 1:  45662cf582a = 1:  45662cf582a archive: optionally add "virtual" files
 2:  ce4b1b680c9 ! 2:  fdba4ed6f4d archive --add-file-with-contents: allow paths containing colons
     @@ Documentation/git-archive.txt: OPTIONS
      -command-line limits. For non-trivial cases, write an untracked file
      -and use `--add-file` instead.
      +The `<path>` argument can start and end with a literal double-quote
     -+character. In this case, the backslash is interpreted as escape
     -+character. The path must be quoted if it contains a colon, to avoid
     -+the colon from being misinterpreted as the separator between the
     -+path and the contents.
     ++character; The contained file name is interpreted as a C-style string,
     ++i.e. the backslash is interpreted as escape character. The path must
     ++be quoted if it contains a colon, to avoid the colon from being
     ++misinterpreted as the separator between the path and the contents, or
     ++if the path begins or ends with a double-quote character.
      ++
      +The file mode is limited to a regular file, and the option may be
      +subject to platform-dependent command-line limits. For non-trivial
     @@ Documentation/git-archive.txt: OPTIONS
       	Look for attributes in .gitattributes files in the working tree
      
       ## archive.c ##
     +@@
     + #include "parse-options.h"
     + #include "unpack-trees.h"
     + #include "dir.h"
     ++#include "quote.h"
     + 
     + static char const * const archive_usage[] = {
     + 	N_("git archive [<options>] <tree-ish> [<path>...]"),
      @@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int unset)
       			die(_("Not a regular file: %s"), path);
       		info->content = NULL; /* read the file later */
       	} else {
      -		const char *colon = strchr(arg, ':');
     - 		char *p;
     +-		char *p;
     ++		struct strbuf buf = STRBUF_INIT;
     ++		const char *p = arg;
     ++
     ++		if (*p != '"')
     ++			p = strchr(p, ':');
     ++		else if (unquote_c_style(&buf, p, &p) < 0)
     ++			die(_("unclosed quote: '%s'"), arg);
       
      -		if (!colon)
     --			die(_("missing colon: '%s'"), arg);
     -+		if (*arg != '"') {
     -+			const char *colon = strchr(arg, ':');
     -+
     -+			if (!colon)
     -+				die(_("missing colon: '%s'"), arg);
     -+			p = xstrndup(arg, colon - arg);
     -+			arg = colon + 1;
     -+		} else {
     -+			struct strbuf buf = STRBUF_INIT;
     -+			const char *orig = arg;
     -+
     -+			for (;;) {
     -+				if (!*(++arg))
     -+					die(_("unclosed quote: '%s'"), orig);
     -+				if (*arg == '"')
     -+					break;
     -+				if (*arg == '\\' && *(++arg) == '\0')
     -+					die(_("trailing backslash: '%s"), orig);
     -+				else
     -+					strbuf_addch(&buf, *arg);
     -+			}
     -+
     -+			if (*(++arg) != ':')
     -+				die(_("missing colon: '%s'"), orig);
     -+
     -+			p = strbuf_detach(&buf, NULL);
     -+			arg++;
     -+		}
     ++		if (!p || *p != ':')
     + 			die(_("missing colon: '%s'"), arg);
       
      -		p = xstrndup(arg, colon - arg);
     - 		if (!args->prefix)
     - 			path = p;
     - 		else {
     -@@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int unset)
     +-		if (!args->prefix)
     +-			path = p;
     +-		else {
     +-			path = prefix_filename(args->prefix, p);
     +-			free(p);
     ++		if (p == arg)
     ++			die(_("empty file name: '%s'"), arg);
     ++
     ++		path = buf.len ?
     ++			strbuf_detach(&buf, NULL) : xstrndup(arg, p - arg);
     ++
     ++		if (args->prefix) {
     ++			char *save = path;
     ++			path = prefix_filename(args->prefix, path);
     ++			free(save);
       		}
       		memset(&info->stat, 0, sizeof(info->stat));
       		info->stat.st_mode = S_IFREG | 0644;
      -		info->content = xstrdup(colon + 1);
     -+		info->content = xstrdup(arg);
     ++		info->content = xstrdup(p + 1);
       		info->stat.st_size = strlen(info->content);
       	}
       	item = string_list_append_nodup(&args->extra_files, path);
 3:  5a3eeb55409 = 3:  da9f52a8240 scalar: validate the optional enlistment argument
 4:  dfe821d10fe ! 4:  87bdc22322b Implement `scalar diagnose`
     @@ contrib/scalar/scalar.c: static int unregister_dir(void)
      +		if (e->d_type == DT_REG)
      +			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
      +		else if (e->d_type != DT_DIR)
     ++			warning(_("skipping '%s', which is neither file nor "
     ++				  "directory"), buf.buf);
     ++		else if (recurse &&
     ++			 add_directory_to_archiver(archiver_args,
     ++						   buf.buf, recurse) < 0)
      +			res = -1;
     -+		else if (recurse)
     -+		     add_directory_to_archiver(archiver_args, buf.buf, recurse);
      +	}
      +
      +	closedir(dir);
     @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success '`scalar [...] <dir>` erro
      +SQ="'"
      +test_expect_success UNZIP 'scalar diagnose' '
      +	scalar clone "file://$(pwd)" cloned --single-branch &&
     -+	scalar diagnose cloned >out &&
     -+	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
     ++	scalar diagnose cloned >out 2>err &&
     ++	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
      +	zip_path=$(cat zip_path) &&
      +	test -n "$zip_path" &&
      +	unzip -v "$zip_path" &&
 5:  bb162abd383 ! 5:  3f63b197d42 scalar diagnose: include disk space information
     @@ contrib/scalar/t/t9099-scalar.sh
      @@ contrib/scalar/t/t9099-scalar.sh: SQ="'"
       test_expect_success UNZIP 'scalar diagnose' '
       	scalar clone "file://$(pwd)" cloned --single-branch &&
     - 	scalar diagnose cloned >out &&
     + 	scalar diagnose cloned >out 2>err &&
      +	grep "Available space" out &&
     - 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
     + 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
       	zip_path=$(cat zip_path) &&
       	test -n "$zip_path" &&
 6:  32aaad7cce1 ! 6:  fc1319338fc scalar: teach `diagnose` to gather packfile info
     @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success '`scalar [...] <dir>` erro
       	scalar clone "file://$(pwd)" cloned --single-branch &&
      +	git repack &&
      +	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
     - 	scalar diagnose cloned >out &&
     + 	scalar diagnose cloned >out 2>err &&
       	grep "Available space" out &&
     - 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
     + 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
      @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success UNZIP 'scalar diagnose' '
       	folder=${zip_path%.zip} &&
       	test_path_is_missing "$folder" &&
 7:  322932f0bb8 ! 7:  e8f5b42f7b7 scalar: teach `diagnose` to gather loose objects information
     @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success UNZIP 'scalar diagnose' '
       	git repack &&
       	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
      +	test_commit -C cloned/src loose &&
     - 	scalar diagnose cloned >out &&
     + 	scalar diagnose cloned >out 2>err &&
       	grep "Available space" out &&
     - 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
     + 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
      @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success UNZIP 'scalar diagnose' '
       	unzip -p "$zip_path" diagnostics.log >out &&
       	test_file_not_empty out &&

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
@ 2022-05-10 19:26       ` Johannes Schindelin via GitGitGadget
  2022-05-10 21:48         ` Junio C Hamano
  2022-05-10 19:26       ` [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
                         ` (7 subsequent siblings)
  8 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-10 19:26 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

With the `--add-file-with-content=<path>:<content>` option, `git
archive` now supports use cases where relatively trivial files need to
be added that do not exist on disk.

This will allow us to generate `.zip` files with generated content,
without having to add said content to the object database and without
having to write it out to disk.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 11 ++++++++
 archive.c                     | 51 +++++++++++++++++++++++++++++------
 t/t5003-archive-zip.sh        | 12 +++++++++
 3 files changed, 66 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index bc4e76a7834..a0edc9167b2 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -61,6 +61,17 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 
+--add-file-with-content=<path>:<content>::
+	Add the specified contents to the archive.  Can be repeated to add
+	multiple files.  The path of the file in the archive is built
+	by concatenating the value for `--prefix` (if any) and the
+	basename of <file>.
++
+The `<path>` cannot contain any colon, the file mode is limited to
+a regular file, and the option may be subject to platform-dependent
+command-line limits. For non-trivial cases, write an untracked file
+and use `--add-file` instead.
+
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
 	as well (see <<ATTRIBUTES>>).
diff --git a/archive.c b/archive.c
index a3bbb091256..d798624cd5f 100644
--- a/archive.c
+++ b/archive.c
@@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
 struct extra_file_info {
 	char *base;
 	struct stat stat;
+	void *content;
 };
 
 int write_archive_entries(struct archiver_args *args,
@@ -337,7 +338,13 @@ int write_archive_entries(struct archiver_args *args,
 		strbuf_addstr(&path_in_archive, basename(path));
 
 		strbuf_reset(&content);
-		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
+		if (info->content)
+			err = write_entry(args, &fake_oid, path_in_archive.buf,
+					  path_in_archive.len,
+					  info->stat.st_mode,
+					  info->content, info->stat.st_size);
+		else if (strbuf_read_file(&content, path,
+					  info->stat.st_size) < 0)
 			err = error_errno(_("could not read '%s'"), path);
 		else
 			err = write_entry(args, &fake_oid, path_in_archive.buf,
@@ -493,6 +500,7 @@ static void extra_file_info_clear(void *util, const char *str)
 {
 	struct extra_file_info *info = util;
 	free(info->base);
+	free(info->content);
 	free(info);
 }
 
@@ -514,14 +522,38 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 	if (!arg)
 		return -1;
 
-	path = prefix_filename(args->prefix, arg);
-	item = string_list_append_nodup(&args->extra_files, path);
-	item->util = info = xmalloc(sizeof(*info));
+	info = xmalloc(sizeof(*info));
 	info->base = xstrdup_or_null(base);
-	if (stat(path, &info->stat))
-		die(_("File not found: %s"), path);
-	if (!S_ISREG(info->stat.st_mode))
-		die(_("Not a regular file: %s"), path);
+
+	if (!strcmp(opt->long_name, "add-file")) {
+		path = prefix_filename(args->prefix, arg);
+		if (stat(path, &info->stat))
+			die(_("File not found: %s"), path);
+		if (!S_ISREG(info->stat.st_mode))
+			die(_("Not a regular file: %s"), path);
+		info->content = NULL; /* read the file later */
+	} else {
+		const char *colon = strchr(arg, ':');
+		char *p;
+
+		if (!colon)
+			die(_("missing colon: '%s'"), arg);
+
+		p = xstrndup(arg, colon - arg);
+		if (!args->prefix)
+			path = p;
+		else {
+			path = prefix_filename(args->prefix, p);
+			free(p);
+		}
+		memset(&info->stat, 0, sizeof(info->stat));
+		info->stat.st_mode = S_IFREG | 0644;
+		info->content = xstrdup(colon + 1);
+		info->stat.st_size = strlen(info->content);
+	}
+	item = string_list_append_nodup(&args->extra_files, path);
+	item->util = info;
+
 	return 0;
 }
 
@@ -554,6 +586,9 @@ static int parse_archive_args(int argc, const char **argv,
 		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
 		  N_("add untracked file to archive"), 0, add_file_cb,
 		  (intptr_t)&base },
+		{ OPTION_CALLBACK, 0, "add-file-with-content", args,
+		  N_("path:content"), N_("add untracked file to archive"), 0,
+		  add_file_cb, (intptr_t)&base },
 		OPT_STRING('o', "output", &output, N_("file"),
 			N_("write the archive to this file")),
 		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 1e6d18b140e..8ff1257f1a0 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -206,6 +206,18 @@ test_expect_success 'git archive --format=zip --add-file' '
 check_zip with_untracked
 check_added with_untracked untracked untracked
 
+test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
+	git archive --format=zip >with_file_with_content.zip \
+		--add-file-with-content=hello:world $EMPTY_TREE &&
+	test_when_finished "rm -rf tmp-unpack" &&
+	mkdir tmp-unpack && (
+		cd tmp-unpack &&
+		"$GIT_UNZIP" ../with_file_with_content.zip &&
+		test_path_is_file hello &&
+		test world = $(cat hello)
+	)
+'
+
 test_expect_success 'git archive --format=zip --add-file twice' '
 	echo untracked >untracked &&
 	git archive --format=zip --prefix=one/ --add-file=untracked \
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
  2022-05-10 19:26       ` [PATCH v4 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
@ 2022-05-10 19:26       ` Johannes Schindelin via GitGitGadget
  2022-05-10 21:56         ` Junio C Hamano
  2022-05-10 19:27       ` [PATCH v4 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
                         ` (6 subsequent siblings)
  8 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-10 19:26 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

By allowing the path to be enclosed in double-quotes, we can avoid
the limitation that paths cannot contain colons.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 14 ++++++++++----
 archive.c                     | 30 ++++++++++++++++++++----------
 t/t5003-archive-zip.sh        |  8 ++++++++
 3 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index a0edc9167b2..21eab5690ad 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -67,10 +67,16 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 +
-The `<path>` cannot contain any colon, the file mode is limited to
-a regular file, and the option may be subject to platform-dependent
-command-line limits. For non-trivial cases, write an untracked file
-and use `--add-file` instead.
+The `<path>` argument can start and end with a literal double-quote
+character; The contained file name is interpreted as a C-style string,
+i.e. the backslash is interpreted as escape character. The path must
+be quoted if it contains a colon, to avoid the colon from being
+misinterpreted as the separator between the path and the contents, or
+if the path begins or ends with a double-quote character.
++
+The file mode is limited to a regular file, and the option may be
+subject to platform-dependent command-line limits. For non-trivial
+cases, write an untracked file and use `--add-file` instead.
 
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
diff --git a/archive.c b/archive.c
index d798624cd5f..477eba60ac3 100644
--- a/archive.c
+++ b/archive.c
@@ -9,6 +9,7 @@
 #include "parse-options.h"
 #include "unpack-trees.h"
 #include "dir.h"
+#include "quote.h"
 
 static char const * const archive_usage[] = {
 	N_("git archive [<options>] <tree-ish> [<path>...]"),
@@ -533,22 +534,31 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 			die(_("Not a regular file: %s"), path);
 		info->content = NULL; /* read the file later */
 	} else {
-		const char *colon = strchr(arg, ':');
-		char *p;
+		struct strbuf buf = STRBUF_INIT;
+		const char *p = arg;
+
+		if (*p != '"')
+			p = strchr(p, ':');
+		else if (unquote_c_style(&buf, p, &p) < 0)
+			die(_("unclosed quote: '%s'"), arg);
 
-		if (!colon)
+		if (!p || *p != ':')
 			die(_("missing colon: '%s'"), arg);
 
-		p = xstrndup(arg, colon - arg);
-		if (!args->prefix)
-			path = p;
-		else {
-			path = prefix_filename(args->prefix, p);
-			free(p);
+		if (p == arg)
+			die(_("empty file name: '%s'"), arg);
+
+		path = buf.len ?
+			strbuf_detach(&buf, NULL) : xstrndup(arg, p - arg);
+
+		if (args->prefix) {
+			char *save = path;
+			path = prefix_filename(args->prefix, path);
+			free(save);
 		}
 		memset(&info->stat, 0, sizeof(info->stat));
 		info->stat.st_mode = S_IFREG | 0644;
-		info->content = xstrdup(colon + 1);
+		info->content = xstrdup(p + 1);
 		info->stat.st_size = strlen(info->content);
 	}
 	item = string_list_append_nodup(&args->extra_files, path);
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 8ff1257f1a0..5b8bbfc2692 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -207,13 +207,21 @@ check_zip with_untracked
 check_added with_untracked untracked untracked
 
 test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
+	if test_have_prereq FUNNYNAMES
+	then
+		QUOTED=quoted:colon
+	else
+		QUOTED=quoted
+	fi &&
 	git archive --format=zip >with_file_with_content.zip \
+		--add-file-with-content=\"$QUOTED\": \
 		--add-file-with-content=hello:world $EMPTY_TREE &&
 	test_when_finished "rm -rf tmp-unpack" &&
 	mkdir tmp-unpack && (
 		cd tmp-unpack &&
 		"$GIT_UNZIP" ../with_file_with_content.zip &&
 		test_path_is_file hello &&
+		test_path_is_file $QUOTED &&
 		test world = $(cat hello)
 	)
 '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
  2022-05-10 19:26       ` [PATCH v4 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
  2022-05-10 19:26       ` [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
@ 2022-05-10 19:27       ` Johannes Schindelin via GitGitGadget
  2022-05-17 14:51         ` Ævar Arnfjörð Bjarmason
  2022-05-10 19:27       ` [PATCH v4 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
                         ` (5 subsequent siblings)
  8 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-10 19:27 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `scalar` command needs a Scalar enlistment for many subcommands, and
looks in the current directory for such an enlistment (traversing the
parent directories until it finds one).

These is subcommands can also be called with an optional argument
specifying the enlistment. Here, too, we traverse parent directories as
needed, until we find an enlistment.

However, if the specified directory does not even exist, or is not a
directory, we should stop right there, with an error message.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 6 ++++--
 contrib/scalar/t/t9099-scalar.sh | 5 +++++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 1ce9c2b00e8..00dcd4b50ef 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -43,9 +43,11 @@ static void setup_enlistment_directory(int argc, const char **argv,
 		usage_with_options(usagestr, options);
 
 	/* find the worktree, determine its corresponding root */
-	if (argc == 1)
+	if (argc == 1) {
 		strbuf_add_absolute_path(&path, argv[0]);
-	else if (strbuf_getcwd(&path) < 0)
+		if (!is_directory(path.buf))
+			die(_("'%s' does not exist"), path.buf);
+	} else if (strbuf_getcwd(&path) < 0)
 		die(_("need a working directory"));
 
 	strbuf_trim_trailing_dir_sep(&path);
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 2e1502ad45e..9d83fdf25e8 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -85,4 +85,9 @@ test_expect_success 'scalar delete with enlistment' '
 	test_path_is_missing cloned
 '
 
+test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
+	! scalar run config cloned 2>err &&
+	grep "cloned. does not exist" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v4 4/7] Implement `scalar diagnose`
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
                         ` (2 preceding siblings ...)
  2022-05-10 19:27       ` [PATCH v4 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
@ 2022-05-10 19:27       ` Johannes Schindelin via GitGitGadget
  2022-05-17 14:53         ` Ævar Arnfjörð Bjarmason
  2022-05-10 19:27       ` [PATCH v4 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
                         ` (4 subsequent siblings)
  8 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-10 19:27 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Over the course of Scalar's development, it became obvious that there is
a need for a command that can gather all kinds of useful information
that can help identify the most typical problems with large
worktrees/repositories.

The `diagnose` command is the culmination of this hard-won knowledge: it
gathers the installed hooks, the config, a couple statistics describing
the data shape, among other pieces of information, and then wraps
everything up in a tidy, neat `.zip` archive.

Note: originally, Scalar was implemented in C# using the .NET API, where
we had the luxury of a comprehensive standard library that includes
basic functionality such as writing a `.zip` file. In the C version, we
lack such a commodity. Rather than introducing a dependency on, say,
libzip, we slightly abuse Git's `archive` machinery: we write out a
`.zip` of the empty try, augmented by a couple files that are added via
the `--add-file*` options. We are careful trying not to modify the
current repository in any way lest the very circumstances that required
`scalar diagnose` to be run are changed by the `diagnose` run itself.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 144 +++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt        |  12 +++
 contrib/scalar/t/t9099-scalar.sh |  14 +++
 3 files changed, 170 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 00dcd4b50ef..367a2c50e25 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,6 +11,7 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
+#include "archive.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -261,6 +262,47 @@ static int unregister_dir(void)
 	return res;
 }
 
+static int add_directory_to_archiver(struct strvec *archiver_args,
+					  const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir = opendir(at_root ? "." : path);
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	if (!dir)
+		return error(_("could not open directory '%s'"), path);
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if (e->d_type == DT_REG)
+			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
+		else if (e->d_type != DT_DIR)
+			warning(_("skipping '%s', which is neither file nor "
+				  "directory"), buf.buf);
+		else if (recurse &&
+			 add_directory_to_archiver(archiver_args,
+						   buf.buf, recurse) < 0)
+			res = -1;
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -501,6 +543,107 @@ cleanup:
 	return res;
 }
 
+static int cmd_diagnose(int argc, const char **argv)
+{
+	struct option options[] = {
+		OPT_END(),
+	};
+	const char * const usage[] = {
+		N_("scalar diagnose [<enlistment>]"),
+		NULL
+	};
+	struct strbuf zip_path = STRBUF_INIT;
+	struct strvec archiver_args = STRVEC_INIT;
+	char **argv_copy = NULL;
+	int stdout_fd = -1, archiver_fd = -1;
+	time_t now = time(NULL);
+	struct tm tm;
+	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
+	int res = 0;
+
+	argc = parse_options(argc, argv, NULL, options,
+			     usage, 0);
+
+	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
+
+	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
+	strbuf_addftime(&zip_path,
+			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
+	strbuf_addstr(&zip_path, ".zip");
+	switch (safe_create_leading_directories(zip_path.buf)) {
+	case SCLD_EXISTS:
+	case SCLD_OK:
+		break;
+	default:
+		error_errno(_("could not create directory for '%s'"),
+			    zip_path.buf);
+		goto diagnose_cleanup;
+	}
+	stdout_fd = dup(1);
+	if (stdout_fd < 0) {
+		res = error_errno(_("could not duplicate stdout"));
+		goto diagnose_cleanup;
+	}
+
+	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
+		res = error_errno(_("could not redirect output"));
+		goto diagnose_cleanup;
+	}
+
+	init_zip_archiver();
+	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	write_or_die(stdout_fd, buf.buf, buf.len);
+	strvec_pushf(&archiver_args,
+		     "--add-file-with-content=diagnostics.log:%.*s",
+		     (int)buf.len, buf.buf);
+
+	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	strvec_pushl(&archiver_args, "--prefix=",
+		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
+
+	/* `write_archive()` modifies the `argv` passed to it. Let it. */
+	argv_copy = xmemdupz(archiver_args.v,
+			     sizeof(char *) * archiver_args.nr);
+	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
+			    the_repository, NULL, 0);
+	if (res) {
+		error(_("failed to write archive"));
+		goto diagnose_cleanup;
+	}
+
+	if (!res)
+		fprintf(stderr, "\n"
+		       "Diagnostics complete.\n"
+		       "All of the gathered info is captured in '%s'\n",
+		       zip_path.buf);
+
+diagnose_cleanup:
+	if (archiver_fd >= 0) {
+		close(1);
+		dup2(stdout_fd, 1);
+	}
+	free(argv_copy);
+	strvec_clear(&archiver_args);
+	strbuf_release(&zip_path);
+	strbuf_release(&path);
+	strbuf_release(&buf);
+
+	return res;
+}
+
 static int cmd_list(int argc, const char **argv)
 {
 	if (argc != 1)
@@ -802,6 +945,7 @@ static struct {
 	{ "reconfigure", cmd_reconfigure },
 	{ "delete", cmd_delete },
 	{ "version", cmd_version },
+	{ "diagnose", cmd_diagnose },
 	{ NULL, NULL},
 };
 
diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
index f416d637289..22583fe046e 100644
--- a/contrib/scalar/scalar.txt
+++ b/contrib/scalar/scalar.txt
@@ -14,6 +14,7 @@ scalar register [<enlistment>]
 scalar unregister [<enlistment>]
 scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
 scalar reconfigure [ --all | <enlistment> ]
+scalar diagnose [<enlistment>]
 scalar delete <enlistment>
 
 DESCRIPTION
@@ -129,6 +130,17 @@ reconfigure the enlistment.
 With the `--all` option, all enlistments currently registered with Scalar
 will be reconfigured. Use this option after each Scalar upgrade.
 
+Diagnose
+~~~~~~~~
+
+diagnose [<enlistment>]::
+    When reporting issues with Scalar, it is often helpful to provide the
+    information gathered by this command, including logs and certain
+    statistics describing the data shape of the current enlistment.
++
+The output of this command is a `.zip` file that is written into
+a directory adjacent to the worktree in the `src` directory.
+
 Delete
 ~~~~~~
 
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 9d83fdf25e8..6802d317258 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -90,4 +90,18 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 	grep "cloned. does not exist" err
 '
 
+SQ="'"
+test_expect_success UNZIP 'scalar diagnose' '
+	scalar clone "file://$(pwd)" cloned --single-branch &&
+	scalar diagnose cloned >out 2>err &&
+	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
+	zip_path=$(cat zip_path) &&
+	test -n "$zip_path" &&
+	unzip -v "$zip_path" &&
+	folder=${zip_path%.zip} &&
+	test_path_is_missing "$folder" &&
+	unzip -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v4 5/7] scalar diagnose: include disk space information
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
                         ` (3 preceding siblings ...)
  2022-05-10 19:27       ` [PATCH v4 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-05-10 19:27       ` Johannes Schindelin via GitGitGadget
  2022-05-10 19:27       ` [PATCH v4 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
                         ` (3 subsequent siblings)
  8 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-10 19:27 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When analyzing problems with large worktrees/repositories, it is useful
to know how close to a "full disk" situation Scalar/Git operates. Let's
include this information.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 53 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  1 +
 2 files changed, 54 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 367a2c50e25..34cbec59b45 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -303,6 +303,58 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 	return res;
 }
 
+#ifndef WIN32
+#include <sys/statvfs.h>
+#endif
+
+static int get_disk_info(struct strbuf *out)
+{
+#ifdef WIN32
+	struct strbuf buf = STRBUF_INIT;
+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
+	DWORD serial_number, component_length, flags;
+	ULARGE_INTEGER avail2caller, total, avail;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
+		error(_("could not determine free disk size for '%s'"),
+		      buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
+				   &serial_number, &component_length, &flags,
+				   fs_name, sizeof(fs_name))) {
+		error(_("could not get info for '%s'"), buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
+	strbuf_addch(out, '\n');
+	strbuf_release(&buf);
+#else
+	struct strbuf buf = STRBUF_INIT;
+	struct statvfs stat;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (statvfs(buf.buf, &stat) < 0) {
+		error_errno(_("could not determine free disk size for '%s'"),
+			    buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
+	strbuf_release(&buf);
+#endif
+	return 0;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -599,6 +651,7 @@ static int cmd_diagnose(int argc, const char **argv)
 	get_version_info(&buf, 1);
 
 	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	get_disk_info(&buf);
 	write_or_die(stdout_fd, buf.buf, buf.len);
 	strvec_pushf(&archiver_args,
 		     "--add-file-with-content=diagnostics.log:%.*s",
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 6802d317258..934b2485d91 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -94,6 +94,7 @@ SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	scalar diagnose cloned >out 2>err &&
+	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
 	zip_path=$(cat zip_path) &&
 	test -n "$zip_path" &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v4 6/7] scalar: teach `diagnose` to gather packfile info
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
                         ` (4 preceding siblings ...)
  2022-05-10 19:27       ` [PATCH v4 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
@ 2022-05-10 19:27       ` Matthew John Cheetham via GitGitGadget
  2022-05-10 19:27       ` [PATCH v4 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
                         ` (2 subsequent siblings)
  8 siblings, 0 replies; 109+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-05-10 19:27 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

It's helpful to see if there are other crud files in the pack
directory. Let's teach the `scalar diagnose` command to gather
file size information about pack files.

While at it, also enumerate the pack files in the alternate
object directories, if any are registered.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 30 ++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  6 +++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 34cbec59b45..e8e0a5ec473 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "help.h"
 #include "archive.h"
+#include "object-store.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -595,6 +596,29 @@ cleanup:
 	return res;
 }
 
+static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
+				   const char *file_name, void *data)
+{
+	struct strbuf *buf = data;
+	struct stat st;
+
+	if (!stat(full_path, &st))
+		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
+			    (uintmax_t)st.st_size);
+}
+
+static int dir_file_stats(struct object_directory *object_dir, void *data)
+{
+	struct strbuf *buf = data;
+
+	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
+
+	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
+				  data);
+
+	return 0;
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -657,6 +681,12 @@ static int cmd_diagnose(int argc, const char **argv)
 		     "--add-file-with-content=diagnostics.log:%.*s",
 		     (int)buf.len, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-file-with-content=packs-local.txt:");
+	dir_file_stats(the_repository->objects->odb, &buf);
+	foreach_alt_odb(dir_file_stats, &buf);
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 934b2485d91..3dd5650cceb 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -93,6 +93,8 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
+	git repack &&
+	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
 	scalar diagnose cloned >out 2>err &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
@@ -102,7 +104,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	folder=${zip_path%.zip} &&
 	test_path_is_missing "$folder" &&
 	unzip -p "$zip_path" diagnostics.log >out &&
-	test_file_not_empty out
+	test_file_not_empty out &&
+	unzip -p "$zip_path" packs-local.txt >out &&
+	grep "$(pwd)/.git/objects" out
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v4 7/7] scalar: teach `diagnose` to gather loose objects information
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
                         ` (5 preceding siblings ...)
  2022-05-10 19:27       ` [PATCH v4 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
@ 2022-05-10 19:27       ` Matthew John Cheetham via GitGitGadget
  2022-05-17 15:03       ` [PATCH v4 0/7] scalar: implement the subcommand "diagnose" Ævar Arnfjörð Bjarmason
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
  8 siblings, 0 replies; 109+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-05-10 19:27 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

When operating at the scale that Scalar wants to support, certain data
shapes are more likely to cause undesirable performance issues, such as
large numbers of loose objects.

By including statistics about this, `scalar diagnose` now makes it
easier to identify such scenarios.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 59 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  5 ++-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index e8e0a5ec473..03da7452d83 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -619,6 +619,60 @@ static int dir_file_stats(struct object_directory *object_dir, void *data)
 	return 0;
 }
 
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -687,6 +741,11 @@ static int cmd_diagnose(int argc, const char **argv)
 	foreach_alt_odb(dir_file_stats, &buf);
 	strvec_push(&archiver_args, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-file-with-content=objects-local.txt:");
+	loose_objs_stats(&buf, ".git/objects");
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 3dd5650cceb..72023a1ca1d 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -95,6 +95,7 @@ test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	git repack &&
 	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
+	test_commit -C cloned/src loose &&
 	scalar diagnose cloned >out 2>err &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
@@ -106,7 +107,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	unzip -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
 	unzip -p "$zip_path" packs-local.txt >out &&
-	grep "$(pwd)/.git/objects" out
+	grep "$(pwd)/.git/objects" out &&
+	unzip -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [1-9]" out
 '
 
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-10 19:26       ` [PATCH v4 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
@ 2022-05-10 21:48         ` Junio C Hamano
  2022-05-10 22:06           ` rsbecker
  2022-05-12 22:31           ` [PATCH] fixup! " Junio C Hamano
  0 siblings, 2 replies; 109+ messages in thread
From: Junio C Hamano @ 2022-05-10 21:48 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, René Scharfe, Taylor Blau, Derrick Stolee,
	Elijah Newren, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> @@ -514,14 +522,38 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
>  	if (!arg)
>  		return -1;
>  
> -	path = prefix_filename(args->prefix, arg);
> -	item = string_list_append_nodup(&args->extra_files, path);
> -	item->util = info = xmalloc(sizeof(*info));
> +	info = xmalloc(sizeof(*info));
>  	info->base = xstrdup_or_null(base);
> -	if (stat(path, &info->stat))
> -		die(_("File not found: %s"), path);
> -	if (!S_ISREG(info->stat.st_mode))
> -		die(_("Not a regular file: %s"), path);
> +
> +	if (!strcmp(opt->long_name, "add-file")) {
> +		path = prefix_filename(args->prefix, arg);
> +		if (stat(path, &info->stat))
> +			die(_("File not found: %s"), path);
> +		if (!S_ISREG(info->stat.st_mode))
> +			die(_("Not a regular file: %s"), path);
> +		info->content = NULL; /* read the file later */
> +	} else {

This pretends that this new one will stay to be the only other
option that uses the same callback in the future.  To be more
defensive, it should do

	} else if (!strcmp(opt->long_name, "...")) {

and end the if/else if/else cascade with

	} else {
		BUG("add_file_cb called for unknown option");
	}

> +		const char *colon = strchr(arg, ':');
> +		char *p;
> +
> +		if (!colon)
> +			die(_("missing colon: '%s'"), arg);
> +
> +		p = xstrndup(arg, colon - arg);
> +		if (!args->prefix)
> +			path = p;
> +		else {
> +			path = prefix_filename(args->prefix, p);
> +			free(p);
> +		}
> +		memset(&info->stat, 0, sizeof(info->stat));
> +		info->stat.st_mode = S_IFREG | 0644;

I can sympathize with the desire to omit the mode bits because it
may not be useful for the immediate purpose of "scalar diagnose"
where the extracting end won't care what the file's permission bits
are, but by letting this "mode is hardcoded" thing squat here would
later make it more work when other people want to add an option that
truely lets the caller add a "vitual" file, in response to end-user
complaints that they cannot use the existing one to add an
exectuable file, for example.  I do not care too much about the
pathname limitation that does not allow a colon in it, simply
because it is unusual enough, but I am not sure about hardcoded
permission bits.

If we did "--add-virtual-file=<path>:0644:<contents>" instead from
day one, it certainly adds a few more lines of logic to this patch,
and the calling "scalar diagnose" may have to pass a few more bytes,
but I suspect that such a change would help the project in the
longer run.

Thanks.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-10 19:26       ` [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
@ 2022-05-10 21:56         ` Junio C Hamano
  2022-05-10 22:23           ` rsbecker
  2022-05-19 18:09           ` Johannes Schindelin
  0 siblings, 2 replies; 109+ messages in thread
From: Junio C Hamano @ 2022-05-10 21:56 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, René Scharfe, Taylor Blau, Derrick Stolee,
	Elijah Newren, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> By allowing the path to be enclosed in double-quotes, we can avoid
> the limitation that paths cannot contain colons.
> ...
> +		struct strbuf buf = STRBUF_INIT;
> +		const char *p = arg;
> +
> +		if (*p != '"')
> +			p = strchr(p, ':');
> +		else if (unquote_c_style(&buf, p, &p) < 0)
> +			die(_("unclosed quote: '%s'"), arg);

Even though I do not think people necessarily would want to use
colons in their pathnames (it has problems interoperating with other
systems), lifting the limitation is a good thing to do.  I totally
forgot that we designed unquote_c_style() to self terminate and
return the end pointer to the caller so the caller does not have to
worry, which is very nice. 

Even if this step weren't here in the series, I would have thought
the mode bits issue was more serious than "no colons in path"
limitation, but given that we address this unusual corner case
limitation, I would think we should address the hardcoded mode bits
at the same time.

> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> index 8ff1257f1a0..5b8bbfc2692 100755
> --- a/t/t5003-archive-zip.sh
> +++ b/t/t5003-archive-zip.sh
> @@ -207,13 +207,21 @@ check_zip with_untracked
>  check_added with_untracked untracked untracked
>  
>  test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
> +	if test_have_prereq FUNNYNAMES
> +	then
> +		QUOTED=quoted:colon
> +	else
> +		QUOTED=quoted
> +	fi &&

;-)

>  	git archive --format=zip >with_file_with_content.zip \
> +		--add-file-with-content=\"$QUOTED\": \
>  		--add-file-with-content=hello:world $EMPTY_TREE &&
>  	test_when_finished "rm -rf tmp-unpack" &&
>  	mkdir tmp-unpack && (
>  		cd tmp-unpack &&
>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
>  		test_path_is_file hello &&
> +		test_path_is_file $QUOTED &&

Looks OK, even though it probably is a good idea to have dq around
$QUOTED, so that future developers can easily insert SP into its
value to use a bit more common but still a bit more problematic
pathnames in the test.

Thanks.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* RE: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-10 21:48         ` Junio C Hamano
@ 2022-05-10 22:06           ` rsbecker
  2022-05-10 23:21             ` Junio C Hamano
  2022-05-12 22:31           ` [PATCH] fixup! " Junio C Hamano
  1 sibling, 1 reply; 109+ messages in thread
From: rsbecker @ 2022-05-10 22:06 UTC (permalink / raw)
  To: 'Junio C Hamano', 'Johannes Schindelin via GitGitGadget'
  Cc: git, 'René Scharfe', 'Taylor Blau',
	'Derrick Stolee', 'Elijah Newren',
	'Johannes Schindelin'

On May 10, 2022 5:48 PM, Junio C Hamano wrote:
>"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
>writes:
>
>> @@ -514,14 +522,38 @@ static int add_file_cb(const struct option *opt, const
>char *arg, int unset)
>>  	if (!arg)
>>  		return -1;
>>
>> -	path = prefix_filename(args->prefix, arg);
>> -	item = string_list_append_nodup(&args->extra_files, path);
>> -	item->util = info = xmalloc(sizeof(*info));
>> +	info = xmalloc(sizeof(*info));
>>  	info->base = xstrdup_or_null(base);
>> -	if (stat(path, &info->stat))
>> -		die(_("File not found: %s"), path);
>> -	if (!S_ISREG(info->stat.st_mode))
>> -		die(_("Not a regular file: %s"), path);
>> +
>> +	if (!strcmp(opt->long_name, "add-file")) {
>> +		path = prefix_filename(args->prefix, arg);
>> +		if (stat(path, &info->stat))
>> +			die(_("File not found: %s"), path);
>> +		if (!S_ISREG(info->stat.st_mode))
>> +			die(_("Not a regular file: %s"), path);
>> +		info->content = NULL; /* read the file later */
>> +	} else {
>
>This pretends that this new one will stay to be the only other option that uses the
>same callback in the future.  To be more defensive, it should do
>
>	} else if (!strcmp(opt->long_name, "...")) {
>
>and end the if/else if/else cascade with
>
>	} else {
>		BUG("add_file_cb called for unknown option");
>	}
>
>> +		const char *colon = strchr(arg, ':');
>> +		char *p;
>> +
>> +		if (!colon)
>> +			die(_("missing colon: '%s'"), arg);
>> +
>> +		p = xstrndup(arg, colon - arg);
>> +		if (!args->prefix)
>> +			path = p;
>> +		else {
>> +			path = prefix_filename(args->prefix, p);
>> +			free(p);
>> +		}
>> +		memset(&info->stat, 0, sizeof(info->stat));
>> +		info->stat.st_mode = S_IFREG | 0644;
>
>I can sympathize with the desire to omit the mode bits because it may not be
>useful for the immediate purpose of "scalar diagnose"
>where the extracting end won't care what the file's permission bits are, but by
>letting this "mode is hardcoded" thing squat here would later make it more work
>when other people want to add an option that truely lets the caller add a "vitual"
>file, in response to end-user complaints that they cannot use the existing one to
>add an exectuable file, for example.  I do not care too much about the pathname
>limitation that does not allow a colon in it, simply because it is unusual enough, but
>I am not sure about hardcoded permission bits.
>
>If we did "--add-virtual-file=<path>:0644:<contents>" instead from day one, it
>certainly adds a few more lines of logic to this patch, and the calling "scalar
>diagnose" may have to pass a few more bytes, but I suspect that such a change
>would help the project in the longer run.

Would not core.filemode=false somewhat simulate this? The consumer-client would not care/do anything with the mode anyway. Or am I missing something?
--Randall


^ permalink raw reply	[flat|nested] 109+ messages in thread

* RE: [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-10 21:56         ` Junio C Hamano
@ 2022-05-10 22:23           ` rsbecker
  2022-05-19 18:12             ` Johannes Schindelin
  2022-05-19 18:09           ` Johannes Schindelin
  1 sibling, 1 reply; 109+ messages in thread
From: rsbecker @ 2022-05-10 22:23 UTC (permalink / raw)
  To: 'Junio C Hamano', 'Johannes Schindelin via GitGitGadget'
  Cc: git, 'René Scharfe', 'Taylor Blau',
	'Derrick Stolee', 'Elijah Newren',
	'Johannes Schindelin'

On May 10, 2022 5:57 PM, Junio C Hamano wrote:
>"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
>writes:
>
>> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>>
>> By allowing the path to be enclosed in double-quotes, we can avoid the
>> limitation that paths cannot contain colons.
>> ...
>> +		struct strbuf buf = STRBUF_INIT;
>> +		const char *p = arg;
>> +
>> +		if (*p != '"')
>> +			p = strchr(p, ':');
>> +		else if (unquote_c_style(&buf, p, &p) < 0)
>> +			die(_("unclosed quote: '%s'"), arg);
>
>Even though I do not think people necessarily would want to use colons in their
>pathnames (it has problems interoperating with other systems), lifting the
>limitation is a good thing to do.  I totally forgot that we designed
>unquote_c_style() to self terminate and return the end pointer to the caller so the
>caller does not have to worry, which is very nice.
>
>Even if this step weren't here in the series, I would have thought the mode bits
>issue was more serious than "no colons in path"
>limitation, but given that we address this unusual corner case limitation, I would
>think we should address the hardcoded mode bits at the same time.
>
>> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh index
>> 8ff1257f1a0..5b8bbfc2692 100755
>> --- a/t/t5003-archive-zip.sh
>> +++ b/t/t5003-archive-zip.sh
>> @@ -207,13 +207,21 @@ check_zip with_untracked  check_added
>> with_untracked untracked untracked
>>
>>  test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
>> +	if test_have_prereq FUNNYNAMES
>> +	then
>> +		QUOTED=quoted:colon
>> +	else
>> +		QUOTED=quoted
>> +	fi &&
>
>;-)
>
>>  	git archive --format=zip >with_file_with_content.zip \
>> +		--add-file-with-content=\"$QUOTED\": \
>>  		--add-file-with-content=hello:world $EMPTY_TREE &&
>>  	test_when_finished "rm -rf tmp-unpack" &&
>>  	mkdir tmp-unpack && (
>>  		cd tmp-unpack &&
>>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
>>  		test_path_is_file hello &&
>> +		test_path_is_file $QUOTED &&
>
>Looks OK, even though it probably is a good idea to have dq around $QUOTED, so
>that future developers can easily insert SP into its value to use a bit more common
>but still a bit more problematic pathnames in the test.

A test case for .gitignore in this would be good too. People on our exotic platform do this stuff as a matter of course. As an example, a name of $Z3P4:12399334 being used as a named pipe (associated with the unique name of a process) actually has been seen in the wild recently. My solution was to wild card this and/or contain it in an ignored directory.
Regards,
Randall


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-10 22:06           ` rsbecker
@ 2022-05-10 23:21             ` Junio C Hamano
  2022-05-11 16:14               ` René Scharfe
  0 siblings, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2022-05-10 23:21 UTC (permalink / raw)
  To: rsbecker
  Cc: 'Johannes Schindelin via GitGitGadget',
	git, 'René Scharfe', 'Taylor Blau',
	'Derrick Stolee', 'Elijah Newren',
	'Johannes Schindelin'

<rsbecker@nexbridge.com> writes:

>>If we did "--add-virtual-file=<path>:0644:<contents>" instead from day one, it
>>certainly adds a few more lines of logic to this patch, and the calling "scalar
>>diagnose" may have to pass a few more bytes, but I suspect that such a change
>>would help the project in the longer run.
>
> Would not core.filemode=false somewhat simulate this? The
> consumer-client would not care/do anything with the mode
> anyway. Or am I missing something?

Or I must be missing something.  This is part of "git archive" where
its output is a tarball (or a zipfile) in which each entry knows its
permission bits (or at least, if it is executable).  Running "tar xf" 
or "unzip" on the receiving end of the output of this command should
set the executable bit (and other permission bits) correctly I would
certainly hope, so it does matter, no?

I did say "scalar diagnose" may not care.  But a patch to "git
archive" will affect other people, and among them there would be
people who say "gee, now I can add a handful of files from the
command line with their contents, without actually having them in
throw-away untracked files, when running 'git archive'.  That's
handy!", try it out and get disappointed by their inability to
create executable files that way.  And obviously I care more about
"git archive" than "scalar diagnose".  I very welcome to enhance the
former to support the need for the latter.  I do not see a good
reason to stop at a half-feature added to the former, even that
added half is enough to satisfy the latter, when the other half is
not all that hard to add, and it is reasonably expected that users
other than "scalar diagnose" would naturally want the other half,
too.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-10 23:21             ` Junio C Hamano
@ 2022-05-11 16:14               ` René Scharfe
  2022-05-11 19:27                 ` Junio C Hamano
  0 siblings, 1 reply; 109+ messages in thread
From: René Scharfe @ 2022-05-11 16:14 UTC (permalink / raw)
  To: Junio C Hamano, rsbecker
  Cc: 'Johannes Schindelin via GitGitGadget',
	git, 'Taylor Blau', 'Derrick Stolee',
	'Elijah Newren', 'Johannes Schindelin'

Am 11.05.22 um 01:21 schrieb Junio C Hamano:
> <rsbecker@nexbridge.com> writes:
>
>>> If we did "--add-virtual-file=<path>:0644:<contents>" instead from day one, it
>>> certainly adds a few more lines of logic to this patch, and the calling "scalar
>>> diagnose" may have to pass a few more bytes, but I suspect that such a change
>>> would help the project in the longer run.

> I did say "scalar diagnose" may not care.  But a patch to "git
> archive" will affect other people, and among them there would be
> people who say "gee, now I can add a handful of files from the
> command line with their contents, without actually having them in
> throw-away untracked files, when running 'git archive'.  That's
> handy!", try it out and get disappointed by their inability to
> create executable files that way.

Which might motivate them to contribute a patch to add that feature.
Give them a chance! :)

> And obviously I care more about
> "git archive" than "scalar diagnose".  I very welcome to enhance the
> former to support the need for the latter.  I do not see a good
> reason to stop at a half-feature added to the former, even that
> added half is enough to satisfy the latter, when the other half is
> not all that hard to add, and it is reasonably expected that users
> other than "scalar diagnose" would naturally want the other half,
> too.

FWIW, I'd already be satisfied by a convincing outline of a way towards
a complete solution to accept the partial feature, just to be sure we
don't paint ourselves into a corner.  But I'm bad at both strategy and
saying no, so that's that.

Regarding file modes: We only effectively support the executable bit,
so an additional option --add-virtual-executable-file=<path>:<contents>
would suffice.  It would also prevent the false impression that
arbitrary file modes can be used ("I said 0123 and got 0644, bug!").
And it would not even be the longest Git option..

René

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-11 16:14               ` René Scharfe
@ 2022-05-11 19:27                 ` Junio C Hamano
  2022-05-12 16:16                   ` René Scharfe
  0 siblings, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2022-05-11 19:27 UTC (permalink / raw)
  To: René Scharfe
  Cc: rsbecker, 'Johannes Schindelin via GitGitGadget',
	git, 'Taylor Blau', 'Derrick Stolee',
	'Elijah Newren', 'Johannes Schindelin'

René Scharfe <l.s.r@web.de> writes:

> Am 11.05.22 um 01:21 schrieb Junio C Hamano:
>> <rsbecker@nexbridge.com> writes:
>>
>>>> If we did "--add-virtual-file=<path>:0644:<contents>" instead from day one, it
>>>> certainly adds a few more lines of logic to this patch, and the calling "scalar
>>>> diagnose" may have to pass a few more bytes, but I suspect that such a change
>>>> would help the project in the longer run.
>
>> I did say "scalar diagnose" may not care.  But a patch to "git
>> archive" will affect other people, and among them there would be
>> people who say "gee, now I can add a handful of files from the
>> command line with their contents, without actually having them in
>> throw-away untracked files, when running 'git archive'.  That's
>> handy!", try it out and get disappointed by their inability to
>> create executable files that way.
>
> Which might motivate them to contribute a patch to add that feature.
> Give them a chance! :)

Yes, but there is no way to reuse the same option in a backward
compatible way to later add the mode information, and that is why we
want to be careful before a half-feature squats on an option.

> FWIW, I'd already be satisfied by a convincing outline of a way towards
> a complete solution to accept the partial feature, just to be sure we
> don't paint ourselves into a corner.

Exactly.  As you say, an extra and separate option can be used.  I
do not know if that is a workaround because we didn't design the
first option to take an additional option, or a welcome feature.

> Regarding file modes: We only effectively support the executable bit,
> so an additional option --add-virtual-executable-file=<path>:<contents>
> would suffice.

While I do not think we want to support more than one "is it
executable or not?" bit, I am not so sure about what the current
code does, though, for these "not from a tree, but added as extra
files" entries.

If you add an extra file from an on-disk untracked file, the
add_file_cb() callback picks up the full st.st_mode for the file,
and write_archive_entries() in its loop over args->extra_files pass
the full info->stat.st_mode down to write_entry(), which is used by
archive-tar.c::write_tar_entry() to obtain mode bits pretty much
as-is.  For tracked paths, we probably are normalizing the blobs
between 0644 and 0755 way before the values are passed as "mode"
parameter to the write_entry() functions, but for these extra files,
there is no such massaging.

So, I am OK with --add-virtual-executable=<path>:<contents> (but the
point still stands that the way the code in the patch squats in the
codepath makes it necessary to first refator it before it can
happen) as a separate option.  We may want to massage the mode bit
we grab from these extra files, if we were to go that route, though.

Thanks.


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-11 19:27                 ` Junio C Hamano
@ 2022-05-12 16:16                   ` René Scharfe
  2022-05-12 18:15                     ` Junio C Hamano
  0 siblings, 1 reply; 109+ messages in thread
From: René Scharfe @ 2022-05-12 16:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: rsbecker, 'Johannes Schindelin via GitGitGadget',
	git, 'Taylor Blau', 'Derrick Stolee',
	'Elijah Newren', 'Johannes Schindelin'

Am 11.05.22 um 21:27 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> Regarding file modes: We only effectively support the executable bit,
>> so an additional option --add-virtual-executable-file=<path>:<contents>
>> would suffice.
>
> While I do not think we want to support more than one "is it
> executable or not?" bit, I am not so sure about what the current
> code does, though, for these "not from a tree, but added as extra
> files" entries.
>
> If you add an extra file from an on-disk untracked file, the
> add_file_cb() callback picks up the full st.st_mode for the file,
> and write_archive_entries() in its loop over args->extra_files pass
> the full info->stat.st_mode down to write_entry(), which is used by
> archive-tar.c::write_tar_entry() to obtain mode bits pretty much
> as-is.

Good point.  write_tar_entry() actually normalizes the permission bits
and applies tar.umask (0002 by default):

	if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
		*header.typeflag = TYPEFLAG_DIR;
		mode = (mode | 0777) & ~tar_umask;
	} else if (S_ISLNK(mode)) {
		*header.typeflag = TYPEFLAG_LNK;
		mode |= 0777;
	} else if (S_ISREG(mode)) {
		*header.typeflag = TYPEFLAG_REG;
		mode = (mode | ((mode & 0100) ? 0777 : 0666)) & ~tar_umask;

But write_zip_entry() only normalizes (drops) the permission bits of
non-executable files:

                attr2 = S_ISLNK(mode) ? ((mode | 0777) << 16) :
                        (mode & 0111) ? ((mode) << 16) : 0;
                if (S_ISLNK(mode) || (mode & 0111))
                        creator_version = 0x0317;

attr2 corresponds to the field "external file attributes" mentioned in
the ZIP format specification, APPNOTE.TXT.  It's interpreted based on
the "version made by" (creator_version here); that 0x03 part above
means "UNIX".  The default is MS-DOS (FAT filesystem), with effectivly
no support for file permissions.

So we currently leak permission bits of executable files into ZIP
archives, but not tar files. :-|  Normalizing those to 0755 would be
more consistent.

> For tracked paths, we probably are normalizing the blobs
> between 0644 and 0755 way before the values are passed as "mode"
> parameter to the write_entry() functions, but for these extra files,
> there is no such massaging.

Right, mode values from read_tree() pass through canon_mode(), so only
untracked files (those appended with --add-file) are affected by the
leakage mentioned above.

René

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-12 16:16                   ` René Scharfe
@ 2022-05-12 18:15                     ` Junio C Hamano
  2022-05-12 21:31                       ` Junio C Hamano
  0 siblings, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2022-05-12 18:15 UTC (permalink / raw)
  To: René Scharfe
  Cc: rsbecker, 'Johannes Schindelin via GitGitGadget',
	git, 'Taylor Blau', 'Derrick Stolee',
	'Elijah Newren', 'Johannes Schindelin'

René Scharfe <l.s.r@web.de> writes:

> Good point.  write_tar_entry() actually normalizes the permission bits
> and applies tar.umask (0002 by default):
>
> 	if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
> 		*header.typeflag = TYPEFLAG_DIR;
> 		mode = (mode | 0777) & ~tar_umask;
> 	} else if (S_ISLNK(mode)) {
> 		*header.typeflag = TYPEFLAG_LNK;
> 		mode |= 0777;
> 	} else if (S_ISREG(mode)) {
> 		*header.typeflag = TYPEFLAG_REG;
> 		mode = (mode | ((mode & 0100) ? 0777 : 0666)) & ~tar_umask;

Yeah, this side seems to care only about u+x bit, so
"add-executable" as a separate option would fly we..

> But write_zip_entry() only normalizes (drops) the permission bits of
> non-executable files:
>
>                 attr2 = S_ISLNK(mode) ? ((mode | 0777) << 16) :
>                         (mode & 0111) ? ((mode) << 16) : 0;
>                 if (S_ISLNK(mode) || (mode & 0111))
>                         creator_version = 0x0317;
>
> attr2 corresponds to the field "external file attributes" mentioned in
> the ZIP format specification, APPNOTE.TXT.  It's interpreted based on
> the "version made by" (creator_version here); that 0x03 part above
> means "UNIX".  The default is MS-DOS (FAT filesystem), with effectivly
> no support for file permissions.
>
> So we currently leak permission bits of executable files into ZIP
> archives, but not tar files. :-|  Normalizing those to 0755 would be
> more consistent.

Yup.

>> For tracked paths, we probably are normalizing the blobs
>> between 0644 and 0755 way before the values are passed as "mode"
>> parameter to the write_entry() functions, but for these extra files,
>> there is no such massaging.
>
> Right, mode values from read_tree() pass through canon_mode(), so only
> untracked files (those appended with --add-file) are affected by the
> leakage mentioned above.

Thanks for sanity-checking.


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-12 18:15                     ` Junio C Hamano
@ 2022-05-12 21:31                       ` Junio C Hamano
  2022-05-14  7:06                         ` René Scharfe
  0 siblings, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2022-05-12 21:31 UTC (permalink / raw)
  To: René Scharfe
  Cc: rsbecker, 'Johannes Schindelin via GitGitGadget',
	git, 'Taylor Blau', 'Derrick Stolee',
	'Elijah Newren', 'Johannes Schindelin'

Junio C Hamano <gitster@pobox.com> writes:

>> So we currently leak permission bits of executable files into ZIP
>> archives, but not tar files. :-|  Normalizing those to 0755 would be
>> more consistent.

Today, I was scanning the "What's cooking" draft and saw too many
topics that are marked with "Expecting a reroll".  It turns out that
this "mode bits" thing will not be a blocker to make us wait for a
reroll of the topic, so let's handle it separately, before we
forget, as an independent fix outside the series under discussion.

Thanks.

--- >8 ---
Subject: [PATCH] archive: do not let on-disk mode leak to zip archives

When the "--add-file" option is used to add the contents from an
untracked file to the archive, the permission mode bits for these
files are sent to the archive-backend specific "write_entry()"
method as-is.  We normalize the mode bits for tracked files way
before we pass them to the write_entry() method; we should do the
same here.

This is not strictly needed for "tar" archive-backend, as it has its
own code to further clean them up, but "zip" archive-backend is not
so well prepared.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 archive.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/archive.c b/archive.c
index e29d0e00f6..12a08af531 100644
--- a/archive.c
+++ b/archive.c
@@ -342,7 +342,7 @@ int write_archive_entries(struct archiver_args *args,
 		else
 			err = write_entry(args, &fake_oid, path_in_archive.buf,
 					  path_in_archive.len,
-					  info->stat.st_mode,
+					  canon_mode(info->stat.st_mode),
 					  content.buf, content.len);
 		if (err)
 			break;
-- 
2.36.1-338-g1c7f76a54c


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH] fixup! archive: optionally add "virtual" files
  2022-05-10 21:48         ` Junio C Hamano
  2022-05-10 22:06           ` rsbecker
@ 2022-05-12 22:31           ` Junio C Hamano
  1 sibling, 0 replies; 109+ messages in thread
From: Junio C Hamano @ 2022-05-12 22:31 UTC (permalink / raw)
  To: git
  Cc: Johannes Schindelin via GitGitGadget, René Scharfe,
	Taylor Blau, Derrick Stolee, Elijah Newren, Johannes Schindelin

Do not let add_file_cb() assume that two existing callers are the
only ones, and checking that the caller is not one of them is
sufficient to determine it is the other one.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * To be squashed to the commit with the title in the series.

   The "What's cooking" report is getting crowded with too many
   topics marked as "Expecting a reroll", and I'm trying to do
   easier ones myself to see how much reduction we can make.

 archive.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/archive.c b/archive.c
index 477eba60ac..98c7449ea1 100644
--- a/archive.c
+++ b/archive.c
@@ -533,7 +533,7 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 		if (!S_ISREG(info->stat.st_mode))
 			die(_("Not a regular file: %s"), path);
 		info->content = NULL; /* read the file later */
-	} else {
+	} else if (!strcmp(opt->long_name, "add-file-with-content")) {
 		struct strbuf buf = STRBUF_INIT;
 		const char *p = arg;
 
@@ -560,6 +560,8 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 		info->stat.st_mode = S_IFREG | 0644;
 		info->content = xstrdup(p + 1);
 		info->stat.st_size = strlen(info->content);
+	} else {
+		BUG("add_file_cb() called for %s", opt->long_name);
 	}
 	item = string_list_append_nodup(&args->extra_files, path);
 	item->util = info;
-- 
2.36.1-338-g1c7f76a54c


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 1/7] archive: optionally add "virtual" files
  2022-05-12 21:31                       ` Junio C Hamano
@ 2022-05-14  7:06                         ` René Scharfe
  0 siblings, 0 replies; 109+ messages in thread
From: René Scharfe @ 2022-05-14  7:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: rsbecker, 'Johannes Schindelin via GitGitGadget',
	git, 'Taylor Blau', 'Derrick Stolee',
	'Elijah Newren', 'Johannes Schindelin'

Am 12.05.22 um 23:31 schrieb Junio C Hamano:
> Junio C Hamano <gitster@pobox.com> writes:
>
>>> So we currently leak permission bits of executable files into ZIP
>>> archives, but not tar files. :-|  Normalizing those to 0755 would be
>>> more consistent.
>
> Today, I was scanning the "What's cooking" draft and saw too many
> topics that are marked with "Expecting a reroll".  It turns out that
> this "mode bits" thing will not be a blocker to make us wait for a
> reroll of the topic, so let's handle it separately, before we
> forget, as an independent fix outside the series under discussion.
>
> Thanks.
>
> --- >8 ---
> Subject: [PATCH] archive: do not let on-disk mode leak to zip archives
>
> When the "--add-file" option is used to add the contents from an
> untracked file to the archive, the permission mode bits for these
> files are sent to the archive-backend specific "write_entry()"
> method as-is.  We normalize the mode bits for tracked files way
> before we pass them to the write_entry() method; we should do the
> same here.
>
> This is not strictly needed for "tar" archive-backend, as it has its
> own code to further clean them up, but "zip" archive-backend is not
> so well prepared.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  archive.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/archive.c b/archive.c
> index e29d0e00f6..12a08af531 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -342,7 +342,7 @@ int write_archive_entries(struct archiver_args *args,
>  		else
>  			err = write_entry(args, &fake_oid, path_in_archive.buf,
>  					  path_in_archive.len,
> -					  info->stat.st_mode,
> +					  canon_mode(info->stat.st_mode),
>  					  content.buf, content.len);
>  		if (err)
>  			break;

Looks good to me, thank you!

René

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-10 19:27       ` [PATCH v4 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
@ 2022-05-17 14:51         ` Ævar Arnfjörð Bjarmason
  2022-05-18 17:35           ` Junio C Hamano
  0 siblings, 1 reply; 109+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-17 14:51 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, René Scharfe, Taylor Blau, Derrick Stolee,
	Elijah Newren, Johannes Schindelin


On Tue, May 10 2022, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> The `scalar` command needs a Scalar enlistment for many subcommands, and
> looks in the current directory for such an enlistment (traversing the
> parent directories until it finds one).
>
> These is subcommands can also be called with an optional argument
> specifying the enlistment. Here, too, we traverse parent directories as
> needed, until we find an enlistment.
>
> However, if the specified directory does not even exist, or is not a
> directory, we should stop right there, with an error message.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  contrib/scalar/scalar.c          | 6 ++++--
>  contrib/scalar/t/t9099-scalar.sh | 5 +++++
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index 1ce9c2b00e8..00dcd4b50ef 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -43,9 +43,11 @@ static void setup_enlistment_directory(int argc, const char **argv,
>  		usage_with_options(usagestr, options);
>  
>  	/* find the worktree, determine its corresponding root */
> -	if (argc == 1)
> +	if (argc == 1) {
>  		strbuf_add_absolute_path(&path, argv[0]);
> -	else if (strbuf_getcwd(&path) < 0)
> +		if (!is_directory(path.buf))
> +			die(_("'%s' does not exist"), path.buf);
> +	} else if (strbuf_getcwd(&path) < 0)
>  		die(_("need a working directory"));
>  
>  	strbuf_trim_trailing_dir_sep(&path);
> diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> index 2e1502ad45e..9d83fdf25e8 100755
> --- a/contrib/scalar/t/t9099-scalar.sh
> +++ b/contrib/scalar/t/t9099-scalar.sh
> @@ -85,4 +85,9 @@ test_expect_success 'scalar delete with enlistment' '
>  	test_path_is_missing cloned
>  '
>  
> +test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
> +	! scalar run config cloned 2>err &&

Needs to use test_must_fail, not !

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Implement `scalar diagnose`
  2022-05-10 19:27       ` [PATCH v4 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-05-17 14:53         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 109+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-17 14:53 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, René Scharfe, Taylor Blau, Derrick Stolee,
	Elijah Newren, Johannes Schindelin


On Tue, May 10 2022, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> Over the course of Scalar's development, it became obvious that there is
> a need for a command that can gather all kinds of useful information
> that can help identify the most typical problems with large
> worktrees/repositories.
>
> The `diagnose` command is the culmination of this hard-won knowledge: it
> gathers the installed hooks, the config, a couple statistics describing
> the data shape, among other pieces of information, and then wraps
> everything up in a tidy, neat `.zip` archive.
>
> Note: originally, Scalar was implemented in C# using the .NET API, where
> we had the luxury of a comprehensive standard library that includes
> basic functionality such as writing a `.zip` file. In the C version, we
> lack such a commodity. Rather than introducing a dependency on, say,
> libzip, we slightly abuse Git's `archive` machinery: we write out a
> `.zip` of the empty try, augmented by a couple files that are added via
> the `--add-file*` options. We are careful trying not to modify the
> current repository in any way lest the very circumstances that required
> `scalar diagnose` to be run are changed by the `diagnose` run itself.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  contrib/scalar/scalar.c          | 144 +++++++++++++++++++++++++++++++
>  contrib/scalar/scalar.txt        |  12 +++
>  contrib/scalar/t/t9099-scalar.sh |  14 +++
>  3 files changed, 170 insertions(+)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index 00dcd4b50ef..367a2c50e25 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -11,6 +11,7 @@
>  #include "dir.h"
>  #include "packfile.h"
>  #include "help.h"
> +#include "archive.h"
>  
>  /*
>   * Remove the deepest subdirectory in the provided path string. Path must not
> @@ -261,6 +262,47 @@ static int unregister_dir(void)
>  	return res;
>  }
>  
> +static int add_directory_to_archiver(struct strvec *archiver_args,
> +					  const char *path, int recurse)
> +{
> +	int at_root = !*path;
> +	DIR *dir = opendir(at_root ? "." : path);
> +	struct dirent *e;
> +	struct strbuf buf = STRBUF_INIT;
> +	size_t len;
> +	int res = 0;
> +
> +	if (!dir)
> +		return error(_("could not open directory '%s'"), path);


s/error/error_errno/, surely?

> +	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
> +	strbuf_addftime(&zip_path,
> +			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);

Would we be worse off if we stole this timestamp from some known file
(or HEAD), and thus made a second run of this reproducable?

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 0/7] scalar: implement the subcommand "diagnose"
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
                         ` (6 preceding siblings ...)
  2022-05-10 19:27       ` [PATCH v4 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
@ 2022-05-17 15:03       ` Ævar Arnfjörð Bjarmason
  2022-05-17 15:28         ` rsbecker
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
  8 siblings, 1 reply; 109+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-17 15:03 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, René Scharfe, Taylor Blau, Derrick Stolee,
	Elijah Newren, Johannes Schindelin


On Tue, May 10 2022, Johannes Schindelin via GitGitGadget wrote:

> Over the course of the years, we developed a sub-command that gathers
> diagnostic data into a .zip file that can then be attached to bug reports.
> This sub-command turned out to be very useful in helping Scalar developers
> identify and fix issues.

I don't mind this as some intermediate step, but re the context of the
plan for scalar "eventually going away" (discussed in previous threads)
I wonder why (especially re the earlier thread upthread at [1]) this
isn't being added to "git bugreport".

Is the plan to integrate this into "git bugreport" eventually?

1. https://lore.kernel.org/git/nycvar.QRO.7.76.6.2202062213030.347@tvgsbejvaqbjf.bet/

^ permalink raw reply	[flat|nested] 109+ messages in thread

* RE: [PATCH v4 0/7] scalar: implement the subcommand "diagnose"
  2022-05-17 15:03       ` [PATCH v4 0/7] scalar: implement the subcommand "diagnose" Ævar Arnfjörð Bjarmason
@ 2022-05-17 15:28         ` rsbecker
  2022-05-19 18:17           ` Johannes Schindelin
  0 siblings, 1 reply; 109+ messages in thread
From: rsbecker @ 2022-05-17 15:28 UTC (permalink / raw)
  To: 'Ævar Arnfjörð Bjarmason',
	'Johannes Schindelin via GitGitGadget'
  Cc: git, 'René Scharfe', 'Taylor Blau',
	'Derrick Stolee', 'Elijah Newren',
	'Johannes Schindelin'

On May 17, 2022 11:03 AM, Ævar Arnfjörð Bjarmason wrote:
>On Tue, May 10 2022, Johannes Schindelin via GitGitGadget wrote:
>
>> Over the course of the years, we developed a sub-command that gathers
>> diagnostic data into a .zip file that can then be attached to bug reports.
>> This sub-command turned out to be very useful in helping Scalar
>> developers identify and fix issues.
>
>I don't mind this as some intermediate step, but re the context of the plan for
>scalar "eventually going away" (discussed in previous threads) I wonder why
>(especially re the earlier thread upthread at [1]) this isn't being added to "git
>bugreport".
>
>Is the plan to integrate this into "git bugreport" eventually?
>
>1.
>https://lore.kernel.org/git/nycvar.QRO.7.76.6.2202062213030.347@tvgsbejvaqbjf.
>bet/

Could this also not be useful in fsck, as --diagnose? That's the go-to command when there are issues for many users.
--Randall


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-17 14:51         ` Ævar Arnfjörð Bjarmason
@ 2022-05-18 17:35           ` Junio C Hamano
  2022-05-20  7:30             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2022-05-18 17:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Schindelin via GitGitGadget, git, René Scharfe,
	Taylor Blau, Derrick Stolee, Elijah Newren, Johannes Schindelin

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> +test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
>> +	! scalar run config cloned 2>err &&
>
> Needs to use test_must_fail, not !

Good eyes and careful reading are very much appreciated, but in this
case, doesn't such an improvement depend on an update to teach
test_must_fail_acceptable about scalar being whitelisted?



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-10 21:56         ` Junio C Hamano
  2022-05-10 22:23           ` rsbecker
@ 2022-05-19 18:09           ` Johannes Schindelin
  2022-05-19 18:44             ` Junio C Hamano
  1 sibling, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2022-05-19 18:09 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, René Scharfe,
	Taylor Blau, Derrick Stolee, Elijah Newren

Hi Junio,

On Tue, 10 May 2022, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > By allowing the path to be enclosed in double-quotes, we can avoid
> > the limitation that paths cannot contain colons.
> > ...
> > +		struct strbuf buf = STRBUF_INIT;
> > +		const char *p = arg;
> > +
> > +		if (*p != '"')
> > +			p = strchr(p, ':');
> > +		else if (unquote_c_style(&buf, p, &p) < 0)
> > +			die(_("unclosed quote: '%s'"), arg);
>
> Even though I do not think people necessarily would want to use
> colons in their pathnames (it has problems interoperating with other
> systems), lifting the limitation is a good thing to do.  I totally
> forgot that we designed unquote_c_style() to self terminate and
> return the end pointer to the caller so the caller does not have to
> worry, which is very nice.
>
> Even if this step weren't here in the series, I would have thought
> the mode bits issue was more serious than "no colons in path"
> limitation, but given that we address this unusual corner case
> limitation, I would think we should address the hardcoded mode bits
> at the same time.
>
> > diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> > index 8ff1257f1a0..5b8bbfc2692 100755
> > --- a/t/t5003-archive-zip.sh
> > +++ b/t/t5003-archive-zip.sh
> > @@ -207,13 +207,21 @@ check_zip with_untracked
> >  check_added with_untracked untracked untracked
> >
> >  test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
> > +	if test_have_prereq FUNNYNAMES
> > +	then
> > +		QUOTED=quoted:colon
> > +	else
> > +		QUOTED=quoted
> > +	fi &&
>
> ;-)
>
> >  	git archive --format=zip >with_file_with_content.zip \
> > +		--add-file-with-content=\"$QUOTED\": \
> >  		--add-file-with-content=hello:world $EMPTY_TREE &&
> >  	test_when_finished "rm -rf tmp-unpack" &&
> >  	mkdir tmp-unpack && (
> >  		cd tmp-unpack &&
> >  		"$GIT_UNZIP" ../with_file_with_content.zip &&
> >  		test_path_is_file hello &&
> > +		test_path_is_file $QUOTED &&
>
> Looks OK, even though it probably is a good idea to have dq around
> $QUOTED, so that future developers can easily insert SP into its
> value to use a bit more common but still a bit more problematic
> pathnames in the test.

I actually decided against this because reading

	"$QUOTED"

would mislead future me to think that the double quotes that enclose
$QUOTED are the quotes that the variable's name talks about. But the
quotes are actually the escaped ones that are passed to `git archive`
above.

So, to help future Dscho should they read this code six months from now or
even later, I wanted to specifically only add quotes to the `git archive`
call to make the intention abundantly clear.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 109+ messages in thread

* RE: [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-10 22:23           ` rsbecker
@ 2022-05-19 18:12             ` Johannes Schindelin
  0 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2022-05-19 18:12 UTC (permalink / raw)
  To: rsbecker
  Cc: 'Junio C Hamano',
	'Johannes Schindelin via GitGitGadget',
	git, 'René Scharfe', 'Taylor Blau',
	'Derrick Stolee', 'Elijah Newren'

Hi Randall,

On Tue, 10 May 2022, rsbecker@nexbridge.com wrote:

> On May 10, 2022 5:57 PM, Junio C Hamano wrote:
> >"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> >writes:
> >
> >>  	git archive --format=zip >with_file_with_content.zip \
> >> +		--add-file-with-content=\"$QUOTED\": \
> >>  		--add-file-with-content=hello:world $EMPTY_TREE &&
> >>  	test_when_finished "rm -rf tmp-unpack" &&
> >>  	mkdir tmp-unpack && (
> >>  		cd tmp-unpack &&
> >>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
> >>  		test_path_is_file hello &&
> >> +		test_path_is_file $QUOTED &&
> >
> >Looks OK, even though it probably is a good idea to have dq around $QUOTED, so
> >that future developers can easily insert SP into its value to use a bit more common
> >but still a bit more problematic pathnames in the test.
>
> A test case for .gitignore in this would be good too. People on our
> exotic platform do this stuff as a matter of course. As an example, a
> name of $Z3P4:12399334 being used as a named pipe (associated with the
> unique name of a process) actually has been seen in the wild recently.
> My solution was to wild card this and/or contain it in an ignored
> directory.

The `--add-file-with-content` option, which this test case is all about,
specifically does not heed `.gitignore`. Is this what you want to test? If
so, I don't think that's necessary. Unless you expect some future version
to introduce a patch by mistake that makes `--add-file-with-content`
subject to the `.gitignore` rules.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 109+ messages in thread

* RE: [PATCH v4 0/7] scalar: implement the subcommand "diagnose"
  2022-05-17 15:28         ` rsbecker
@ 2022-05-19 18:17           ` Johannes Schindelin
  0 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2022-05-19 18:17 UTC (permalink / raw)
  To: rsbecker
  Cc: 'Ævar Arnfjörð Bjarmason',
	'Johannes Schindelin via GitGitGadget',
	git, 'René Scharfe', 'Taylor Blau',
	'Derrick Stolee', 'Elijah Newren'

[-- Attachment #1: Type: text/plain, Size: 1433 bytes --]

Hi Randall and Ævar,

On Tue, 17 May 2022, rsbecker@nexbridge.com wrote:

> On May 17, 2022 11:03 AM, Ævar Arnfjörð Bjarmason wrote:
> >On Tue, May 10 2022, Johannes Schindelin via GitGitGadget wrote:
> >
> >> Over the course of the years, we developed a sub-command that gathers
> >> diagnostic data into a .zip file that can then be attached to bug
> >> reports. This sub-command turned out to be very useful in helping
> >> Scalar developers identify and fix issues.
> >
> >I don't mind this as some intermediate step, but re the context of the
> >plan for scalar "eventually going away" (discussed in previous threads)
> >I wonder why (especially re the earlier thread upthread at [1]) this
> >isn't being added to "git bugreport".
> >
> >Is the plan to integrate this into "git bugreport" eventually?

Potentially a variation of the `scalar diagnose` code could be useful in
`git bugreport`, opt-in via a new option.

But that's not the purpose of this patch series.

> Could this also not be useful in fsck, as --diagnose? That's the go-to
> command when there are issues for many users.

I can see where you're coming from, but `fsck`'s mission is to verify the
integrity of the local Git database. That is very different from the
mission of `scalar diagnose`, which is to help diagnose issues (whether
they are truly bugs or usage patterns causing unfortunate performance).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v5 0/7] scalar: implement the subcommand "diagnose"
  2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
                         ` (7 preceding siblings ...)
  2022-05-17 15:03       ` [PATCH v4 0/7] scalar: implement the subcommand "diagnose" Ævar Arnfjörð Bjarmason
@ 2022-05-19 18:17       ` Johannes Schindelin via GitGitGadget
  2022-05-19 18:17         ` [PATCH v5 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
                           ` (8 more replies)
  8 siblings, 9 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-19 18:17 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin

Over the course of the years, we developed a sub-command that gathers
diagnostic data into a .zip file that can then be attached to bug reports.
This sub-command turned out to be very useful in helping Scalar developers
identify and fix issues.

Changes since v4:

 * Squashed in Junio's suggested fixups
 * Renamed the option from --add-file-with-content=<name>:<content> to
   --add-virtual-file=<name>:<content>
 * Fixed one instance where I had used error() instead of error_errno().

Changes since v3:

 * We're now using unquote_c_style() instead of rolling our own unquoter.
 * Fixed the added regression test.
 * As pointed out by Scalar's Functional Tests, the
   add_directory_to_archiver() function should not fail when scalar diagnose
   encounters FSMonitor's Unix socket, but only warn instead.
 * Related: add_directory_to_archiver() needs to propagate errors from
   processing subdirectories so that the top-level call returns an error,
   too.

Changes since v2:

 * Clarified in the commit message what the biggest benefit of
   --add-file-with-content is.
 * The <path> part of the -add-file-with-content argument can now contain
   colons. To do this, the path needs to start and end in double-quote
   characters (which are stripped), and the backslash serves as escape
   character in that case (to allow the path to contain both colons and
   double-quotes).
 * Fixed incorrect grammar.
 * Instead of strcmp(<what-we-don't-want>), we now say
   !strcmp(<what-we-want>).
 * The help text for --add-file-with-content was improved a tiny bit.
 * Adjusted the commit message that still talked about spawning plenty of
   processes and about a throw-away repository for the sake of generating a
   .zip file.
 * Simplified the code that shows the diagnostics and adds them to the .zip
   file.
 * The final message that reports that the archive is complete is now
   printed to stderr instead of stdout.

Changes since v1:

 * Instead of creating a throw-away repository, staging the contents of the
   .zip file and then using git write-tree and git archive to write the .zip
   file, the patch series now introduces a new option to git archive and
   uses write_archive() directly (avoiding any separate process).
 * Since the command avoids separate processes, it is now blazing fast on
   Windows, and I dropped the spinner() function because it's no longer
   needed.
 * While reworking the test case, I noticed that scalar [...] <enlistment>
   failed to verify that the specified directory exists, and would happily
   "traverse to its parent directory" on its quest to find a Scalar
   enlistment. That is of course incorrect, and has been fixed as a "while
   at it" sort of preparatory commit.
 * I had forgotten to sign off on all the commits, which has been fixed.
 * Instead of some "home-grown" readdir()-based function, the code now uses
   for_each_file_in_pack_dir() to look through the pack directories.
 * If any alternates are configured, their pack directories are now included
   in the output.
 * The commit message that might be interpreted to promise information about
   large loose files has been corrected to no longer promise that.
 * The test cases have been adjusted to test a little bit more (e.g.
   verifying that specific paths are mentioned in the output, instead of
   merely verifying that the output is non-empty).

Johannes Schindelin (5):
  archive: optionally add "virtual" files
  archive --add-file-with-contents: allow paths containing colons
  scalar: validate the optional enlistment argument
  Implement `scalar diagnose`
  scalar diagnose: include disk space information

Matthew John Cheetham (2):
  scalar: teach `diagnose` to gather packfile info
  scalar: teach `diagnose` to gather loose objects information

 Documentation/git-archive.txt    |  17 ++
 archive.c                        |  63 ++++++-
 contrib/scalar/scalar.c          | 292 ++++++++++++++++++++++++++++++-
 contrib/scalar/scalar.txt        |  12 ++
 contrib/scalar/t/t9099-scalar.sh |  27 +++
 t/t5003-archive-zip.sh           |  20 +++
 6 files changed, 421 insertions(+), 10 deletions(-)


base-commit: ddc35d833dd6f9e8946b09cecd3311b8aa18d295
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1128%2Fdscho%2Fscalar-diagnose-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1128/dscho/scalar-diagnose-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/1128

Range-diff vs v4:

 1:  45662cf582a ! 1:  42e73fb0aac archive: optionally add "virtual" files
     @@ Documentation/git-archive.txt: OPTIONS
       	by concatenating the value for `--prefix` (if any) and the
       	basename of <file>.
       
     -+--add-file-with-content=<path>:<content>::
     ++--add-virtual-file=<path>:<content>::
      +	Add the specified contents to the archive.  Can be repeated to add
      +	multiple files.  The path of the file in the archive is built
      +	by concatenating the value for `--prefix` (if any) and the
     @@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int
      +		if (!S_ISREG(info->stat.st_mode))
      +			die(_("Not a regular file: %s"), path);
      +		info->content = NULL; /* read the file later */
     -+	} else {
     ++	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
      +		const char *colon = strchr(arg, ':');
      +		char *p;
      +
     @@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int
      +		info->stat.st_mode = S_IFREG | 0644;
      +		info->content = xstrdup(colon + 1);
      +		info->stat.st_size = strlen(info->content);
     ++	} else {
     ++		BUG("add_file_cb() called for %s", opt->long_name);
      +	}
      +	item = string_list_append_nodup(&args->extra_files, path);
      +	item->util = info;
     @@ archive.c: static int parse_archive_args(int argc, const char **argv,
       		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
       		  N_("add untracked file to archive"), 0, add_file_cb,
       		  (intptr_t)&base },
     -+		{ OPTION_CALLBACK, 0, "add-file-with-content", args,
     ++		{ OPTION_CALLBACK, 0, "add-virtual-file", args,
      +		  N_("path:content"), N_("add untracked file to archive"), 0,
      +		  add_file_cb, (intptr_t)&base },
       		OPT_STRING('o', "output", &output, N_("file"),
     @@ t/t5003-archive-zip.sh: test_expect_success 'git archive --format=zip --add-file
       check_zip with_untracked
       check_added with_untracked untracked untracked
       
     -+test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
     ++test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
      +	git archive --format=zip >with_file_with_content.zip \
     -+		--add-file-with-content=hello:world $EMPTY_TREE &&
     ++		--add-virtual-file=hello:world $EMPTY_TREE &&
      +	test_when_finished "rm -rf tmp-unpack" &&
      +	mkdir tmp-unpack && (
      +		cd tmp-unpack &&
 2:  fdba4ed6f4d ! 2:  b5ebd61066a archive --add-file-with-contents: allow paths containing colons
     @@ archive.c
      @@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int unset)
       			die(_("Not a regular file: %s"), path);
       		info->content = NULL; /* read the file later */
     - 	} else {
     + 	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
      -		const char *colon = strchr(arg, ':');
      -		char *p;
      +		struct strbuf buf = STRBUF_INIT;
     @@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int
      -		info->content = xstrdup(colon + 1);
      +		info->content = xstrdup(p + 1);
       		info->stat.st_size = strlen(info->content);
     - 	}
     - 	item = string_list_append_nodup(&args->extra_files, path);
     + 	} else {
     + 		BUG("add_file_cb() called for %s", opt->long_name);
      
       ## t/t5003-archive-zip.sh ##
      @@ t/t5003-archive-zip.sh: check_zip with_untracked
       check_added with_untracked untracked untracked
       
     - test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
     + test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
      +	if test_have_prereq FUNNYNAMES
      +	then
      +		QUOTED=quoted:colon
     @@ t/t5003-archive-zip.sh: check_zip with_untracked
      +		QUOTED=quoted
      +	fi &&
       	git archive --format=zip >with_file_with_content.zip \
     -+		--add-file-with-content=\"$QUOTED\": \
     - 		--add-file-with-content=hello:world $EMPTY_TREE &&
     ++		--add-virtual-file=\"$QUOTED\": \
     + 		--add-virtual-file=hello:world $EMPTY_TREE &&
       	test_when_finished "rm -rf tmp-unpack" &&
       	mkdir tmp-unpack && (
       		cd tmp-unpack &&
 3:  da9f52a8240 = 3:  f1ba69c02d7 scalar: validate the optional enlistment argument
 4:  87bdc22322b ! 4:  3fb90194744 Implement `scalar diagnose`
     @@ contrib/scalar/scalar.c: static int unregister_dir(void)
      +	int res = 0;
      +
      +	if (!dir)
     -+		return error(_("could not open directory '%s'"), path);
     ++		return error_errno(_("could not open directory '%s'"), path);
      +
      +	if (!at_root)
      +		strbuf_addf(&buf, "%s/", path);
     @@ contrib/scalar/scalar.c: cleanup:
      +	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
      +	write_or_die(stdout_fd, buf.buf, buf.len);
      +	strvec_pushf(&archiver_args,
     -+		     "--add-file-with-content=diagnostics.log:%.*s",
     ++		     "--add-virtual-file=diagnostics.log:%.*s",
      +		     (int)buf.len, buf.buf);
      +
      +	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 5:  3f63b197d42 ! 5:  2e645b08a9e scalar diagnose: include disk space information
     @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
      +	get_disk_info(&buf);
       	write_or_die(stdout_fd, buf.buf, buf.len);
       	strvec_pushf(&archiver_args,
     - 		     "--add-file-with-content=diagnostics.log:%.*s",
     + 		     "--add-virtual-file=diagnostics.log:%.*s",
      
       ## contrib/scalar/t/t9099-scalar.sh ##
      @@ contrib/scalar/t/t9099-scalar.sh: SQ="'"
 6:  fc1319338fc ! 6:  0fa20d73750 scalar: teach `diagnose` to gather packfile info
     @@ contrib/scalar/scalar.c: cleanup:
       {
       	struct option options[] = {
      @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
     - 		     "--add-file-with-content=diagnostics.log:%.*s",
     + 		     "--add-virtual-file=diagnostics.log:%.*s",
       		     (int)buf.len, buf.buf);
       
      +	strbuf_reset(&buf);
     -+	strbuf_addstr(&buf, "--add-file-with-content=packs-local.txt:");
     ++	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
      +	dir_file_stats(the_repository->objects->odb, &buf);
      +	foreach_alt_odb(dir_file_stats, &buf);
      +	strvec_push(&archiver_args, buf.buf);
 7:  e8f5b42f7b7 ! 7:  62e173b47cf scalar: teach `diagnose` to gather loose objects information
     @@ contrib/scalar/scalar.c: static int cmd_diagnose(int argc, const char **argv)
       	strvec_push(&archiver_args, buf.buf);
       
      +	strbuf_reset(&buf);
     -+	strbuf_addstr(&buf, "--add-file-with-content=objects-local.txt:");
     ++	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
      +	loose_objs_stats(&buf, ".git/objects");
      +	strvec_push(&archiver_args, buf.buf);
      +

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v5 1/7] archive: optionally add "virtual" files
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
@ 2022-05-19 18:17         ` Johannes Schindelin via GitGitGadget
  2022-05-20 14:41           ` René Scharfe
  2022-05-19 18:17         ` [PATCH v5 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
                           ` (7 subsequent siblings)
  8 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-19 18:17 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

With the `--add-file-with-content=<path>:<content>` option, `git
archive` now supports use cases where relatively trivial files need to
be added that do not exist on disk.

This will allow us to generate `.zip` files with generated content,
without having to add said content to the object database and without
having to write it out to disk.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 11 ++++++++
 archive.c                     | 53 +++++++++++++++++++++++++++++------
 t/t5003-archive-zip.sh        | 12 ++++++++
 3 files changed, 68 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index bc4e76a7834..893cb1075bf 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -61,6 +61,17 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 
+--add-virtual-file=<path>:<content>::
+	Add the specified contents to the archive.  Can be repeated to add
+	multiple files.  The path of the file in the archive is built
+	by concatenating the value for `--prefix` (if any) and the
+	basename of <file>.
++
+The `<path>` cannot contain any colon, the file mode is limited to
+a regular file, and the option may be subject to platform-dependent
+command-line limits. For non-trivial cases, write an untracked file
+and use `--add-file` instead.
+
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
 	as well (see <<ATTRIBUTES>>).
diff --git a/archive.c b/archive.c
index a3bbb091256..d20e16fa819 100644
--- a/archive.c
+++ b/archive.c
@@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
 struct extra_file_info {
 	char *base;
 	struct stat stat;
+	void *content;
 };
 
 int write_archive_entries(struct archiver_args *args,
@@ -337,7 +338,13 @@ int write_archive_entries(struct archiver_args *args,
 		strbuf_addstr(&path_in_archive, basename(path));
 
 		strbuf_reset(&content);
-		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
+		if (info->content)
+			err = write_entry(args, &fake_oid, path_in_archive.buf,
+					  path_in_archive.len,
+					  info->stat.st_mode,
+					  info->content, info->stat.st_size);
+		else if (strbuf_read_file(&content, path,
+					  info->stat.st_size) < 0)
 			err = error_errno(_("could not read '%s'"), path);
 		else
 			err = write_entry(args, &fake_oid, path_in_archive.buf,
@@ -493,6 +500,7 @@ static void extra_file_info_clear(void *util, const char *str)
 {
 	struct extra_file_info *info = util;
 	free(info->base);
+	free(info->content);
 	free(info);
 }
 
@@ -514,14 +522,40 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 	if (!arg)
 		return -1;
 
-	path = prefix_filename(args->prefix, arg);
-	item = string_list_append_nodup(&args->extra_files, path);
-	item->util = info = xmalloc(sizeof(*info));
+	info = xmalloc(sizeof(*info));
 	info->base = xstrdup_or_null(base);
-	if (stat(path, &info->stat))
-		die(_("File not found: %s"), path);
-	if (!S_ISREG(info->stat.st_mode))
-		die(_("Not a regular file: %s"), path);
+
+	if (!strcmp(opt->long_name, "add-file")) {
+		path = prefix_filename(args->prefix, arg);
+		if (stat(path, &info->stat))
+			die(_("File not found: %s"), path);
+		if (!S_ISREG(info->stat.st_mode))
+			die(_("Not a regular file: %s"), path);
+		info->content = NULL; /* read the file later */
+	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
+		const char *colon = strchr(arg, ':');
+		char *p;
+
+		if (!colon)
+			die(_("missing colon: '%s'"), arg);
+
+		p = xstrndup(arg, colon - arg);
+		if (!args->prefix)
+			path = p;
+		else {
+			path = prefix_filename(args->prefix, p);
+			free(p);
+		}
+		memset(&info->stat, 0, sizeof(info->stat));
+		info->stat.st_mode = S_IFREG | 0644;
+		info->content = xstrdup(colon + 1);
+		info->stat.st_size = strlen(info->content);
+	} else {
+		BUG("add_file_cb() called for %s", opt->long_name);
+	}
+	item = string_list_append_nodup(&args->extra_files, path);
+	item->util = info;
+
 	return 0;
 }
 
@@ -554,6 +588,9 @@ static int parse_archive_args(int argc, const char **argv,
 		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
 		  N_("add untracked file to archive"), 0, add_file_cb,
 		  (intptr_t)&base },
+		{ OPTION_CALLBACK, 0, "add-virtual-file", args,
+		  N_("path:content"), N_("add untracked file to archive"), 0,
+		  add_file_cb, (intptr_t)&base },
 		OPT_STRING('o', "output", &output, N_("file"),
 			N_("write the archive to this file")),
 		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 1e6d18b140e..ebc26e89a9b 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -206,6 +206,18 @@ test_expect_success 'git archive --format=zip --add-file' '
 check_zip with_untracked
 check_added with_untracked untracked untracked
 
+test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
+	git archive --format=zip >with_file_with_content.zip \
+		--add-virtual-file=hello:world $EMPTY_TREE &&
+	test_when_finished "rm -rf tmp-unpack" &&
+	mkdir tmp-unpack && (
+		cd tmp-unpack &&
+		"$GIT_UNZIP" ../with_file_with_content.zip &&
+		test_path_is_file hello &&
+		test world = $(cat hello)
+	)
+'
+
 test_expect_success 'git archive --format=zip --add-file twice' '
 	echo untracked >untracked &&
 	git archive --format=zip --prefix=one/ --add-file=untracked \
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v5 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
  2022-05-19 18:17         ` [PATCH v5 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
@ 2022-05-19 18:17         ` Johannes Schindelin via GitGitGadget
  2022-05-19 18:17         ` [PATCH v5 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
                           ` (6 subsequent siblings)
  8 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-19 18:17 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

By allowing the path to be enclosed in double-quotes, we can avoid
the limitation that paths cannot contain colons.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 14 ++++++++++----
 archive.c                     | 30 ++++++++++++++++++++----------
 t/t5003-archive-zip.sh        |  8 ++++++++
 3 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index 893cb1075bf..54de945a84e 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -67,10 +67,16 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 +
-The `<path>` cannot contain any colon, the file mode is limited to
-a regular file, and the option may be subject to platform-dependent
-command-line limits. For non-trivial cases, write an untracked file
-and use `--add-file` instead.
+The `<path>` argument can start and end with a literal double-quote
+character; The contained file name is interpreted as a C-style string,
+i.e. the backslash is interpreted as escape character. The path must
+be quoted if it contains a colon, to avoid the colon from being
+misinterpreted as the separator between the path and the contents, or
+if the path begins or ends with a double-quote character.
++
+The file mode is limited to a regular file, and the option may be
+subject to platform-dependent command-line limits. For non-trivial
+cases, write an untracked file and use `--add-file` instead.
 
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
diff --git a/archive.c b/archive.c
index d20e16fa819..b7756b91200 100644
--- a/archive.c
+++ b/archive.c
@@ -9,6 +9,7 @@
 #include "parse-options.h"
 #include "unpack-trees.h"
 #include "dir.h"
+#include "quote.h"
 
 static char const * const archive_usage[] = {
 	N_("git archive [<options>] <tree-ish> [<path>...]"),
@@ -533,22 +534,31 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 			die(_("Not a regular file: %s"), path);
 		info->content = NULL; /* read the file later */
 	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
-		const char *colon = strchr(arg, ':');
-		char *p;
+		struct strbuf buf = STRBUF_INIT;
+		const char *p = arg;
+
+		if (*p != '"')
+			p = strchr(p, ':');
+		else if (unquote_c_style(&buf, p, &p) < 0)
+			die(_("unclosed quote: '%s'"), arg);
 
-		if (!colon)
+		if (!p || *p != ':')
 			die(_("missing colon: '%s'"), arg);
 
-		p = xstrndup(arg, colon - arg);
-		if (!args->prefix)
-			path = p;
-		else {
-			path = prefix_filename(args->prefix, p);
-			free(p);
+		if (p == arg)
+			die(_("empty file name: '%s'"), arg);
+
+		path = buf.len ?
+			strbuf_detach(&buf, NULL) : xstrndup(arg, p - arg);
+
+		if (args->prefix) {
+			char *save = path;
+			path = prefix_filename(args->prefix, path);
+			free(save);
 		}
 		memset(&info->stat, 0, sizeof(info->stat));
 		info->stat.st_mode = S_IFREG | 0644;
-		info->content = xstrdup(colon + 1);
+		info->content = xstrdup(p + 1);
 		info->stat.st_size = strlen(info->content);
 	} else {
 		BUG("add_file_cb() called for %s", opt->long_name);
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index ebc26e89a9b..50932a866c9 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -207,13 +207,21 @@ check_zip with_untracked
 check_added with_untracked untracked untracked
 
 test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
+	if test_have_prereq FUNNYNAMES
+	then
+		QUOTED=quoted:colon
+	else
+		QUOTED=quoted
+	fi &&
 	git archive --format=zip >with_file_with_content.zip \
+		--add-virtual-file=\"$QUOTED\": \
 		--add-virtual-file=hello:world $EMPTY_TREE &&
 	test_when_finished "rm -rf tmp-unpack" &&
 	mkdir tmp-unpack && (
 		cd tmp-unpack &&
 		"$GIT_UNZIP" ../with_file_with_content.zip &&
 		test_path_is_file hello &&
+		test_path_is_file $QUOTED &&
 		test world = $(cat hello)
 	)
 '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v5 3/7] scalar: validate the optional enlistment argument
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
  2022-05-19 18:17         ` [PATCH v5 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
  2022-05-19 18:17         ` [PATCH v5 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
@ 2022-05-19 18:17         ` Johannes Schindelin via GitGitGadget
  2022-05-19 18:18         ` [PATCH v5 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
                           ` (5 subsequent siblings)
  8 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-19 18:17 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `scalar` command needs a Scalar enlistment for many subcommands, and
looks in the current directory for such an enlistment (traversing the
parent directories until it finds one).

These is subcommands can also be called with an optional argument
specifying the enlistment. Here, too, we traverse parent directories as
needed, until we find an enlistment.

However, if the specified directory does not even exist, or is not a
directory, we should stop right there, with an error message.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 6 ++++--
 contrib/scalar/t/t9099-scalar.sh | 5 +++++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 1ce9c2b00e8..00dcd4b50ef 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -43,9 +43,11 @@ static void setup_enlistment_directory(int argc, const char **argv,
 		usage_with_options(usagestr, options);
 
 	/* find the worktree, determine its corresponding root */
-	if (argc == 1)
+	if (argc == 1) {
 		strbuf_add_absolute_path(&path, argv[0]);
-	else if (strbuf_getcwd(&path) < 0)
+		if (!is_directory(path.buf))
+			die(_("'%s' does not exist"), path.buf);
+	} else if (strbuf_getcwd(&path) < 0)
 		die(_("need a working directory"));
 
 	strbuf_trim_trailing_dir_sep(&path);
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 2e1502ad45e..9d83fdf25e8 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -85,4 +85,9 @@ test_expect_success 'scalar delete with enlistment' '
 	test_path_is_missing cloned
 '
 
+test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
+	! scalar run config cloned 2>err &&
+	grep "cloned. does not exist" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v5 4/7] Implement `scalar diagnose`
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
                           ` (2 preceding siblings ...)
  2022-05-19 18:17         ` [PATCH v5 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
@ 2022-05-19 18:18         ` Johannes Schindelin via GitGitGadget
  2022-05-19 18:18         ` [PATCH v5 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
                           ` (4 subsequent siblings)
  8 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-19 18:18 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Over the course of Scalar's development, it became obvious that there is
a need for a command that can gather all kinds of useful information
that can help identify the most typical problems with large
worktrees/repositories.

The `diagnose` command is the culmination of this hard-won knowledge: it
gathers the installed hooks, the config, a couple statistics describing
the data shape, among other pieces of information, and then wraps
everything up in a tidy, neat `.zip` archive.

Note: originally, Scalar was implemented in C# using the .NET API, where
we had the luxury of a comprehensive standard library that includes
basic functionality such as writing a `.zip` file. In the C version, we
lack such a commodity. Rather than introducing a dependency on, say,
libzip, we slightly abuse Git's `archive` machinery: we write out a
`.zip` of the empty try, augmented by a couple files that are added via
the `--add-file*` options. We are careful trying not to modify the
current repository in any way lest the very circumstances that required
`scalar diagnose` to be run are changed by the `diagnose` run itself.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 144 +++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt        |  12 +++
 contrib/scalar/t/t9099-scalar.sh |  14 +++
 3 files changed, 170 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 00dcd4b50ef..53213f9a3b9 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,6 +11,7 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
+#include "archive.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -261,6 +262,47 @@ static int unregister_dir(void)
 	return res;
 }
 
+static int add_directory_to_archiver(struct strvec *archiver_args,
+					  const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir = opendir(at_root ? "." : path);
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	if (!dir)
+		return error_errno(_("could not open directory '%s'"), path);
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if (e->d_type == DT_REG)
+			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
+		else if (e->d_type != DT_DIR)
+			warning(_("skipping '%s', which is neither file nor "
+				  "directory"), buf.buf);
+		else if (recurse &&
+			 add_directory_to_archiver(archiver_args,
+						   buf.buf, recurse) < 0)
+			res = -1;
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -501,6 +543,107 @@ cleanup:
 	return res;
 }
 
+static int cmd_diagnose(int argc, const char **argv)
+{
+	struct option options[] = {
+		OPT_END(),
+	};
+	const char * const usage[] = {
+		N_("scalar diagnose [<enlistment>]"),
+		NULL
+	};
+	struct strbuf zip_path = STRBUF_INIT;
+	struct strvec archiver_args = STRVEC_INIT;
+	char **argv_copy = NULL;
+	int stdout_fd = -1, archiver_fd = -1;
+	time_t now = time(NULL);
+	struct tm tm;
+	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
+	int res = 0;
+
+	argc = parse_options(argc, argv, NULL, options,
+			     usage, 0);
+
+	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
+
+	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
+	strbuf_addftime(&zip_path,
+			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
+	strbuf_addstr(&zip_path, ".zip");
+	switch (safe_create_leading_directories(zip_path.buf)) {
+	case SCLD_EXISTS:
+	case SCLD_OK:
+		break;
+	default:
+		error_errno(_("could not create directory for '%s'"),
+			    zip_path.buf);
+		goto diagnose_cleanup;
+	}
+	stdout_fd = dup(1);
+	if (stdout_fd < 0) {
+		res = error_errno(_("could not duplicate stdout"));
+		goto diagnose_cleanup;
+	}
+
+	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
+		res = error_errno(_("could not redirect output"));
+		goto diagnose_cleanup;
+	}
+
+	init_zip_archiver();
+	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	write_or_die(stdout_fd, buf.buf, buf.len);
+	strvec_pushf(&archiver_args,
+		     "--add-virtual-file=diagnostics.log:%.*s",
+		     (int)buf.len, buf.buf);
+
+	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	strvec_pushl(&archiver_args, "--prefix=",
+		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
+
+	/* `write_archive()` modifies the `argv` passed to it. Let it. */
+	argv_copy = xmemdupz(archiver_args.v,
+			     sizeof(char *) * archiver_args.nr);
+	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
+			    the_repository, NULL, 0);
+	if (res) {
+		error(_("failed to write archive"));
+		goto diagnose_cleanup;
+	}
+
+	if (!res)
+		fprintf(stderr, "\n"
+		       "Diagnostics complete.\n"
+		       "All of the gathered info is captured in '%s'\n",
+		       zip_path.buf);
+
+diagnose_cleanup:
+	if (archiver_fd >= 0) {
+		close(1);
+		dup2(stdout_fd, 1);
+	}
+	free(argv_copy);
+	strvec_clear(&archiver_args);
+	strbuf_release(&zip_path);
+	strbuf_release(&path);
+	strbuf_release(&buf);
+
+	return res;
+}
+
 static int cmd_list(int argc, const char **argv)
 {
 	if (argc != 1)
@@ -802,6 +945,7 @@ static struct {
 	{ "reconfigure", cmd_reconfigure },
 	{ "delete", cmd_delete },
 	{ "version", cmd_version },
+	{ "diagnose", cmd_diagnose },
 	{ NULL, NULL},
 };
 
diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
index f416d637289..22583fe046e 100644
--- a/contrib/scalar/scalar.txt
+++ b/contrib/scalar/scalar.txt
@@ -14,6 +14,7 @@ scalar register [<enlistment>]
 scalar unregister [<enlistment>]
 scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
 scalar reconfigure [ --all | <enlistment> ]
+scalar diagnose [<enlistment>]
 scalar delete <enlistment>
 
 DESCRIPTION
@@ -129,6 +130,17 @@ reconfigure the enlistment.
 With the `--all` option, all enlistments currently registered with Scalar
 will be reconfigured. Use this option after each Scalar upgrade.
 
+Diagnose
+~~~~~~~~
+
+diagnose [<enlistment>]::
+    When reporting issues with Scalar, it is often helpful to provide the
+    information gathered by this command, including logs and certain
+    statistics describing the data shape of the current enlistment.
++
+The output of this command is a `.zip` file that is written into
+a directory adjacent to the worktree in the `src` directory.
+
 Delete
 ~~~~~~
 
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 9d83fdf25e8..6802d317258 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -90,4 +90,18 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 	grep "cloned. does not exist" err
 '
 
+SQ="'"
+test_expect_success UNZIP 'scalar diagnose' '
+	scalar clone "file://$(pwd)" cloned --single-branch &&
+	scalar diagnose cloned >out 2>err &&
+	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
+	zip_path=$(cat zip_path) &&
+	test -n "$zip_path" &&
+	unzip -v "$zip_path" &&
+	folder=${zip_path%.zip} &&
+	test_path_is_missing "$folder" &&
+	unzip -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v5 5/7] scalar diagnose: include disk space information
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
                           ` (3 preceding siblings ...)
  2022-05-19 18:18         ` [PATCH v5 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-05-19 18:18         ` Johannes Schindelin via GitGitGadget
  2022-05-19 18:18         ` [PATCH v5 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
                           ` (3 subsequent siblings)
  8 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-19 18:18 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When analyzing problems with large worktrees/repositories, it is useful
to know how close to a "full disk" situation Scalar/Git operates. Let's
include this information.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 53 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  1 +
 2 files changed, 54 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 53213f9a3b9..0a9e25a57f8 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -303,6 +303,58 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 	return res;
 }
 
+#ifndef WIN32
+#include <sys/statvfs.h>
+#endif
+
+static int get_disk_info(struct strbuf *out)
+{
+#ifdef WIN32
+	struct strbuf buf = STRBUF_INIT;
+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
+	DWORD serial_number, component_length, flags;
+	ULARGE_INTEGER avail2caller, total, avail;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
+		error(_("could not determine free disk size for '%s'"),
+		      buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
+				   &serial_number, &component_length, &flags,
+				   fs_name, sizeof(fs_name))) {
+		error(_("could not get info for '%s'"), buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
+	strbuf_addch(out, '\n');
+	strbuf_release(&buf);
+#else
+	struct strbuf buf = STRBUF_INIT;
+	struct statvfs stat;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (statvfs(buf.buf, &stat) < 0) {
+		error_errno(_("could not determine free disk size for '%s'"),
+			    buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
+	strbuf_release(&buf);
+#endif
+	return 0;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -599,6 +651,7 @@ static int cmd_diagnose(int argc, const char **argv)
 	get_version_info(&buf, 1);
 
 	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	get_disk_info(&buf);
 	write_or_die(stdout_fd, buf.buf, buf.len);
 	strvec_pushf(&archiver_args,
 		     "--add-virtual-file=diagnostics.log:%.*s",
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 6802d317258..934b2485d91 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -94,6 +94,7 @@ SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	scalar diagnose cloned >out 2>err &&
+	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
 	zip_path=$(cat zip_path) &&
 	test -n "$zip_path" &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v5 6/7] scalar: teach `diagnose` to gather packfile info
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
                           ` (4 preceding siblings ...)
  2022-05-19 18:18         ` [PATCH v5 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
@ 2022-05-19 18:18         ` Matthew John Cheetham via GitGitGadget
  2022-05-19 18:18         ` [PATCH v5 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
                           ` (2 subsequent siblings)
  8 siblings, 0 replies; 109+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-05-19 18:18 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

It's helpful to see if there are other crud files in the pack
directory. Let's teach the `scalar diagnose` command to gather
file size information about pack files.

While at it, also enumerate the pack files in the alternate
object directories, if any are registered.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 30 ++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  6 +++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 0a9e25a57f8..d302c27e114 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "help.h"
 #include "archive.h"
+#include "object-store.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -595,6 +596,29 @@ cleanup:
 	return res;
 }
 
+static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
+				   const char *file_name, void *data)
+{
+	struct strbuf *buf = data;
+	struct stat st;
+
+	if (!stat(full_path, &st))
+		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
+			    (uintmax_t)st.st_size);
+}
+
+static int dir_file_stats(struct object_directory *object_dir, void *data)
+{
+	struct strbuf *buf = data;
+
+	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
+
+	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
+				  data);
+
+	return 0;
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -657,6 +681,12 @@ static int cmd_diagnose(int argc, const char **argv)
 		     "--add-virtual-file=diagnostics.log:%.*s",
 		     (int)buf.len, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
+	dir_file_stats(the_repository->objects->odb, &buf);
+	foreach_alt_odb(dir_file_stats, &buf);
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 934b2485d91..3dd5650cceb 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -93,6 +93,8 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
+	git repack &&
+	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
 	scalar diagnose cloned >out 2>err &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
@@ -102,7 +104,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	folder=${zip_path%.zip} &&
 	test_path_is_missing "$folder" &&
 	unzip -p "$zip_path" diagnostics.log >out &&
-	test_file_not_empty out
+	test_file_not_empty out &&
+	unzip -p "$zip_path" packs-local.txt >out &&
+	grep "$(pwd)/.git/objects" out
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v5 7/7] scalar: teach `diagnose` to gather loose objects information
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
                           ` (5 preceding siblings ...)
  2022-05-19 18:18         ` [PATCH v5 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
@ 2022-05-19 18:18         ` Matthew John Cheetham via GitGitGadget
  2022-05-19 19:23         ` [PATCH v5 0/7] scalar: implement the subcommand "diagnose" Junio C Hamano
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
  8 siblings, 0 replies; 109+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-05-19 18:18 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

When operating at the scale that Scalar wants to support, certain data
shapes are more likely to cause undesirable performance issues, such as
large numbers of loose objects.

By including statistics about this, `scalar diagnose` now makes it
easier to identify such scenarios.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 59 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  5 ++-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index d302c27e114..0c278681758 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -619,6 +619,60 @@ static int dir_file_stats(struct object_directory *object_dir, void *data)
 	return 0;
 }
 
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -687,6 +741,11 @@ static int cmd_diagnose(int argc, const char **argv)
 	foreach_alt_odb(dir_file_stats, &buf);
 	strvec_push(&archiver_args, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
+	loose_objs_stats(&buf, ".git/objects");
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 3dd5650cceb..72023a1ca1d 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -95,6 +95,7 @@ test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	git repack &&
 	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
+	test_commit -C cloned/src loose &&
 	scalar diagnose cloned >out 2>err &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
@@ -106,7 +107,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	unzip -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
 	unzip -p "$zip_path" packs-local.txt >out &&
-	grep "$(pwd)/.git/objects" out
+	grep "$(pwd)/.git/objects" out &&
+	unzip -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [1-9]" out
 '
 
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons
  2022-05-19 18:09           ` Johannes Schindelin
@ 2022-05-19 18:44             ` Junio C Hamano
  0 siblings, 0 replies; 109+ messages in thread
From: Junio C Hamano @ 2022-05-19 18:44 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, René Scharfe,
	Taylor Blau, Derrick Stolee, Elijah Newren

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> >  	git archive --format=zip >with_file_with_content.zip \
>> > +		--add-file-with-content=\"$QUOTED\": \
>> >  		--add-file-with-content=hello:world $EMPTY_TREE &&
>> >  	test_when_finished "rm -rf tmp-unpack" &&
>> >  	mkdir tmp-unpack && (
>> >  		cd tmp-unpack &&
>> >  		"$GIT_UNZIP" ../with_file_with_content.zip &&
>> >  		test_path_is_file hello &&
>> > +		test_path_is_file $QUOTED &&
>>
>> Looks OK, even though it probably is a good idea to have dq around
>> $QUOTED, so that future developers can easily insert SP into its
>> value to use a bit more common but still a bit more problematic
>> pathnames in the test.
>
> I actually decided against this because reading
>
> 	"$QUOTED"
>
> would mislead future me to think that the double quotes that enclose
> $QUOTED are the quotes that the variable's name talks about. But the
> quotes are actually the escaped ones that are passed to `git archive`
> above.

>
> So, to help future Dscho should they read this code six months from now or
> even later, I wanted to specifically only add quotes to the `git archive`
> call to make the intention abundantly clear.

If you find "$QUOTED" misleads any reader to think QUOTED may have
some quote characters in there, you could rename it, of course, to
signal what the value is (e.g. $PATHNAME) better.

But I think you misunderstood my comment completely.

What I meant was to write these lines like:

	--add-file-with-content=\""$QUOTED"\":
	test_path_is_file "$QUOTED"

Because the value in QUOTED can have $IFS whitespaces in it (after
all, allowing random letters like colon, quotes and whitespaces is
why we are adding this unquote_c_style() call), and without the
extra double quotes to protect the parameter expansion of $QUOTED,
the command line is broken.

So, don't decide against it; the reasoning behind that decision is
simply wrong.

Thanks.


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v5 0/7] scalar: implement the subcommand "diagnose"
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
                           ` (6 preceding siblings ...)
  2022-05-19 18:18         ` [PATCH v5 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
@ 2022-05-19 19:23         ` Junio C Hamano
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
  8 siblings, 0 replies; 109+ messages in thread
From: Junio C Hamano @ 2022-05-19 19:23 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, René Scharfe, Taylor Blau, Derrick Stolee,
	Elijah Newren, rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> Changes since v4:
>
>  * Squashed in Junio's suggested fixups
>  * Renamed the option from --add-file-with-content=<name>:<content> to
>    --add-virtual-file=<name>:<content>

;-)  5 letters shorter and is a good name.

>  * Fixed one instance where I had used error() instead of error_errno().

Looks good.

Thanks.  Will replace and queue.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-18 17:35           ` Junio C Hamano
@ 2022-05-20  7:30             ` Ævar Arnfjörð Bjarmason
  2022-05-20 15:55               ` Johannes Schindelin
  0 siblings, 1 reply; 109+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-20  7:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, René Scharfe,
	Taylor Blau, Derrick Stolee, Elijah Newren, Johannes Schindelin


On Wed, May 18 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>>> +test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
>>> +	! scalar run config cloned 2>err &&
>>
>> Needs to use test_must_fail, not !
>
> Good eyes and careful reading are very much appreciated, but in this
> case, doesn't such an improvement depend on an update to teach
> test_must_fail_acceptable about scalar being whitelisted?

Yes, I think so (but haven't tested it just now), but it's a relatively
small change to t/test-lib-functions.sh.

I was just noting the potential hidden segfault etc., the issue remains
in v5.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v5 1/7] archive: optionally add "virtual" files
  2022-05-19 18:17         ` [PATCH v5 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
@ 2022-05-20 14:41           ` René Scharfe
  2022-05-20 16:21             ` Junio C Hamano
  0 siblings, 1 reply; 109+ messages in thread
From: René Scharfe @ 2022-05-20 14:41 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git
  Cc: Taylor Blau, Derrick Stolee, Elijah Newren, rsbecker,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin

Am 19.05.22 um 20:17 schrieb Johannes Schindelin via GitGitGadget:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> With the `--add-file-with-content=<path>:<content>` option, `git
            ^^^^^^^^^^^^^^^^^^^^^^^
That's still the old option name.  Same in the subject of patch 2.

> archive` now supports use cases where relatively trivial files need to
> be added that do not exist on disk.
>
> This will allow us to generate `.zip` files with generated content,
> without having to add said content to the object database and without
> having to write it out to disk.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  Documentation/git-archive.txt | 11 ++++++++
>  archive.c                     | 53 +++++++++++++++++++++++++++++------
>  t/t5003-archive-zip.sh        | 12 ++++++++
>  3 files changed, 68 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
> index bc4e76a7834..893cb1075bf 100644
> --- a/Documentation/git-archive.txt
> +++ b/Documentation/git-archive.txt
> @@ -61,6 +61,17 @@ OPTIONS
>  	by concatenating the value for `--prefix` (if any) and the
>  	basename of <file>.
>
> +--add-virtual-file=<path>:<content>::
> +	Add the specified contents to the archive.  Can be repeated to add
> +	multiple files.  The path of the file in the archive is built
> +	by concatenating the value for `--prefix` (if any) and the
> +	basename of <file>.
> ++
> +The `<path>` cannot contain any colon, the file mode is limited to
> +a regular file, and the option may be subject to platform-dependent
> +command-line limits. For non-trivial cases, write an untracked file
> +and use `--add-file` instead.
> +
>  --worktree-attributes::
>  	Look for attributes in .gitattributes files in the working tree
>  	as well (see <<ATTRIBUTES>>).
> diff --git a/archive.c b/archive.c
> index a3bbb091256..d20e16fa819 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
>  struct extra_file_info {
>  	char *base;
>  	struct stat stat;
> +	void *content;
>  };
>
>  int write_archive_entries(struct archiver_args *args,
> @@ -337,7 +338,13 @@ int write_archive_entries(struct archiver_args *args,
>  		strbuf_addstr(&path_in_archive, basename(path));
>
>  		strbuf_reset(&content);
> -		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
> +		if (info->content)
> +			err = write_entry(args, &fake_oid, path_in_archive.buf,
> +					  path_in_archive.len,
> +					  info->stat.st_mode,
> +					  info->content, info->stat.st_size);
> +		else if (strbuf_read_file(&content, path,
> +					  info->stat.st_size) < 0)
>  			err = error_errno(_("could not read '%s'"), path);
>  		else
>  			err = write_entry(args, &fake_oid, path_in_archive.buf,
> @@ -493,6 +500,7 @@ static void extra_file_info_clear(void *util, const char *str)
>  {
>  	struct extra_file_info *info = util;
>  	free(info->base);
> +	free(info->content);
>  	free(info);
>  }
>
> @@ -514,14 +522,40 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
>  	if (!arg)
>  		return -1;
>
> -	path = prefix_filename(args->prefix, arg);
> -	item = string_list_append_nodup(&args->extra_files, path);
> -	item->util = info = xmalloc(sizeof(*info));
> +	info = xmalloc(sizeof(*info));
>  	info->base = xstrdup_or_null(base);
> -	if (stat(path, &info->stat))
> -		die(_("File not found: %s"), path);
> -	if (!S_ISREG(info->stat.st_mode))
> -		die(_("Not a regular file: %s"), path);
> +
> +	if (!strcmp(opt->long_name, "add-file")) {
> +		path = prefix_filename(args->prefix, arg);
> +		if (stat(path, &info->stat))
> +			die(_("File not found: %s"), path);
> +		if (!S_ISREG(info->stat.st_mode))
> +			die(_("Not a regular file: %s"), path);
> +		info->content = NULL; /* read the file later */
> +	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
> +		const char *colon = strchr(arg, ':');
> +		char *p;
> +
> +		if (!colon)
> +			die(_("missing colon: '%s'"), arg);
> +
> +		p = xstrndup(arg, colon - arg);
> +		if (!args->prefix)
> +			path = p;
> +		else {
> +			path = prefix_filename(args->prefix, p);
> +			free(p);
> +		}
> +		memset(&info->stat, 0, sizeof(info->stat));
> +		info->stat.st_mode = S_IFREG | 0644;
> +		info->content = xstrdup(colon + 1);
> +		info->stat.st_size = strlen(info->content);
> +	} else {
> +		BUG("add_file_cb() called for %s", opt->long_name);
> +	}
> +	item = string_list_append_nodup(&args->extra_files, path);
> +	item->util = info;
> +
>  	return 0;
>  }
>
> @@ -554,6 +588,9 @@ static int parse_archive_args(int argc, const char **argv,
>  		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
>  		  N_("add untracked file to archive"), 0, add_file_cb,
>  		  (intptr_t)&base },
> +		{ OPTION_CALLBACK, 0, "add-virtual-file", args,
> +		  N_("path:content"), N_("add untracked file to archive"), 0,
> +		  add_file_cb, (intptr_t)&base },
>  		OPT_STRING('o', "output", &output, N_("file"),
>  			N_("write the archive to this file")),
>  		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> index 1e6d18b140e..ebc26e89a9b 100755
> --- a/t/t5003-archive-zip.sh
> +++ b/t/t5003-archive-zip.sh
> @@ -206,6 +206,18 @@ test_expect_success 'git archive --format=zip --add-file' '
>  check_zip with_untracked
>  check_added with_untracked untracked untracked
>
> +test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
> +	git archive --format=zip >with_file_with_content.zip \
> +		--add-virtual-file=hello:world $EMPTY_TREE &&
> +	test_when_finished "rm -rf tmp-unpack" &&
> +	mkdir tmp-unpack && (
> +		cd tmp-unpack &&
> +		"$GIT_UNZIP" ../with_file_with_content.zip &&
> +		test_path_is_file hello &&
> +		test world = $(cat hello)
> +	)
> +'
> +
>  test_expect_success 'git archive --format=zip --add-file twice' '
>  	echo untracked >untracked &&
>  	git archive --format=zip --prefix=one/ --add-file=untracked \

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-20  7:30             ` Ævar Arnfjörð Bjarmason
@ 2022-05-20 15:55               ` Johannes Schindelin
  2022-05-21  9:54                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2022-05-20 15:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Johannes Schindelin via GitGitGadget, git,
	René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren

[-- Attachment #1: Type: text/plain, Size: 974 bytes --]

Hi Ævar,

On Fri, 20 May 2022, Ævar Arnfjörð Bjarmason wrote:

>
> On Wed, May 18 2022, Junio C Hamano wrote:
>
> > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> >
> >>> +test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
> >>> +	! scalar run config cloned 2>err &&
> >>
> >> Needs to use test_must_fail, not !
> >
> > Good eyes and careful reading are very much appreciated, but in this
> > case, doesn't such an improvement depend on an update to teach
> > test_must_fail_acceptable about scalar being whitelisted?
>
> Yes, I think so (but haven't tested it just now), but it's a relatively
> small change to t/test-lib-functions.sh.

Let it be noted that I fully agree with Junio that good eyes and careful
reading are very much appreciated. And that in this case, that would have
implied noticing that `test_must_fail` is reserved for Git commands.

Scalar is not (yet?) a Git command.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v5 1/7] archive: optionally add "virtual" files
  2022-05-20 14:41           ` René Scharfe
@ 2022-05-20 16:21             ` Junio C Hamano
  0 siblings, 0 replies; 109+ messages in thread
From: Junio C Hamano @ 2022-05-20 16:21 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Schindelin via GitGitGadget, git, Taylor Blau,
	Derrick Stolee, Elijah Newren, rsbecker,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin

René Scharfe <l.s.r@web.de> writes:

> Am 19.05.22 um 20:17 schrieb Johannes Schindelin via GitGitGadget:
>> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>>
>> With the `--add-file-with-content=<path>:<content>` option, `git
>             ^^^^^^^^^^^^^^^^^^^^^^^
> That's still the old option name.  Same in the subject of patch 2.

Good eyes, and thanks for catching what I missed---the risk of
relying too much on the range-diff X-<.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-20 15:55               ` Johannes Schindelin
@ 2022-05-21  9:54                 ` Ævar Arnfjörð Bjarmason
  2022-05-22  5:50                   ` Junio C Hamano
  0 siblings, 1 reply; 109+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-21  9:54 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Johannes Schindelin via GitGitGadget, git,
	René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren


On Fri, May 20 2022, Johannes Schindelin wrote:

> Hi Ævar,
>
> On Fri, 20 May 2022, Ævar Arnfjörð Bjarmason wrote:
>
>>
>> On Wed, May 18 2022, Junio C Hamano wrote:
>>
>> > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>> >
>> >>> +test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
>> >>> +	! scalar run config cloned 2>err &&
>> >>
>> >> Needs to use test_must_fail, not !
>> >
>> > Good eyes and careful reading are very much appreciated, but in this
>> > case, doesn't such an improvement depend on an update to teach
>> > test_must_fail_acceptable about scalar being whitelisted?
>>
>> Yes, I think so (but haven't tested it just now), but it's a relatively
>> small change to t/test-lib-functions.sh.
>
> Let it be noted that I fully agree with Junio that good eyes and careful
> reading are very much appreciated. And that in this case, that would have
> implied noticing that `test_must_fail` is reserved for Git commands.
>
> Scalar is not (yet?) a Git command.

"test-tool" isn't "git" either, so I think this argument is a
non-starter.

As the documentation for "test_must_fail" notes the distinction is
whether something is "system-supplied". I.e. we're not going to test
whether "grep" segfaults, but we should test our own code to see if it
segfaults.

The scalar code is code we ship and test, so we should use the helper
that doesn't hide a segfault.

I don't understand why you wouldn't think that's the obvious fix here,
adding "scalar" to that whitelist is a one-line fix, and clearly yields
a more useful end result than a test silently hiding segfaults.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v6 0/7] scalar: implement the subcommand "diagnose"
  2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
                           ` (7 preceding siblings ...)
  2022-05-19 19:23         ` [PATCH v5 0/7] scalar: implement the subcommand "diagnose" Junio C Hamano
@ 2022-05-21 15:08         ` Johannes Schindelin via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
                             ` (6 more replies)
  8 siblings, 7 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-21 15:08 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin

Over the course of the years, we developed a sub-command that gathers
diagnostic data into a .zip file that can then be attached to bug reports.
This sub-command turned out to be very useful in helping Scalar developers
identify and fix issues.

Changes since v5:

 * Reworded the missed mentions of the old name of the --add-virtual-file
   option (thanks René!).
 * Renamed misleading variable name from $QUOTED to $PATHNAME (thanks
   Junio!).

Changes since v4:

 * Squashed in Junio's suggested fixups
 * Renamed the option from --add-file-with-content=<name>:<content> to
   --add-virtual-file=<name>:<content>
 * Fixed one instance where I had used error() instead of error_errno().

Changes since v3:

 * We're now using unquote_c_style() instead of rolling our own unquoter.
 * Fixed the added regression test.
 * As pointed out by Scalar's Functional Tests, the
   add_directory_to_archiver() function should not fail when scalar diagnose
   encounters FSMonitor's Unix socket, but only warn instead.
 * Related: add_directory_to_archiver() needs to propagate errors from
   processing subdirectories so that the top-level call returns an error,
   too.

Changes since v2:

 * Clarified in the commit message what the biggest benefit of
   --add-file-with-content is.
 * The <path> part of the -add-file-with-content argument can now contain
   colons. To do this, the path needs to start and end in double-quote
   characters (which are stripped), and the backslash serves as escape
   character in that case (to allow the path to contain both colons and
   double-quotes).
 * Fixed incorrect grammar.
 * Instead of strcmp(<what-we-don't-want>), we now say
   !strcmp(<what-we-want>).
 * The help text for --add-file-with-content was improved a tiny bit.
 * Adjusted the commit message that still talked about spawning plenty of
   processes and about a throw-away repository for the sake of generating a
   .zip file.
 * Simplified the code that shows the diagnostics and adds them to the .zip
   file.
 * The final message that reports that the archive is complete is now
   printed to stderr instead of stdout.

Changes since v1:

 * Instead of creating a throw-away repository, staging the contents of the
   .zip file and then using git write-tree and git archive to write the .zip
   file, the patch series now introduces a new option to git archive and
   uses write_archive() directly (avoiding any separate process).
 * Since the command avoids separate processes, it is now blazing fast on
   Windows, and I dropped the spinner() function because it's no longer
   needed.
 * While reworking the test case, I noticed that scalar [...] <enlistment>
   failed to verify that the specified directory exists, and would happily
   "traverse to its parent directory" on its quest to find a Scalar
   enlistment. That is of course incorrect, and has been fixed as a "while
   at it" sort of preparatory commit.
 * I had forgotten to sign off on all the commits, which has been fixed.
 * Instead of some "home-grown" readdir()-based function, the code now uses
   for_each_file_in_pack_dir() to look through the pack directories.
 * If any alternates are configured, their pack directories are now included
   in the output.
 * The commit message that might be interpreted to promise information about
   large loose files has been corrected to no longer promise that.
 * The test cases have been adjusted to test a little bit more (e.g.
   verifying that specific paths are mentioned in the output, instead of
   merely verifying that the output is non-empty).

Johannes Schindelin (5):
  archive: optionally add "virtual" files
  archive --add-virtual-file: allow paths containing colons
  scalar: validate the optional enlistment argument
  Implement `scalar diagnose`
  scalar diagnose: include disk space information

Matthew John Cheetham (2):
  scalar: teach `diagnose` to gather packfile info
  scalar: teach `diagnose` to gather loose objects information

 Documentation/git-archive.txt    |  17 ++
 archive.c                        |  63 ++++++-
 contrib/scalar/scalar.c          | 292 ++++++++++++++++++++++++++++++-
 contrib/scalar/scalar.txt        |  12 ++
 contrib/scalar/t/t9099-scalar.sh |  27 +++
 t/t5003-archive-zip.sh           |  20 +++
 6 files changed, 421 insertions(+), 10 deletions(-)


base-commit: ddc35d833dd6f9e8946b09cecd3311b8aa18d295
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1128%2Fdscho%2Fscalar-diagnose-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1128/dscho/scalar-diagnose-v6
Pull-Request: https://github.com/gitgitgadget/git/pull/1128

Range-diff vs v5:

 1:  42e73fb0aac ! 1:  0005cfae31d archive: optionally add "virtual" files
     @@ Metadata
       ## Commit message ##
          archive: optionally add "virtual" files
      
     -    With the `--add-file-with-content=<path>:<content>` option, `git
     -    archive` now supports use cases where relatively trivial files need to
     -    be added that do not exist on disk.
     +    With the `--add-virtual-file=<path>:<content>` option, `git archive` now
     +    supports use cases where relatively trivial files need to be added that
     +    do not exist on disk.
      
          This will allow us to generate `.zip` files with generated content,
          without having to add said content to the object database and without
 2:  b5ebd61066a ! 2:  7eebcf27b45 archive --add-file-with-contents: allow paths containing colons
     @@ Metadata
      Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
      
       ## Commit message ##
     -    archive --add-file-with-contents: allow paths containing colons
     +    archive --add-virtual-file: allow paths containing colons
      
          By allowing the path to be enclosed in double-quotes, we can avoid
          the limitation that paths cannot contain colons.
     @@ t/t5003-archive-zip.sh: check_zip with_untracked
       test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
      +	if test_have_prereq FUNNYNAMES
      +	then
     -+		QUOTED=quoted:colon
     ++		PATHNAME=quoted:colon
      +	else
     -+		QUOTED=quoted
     ++		PATHNAME=quoted
      +	fi &&
       	git archive --format=zip >with_file_with_content.zip \
     -+		--add-virtual-file=\"$QUOTED\": \
     ++		--add-virtual-file=\"$PATHNAME\": \
       		--add-virtual-file=hello:world $EMPTY_TREE &&
       	test_when_finished "rm -rf tmp-unpack" &&
       	mkdir tmp-unpack && (
       		cd tmp-unpack &&
       		"$GIT_UNZIP" ../with_file_with_content.zip &&
       		test_path_is_file hello &&
     -+		test_path_is_file $QUOTED &&
     ++		test_path_is_file $PATHNAME &&
       		test world = $(cat hello)
       	)
       '
 3:  f1ba69c02d7 = 3:  ca83ddd5eed scalar: validate the optional enlistment argument
 4:  3fb90194744 = 4:  89c13a45e00 Implement `scalar diagnose`
 5:  2e645b08a9e = 5:  8ffbaad3086 scalar diagnose: include disk space information
 6:  0fa20d73750 = 6:  15cd7f17896 scalar: teach `diagnose` to gather packfile info
 7:  62e173b47cf = 7:  a4a74d5ef58 scalar: teach `diagnose` to gather loose objects information

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v6 1/7] archive: optionally add "virtual" files
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
@ 2022-05-21 15:08           ` Johannes Schindelin via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 2/7] archive --add-virtual-file: allow paths containing colons Johannes Schindelin via GitGitGadget
                             ` (5 subsequent siblings)
  6 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-21 15:08 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

With the `--add-virtual-file=<path>:<content>` option, `git archive` now
supports use cases where relatively trivial files need to be added that
do not exist on disk.

This will allow us to generate `.zip` files with generated content,
without having to add said content to the object database and without
having to write it out to disk.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 11 ++++++++
 archive.c                     | 53 +++++++++++++++++++++++++++++------
 t/t5003-archive-zip.sh        | 12 ++++++++
 3 files changed, 68 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index bc4e76a7834..893cb1075bf 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -61,6 +61,17 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 
+--add-virtual-file=<path>:<content>::
+	Add the specified contents to the archive.  Can be repeated to add
+	multiple files.  The path of the file in the archive is built
+	by concatenating the value for `--prefix` (if any) and the
+	basename of <file>.
++
+The `<path>` cannot contain any colon, the file mode is limited to
+a regular file, and the option may be subject to platform-dependent
+command-line limits. For non-trivial cases, write an untracked file
+and use `--add-file` instead.
+
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
 	as well (see <<ATTRIBUTES>>).
diff --git a/archive.c b/archive.c
index a3bbb091256..d20e16fa819 100644
--- a/archive.c
+++ b/archive.c
@@ -263,6 +263,7 @@ static int queue_or_write_archive_entry(const struct object_id *oid,
 struct extra_file_info {
 	char *base;
 	struct stat stat;
+	void *content;
 };
 
 int write_archive_entries(struct archiver_args *args,
@@ -337,7 +338,13 @@ int write_archive_entries(struct archiver_args *args,
 		strbuf_addstr(&path_in_archive, basename(path));
 
 		strbuf_reset(&content);
-		if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
+		if (info->content)
+			err = write_entry(args, &fake_oid, path_in_archive.buf,
+					  path_in_archive.len,
+					  info->stat.st_mode,
+					  info->content, info->stat.st_size);
+		else if (strbuf_read_file(&content, path,
+					  info->stat.st_size) < 0)
 			err = error_errno(_("could not read '%s'"), path);
 		else
 			err = write_entry(args, &fake_oid, path_in_archive.buf,
@@ -493,6 +500,7 @@ static void extra_file_info_clear(void *util, const char *str)
 {
 	struct extra_file_info *info = util;
 	free(info->base);
+	free(info->content);
 	free(info);
 }
 
@@ -514,14 +522,40 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 	if (!arg)
 		return -1;
 
-	path = prefix_filename(args->prefix, arg);
-	item = string_list_append_nodup(&args->extra_files, path);
-	item->util = info = xmalloc(sizeof(*info));
+	info = xmalloc(sizeof(*info));
 	info->base = xstrdup_or_null(base);
-	if (stat(path, &info->stat))
-		die(_("File not found: %s"), path);
-	if (!S_ISREG(info->stat.st_mode))
-		die(_("Not a regular file: %s"), path);
+
+	if (!strcmp(opt->long_name, "add-file")) {
+		path = prefix_filename(args->prefix, arg);
+		if (stat(path, &info->stat))
+			die(_("File not found: %s"), path);
+		if (!S_ISREG(info->stat.st_mode))
+			die(_("Not a regular file: %s"), path);
+		info->content = NULL; /* read the file later */
+	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
+		const char *colon = strchr(arg, ':');
+		char *p;
+
+		if (!colon)
+			die(_("missing colon: '%s'"), arg);
+
+		p = xstrndup(arg, colon - arg);
+		if (!args->prefix)
+			path = p;
+		else {
+			path = prefix_filename(args->prefix, p);
+			free(p);
+		}
+		memset(&info->stat, 0, sizeof(info->stat));
+		info->stat.st_mode = S_IFREG | 0644;
+		info->content = xstrdup(colon + 1);
+		info->stat.st_size = strlen(info->content);
+	} else {
+		BUG("add_file_cb() called for %s", opt->long_name);
+	}
+	item = string_list_append_nodup(&args->extra_files, path);
+	item->util = info;
+
 	return 0;
 }
 
@@ -554,6 +588,9 @@ static int parse_archive_args(int argc, const char **argv,
 		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
 		  N_("add untracked file to archive"), 0, add_file_cb,
 		  (intptr_t)&base },
+		{ OPTION_CALLBACK, 0, "add-virtual-file", args,
+		  N_("path:content"), N_("add untracked file to archive"), 0,
+		  add_file_cb, (intptr_t)&base },
 		OPT_STRING('o', "output", &output, N_("file"),
 			N_("write the archive to this file")),
 		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 1e6d18b140e..ebc26e89a9b 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -206,6 +206,18 @@ test_expect_success 'git archive --format=zip --add-file' '
 check_zip with_untracked
 check_added with_untracked untracked untracked
 
+test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
+	git archive --format=zip >with_file_with_content.zip \
+		--add-virtual-file=hello:world $EMPTY_TREE &&
+	test_when_finished "rm -rf tmp-unpack" &&
+	mkdir tmp-unpack && (
+		cd tmp-unpack &&
+		"$GIT_UNZIP" ../with_file_with_content.zip &&
+		test_path_is_file hello &&
+		test world = $(cat hello)
+	)
+'
+
 test_expect_success 'git archive --format=zip --add-file twice' '
 	echo untracked >untracked &&
 	git archive --format=zip --prefix=one/ --add-file=untracked \
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v6 2/7] archive --add-virtual-file: allow paths containing colons
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
@ 2022-05-21 15:08           ` Johannes Schindelin via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
                             ` (4 subsequent siblings)
  6 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-21 15:08 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

By allowing the path to be enclosed in double-quotes, we can avoid
the limitation that paths cannot contain colons.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 14 ++++++++++----
 archive.c                     | 30 ++++++++++++++++++++----------
 t/t5003-archive-zip.sh        |  8 ++++++++
 3 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index 893cb1075bf..54de945a84e 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -67,10 +67,16 @@ OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 +
-The `<path>` cannot contain any colon, the file mode is limited to
-a regular file, and the option may be subject to platform-dependent
-command-line limits. For non-trivial cases, write an untracked file
-and use `--add-file` instead.
+The `<path>` argument can start and end with a literal double-quote
+character; The contained file name is interpreted as a C-style string,
+i.e. the backslash is interpreted as escape character. The path must
+be quoted if it contains a colon, to avoid the colon from being
+misinterpreted as the separator between the path and the contents, or
+if the path begins or ends with a double-quote character.
++
+The file mode is limited to a regular file, and the option may be
+subject to platform-dependent command-line limits. For non-trivial
+cases, write an untracked file and use `--add-file` instead.
 
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
diff --git a/archive.c b/archive.c
index d20e16fa819..b7756b91200 100644
--- a/archive.c
+++ b/archive.c
@@ -9,6 +9,7 @@
 #include "parse-options.h"
 #include "unpack-trees.h"
 #include "dir.h"
+#include "quote.h"
 
 static char const * const archive_usage[] = {
 	N_("git archive [<options>] <tree-ish> [<path>...]"),
@@ -533,22 +534,31 @@ static int add_file_cb(const struct option *opt, const char *arg, int unset)
 			die(_("Not a regular file: %s"), path);
 		info->content = NULL; /* read the file later */
 	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
-		const char *colon = strchr(arg, ':');
-		char *p;
+		struct strbuf buf = STRBUF_INIT;
+		const char *p = arg;
+
+		if (*p != '"')
+			p = strchr(p, ':');
+		else if (unquote_c_style(&buf, p, &p) < 0)
+			die(_("unclosed quote: '%s'"), arg);
 
-		if (!colon)
+		if (!p || *p != ':')
 			die(_("missing colon: '%s'"), arg);
 
-		p = xstrndup(arg, colon - arg);
-		if (!args->prefix)
-			path = p;
-		else {
-			path = prefix_filename(args->prefix, p);
-			free(p);
+		if (p == arg)
+			die(_("empty file name: '%s'"), arg);
+
+		path = buf.len ?
+			strbuf_detach(&buf, NULL) : xstrndup(arg, p - arg);
+
+		if (args->prefix) {
+			char *save = path;
+			path = prefix_filename(args->prefix, path);
+			free(save);
 		}
 		memset(&info->stat, 0, sizeof(info->stat));
 		info->stat.st_mode = S_IFREG | 0644;
-		info->content = xstrdup(colon + 1);
+		info->content = xstrdup(p + 1);
 		info->stat.st_size = strlen(info->content);
 	} else {
 		BUG("add_file_cb() called for %s", opt->long_name);
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index ebc26e89a9b..3a5a052e8ce 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -207,13 +207,21 @@ check_zip with_untracked
 check_added with_untracked untracked untracked
 
 test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
+	if test_have_prereq FUNNYNAMES
+	then
+		PATHNAME=quoted:colon
+	else
+		PATHNAME=quoted
+	fi &&
 	git archive --format=zip >with_file_with_content.zip \
+		--add-virtual-file=\"$PATHNAME\": \
 		--add-virtual-file=hello:world $EMPTY_TREE &&
 	test_when_finished "rm -rf tmp-unpack" &&
 	mkdir tmp-unpack && (
 		cd tmp-unpack &&
 		"$GIT_UNZIP" ../with_file_with_content.zip &&
 		test_path_is_file hello &&
+		test_path_is_file $PATHNAME &&
 		test world = $(cat hello)
 	)
 '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v6 3/7] scalar: validate the optional enlistment argument
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 2/7] archive --add-virtual-file: allow paths containing colons Johannes Schindelin via GitGitGadget
@ 2022-05-21 15:08           ` Johannes Schindelin via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
                             ` (3 subsequent siblings)
  6 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-21 15:08 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `scalar` command needs a Scalar enlistment for many subcommands, and
looks in the current directory for such an enlistment (traversing the
parent directories until it finds one).

These is subcommands can also be called with an optional argument
specifying the enlistment. Here, too, we traverse parent directories as
needed, until we find an enlistment.

However, if the specified directory does not even exist, or is not a
directory, we should stop right there, with an error message.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 6 ++++--
 contrib/scalar/t/t9099-scalar.sh | 5 +++++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 1ce9c2b00e8..00dcd4b50ef 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -43,9 +43,11 @@ static void setup_enlistment_directory(int argc, const char **argv,
 		usage_with_options(usagestr, options);
 
 	/* find the worktree, determine its corresponding root */
-	if (argc == 1)
+	if (argc == 1) {
 		strbuf_add_absolute_path(&path, argv[0]);
-	else if (strbuf_getcwd(&path) < 0)
+		if (!is_directory(path.buf))
+			die(_("'%s' does not exist"), path.buf);
+	} else if (strbuf_getcwd(&path) < 0)
 		die(_("need a working directory"));
 
 	strbuf_trim_trailing_dir_sep(&path);
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 2e1502ad45e..9d83fdf25e8 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -85,4 +85,9 @@ test_expect_success 'scalar delete with enlistment' '
 	test_path_is_missing cloned
 '
 
+test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
+	! scalar run config cloned 2>err &&
+	grep "cloned. does not exist" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v6 4/7] Implement `scalar diagnose`
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
                             ` (2 preceding siblings ...)
  2022-05-21 15:08           ` [PATCH v6 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
@ 2022-05-21 15:08           ` Johannes Schindelin via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
                             ` (2 subsequent siblings)
  6 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-21 15:08 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Over the course of Scalar's development, it became obvious that there is
a need for a command that can gather all kinds of useful information
that can help identify the most typical problems with large
worktrees/repositories.

The `diagnose` command is the culmination of this hard-won knowledge: it
gathers the installed hooks, the config, a couple statistics describing
the data shape, among other pieces of information, and then wraps
everything up in a tidy, neat `.zip` archive.

Note: originally, Scalar was implemented in C# using the .NET API, where
we had the luxury of a comprehensive standard library that includes
basic functionality such as writing a `.zip` file. In the C version, we
lack such a commodity. Rather than introducing a dependency on, say,
libzip, we slightly abuse Git's `archive` machinery: we write out a
`.zip` of the empty try, augmented by a couple files that are added via
the `--add-file*` options. We are careful trying not to modify the
current repository in any way lest the very circumstances that required
`scalar diagnose` to be run are changed by the `diagnose` run itself.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 144 +++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt        |  12 +++
 contrib/scalar/t/t9099-scalar.sh |  14 +++
 3 files changed, 170 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 00dcd4b50ef..53213f9a3b9 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,6 +11,7 @@
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
+#include "archive.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -261,6 +262,47 @@ static int unregister_dir(void)
 	return res;
 }
 
+static int add_directory_to_archiver(struct strvec *archiver_args,
+					  const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir = opendir(at_root ? "." : path);
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	if (!dir)
+		return error_errno(_("could not open directory '%s'"), path);
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if (e->d_type == DT_REG)
+			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
+		else if (e->d_type != DT_DIR)
+			warning(_("skipping '%s', which is neither file nor "
+				  "directory"), buf.buf);
+		else if (recurse &&
+			 add_directory_to_archiver(archiver_args,
+						   buf.buf, recurse) < 0)
+			res = -1;
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -501,6 +543,107 @@ cleanup:
 	return res;
 }
 
+static int cmd_diagnose(int argc, const char **argv)
+{
+	struct option options[] = {
+		OPT_END(),
+	};
+	const char * const usage[] = {
+		N_("scalar diagnose [<enlistment>]"),
+		NULL
+	};
+	struct strbuf zip_path = STRBUF_INIT;
+	struct strvec archiver_args = STRVEC_INIT;
+	char **argv_copy = NULL;
+	int stdout_fd = -1, archiver_fd = -1;
+	time_t now = time(NULL);
+	struct tm tm;
+	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
+	int res = 0;
+
+	argc = parse_options(argc, argv, NULL, options,
+			     usage, 0);
+
+	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
+
+	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
+	strbuf_addftime(&zip_path,
+			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
+	strbuf_addstr(&zip_path, ".zip");
+	switch (safe_create_leading_directories(zip_path.buf)) {
+	case SCLD_EXISTS:
+	case SCLD_OK:
+		break;
+	default:
+		error_errno(_("could not create directory for '%s'"),
+			    zip_path.buf);
+		goto diagnose_cleanup;
+	}
+	stdout_fd = dup(1);
+	if (stdout_fd < 0) {
+		res = error_errno(_("could not duplicate stdout"));
+		goto diagnose_cleanup;
+	}
+
+	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
+		res = error_errno(_("could not redirect output"));
+		goto diagnose_cleanup;
+	}
+
+	init_zip_archiver();
+	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	write_or_die(stdout_fd, buf.buf, buf.len);
+	strvec_pushf(&archiver_args,
+		     "--add-virtual-file=diagnostics.log:%.*s",
+		     (int)buf.len, buf.buf);
+
+	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	strvec_pushl(&archiver_args, "--prefix=",
+		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
+
+	/* `write_archive()` modifies the `argv` passed to it. Let it. */
+	argv_copy = xmemdupz(archiver_args.v,
+			     sizeof(char *) * archiver_args.nr);
+	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
+			    the_repository, NULL, 0);
+	if (res) {
+		error(_("failed to write archive"));
+		goto diagnose_cleanup;
+	}
+
+	if (!res)
+		fprintf(stderr, "\n"
+		       "Diagnostics complete.\n"
+		       "All of the gathered info is captured in '%s'\n",
+		       zip_path.buf);
+
+diagnose_cleanup:
+	if (archiver_fd >= 0) {
+		close(1);
+		dup2(stdout_fd, 1);
+	}
+	free(argv_copy);
+	strvec_clear(&archiver_args);
+	strbuf_release(&zip_path);
+	strbuf_release(&path);
+	strbuf_release(&buf);
+
+	return res;
+}
+
 static int cmd_list(int argc, const char **argv)
 {
 	if (argc != 1)
@@ -802,6 +945,7 @@ static struct {
 	{ "reconfigure", cmd_reconfigure },
 	{ "delete", cmd_delete },
 	{ "version", cmd_version },
+	{ "diagnose", cmd_diagnose },
 	{ NULL, NULL},
 };
 
diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
index f416d637289..22583fe046e 100644
--- a/contrib/scalar/scalar.txt
+++ b/contrib/scalar/scalar.txt
@@ -14,6 +14,7 @@ scalar register [<enlistment>]
 scalar unregister [<enlistment>]
 scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
 scalar reconfigure [ --all | <enlistment> ]
+scalar diagnose [<enlistment>]
 scalar delete <enlistment>
 
 DESCRIPTION
@@ -129,6 +130,17 @@ reconfigure the enlistment.
 With the `--all` option, all enlistments currently registered with Scalar
 will be reconfigured. Use this option after each Scalar upgrade.
 
+Diagnose
+~~~~~~~~
+
+diagnose [<enlistment>]::
+    When reporting issues with Scalar, it is often helpful to provide the
+    information gathered by this command, including logs and certain
+    statistics describing the data shape of the current enlistment.
++
+The output of this command is a `.zip` file that is written into
+a directory adjacent to the worktree in the `src` directory.
+
 Delete
 ~~~~~~
 
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 9d83fdf25e8..6802d317258 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -90,4 +90,18 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 	grep "cloned. does not exist" err
 '
 
+SQ="'"
+test_expect_success UNZIP 'scalar diagnose' '
+	scalar clone "file://$(pwd)" cloned --single-branch &&
+	scalar diagnose cloned >out 2>err &&
+	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
+	zip_path=$(cat zip_path) &&
+	test -n "$zip_path" &&
+	unzip -v "$zip_path" &&
+	folder=${zip_path%.zip} &&
+	test_path_is_missing "$folder" &&
+	unzip -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v6 5/7] scalar diagnose: include disk space information
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
                             ` (3 preceding siblings ...)
  2022-05-21 15:08           ` [PATCH v6 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
@ 2022-05-21 15:08           ` Johannes Schindelin via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
  6 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2022-05-21 15:08 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When analyzing problems with large worktrees/repositories, it is useful
to know how close to a "full disk" situation Scalar/Git operates. Let's
include this information.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 53 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  1 +
 2 files changed, 54 insertions(+)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 53213f9a3b9..0a9e25a57f8 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -303,6 +303,58 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
 	return res;
 }
 
+#ifndef WIN32
+#include <sys/statvfs.h>
+#endif
+
+static int get_disk_info(struct strbuf *out)
+{
+#ifdef WIN32
+	struct strbuf buf = STRBUF_INIT;
+	char volume_name[MAX_PATH], fs_name[MAX_PATH];
+	DWORD serial_number, component_length, flags;
+	ULARGE_INTEGER avail2caller, total, avail;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (!GetDiskFreeSpaceExA(buf.buf, &avail2caller, &total, &avail)) {
+		error(_("could not determine free disk size for '%s'"),
+		      buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_setlen(&buf, offset_1st_component(buf.buf));
+	if (!GetVolumeInformationA(buf.buf, volume_name, sizeof(volume_name),
+				   &serial_number, &component_length, &flags,
+				   fs_name, sizeof(fs_name))) {
+		error(_("could not get info for '%s'"), buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, avail2caller.QuadPart);
+	strbuf_addch(out, '\n');
+	strbuf_release(&buf);
+#else
+	struct strbuf buf = STRBUF_INIT;
+	struct statvfs stat;
+
+	strbuf_realpath(&buf, ".", 1);
+	if (statvfs(buf.buf, &stat) < 0) {
+		error_errno(_("could not determine free disk size for '%s'"),
+			    buf.buf);
+		strbuf_release(&buf);
+		return -1;
+	}
+
+	strbuf_addf(out, "Available space on '%s': ", buf.buf);
+	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
+	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
+	strbuf_release(&buf);
+#endif
+	return 0;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -599,6 +651,7 @@ static int cmd_diagnose(int argc, const char **argv)
 	get_version_info(&buf, 1);
 
 	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	get_disk_info(&buf);
 	write_or_die(stdout_fd, buf.buf, buf.len);
 	strvec_pushf(&archiver_args,
 		     "--add-virtual-file=diagnostics.log:%.*s",
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 6802d317258..934b2485d91 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -94,6 +94,7 @@ SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	scalar diagnose cloned >out 2>err &&
+	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
 	zip_path=$(cat zip_path) &&
 	test -n "$zip_path" &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v6 6/7] scalar: teach `diagnose` to gather packfile info
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
                             ` (4 preceding siblings ...)
  2022-05-21 15:08           ` [PATCH v6 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
@ 2022-05-21 15:08           ` Matthew John Cheetham via GitGitGadget
  2022-05-21 15:08           ` [PATCH v6 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
  6 siblings, 0 replies; 109+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-05-21 15:08 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

It's helpful to see if there are other crud files in the pack
directory. Let's teach the `scalar diagnose` command to gather
file size information about pack files.

While at it, also enumerate the pack files in the alternate
object directories, if any are registered.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 30 ++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  6 +++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 0a9e25a57f8..d302c27e114 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "help.h"
 #include "archive.h"
+#include "object-store.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -595,6 +596,29 @@ cleanup:
 	return res;
 }
 
+static void dir_file_stats_objects(const char *full_path, size_t full_path_len,
+				   const char *file_name, void *data)
+{
+	struct strbuf *buf = data;
+	struct stat st;
+
+	if (!stat(full_path, &st))
+		strbuf_addf(buf, "%-70s %16" PRIuMAX "\n", file_name,
+			    (uintmax_t)st.st_size);
+}
+
+static int dir_file_stats(struct object_directory *object_dir, void *data)
+{
+	struct strbuf *buf = data;
+
+	strbuf_addf(buf, "Contents of %s:\n", object_dir->path);
+
+	for_each_file_in_pack_dir(object_dir->path, dir_file_stats_objects,
+				  data);
+
+	return 0;
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -657,6 +681,12 @@ static int cmd_diagnose(int argc, const char **argv)
 		     "--add-virtual-file=diagnostics.log:%.*s",
 		     (int)buf.len, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=packs-local.txt:");
+	dir_file_stats(the_repository->objects->odb, &buf);
+	foreach_alt_odb(dir_file_stats, &buf);
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 934b2485d91..3dd5650cceb 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -93,6 +93,8 @@ test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 SQ="'"
 test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
+	git repack &&
+	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
 	scalar diagnose cloned >out 2>err &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
@@ -102,7 +104,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	folder=${zip_path%.zip} &&
 	test_path_is_missing "$folder" &&
 	unzip -p "$zip_path" diagnostics.log >out &&
-	test_file_not_empty out
+	test_file_not_empty out &&
+	unzip -p "$zip_path" packs-local.txt >out &&
+	grep "$(pwd)/.git/objects" out
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v6 7/7] scalar: teach `diagnose` to gather loose objects information
  2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
                             ` (5 preceding siblings ...)
  2022-05-21 15:08           ` [PATCH v6 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
@ 2022-05-21 15:08           ` Matthew John Cheetham via GitGitGadget
  6 siblings, 0 replies; 109+ messages in thread
From: Matthew John Cheetham via GitGitGadget @ 2022-05-21 15:08 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren,
	rsbecker, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Matthew John Cheetham

From: Matthew John Cheetham <mjcheetham@outlook.com>

When operating at the scale that Scalar wants to support, certain data
shapes are more likely to cause undesirable performance issues, such as
large numbers of loose objects.

By including statistics about this, `scalar diagnose` now makes it
easier to identify such scenarios.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 contrib/scalar/scalar.c          | 59 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  5 ++-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index d302c27e114..0c278681758 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -619,6 +619,60 @@ static int dir_file_stats(struct object_directory *object_dir, void *data)
 	return 0;
 }
 
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -687,6 +741,11 @@ static int cmd_diagnose(int argc, const char **argv)
 	foreach_alt_odb(dir_file_stats, &buf);
 	strvec_push(&archiver_args, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
+	loose_objs_stats(&buf, ".git/objects");
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 3dd5650cceb..72023a1ca1d 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -95,6 +95,7 @@ test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	git repack &&
 	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
+	test_commit -C cloned/src loose &&
 	scalar diagnose cloned >out 2>err &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
@@ -106,7 +107,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	unzip -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
 	unzip -p "$zip_path" packs-local.txt >out &&
-	grep "$(pwd)/.git/objects" out
+	grep "$(pwd)/.git/objects" out &&
+	unzip -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [1-9]" out
 '
 
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-21  9:54                 ` Ævar Arnfjörð Bjarmason
@ 2022-05-22  5:50                   ` Junio C Hamano
  2022-05-24 12:25                     ` Johannes Schindelin
  0 siblings, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2022-05-22  5:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> Scalar is not (yet?) a Git command.
>
> "test-tool" isn't "git" either, so I think this argument is a
> non-starter.
>
> As the documentation for "test_must_fail" notes the distinction is
> whether something is "system-supplied". I.e. we're not going to test
> whether "grep" segfaults, but we should test our own code to see if it
> segfaults.
>
> The scalar code is code we ship and test, so we should use the helper
> that doesn't hide a segfault.
>
> I don't understand why you wouldn't think that's the obvious fix here,
> adding "scalar" to that whitelist is a one-line fix, and clearly yields
> a more useful end result than a test silently hiding segfaults.

FWIW, I don't, either.


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-22  5:50                   ` Junio C Hamano
@ 2022-05-24 12:25                     ` Johannes Schindelin
  2022-05-24 18:11                       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2022-05-24 12:25 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason,
	Johannes Schindelin via GitGitGadget, git, René Scharfe,
	Taylor Blau, Derrick Stolee, Elijah Newren

[-- Attachment #1: Type: text/plain, Size: 1252 bytes --]

Hi Junio,

On Sat, 21 May 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
> >> Scalar is not (yet?) a Git command.
> >
> > "test-tool" isn't "git" either, so I think this argument is a
> > non-starter.
> >
> > As the documentation for "test_must_fail" notes the distinction is
> > whether something is "system-supplied". I.e. we're not going to test
> > whether "grep" segfaults, but we should test our own code to see if it
> > segfaults.
> >
> > The scalar code is code we ship and test, so we should use the helper
> > that doesn't hide a segfault.
> >
> > I don't understand why you wouldn't think that's the obvious fix here,
> > adding "scalar" to that whitelist is a one-line fix, and clearly yields
> > a more useful end result than a test silently hiding segfaults.
>
> FWIW, I don't, either.

Because we are still talking about code that lives as much encapsulated
inside `contrib/scalar/` as possible.

The `! scalar` call is in `contrib/scalar/t/t9099-scalar.sh`.

To make it work with Git's test suite, you would have to bleed an
implementation detail of something inside `contrib/` into
`t/test-lib-functions.sh`.

Not what we want, at this stage.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-24 12:25                     ` Johannes Schindelin
@ 2022-05-24 18:11                       ` Ævar Arnfjörð Bjarmason
  2022-05-24 19:29                         ` Junio C Hamano
  0 siblings, 1 reply; 109+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-24 18:11 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Johannes Schindelin via GitGitGadget, git,
	René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren


On Tue, May 24 2022, Johannes Schindelin wrote:

> Hi Junio,
>
> On Sat, 21 May 2022, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>
>> >> Scalar is not (yet?) a Git command.
>> >
>> > "test-tool" isn't "git" either, so I think this argument is a
>> > non-starter.
>> >
>> > As the documentation for "test_must_fail" notes the distinction is
>> > whether something is "system-supplied". I.e. we're not going to test
>> > whether "grep" segfaults, but we should test our own code to see if it
>> > segfaults.
>> >
>> > The scalar code is code we ship and test, so we should use the helper
>> > that doesn't hide a segfault.
>> >
>> > I don't understand why you wouldn't think that's the obvious fix here,
>> > adding "scalar" to that whitelist is a one-line fix, and clearly yields
>> > a more useful end result than a test silently hiding segfaults.
>>
>> FWIW, I don't, either.
>
> Because we are still talking about code that lives as much encapsulated
> inside `contrib/scalar/` as possible.
>
> The `! scalar` call is in `contrib/scalar/t/t9099-scalar.sh`.
>
> To make it work with Git's test suite, you would have to bleed an
> implementation detail of something inside `contrib/` into
> `t/test-lib-functions.sh`.

The "scalar" command is already built by the top-level Makefile, so I
don't think the distinction you're trying to maintain here even exists
in practice.

I.e. if we ran with this strict reasoning then surely "scalar" belongs
on there just as much as "test-tool" does.

Both are built by our main build process, and thus should have
corresponding adjustments in our main test code, just as is already the
case for both "git" and "test-tool".

But even if that wasn't the case I'd still be of the view that we should
add "scalar" to that list.

It's just a matter of potential time sinks in the future. If we
introduce a hidden segfault in the scalar code and don't notice for some
time because we're using that test pattern that's going to suck, and
likely to waste a lot of time. We might even ship a broken command to
users.

Whereas having "scalar" on that list is going to be a relatively easy
matter of grepping and doing some boilerplate changes if and when we
ever "git rm" it entirely, or "promote it" from contrib or whatever.

I also think that just getting rid of that whitelist entirely is an
acceptable solution. Perhaps it's just being overzealous in forbidding
everything except "git", we should still not use it for the likes of
"grep", but we could just leave that to the documentation.

But I suspect Junio would disagree with that, so in lieu of that ...

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] scalar: validate the optional enlistment argument
  2022-05-24 18:11                       ` Ævar Arnfjörð Bjarmason
@ 2022-05-24 19:29                         ` Junio C Hamano
  0 siblings, 0 replies; 109+ messages in thread
From: Junio C Hamano @ 2022-05-24 19:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	René Scharfe, Taylor Blau, Derrick Stolee, Elijah Newren

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Both are built by our main build process, and thus should have
> corresponding adjustments in our main test code, just as is already the
> case for both "git" and "test-tool".
>
> But even if that wasn't the case I'd still be of the view that we should
> add "scalar" to that list.
>
> It's just a matter of potential time sinks in the future. If we
> introduce a hidden segfault in the scalar code and don't notice for some
> time because we're using that test pattern that's going to suck, and
> likely to waste a lot of time. We might even ship a broken command to
> users.
>
> Whereas having "scalar" on that list is going to be a relatively easy
> matter of grepping and doing some boilerplate changes if and when we
> ever "git rm" it entirely, or "promote it" from contrib or whatever.

In addition, it already is an actual time sink that causes us send a
lot more bytes back and forth than the number of bytes necessary to
send a reroll that adds one liner to the same step.

> I also think that just getting rid of that whitelist entirely is an
> acceptable solution. Perhaps it's just being overzealous in forbidding
> everything except "git", we should still not use it for the likes of
> "grep", but we could just leave that to the documentation.

It indeed is tempting entry into a slippery slope, and I'd see it as
a change bigger than we could comfortably make as a "while at it"
change.

We can stop arguing and instead send in a reroll that squashes in
something like this, which shouldn't be controversial, I would say.

 t/test-lib-functions.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git i/t/test-lib-functions.sh w/t/test-lib-functions.sh
index 93c03380d4..8899eaabed 100644
--- i/t/test-lib-functions.sh
+++ w/t/test-lib-functions.sh
@@ -1106,7 +1106,7 @@ test_must_fail_acceptable () {
 	fi
 
 	case "$1" in
-	git|__git*|test-tool|test_terminal)
+	git|__git*|scalar|test-tool|test_terminal)
 		return 0
 		;;
 	*)




^ permalink raw reply	[flat|nested] 109+ messages in thread

end of thread, other threads:[~2022-05-24 19:29 UTC | newest]

Thread overview: 109+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
2022-01-26  8:41 ` [PATCH 1/5] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-01-26  9:34   ` René Scharfe
2022-01-26 22:20     ` Taylor Blau
2022-02-06 21:34       ` Johannes Schindelin
2022-01-27 19:38   ` Elijah Newren
2022-01-26  8:41 ` [PATCH 2/5] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-01-26  8:41 ` [PATCH 3/5] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-01-26 22:43   ` Taylor Blau
2022-01-27 15:14     ` Derrick Stolee
2022-02-06 21:38       ` Johannes Schindelin
2022-01-26  8:41 ` [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-01-26 22:50   ` Taylor Blau
2022-01-27 15:17     ` Derrick Stolee
2022-01-27 18:59   ` Elijah Newren
2022-02-06 21:25     ` Johannes Schindelin
2022-01-26  8:41 ` [PATCH 5/5] scalar diagnose: show a spinner while staging content Johannes Schindelin via GitGitGadget
2022-01-27 15:19 ` [PATCH 0/5] scalar: implement the subcommand "diagnose" Derrick Stolee
2022-02-06 21:13   ` Johannes Schindelin
2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 1/6] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-02-07 19:55     ` René Scharfe
2022-02-07 23:30       ` Junio C Hamano
2022-02-08 13:12         ` Johannes Schindelin
2022-02-08 17:44           ` Junio C Hamano
2022-02-08 20:58             ` René Scharfe
2022-02-09 22:48               ` Junio C Hamano
2022-02-10 19:10                 ` René Scharfe
2022-02-10 19:23                   ` Junio C Hamano
2022-02-11 19:16                     ` René Scharfe
2022-02-11 21:27                       ` Junio C Hamano
2022-02-12  9:12                         ` René Scharfe
2022-02-13  6:25                           ` Junio C Hamano
2022-02-13  9:02                             ` René Scharfe
2022-02-14 17:22                               ` Junio C Hamano
2022-02-08 12:54       ` Johannes Schindelin
2022-02-06 22:39   ` [PATCH v2 2/6] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 3/6] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-02-07 19:55     ` René Scharfe
2022-02-08 12:08       ` Johannes Schindelin
2022-02-06 22:39   ` [PATCH v2 4/6] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 5/6] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 6/6] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-07  2:06       ` Elijah Newren
2022-05-09 21:04         ` Johannes Schindelin
2022-05-04 15:25     ` [PATCH v3 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-07  2:23     ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Elijah Newren
2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
2022-05-10 19:26       ` [PATCH v4 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-10 21:48         ` Junio C Hamano
2022-05-10 22:06           ` rsbecker
2022-05-10 23:21             ` Junio C Hamano
2022-05-11 16:14               ` René Scharfe
2022-05-11 19:27                 ` Junio C Hamano
2022-05-12 16:16                   ` René Scharfe
2022-05-12 18:15                     ` Junio C Hamano
2022-05-12 21:31                       ` Junio C Hamano
2022-05-14  7:06                         ` René Scharfe
2022-05-12 22:31           ` [PATCH] fixup! " Junio C Hamano
2022-05-10 19:26       ` [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-10 21:56         ` Junio C Hamano
2022-05-10 22:23           ` rsbecker
2022-05-19 18:12             ` Johannes Schindelin
2022-05-19 18:09           ` Johannes Schindelin
2022-05-19 18:44             ` Junio C Hamano
2022-05-10 19:27       ` [PATCH v4 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-17 14:51         ` Ævar Arnfjörð Bjarmason
2022-05-18 17:35           ` Junio C Hamano
2022-05-20  7:30             ` Ævar Arnfjörð Bjarmason
2022-05-20 15:55               ` Johannes Schindelin
2022-05-21  9:54                 ` Ævar Arnfjörð Bjarmason
2022-05-22  5:50                   ` Junio C Hamano
2022-05-24 12:25                     ` Johannes Schindelin
2022-05-24 18:11                       ` Ævar Arnfjörð Bjarmason
2022-05-24 19:29                         ` Junio C Hamano
2022-05-10 19:27       ` [PATCH v4 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-17 14:53         ` Ævar Arnfjörð Bjarmason
2022-05-10 19:27       ` [PATCH v4 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-10 19:27       ` [PATCH v4 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-10 19:27       ` [PATCH v4 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-17 15:03       ` [PATCH v4 0/7] scalar: implement the subcommand "diagnose" Ævar Arnfjörð Bjarmason
2022-05-17 15:28         ` rsbecker
2022-05-19 18:17           ` Johannes Schindelin
2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
2022-05-19 18:17         ` [PATCH v5 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-20 14:41           ` René Scharfe
2022-05-20 16:21             ` Junio C Hamano
2022-05-19 18:17         ` [PATCH v5 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-19 18:17         ` [PATCH v5 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-19 19:23         ` [PATCH v5 0/7] scalar: implement the subcommand "diagnose" Junio C Hamano
2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 2/7] archive --add-virtual-file: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).