git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Cc: Matthew John Cheetham <mjcheetham@outlook.com>
Subject: [PATCH v6+ 7/7] scalar: teach `diagnose` to gather loose objects information
Date: Sat, 28 May 2022 16:11:18 -0700	[thread overview]
Message-ID: <20220528231118.3504387-8-gitster@pobox.com> (raw)
In-Reply-To: <20220528231118.3504387-1-gitster@pobox.com>

From: Matthew John Cheetham <mjcheetham@outlook.com>

When operating at the scale that Scalar wants to support, certain data
shapes are more likely to cause undesirable performance issues, such as
large numbers of loose objects.

By including statistics about this, `scalar diagnose` now makes it
easier to identify such scenarios.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 contrib/scalar/scalar.c          | 59 ++++++++++++++++++++++++++++++++
 contrib/scalar/t/t9099-scalar.sh |  5 ++-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index f745519038..28176914e5 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -618,6 +618,60 @@ static int dir_file_stats(struct object_directory *object_dir, void *data)
 	return 0;
 }
 
+static int count_files(char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
+			count++;
+
+	closedir(dir);
+	return count;
+}
+
+static void loose_objs_stats(struct strbuf *buf, const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int count;
+	int total = 0;
+	unsigned char c;
+	struct strbuf count_path = STRBUF_INIT;
+	size_t base_path_len;
+
+	if (!dir)
+		return;
+
+	strbuf_addstr(buf, "Object directory stats for ");
+	strbuf_add_absolute_path(buf, path);
+	strbuf_addstr(buf, ":\n");
+
+	strbuf_add_absolute_path(&count_path, path);
+	strbuf_addch(&count_path, '/');
+	base_path_len = count_path.len;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_dot_or_dotdot(e->d_name) &&
+		    e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
+		    !hex_to_bytes(&c, e->d_name, 1)) {
+			strbuf_setlen(&count_path, base_path_len);
+			strbuf_addstr(&count_path, e->d_name);
+			total += (count = count_files(count_path.buf));
+			strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
+		}
+
+	strbuf_addf(buf, "Total: %d loose objects", total);
+
+	strbuf_release(&count_path);
+	closedir(dir);
+}
+
 static int cmd_diagnose(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -686,6 +740,11 @@ static int cmd_diagnose(int argc, const char **argv)
 	foreach_alt_odb(dir_file_stats, &buf);
 	strvec_push(&archiver_args, buf.buf);
 
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "--add-virtual-file=objects-local.txt:");
+	loose_objs_stats(&buf, ".git/objects");
+	strvec_push(&archiver_args, buf.buf);
+
 	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
 	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index 2603e2278f..10b1172a8a 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -103,6 +103,7 @@ test_expect_success UNZIP 'scalar diagnose' '
 	scalar clone "file://$(pwd)" cloned --single-branch &&
 	git repack &&
 	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
+	test_commit -C cloned/src loose &&
 	scalar diagnose cloned >out 2>err &&
 	grep "Available space" out &&
 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
@@ -114,7 +115,9 @@ test_expect_success UNZIP 'scalar diagnose' '
 	unzip -p "$zip_path" diagnostics.log >out &&
 	test_file_not_empty out &&
 	unzip -p "$zip_path" packs-local.txt >out &&
-	grep "$(pwd)/.git/objects" out
+	grep "$(pwd)/.git/objects" out &&
+	unzip -p "$zip_path" objects-local.txt >out &&
+	grep "^Total: [1-9]" out
 '
 
 test_done
-- 
2.36.1-385-g60203f3fdb


  parent reply	other threads:[~2022-05-28 23:13 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
2022-01-26  8:41 ` [PATCH 1/5] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-01-26  9:34   ` René Scharfe
2022-01-26 22:20     ` Taylor Blau
2022-02-06 21:34       ` Johannes Schindelin
2022-01-27 19:38   ` Elijah Newren
2022-01-26  8:41 ` [PATCH 2/5] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-01-26  8:41 ` [PATCH 3/5] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-01-26 22:43   ` Taylor Blau
2022-01-27 15:14     ` Derrick Stolee
2022-02-06 21:38       ` Johannes Schindelin
2022-01-26  8:41 ` [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-01-26 22:50   ` Taylor Blau
2022-01-27 15:17     ` Derrick Stolee
2022-01-27 18:59   ` Elijah Newren
2022-02-06 21:25     ` Johannes Schindelin
2022-01-26  8:41 ` [PATCH 5/5] scalar diagnose: show a spinner while staging content Johannes Schindelin via GitGitGadget
2022-01-27 15:19 ` [PATCH 0/5] scalar: implement the subcommand "diagnose" Derrick Stolee
2022-02-06 21:13   ` Johannes Schindelin
2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 1/6] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-02-07 19:55     ` René Scharfe
2022-02-07 23:30       ` Junio C Hamano
2022-02-08 13:12         ` Johannes Schindelin
2022-02-08 17:44           ` Junio C Hamano
2022-02-08 20:58             ` René Scharfe
2022-02-09 22:48               ` Junio C Hamano
2022-02-10 19:10                 ` René Scharfe
2022-02-10 19:23                   ` Junio C Hamano
2022-02-11 19:16                     ` René Scharfe
2022-02-11 21:27                       ` Junio C Hamano
2022-02-12  9:12                         ` René Scharfe
2022-02-13  6:25                           ` Junio C Hamano
2022-02-13  9:02                             ` René Scharfe
2022-02-14 17:22                               ` Junio C Hamano
2022-02-08 12:54       ` Johannes Schindelin
2022-02-06 22:39   ` [PATCH v2 2/6] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 3/6] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-02-07 19:55     ` René Scharfe
2022-02-08 12:08       ` Johannes Schindelin
2022-02-06 22:39   ` [PATCH v2 4/6] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 5/6] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 6/6] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-07  2:06       ` Elijah Newren
2022-05-09 21:04         ` Johannes Schindelin
2022-05-04 15:25     ` [PATCH v3 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-07  2:23     ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Elijah Newren
2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
2022-05-10 19:26       ` [PATCH v4 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-10 21:48         ` Junio C Hamano
2022-05-10 22:06           ` rsbecker
2022-05-10 23:21             ` Junio C Hamano
2022-05-11 16:14               ` René Scharfe
2022-05-11 19:27                 ` Junio C Hamano
2022-05-12 16:16                   ` René Scharfe
2022-05-12 18:15                     ` Junio C Hamano
2022-05-12 21:31                       ` Junio C Hamano
2022-05-14  7:06                         ` René Scharfe
2022-05-12 22:31           ` [PATCH] fixup! " Junio C Hamano
2022-05-10 19:26       ` [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-10 21:56         ` Junio C Hamano
2022-05-10 22:23           ` rsbecker
2022-05-19 18:12             ` Johannes Schindelin
2022-05-19 18:09           ` Johannes Schindelin
2022-05-19 18:44             ` Junio C Hamano
2022-05-10 19:27       ` [PATCH v4 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-17 14:51         ` Ævar Arnfjörð Bjarmason
2022-05-18 17:35           ` Junio C Hamano
2022-05-20  7:30             ` Ævar Arnfjörð Bjarmason
2022-05-20 15:55               ` Johannes Schindelin
2022-05-21  9:54                 ` Ævar Arnfjörð Bjarmason
2022-05-22  5:50                   ` Junio C Hamano
2022-05-24 12:25                     ` Johannes Schindelin
2022-05-24 18:11                       ` Ævar Arnfjörð Bjarmason
2022-05-24 19:29                         ` Junio C Hamano
2022-05-25 10:31                           ` Johannes Schindelin
2022-05-10 19:27       ` [PATCH v4 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-17 14:53         ` Ævar Arnfjörð Bjarmason
2022-05-10 19:27       ` [PATCH v4 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-10 19:27       ` [PATCH v4 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-10 19:27       ` [PATCH v4 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-17 15:03       ` [PATCH v4 0/7] scalar: implement the subcommand "diagnose" Ævar Arnfjörð Bjarmason
2022-05-17 15:28         ` rsbecker
2022-05-19 18:17           ` Johannes Schindelin
2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
2022-05-19 18:17         ` [PATCH v5 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-20 14:41           ` René Scharfe
2022-05-20 16:21             ` Junio C Hamano
2022-05-19 18:17         ` [PATCH v5 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-19 18:17         ` [PATCH v5 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-19 19:23         ` [PATCH v5 0/7] scalar: implement the subcommand "diagnose" Junio C Hamano
2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-25 21:11             ` Junio C Hamano
2022-05-26  9:09               ` René Scharfe
2022-05-26 17:10                 ` Junio C Hamano
2022-05-26 18:57                   ` René Scharfe
2022-05-26 20:16                     ` Junio C Hamano
2022-05-27 17:02                       ` René Scharfe
2022-05-27 19:01                         ` Junio C Hamano
2022-05-28  6:57                           ` René Scharfe
2022-05-21 15:08           ` [PATCH v6 2/7] archive --add-virtual-file: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-25 20:22             ` Junio C Hamano
2022-05-25 21:42               ` Junio C Hamano
2022-05-25 22:34                 ` Junio C Hamano
2022-05-21 15:08           ` [PATCH v6 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-28 23:11           ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 1/7] archive: optionally add "virtual" files Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 2/7] archive --add-virtual-file: allow paths containing colons Junio C Hamano
2022-06-15 18:16               ` Adam Dinwoodie
2022-06-15 20:00                 ` Junio C Hamano
2022-06-15 21:36                   ` Adam Dinwoodie
2022-06-18 20:19                     ` Johannes Schindelin
2022-06-18 22:05                       ` Junio C Hamano
2022-06-20  9:41                       ` Adam Dinwoodie
2022-05-28 23:11             ` [PATCH v6+ 3/7] scalar: validate the optional enlistment argument Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 4/7] scalar: implement `scalar diagnose` Junio C Hamano
2022-06-10  2:08               ` Ævar Arnfjörð Bjarmason
2022-06-10 16:44                 ` Junio C Hamano
2022-06-10 17:35                   ` Ævar Arnfjörð Bjarmason
2022-05-28 23:11             ` [PATCH v6+ 5/7] scalar diagnose: include disk space information Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 6/7] scalar: teach `diagnose` to gather packfile info Junio C Hamano
2022-05-28 23:11             ` Junio C Hamano [this message]
2022-05-30 10:12             ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Johannes Schindelin
2022-05-30 17:37               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220528231118.3504387-8-gitster@pobox.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=mjcheetham@outlook.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).