git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Matthew John Cheetham via GitGitGadget <gitgitgadget@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Johannes Schindelin <johannes.schindelin@gmx.de>,
	Matthew John Cheetham <mjcheetham@outlook.com>
Subject: Re: [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information
Date: Thu, 27 Jan 2022 10:59:08 -0800	[thread overview]
Message-ID: <CABPp-BHeLzinXkX3WgqBNYntJwY_ZAm5D7VdOR7KQahvLOuV=w@mail.gmail.com> (raw)
In-Reply-To: <213f2c94b73f90fc758c2e3872804cf640cb2005.1643186507.git.gitgitgadget@gmail.com>

On Wed, Jan 26, 2022 at 3:37 PM Matthew John Cheetham via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Matthew John Cheetham <mjcheetham@outlook.com>
>
> When operating at the scale that Scalar wants to support, certain data
> shapes are more likely to cause undesirable performance issues, such as
> large numbers or large sizes of loose objects.

Makes sense.

> By including statistics about this, `scalar diagnose` now makes it
> easier to identify such scenarios.
>
> Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
> ---
>  contrib/scalar/scalar.c          | 60 ++++++++++++++++++++++++++++++++
>  contrib/scalar/t/t9099-scalar.sh |  2 ++
>  2 files changed, 62 insertions(+)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index 690933ffdf3..c0ad4948215 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -686,6 +686,60 @@ static void dir_file_stats(struct strbuf *buf, const char *path)
>         closedir(dir);
>  }
>
> +static int count_files(char *path)
> +{
> +       DIR *dir = opendir(path);
> +       struct dirent *e;
> +       int count = 0;
> +
> +       if (!dir)
> +               return 0;
> +
> +       while ((e = readdir(dir)) != NULL)
> +               if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
> +                       count++;
> +
> +       closedir(dir);
> +       return count;
> +}
> +
> +static void loose_objs_stats(struct strbuf *buf, const char *path)
> +{
> +       DIR *dir = opendir(path);
> +       struct dirent *e;
> +       int count;
> +       int total = 0;
> +       unsigned char c;
> +       struct strbuf count_path = STRBUF_INIT;
> +       size_t base_path_len;
> +
> +       if (!dir)
> +               return;
> +
> +       strbuf_addstr(buf, "Object directory stats for ");
> +       strbuf_add_absolute_path(buf, path);
> +       strbuf_addstr(buf, ":\n");
> +
> +       strbuf_add_absolute_path(&count_path, path);
> +       strbuf_addch(&count_path, '/');
> +       base_path_len = count_path.len;
> +
> +       while ((e = readdir(dir)) != NULL)
> +               if (!is_dot_or_dotdot(e->d_name) &&
> +                   e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
> +                   !hex_to_bytes(&c, e->d_name, 1)) {

You only recurse into directories, ignoring individual files.

> +                       strbuf_setlen(&count_path, base_path_len);
> +                       strbuf_addstr(&count_path, e->d_name);
> +                       total += (count = count_files(count_path.buf));
> +                       strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);

This shows the number of files within a directory.

> +               }
> +
> +       strbuf_addf(buf, "Total: %d loose objects", total);

and this shows the total number of files across all the directories.

But the commit message suggested you also wanted to check for large
sizes of loose objects.  Did that get ripped out at some point with
the commit message not being updated, or is it perhaps going to be
included later?

> +
> +       strbuf_release(&count_path);
> +       closedir(dir);
> +}
> +
>  static int cmd_diagnose(int argc, const char **argv)
>  {
>         struct option options[] = {
> @@ -734,6 +788,12 @@ static int cmd_diagnose(int argc, const char **argv)
>         if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt")))
>                 goto diagnose_cleanup;
>
> +       strbuf_reset(&buf);
> +       loose_objs_stats(&buf, ".git/objects");
> +
> +       if ((res = stage(tmp_dir.buf, &buf, "objects-local.txt")))
> +               goto diagnose_cleanup;
> +
>         if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
>             (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
>             (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
> diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> index b1745851e31..f2ec156d819 100755
> --- a/contrib/scalar/t/t9099-scalar.sh
> +++ b/contrib/scalar/t/t9099-scalar.sh
> @@ -77,6 +77,8 @@ test_expect_success UNZIP 'scalar diagnose' '
>         unzip -p "$zip_path" diagnostics.log >out &&
>         test_file_not_empty out &&
>         unzip -p "$zip_path" packs-local.txt >out &&
> +       test_file_not_empty out &&
> +       unzip -p "$zip_path" objects-local.txt >out &&
>         test_file_not_empty out
>  '
>
> --
> gitgitgadget

  parent reply	other threads:[~2022-01-27 18:59 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
2022-01-26  8:41 ` [PATCH 1/5] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-01-26  9:34   ` René Scharfe
2022-01-26 22:20     ` Taylor Blau
2022-02-06 21:34       ` Johannes Schindelin
2022-01-27 19:38   ` Elijah Newren
2022-01-26  8:41 ` [PATCH 2/5] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-01-26  8:41 ` [PATCH 3/5] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-01-26 22:43   ` Taylor Blau
2022-01-27 15:14     ` Derrick Stolee
2022-02-06 21:38       ` Johannes Schindelin
2022-01-26  8:41 ` [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-01-26 22:50   ` Taylor Blau
2022-01-27 15:17     ` Derrick Stolee
2022-01-27 18:59   ` Elijah Newren [this message]
2022-02-06 21:25     ` Johannes Schindelin
2022-01-26  8:41 ` [PATCH 5/5] scalar diagnose: show a spinner while staging content Johannes Schindelin via GitGitGadget
2022-01-27 15:19 ` [PATCH 0/5] scalar: implement the subcommand "diagnose" Derrick Stolee
2022-02-06 21:13   ` Johannes Schindelin
2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 1/6] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-02-07 19:55     ` René Scharfe
2022-02-07 23:30       ` Junio C Hamano
2022-02-08 13:12         ` Johannes Schindelin
2022-02-08 17:44           ` Junio C Hamano
2022-02-08 20:58             ` René Scharfe
2022-02-09 22:48               ` Junio C Hamano
2022-02-10 19:10                 ` René Scharfe
2022-02-10 19:23                   ` Junio C Hamano
2022-02-11 19:16                     ` René Scharfe
2022-02-11 21:27                       ` Junio C Hamano
2022-02-12  9:12                         ` René Scharfe
2022-02-13  6:25                           ` Junio C Hamano
2022-02-13  9:02                             ` René Scharfe
2022-02-14 17:22                               ` Junio C Hamano
2022-02-08 12:54       ` Johannes Schindelin
2022-02-06 22:39   ` [PATCH v2 2/6] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 3/6] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-02-07 19:55     ` René Scharfe
2022-02-08 12:08       ` Johannes Schindelin
2022-02-06 22:39   ` [PATCH v2 4/6] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 5/6] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 6/6] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-07  2:06       ` Elijah Newren
2022-05-09 21:04         ` Johannes Schindelin
2022-05-04 15:25     ` [PATCH v3 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-07  2:23     ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Elijah Newren
2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
2022-05-10 19:26       ` [PATCH v4 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-10 21:48         ` Junio C Hamano
2022-05-10 22:06           ` rsbecker
2022-05-10 23:21             ` Junio C Hamano
2022-05-11 16:14               ` René Scharfe
2022-05-11 19:27                 ` Junio C Hamano
2022-05-12 16:16                   ` René Scharfe
2022-05-12 18:15                     ` Junio C Hamano
2022-05-12 21:31                       ` Junio C Hamano
2022-05-14  7:06                         ` René Scharfe
2022-05-12 22:31           ` [PATCH] fixup! " Junio C Hamano
2022-05-10 19:26       ` [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-10 21:56         ` Junio C Hamano
2022-05-10 22:23           ` rsbecker
2022-05-19 18:12             ` Johannes Schindelin
2022-05-19 18:09           ` Johannes Schindelin
2022-05-19 18:44             ` Junio C Hamano
2022-05-10 19:27       ` [PATCH v4 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-17 14:51         ` Ævar Arnfjörð Bjarmason
2022-05-18 17:35           ` Junio C Hamano
2022-05-20  7:30             ` Ævar Arnfjörð Bjarmason
2022-05-20 15:55               ` Johannes Schindelin
2022-05-21  9:54                 ` Ævar Arnfjörð Bjarmason
2022-05-22  5:50                   ` Junio C Hamano
2022-05-24 12:25                     ` Johannes Schindelin
2022-05-24 18:11                       ` Ævar Arnfjörð Bjarmason
2022-05-24 19:29                         ` Junio C Hamano
2022-05-25 10:31                           ` Johannes Schindelin
2022-05-10 19:27       ` [PATCH v4 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-17 14:53         ` Ævar Arnfjörð Bjarmason
2022-05-10 19:27       ` [PATCH v4 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-10 19:27       ` [PATCH v4 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-10 19:27       ` [PATCH v4 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-17 15:03       ` [PATCH v4 0/7] scalar: implement the subcommand "diagnose" Ævar Arnfjörð Bjarmason
2022-05-17 15:28         ` rsbecker
2022-05-19 18:17           ` Johannes Schindelin
2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
2022-05-19 18:17         ` [PATCH v5 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-20 14:41           ` René Scharfe
2022-05-20 16:21             ` Junio C Hamano
2022-05-19 18:17         ` [PATCH v5 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-19 18:17         ` [PATCH v5 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-19 19:23         ` [PATCH v5 0/7] scalar: implement the subcommand "diagnose" Junio C Hamano
2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-25 21:11             ` Junio C Hamano
2022-05-26  9:09               ` René Scharfe
2022-05-26 17:10                 ` Junio C Hamano
2022-05-26 18:57                   ` René Scharfe
2022-05-26 20:16                     ` Junio C Hamano
2022-05-27 17:02                       ` René Scharfe
2022-05-27 19:01                         ` Junio C Hamano
2022-05-28  6:57                           ` René Scharfe
2022-05-21 15:08           ` [PATCH v6 2/7] archive --add-virtual-file: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-25 20:22             ` Junio C Hamano
2022-05-25 21:42               ` Junio C Hamano
2022-05-25 22:34                 ` Junio C Hamano
2022-05-21 15:08           ` [PATCH v6 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-28 23:11           ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 1/7] archive: optionally add "virtual" files Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 2/7] archive --add-virtual-file: allow paths containing colons Junio C Hamano
2022-06-15 18:16               ` Adam Dinwoodie
2022-06-15 20:00                 ` Junio C Hamano
2022-06-15 21:36                   ` Adam Dinwoodie
2022-06-18 20:19                     ` Johannes Schindelin
2022-06-18 22:05                       ` Junio C Hamano
2022-06-20  9:41                       ` Adam Dinwoodie
2022-05-28 23:11             ` [PATCH v6+ 3/7] scalar: validate the optional enlistment argument Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 4/7] scalar: implement `scalar diagnose` Junio C Hamano
2022-06-10  2:08               ` Ævar Arnfjörð Bjarmason
2022-06-10 16:44                 ` Junio C Hamano
2022-06-10 17:35                   ` Ævar Arnfjörð Bjarmason
2022-05-28 23:11             ` [PATCH v6+ 5/7] scalar diagnose: include disk space information Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 6/7] scalar: teach `diagnose` to gather packfile info Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 7/7] scalar: teach `diagnose` to gather loose objects information Junio C Hamano
2022-05-30 10:12             ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Johannes Schindelin
2022-05-30 17:37               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BHeLzinXkX3WgqBNYntJwY_ZAm5D7VdOR7KQahvLOuV=w@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=mjcheetham@outlook.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).