From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Elijah Newren <newren@gmail.com>
Cc: Matthew John Cheetham via GitGitGadget <gitgitgadget@gmail.com>,
Git Mailing List <git@vger.kernel.org>,
Matthew John Cheetham <mjcheetham@outlook.com>
Subject: Re: [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information
Date: Sun, 6 Feb 2022 22:25:33 +0100 (CET) [thread overview]
Message-ID: <nycvar.QRO.7.76.6.2202062214110.347@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <CABPp-BHeLzinXkX3WgqBNYntJwY_ZAm5D7VdOR7KQahvLOuV=w@mail.gmail.com>
Hi Elijah,
On Thu, 27 Jan 2022, Elijah Newren wrote:
> On Wed, Jan 26, 2022 at 3:37 PM Matthew John Cheetham via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Matthew John Cheetham <mjcheetham@outlook.com>
> >
> > When operating at the scale that Scalar wants to support, certain data
> > shapes are more likely to cause undesirable performance issues, such as
> > large numbers or large sizes of loose objects.
>
> Makes sense.
>
> > By including statistics about this, `scalar diagnose` now makes it
> > easier to identify such scenarios.
> >
> > Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
> > ---
> > contrib/scalar/scalar.c | 60 ++++++++++++++++++++++++++++++++
> > contrib/scalar/t/t9099-scalar.sh | 2 ++
> > 2 files changed, 62 insertions(+)
> >
> > diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> > index 690933ffdf3..c0ad4948215 100644
> > --- a/contrib/scalar/scalar.c
> > +++ b/contrib/scalar/scalar.c
> > @@ -686,6 +686,60 @@ static void dir_file_stats(struct strbuf *buf, const char *path)
> > closedir(dir);
> > }
> >
> > +static int count_files(char *path)
> > +{
> > + DIR *dir = opendir(path);
> > + struct dirent *e;
> > + int count = 0;
> > +
> > + if (!dir)
> > + return 0;
> > +
> > + while ((e = readdir(dir)) != NULL)
> > + if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
> > + count++;
> > +
> > + closedir(dir);
> > + return count;
> > +}
> > +
> > +static void loose_objs_stats(struct strbuf *buf, const char *path)
> > +{
> > + DIR *dir = opendir(path);
> > + struct dirent *e;
> > + int count;
> > + int total = 0;
> > + unsigned char c;
> > + struct strbuf count_path = STRBUF_INIT;
> > + size_t base_path_len;
> > +
> > + if (!dir)
> > + return;
> > +
> > + strbuf_addstr(buf, "Object directory stats for ");
> > + strbuf_add_absolute_path(buf, path);
> > + strbuf_addstr(buf, ":\n");
> > +
> > + strbuf_add_absolute_path(&count_path, path);
> > + strbuf_addch(&count_path, '/');
> > + base_path_len = count_path.len;
> > +
> > + while ((e = readdir(dir)) != NULL)
> > + if (!is_dot_or_dotdot(e->d_name) &&
> > + e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
> > + !hex_to_bytes(&c, e->d_name, 1)) {
>
> You only recurse into directories, ignoring individual files.
>
> > + strbuf_setlen(&count_path, base_path_len);
> > + strbuf_addstr(&count_path, e->d_name);
> > + total += (count = count_files(count_path.buf));
> > + strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
>
> This shows the number of files within a directory.
>
> > + }
> > +
> > + strbuf_addf(buf, "Total: %d loose objects", total);
>
> and this shows the total number of files across all the directories.
>
> But the commit message suggested you also wanted to check for large
> sizes of loose objects. Did that get ripped out at some point with
> the commit message not being updated, or is it perhaps going to be
> included later?
No, there was no plan to include this information later, as the original
.NET implementation of `scalar diagnose` did not provide that information,
either (which I take as a strong sign that we never needed this type of
information to help users, at least not up until this point).
Besides, it would be kind of a difficult thing to say conclusively what
makes a loose file "big". Is it the zlib-compressed size on disk? Or the
unpacked size? Should there be a configurable threshold to determine when
an object is big? Should `core.bigFileThreshold` be co-opted for this?
Together with the fact that there was no need for this information in
practice, it makes me doubt that we should add this type of information. I
actually suspect that _iff_ information of that type would be helpful, a
more complete tool like git-sizer (https://github.com/github/git-sizer/)
would be needed, and I do not really want to subsume git-sizer's
functionality in `scalar diagnose`.
I rephrased the commit message.
Ciao,
Dscho
>
> > +
> > + strbuf_release(&count_path);
> > + closedir(dir);
> > +}
> > +
> > static int cmd_diagnose(int argc, const char **argv)
> > {
> > struct option options[] = {
> > @@ -734,6 +788,12 @@ static int cmd_diagnose(int argc, const char **argv)
> > if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt")))
> > goto diagnose_cleanup;
> >
> > + strbuf_reset(&buf);
> > + loose_objs_stats(&buf, ".git/objects");
> > +
> > + if ((res = stage(tmp_dir.buf, &buf, "objects-local.txt")))
> > + goto diagnose_cleanup;
> > +
> > if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
> > (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
> > (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
> > diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> > index b1745851e31..f2ec156d819 100755
> > --- a/contrib/scalar/t/t9099-scalar.sh
> > +++ b/contrib/scalar/t/t9099-scalar.sh
> > @@ -77,6 +77,8 @@ test_expect_success UNZIP 'scalar diagnose' '
> > unzip -p "$zip_path" diagnostics.log >out &&
> > test_file_not_empty out &&
> > unzip -p "$zip_path" packs-local.txt >out &&
> > + test_file_not_empty out &&
> > + unzip -p "$zip_path" objects-local.txt >out &&
> > test_file_not_empty out
> > '
> >
> > --
> > gitgitgadget
>
next prev parent reply other threads:[~2022-02-06 21:25 UTC|newest]
Thread overview: 140+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-26 8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
2022-01-26 8:41 ` [PATCH 1/5] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-01-26 9:34 ` René Scharfe
2022-01-26 22:20 ` Taylor Blau
2022-02-06 21:34 ` Johannes Schindelin
2022-01-27 19:38 ` Elijah Newren
2022-01-26 8:41 ` [PATCH 2/5] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-01-26 8:41 ` [PATCH 3/5] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-01-26 22:43 ` Taylor Blau
2022-01-27 15:14 ` Derrick Stolee
2022-02-06 21:38 ` Johannes Schindelin
2022-01-26 8:41 ` [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-01-26 22:50 ` Taylor Blau
2022-01-27 15:17 ` Derrick Stolee
2022-01-27 18:59 ` Elijah Newren
2022-02-06 21:25 ` Johannes Schindelin [this message]
2022-01-26 8:41 ` [PATCH 5/5] scalar diagnose: show a spinner while staging content Johannes Schindelin via GitGitGadget
2022-01-27 15:19 ` [PATCH 0/5] scalar: implement the subcommand "diagnose" Derrick Stolee
2022-02-06 21:13 ` Johannes Schindelin
2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
2022-02-06 22:39 ` [PATCH v2 1/6] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-02-07 19:55 ` René Scharfe
2022-02-07 23:30 ` Junio C Hamano
2022-02-08 13:12 ` Johannes Schindelin
2022-02-08 17:44 ` Junio C Hamano
2022-02-08 20:58 ` René Scharfe
2022-02-09 22:48 ` Junio C Hamano
2022-02-10 19:10 ` René Scharfe
2022-02-10 19:23 ` Junio C Hamano
2022-02-11 19:16 ` René Scharfe
2022-02-11 21:27 ` Junio C Hamano
2022-02-12 9:12 ` René Scharfe
2022-02-13 6:25 ` Junio C Hamano
2022-02-13 9:02 ` René Scharfe
2022-02-14 17:22 ` Junio C Hamano
2022-02-08 12:54 ` Johannes Schindelin
2022-02-06 22:39 ` [PATCH v2 2/6] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-02-06 22:39 ` [PATCH v2 3/6] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-02-07 19:55 ` René Scharfe
2022-02-08 12:08 ` Johannes Schindelin
2022-02-06 22:39 ` [PATCH v2 4/6] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-02-06 22:39 ` [PATCH v2 5/6] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-02-06 22:39 ` [PATCH v2 6/6] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-04 15:25 ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
2022-05-04 15:25 ` [PATCH v3 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-04 15:25 ` [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-07 2:06 ` Elijah Newren
2022-05-09 21:04 ` Johannes Schindelin
2022-05-04 15:25 ` [PATCH v3 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-04 15:25 ` [PATCH v3 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-04 15:25 ` [PATCH v3 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-04 15:25 ` [PATCH v3 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-04 15:25 ` [PATCH v3 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-07 2:23 ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Elijah Newren
2022-05-10 19:26 ` [PATCH v4 " Johannes Schindelin via GitGitGadget
2022-05-10 19:26 ` [PATCH v4 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-10 21:48 ` Junio C Hamano
2022-05-10 22:06 ` rsbecker
2022-05-10 23:21 ` Junio C Hamano
2022-05-11 16:14 ` René Scharfe
2022-05-11 19:27 ` Junio C Hamano
2022-05-12 16:16 ` René Scharfe
2022-05-12 18:15 ` Junio C Hamano
2022-05-12 21:31 ` Junio C Hamano
2022-05-14 7:06 ` René Scharfe
2022-05-12 22:31 ` [PATCH] fixup! " Junio C Hamano
2022-05-10 19:26 ` [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-10 21:56 ` Junio C Hamano
2022-05-10 22:23 ` rsbecker
2022-05-19 18:12 ` Johannes Schindelin
2022-05-19 18:09 ` Johannes Schindelin
2022-05-19 18:44 ` Junio C Hamano
2022-05-10 19:27 ` [PATCH v4 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-17 14:51 ` Ævar Arnfjörð Bjarmason
2022-05-18 17:35 ` Junio C Hamano
2022-05-20 7:30 ` Ævar Arnfjörð Bjarmason
2022-05-20 15:55 ` Johannes Schindelin
2022-05-21 9:54 ` Ævar Arnfjörð Bjarmason
2022-05-22 5:50 ` Junio C Hamano
2022-05-24 12:25 ` Johannes Schindelin
2022-05-24 18:11 ` Ævar Arnfjörð Bjarmason
2022-05-24 19:29 ` Junio C Hamano
2022-05-25 10:31 ` Johannes Schindelin
2022-05-10 19:27 ` [PATCH v4 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-17 14:53 ` Ævar Arnfjörð Bjarmason
2022-05-10 19:27 ` [PATCH v4 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-10 19:27 ` [PATCH v4 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-10 19:27 ` [PATCH v4 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-17 15:03 ` [PATCH v4 0/7] scalar: implement the subcommand "diagnose" Ævar Arnfjörð Bjarmason
2022-05-17 15:28 ` rsbecker
2022-05-19 18:17 ` Johannes Schindelin
2022-05-19 18:17 ` [PATCH v5 " Johannes Schindelin via GitGitGadget
2022-05-19 18:17 ` [PATCH v5 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-20 14:41 ` René Scharfe
2022-05-20 16:21 ` Junio C Hamano
2022-05-19 18:17 ` [PATCH v5 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-19 18:17 ` [PATCH v5 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-19 18:18 ` [PATCH v5 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-19 18:18 ` [PATCH v5 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-19 18:18 ` [PATCH v5 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-19 18:18 ` [PATCH v5 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-19 19:23 ` [PATCH v5 0/7] scalar: implement the subcommand "diagnose" Junio C Hamano
2022-05-21 15:08 ` [PATCH v6 " Johannes Schindelin via GitGitGadget
2022-05-21 15:08 ` [PATCH v6 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-25 21:11 ` Junio C Hamano
2022-05-26 9:09 ` René Scharfe
2022-05-26 17:10 ` Junio C Hamano
2022-05-26 18:57 ` René Scharfe
2022-05-26 20:16 ` Junio C Hamano
2022-05-27 17:02 ` René Scharfe
2022-05-27 19:01 ` Junio C Hamano
2022-05-28 6:57 ` René Scharfe
2022-05-21 15:08 ` [PATCH v6 2/7] archive --add-virtual-file: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-25 20:22 ` Junio C Hamano
2022-05-25 21:42 ` Junio C Hamano
2022-05-25 22:34 ` Junio C Hamano
2022-05-21 15:08 ` [PATCH v6 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-21 15:08 ` [PATCH v6 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-21 15:08 ` [PATCH v6 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-21 15:08 ` [PATCH v6 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-21 15:08 ` [PATCH v6 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-28 23:11 ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Junio C Hamano
2022-05-28 23:11 ` [PATCH v6+ 1/7] archive: optionally add "virtual" files Junio C Hamano
2022-05-28 23:11 ` [PATCH v6+ 2/7] archive --add-virtual-file: allow paths containing colons Junio C Hamano
2022-06-15 18:16 ` Adam Dinwoodie
2022-06-15 20:00 ` Junio C Hamano
2022-06-15 21:36 ` Adam Dinwoodie
2022-06-18 20:19 ` Johannes Schindelin
2022-06-18 22:05 ` Junio C Hamano
2022-06-20 9:41 ` Adam Dinwoodie
2022-05-28 23:11 ` [PATCH v6+ 3/7] scalar: validate the optional enlistment argument Junio C Hamano
2022-05-28 23:11 ` [PATCH v6+ 4/7] scalar: implement `scalar diagnose` Junio C Hamano
2022-06-10 2:08 ` Ævar Arnfjörð Bjarmason
2022-06-10 16:44 ` Junio C Hamano
2022-06-10 17:35 ` Ævar Arnfjörð Bjarmason
2022-05-28 23:11 ` [PATCH v6+ 5/7] scalar diagnose: include disk space information Junio C Hamano
2022-05-28 23:11 ` [PATCH v6+ 6/7] scalar: teach `diagnose` to gather packfile info Junio C Hamano
2022-05-28 23:11 ` [PATCH v6+ 7/7] scalar: teach `diagnose` to gather loose objects information Junio C Hamano
2022-05-30 10:12 ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Johannes Schindelin
2022-05-30 17:37 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nycvar.QRO.7.76.6.2202062214110.347@tvgsbejvaqbjf.bet \
--to=johannes.schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=mjcheetham@outlook.com \
--cc=newren@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).