From: Johannes Schindelin <Johannes.Schindelin@gmx.de> To: Elijah Newren <newren@gmail.com> Cc: Matthew John Cheetham via GitGitGadget <gitgitgadget@gmail.com>, Git Mailing List <git@vger.kernel.org>, Matthew John Cheetham <mjcheetham@outlook.com> Subject: Re: [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information Date: Sun, 6 Feb 2022 22:25:33 +0100 (CET) [thread overview] Message-ID: <nycvar.QRO.7.76.6.2202062214110.347@tvgsbejvaqbjf.bet> (raw) In-Reply-To: <CABPp-BHeLzinXkX3WgqBNYntJwY_ZAm5D7VdOR7KQahvLOuV=w@mail.gmail.com> Hi Elijah, On Thu, 27 Jan 2022, Elijah Newren wrote: > On Wed, Jan 26, 2022 at 3:37 PM Matthew John Cheetham via GitGitGadget > <gitgitgadget@gmail.com> wrote: > > > > From: Matthew John Cheetham <mjcheetham@outlook.com> > > > > When operating at the scale that Scalar wants to support, certain data > > shapes are more likely to cause undesirable performance issues, such as > > large numbers or large sizes of loose objects. > > Makes sense. > > > By including statistics about this, `scalar diagnose` now makes it > > easier to identify such scenarios. > > > > Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com> > > --- > > contrib/scalar/scalar.c | 60 ++++++++++++++++++++++++++++++++ > > contrib/scalar/t/t9099-scalar.sh | 2 ++ > > 2 files changed, 62 insertions(+) > > > > diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c > > index 690933ffdf3..c0ad4948215 100644 > > --- a/contrib/scalar/scalar.c > > +++ b/contrib/scalar/scalar.c > > @@ -686,6 +686,60 @@ static void dir_file_stats(struct strbuf *buf, const char *path) > > closedir(dir); > > } > > > > +static int count_files(char *path) > > +{ > > + DIR *dir = opendir(path); > > + struct dirent *e; > > + int count = 0; > > + > > + if (!dir) > > + return 0; > > + > > + while ((e = readdir(dir)) != NULL) > > + if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG) > > + count++; > > + > > + closedir(dir); > > + return count; > > +} > > + > > +static void loose_objs_stats(struct strbuf *buf, const char *path) > > +{ > > + DIR *dir = opendir(path); > > + struct dirent *e; > > + int count; > > + int total = 0; > > + unsigned char c; > > + struct strbuf count_path = STRBUF_INIT; > > + size_t base_path_len; > > + > > + if (!dir) > > + return; > > + > > + strbuf_addstr(buf, "Object directory stats for "); > > + strbuf_add_absolute_path(buf, path); > > + strbuf_addstr(buf, ":\n"); > > + > > + strbuf_add_absolute_path(&count_path, path); > > + strbuf_addch(&count_path, '/'); > > + base_path_len = count_path.len; > > + > > + while ((e = readdir(dir)) != NULL) > > + if (!is_dot_or_dotdot(e->d_name) && > > + e->d_type == DT_DIR && strlen(e->d_name) == 2 && > > + !hex_to_bytes(&c, e->d_name, 1)) { > > You only recurse into directories, ignoring individual files. > > > + strbuf_setlen(&count_path, base_path_len); > > + strbuf_addstr(&count_path, e->d_name); > > + total += (count = count_files(count_path.buf)); > > + strbuf_addf(buf, "%s : %7d files\n", e->d_name, count); > > This shows the number of files within a directory. > > > + } > > + > > + strbuf_addf(buf, "Total: %d loose objects", total); > > and this shows the total number of files across all the directories. > > But the commit message suggested you also wanted to check for large > sizes of loose objects. Did that get ripped out at some point with > the commit message not being updated, or is it perhaps going to be > included later? No, there was no plan to include this information later, as the original .NET implementation of `scalar diagnose` did not provide that information, either (which I take as a strong sign that we never needed this type of information to help users, at least not up until this point). Besides, it would be kind of a difficult thing to say conclusively what makes a loose file "big". Is it the zlib-compressed size on disk? Or the unpacked size? Should there be a configurable threshold to determine when an object is big? Should `core.bigFileThreshold` be co-opted for this? Together with the fact that there was no need for this information in practice, it makes me doubt that we should add this type of information. I actually suspect that _iff_ information of that type would be helpful, a more complete tool like git-sizer (https://github.com/github/git-sizer/) would be needed, and I do not really want to subsume git-sizer's functionality in `scalar diagnose`. I rephrased the commit message. Ciao, Dscho > > > + > > + strbuf_release(&count_path); > > + closedir(dir); > > +} > > + > > static int cmd_diagnose(int argc, const char **argv) > > { > > struct option options[] = { > > @@ -734,6 +788,12 @@ static int cmd_diagnose(int argc, const char **argv) > > if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt"))) > > goto diagnose_cleanup; > > > > + strbuf_reset(&buf); > > + loose_objs_stats(&buf, ".git/objects"); > > + > > + if ((res = stage(tmp_dir.buf, &buf, "objects-local.txt"))) > > + goto diagnose_cleanup; > > + > > if ((res = stage_directory(tmp_dir.buf, ".git", 0)) || > > (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) || > > (res = stage_directory(tmp_dir.buf, ".git/info", 0)) || > > diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh > > index b1745851e31..f2ec156d819 100755 > > --- a/contrib/scalar/t/t9099-scalar.sh > > +++ b/contrib/scalar/t/t9099-scalar.sh > > @@ -77,6 +77,8 @@ test_expect_success UNZIP 'scalar diagnose' ' > > unzip -p "$zip_path" diagnostics.log >out && > > test_file_not_empty out && > > unzip -p "$zip_path" packs-local.txt >out && > > + test_file_not_empty out && > > + unzip -p "$zip_path" objects-local.txt >out && > > test_file_not_empty out > > ' > > > > -- > > gitgitgadget >
next prev parent reply other threads:[~2022-02-06 21:25 UTC|newest] Thread overview: 140+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-01-26 8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget 2022-01-26 8:41 ` [PATCH 1/5] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget 2022-01-26 9:34 ` René Scharfe 2022-01-26 22:20 ` Taylor Blau 2022-02-06 21:34 ` Johannes Schindelin 2022-01-27 19:38 ` Elijah Newren 2022-01-26 8:41 ` [PATCH 2/5] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget 2022-01-26 8:41 ` [PATCH 3/5] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget 2022-01-26 22:43 ` Taylor Blau 2022-01-27 15:14 ` Derrick Stolee 2022-02-06 21:38 ` Johannes Schindelin 2022-01-26 8:41 ` [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget 2022-01-26 22:50 ` Taylor Blau 2022-01-27 15:17 ` Derrick Stolee 2022-01-27 18:59 ` Elijah Newren 2022-02-06 21:25 ` Johannes Schindelin [this message] 2022-01-26 8:41 ` [PATCH 5/5] scalar diagnose: show a spinner while staging content Johannes Schindelin via GitGitGadget 2022-01-27 15:19 ` [PATCH 0/5] scalar: implement the subcommand "diagnose" Derrick Stolee 2022-02-06 21:13 ` Johannes Schindelin 2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget 2022-02-06 22:39 ` [PATCH v2 1/6] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget 2022-02-07 19:55 ` René Scharfe 2022-02-07 23:30 ` Junio C Hamano 2022-02-08 13:12 ` Johannes Schindelin 2022-02-08 17:44 ` Junio C Hamano 2022-02-08 20:58 ` René Scharfe 2022-02-09 22:48 ` Junio C Hamano 2022-02-10 19:10 ` René Scharfe 2022-02-10 19:23 ` Junio C Hamano 2022-02-11 19:16 ` René Scharfe 2022-02-11 21:27 ` Junio C Hamano 2022-02-12 9:12 ` René Scharfe 2022-02-13 6:25 ` Junio C Hamano 2022-02-13 9:02 ` René Scharfe 2022-02-14 17:22 ` Junio C Hamano 2022-02-08 12:54 ` Johannes Schindelin 2022-02-06 22:39 ` [PATCH v2 2/6] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget 2022-02-06 22:39 ` [PATCH v2 3/6] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget 2022-02-07 19:55 ` René Scharfe 2022-02-08 12:08 ` Johannes Schindelin 2022-02-06 22:39 ` [PATCH v2 4/6] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget 2022-02-06 22:39 ` [PATCH v2 5/6] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget 2022-02-06 22:39 ` [PATCH v2 6/6] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget 2022-05-04 15:25 ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget 2022-05-04 15:25 ` [PATCH v3 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget 2022-05-04 15:25 ` [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget 2022-05-07 2:06 ` Elijah Newren 2022-05-09 21:04 ` Johannes Schindelin 2022-05-04 15:25 ` [PATCH v3 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget 2022-05-04 15:25 ` [PATCH v3 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget 2022-05-04 15:25 ` [PATCH v3 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget 2022-05-04 15:25 ` [PATCH v3 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget 2022-05-04 15:25 ` [PATCH v3 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget 2022-05-07 2:23 ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Elijah Newren 2022-05-10 19:26 ` [PATCH v4 " Johannes Schindelin via GitGitGadget 2022-05-10 19:26 ` [PATCH v4 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget 2022-05-10 21:48 ` Junio C Hamano 2022-05-10 22:06 ` rsbecker 2022-05-10 23:21 ` Junio C Hamano 2022-05-11 16:14 ` René Scharfe 2022-05-11 19:27 ` Junio C Hamano 2022-05-12 16:16 ` René Scharfe 2022-05-12 18:15 ` Junio C Hamano 2022-05-12 21:31 ` Junio C Hamano 2022-05-14 7:06 ` René Scharfe 2022-05-12 22:31 ` [PATCH] fixup! " Junio C Hamano 2022-05-10 19:26 ` [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget 2022-05-10 21:56 ` Junio C Hamano 2022-05-10 22:23 ` rsbecker 2022-05-19 18:12 ` Johannes Schindelin 2022-05-19 18:09 ` Johannes Schindelin 2022-05-19 18:44 ` Junio C Hamano 2022-05-10 19:27 ` [PATCH v4 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget 2022-05-17 14:51 ` Ævar Arnfjörð Bjarmason 2022-05-18 17:35 ` Junio C Hamano 2022-05-20 7:30 ` Ævar Arnfjörð Bjarmason 2022-05-20 15:55 ` Johannes Schindelin 2022-05-21 9:54 ` Ævar Arnfjörð Bjarmason 2022-05-22 5:50 ` Junio C Hamano 2022-05-24 12:25 ` Johannes Schindelin 2022-05-24 18:11 ` Ævar Arnfjörð Bjarmason 2022-05-24 19:29 ` Junio C Hamano 2022-05-25 10:31 ` Johannes Schindelin 2022-05-10 19:27 ` [PATCH v4 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget 2022-05-17 14:53 ` Ævar Arnfjörð Bjarmason 2022-05-10 19:27 ` [PATCH v4 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget 2022-05-10 19:27 ` [PATCH v4 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget 2022-05-10 19:27 ` [PATCH v4 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget 2022-05-17 15:03 ` [PATCH v4 0/7] scalar: implement the subcommand "diagnose" Ævar Arnfjörð Bjarmason 2022-05-17 15:28 ` rsbecker 2022-05-19 18:17 ` Johannes Schindelin 2022-05-19 18:17 ` [PATCH v5 " Johannes Schindelin via GitGitGadget 2022-05-19 18:17 ` [PATCH v5 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget 2022-05-20 14:41 ` René Scharfe 2022-05-20 16:21 ` Junio C Hamano 2022-05-19 18:17 ` [PATCH v5 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget 2022-05-19 18:17 ` [PATCH v5 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget 2022-05-19 18:18 ` [PATCH v5 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget 2022-05-19 18:18 ` [PATCH v5 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget 2022-05-19 18:18 ` [PATCH v5 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget 2022-05-19 18:18 ` [PATCH v5 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget 2022-05-19 19:23 ` [PATCH v5 0/7] scalar: implement the subcommand "diagnose" Junio C Hamano 2022-05-21 15:08 ` [PATCH v6 " Johannes Schindelin via GitGitGadget 2022-05-21 15:08 ` [PATCH v6 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget 2022-05-25 21:11 ` Junio C Hamano 2022-05-26 9:09 ` René Scharfe 2022-05-26 17:10 ` Junio C Hamano 2022-05-26 18:57 ` René Scharfe 2022-05-26 20:16 ` Junio C Hamano 2022-05-27 17:02 ` René Scharfe 2022-05-27 19:01 ` Junio C Hamano 2022-05-28 6:57 ` René Scharfe 2022-05-21 15:08 ` [PATCH v6 2/7] archive --add-virtual-file: allow paths containing colons Johannes Schindelin via GitGitGadget 2022-05-25 20:22 ` Junio C Hamano 2022-05-25 21:42 ` Junio C Hamano 2022-05-25 22:34 ` Junio C Hamano 2022-05-21 15:08 ` [PATCH v6 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget 2022-05-21 15:08 ` [PATCH v6 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget 2022-05-21 15:08 ` [PATCH v6 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget 2022-05-21 15:08 ` [PATCH v6 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget 2022-05-21 15:08 ` [PATCH v6 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget 2022-05-28 23:11 ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Junio C Hamano 2022-05-28 23:11 ` [PATCH v6+ 1/7] archive: optionally add "virtual" files Junio C Hamano 2022-05-28 23:11 ` [PATCH v6+ 2/7] archive --add-virtual-file: allow paths containing colons Junio C Hamano 2022-06-15 18:16 ` Adam Dinwoodie 2022-06-15 20:00 ` Junio C Hamano 2022-06-15 21:36 ` Adam Dinwoodie 2022-06-18 20:19 ` Johannes Schindelin 2022-06-18 22:05 ` Junio C Hamano 2022-06-20 9:41 ` Adam Dinwoodie 2022-05-28 23:11 ` [PATCH v6+ 3/7] scalar: validate the optional enlistment argument Junio C Hamano 2022-05-28 23:11 ` [PATCH v6+ 4/7] scalar: implement `scalar diagnose` Junio C Hamano 2022-06-10 2:08 ` Ævar Arnfjörð Bjarmason 2022-06-10 16:44 ` Junio C Hamano 2022-06-10 17:35 ` Ævar Arnfjörð Bjarmason 2022-05-28 23:11 ` [PATCH v6+ 5/7] scalar diagnose: include disk space information Junio C Hamano 2022-05-28 23:11 ` [PATCH v6+ 6/7] scalar: teach `diagnose` to gather packfile info Junio C Hamano 2022-05-28 23:11 ` [PATCH v6+ 7/7] scalar: teach `diagnose` to gather loose objects information Junio C Hamano 2022-05-30 10:12 ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Johannes Schindelin 2022-05-30 17:37 ` Junio C Hamano
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=nycvar.QRO.7.76.6.2202062214110.347@tvgsbejvaqbjf.bet \ --to=johannes.schindelin@gmx.de \ --cc=git@vger.kernel.org \ --cc=gitgitgadget@gmail.com \ --cc=mjcheetham@outlook.com \ --cc=newren@gmail.com \ --subject='Re: [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).