git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Emily Shaffer <emilyshaffer@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v3 6/9] bugreport: count loose objects
Date: Mon, 28 Oct 2019 16:07:40 +0100 (CET)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.1910281540550.46@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <20191025025129.250049-7-emilyshaffer@google.com>

Hi Emily,

On Thu, 24 Oct 2019, Emily Shaffer wrote:

> The number of unpacked objects in a user's repository may help us
> understand the root of the problem they're seeing, especially if a
> command is running unusually slowly.
>
> Rather than directly invoking 'git-count-objects', which may sometimes
> fail unexpectedly on Git for Windows, manually count the contents of
> .git/objects. Additionally, since we may wish to inspect other
> directories' contents for bugreport in the future, put the directory
> listing into a helper function.

Thank you, much appreciated!

I guess the next step is to count the number of packs, and the number of
submodules ;-)

>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
>  bugreport.c         | 72 +++++++++++++++++++++++++++++++++++++++++++++
>  bugreport.h         |  6 ++++
>  builtin/bugreport.c |  4 +++
>  3 files changed, 82 insertions(+)
>
> diff --git a/bugreport.c b/bugreport.c
> index 9d7f44ff28..54e1d47103 100644
> --- a/bugreport.c
> +++ b/bugreport.c
> @@ -5,8 +5,11 @@
>  #include "exec-cmd.h"
>  #include "help.h"
>  #include "run-command.h"
> +#include "strbuf.h"

Why not append this to the end of the `#include` list, as is common in
Git's commit history?

>  #include "version.h"
>
> +#include "dirent.h"

This header (although with pointy brackets instead of double quotes) is
already included in `git-compat-util.h`

> +
>  /**
>   * A sorted list of config options which we will add to the bugreport. Managed
>   * by 'gather_whitelist(...)'.
> @@ -147,3 +150,72 @@ void get_populated_hooks(struct strbuf *hook_info)
>  		}
>  	}
>  }
> +
> +/**
> + * Fill 'contents' with the contents of the dir at 'dirpath'.

Since you start this comment in JavaDoc style, there should be an almost
empty line after this one ("almost" because it still contains the
asterisk, of course).

> + * If 'filter' is nonzero, the contents are filtered on d_type as 'type' - see
> + * 'man readdir'. opendir() doesn't take string length as an arg, so don't
> + * bother passing it in.
> + */
> +void list_contents_of_dir(struct string_list *contents, struct strbuf *dirpath,

Shouldn't this be `static`?

> +			  int filter, unsigned char type)
> +{
> +	struct dirent *dir = NULL;
> +	DIR *dh = NULL;
> +
> +	dh = opendir(dirpath->buf);
> +	while (dh && (dir = readdir(dh))) {
> +		if (!filter || type == dir->d_type) {
> +			string_list_append(contents, dir->d_name);
> +		}
> +	}
> +}
> +
> +
> +void get_object_counts(struct strbuf *obj_info)

Oops. This function is no longer used.

> +{
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	struct strbuf std_out = STRBUF_INIT;
> +
> +	argv_array_push(&cp.args, "count-objects");
> +	argv_array_push(&cp.args, "-vH");
> +	cp.git_cmd = 1;
> +	capture_command(&cp, &std_out, 0);
> +
> +	strbuf_reset(obj_info);
> +	strbuf_addstr(obj_info, "git-count-objects -vH:\n");
> +	strbuf_addbuf(obj_info, &std_out);
> +}
> +
> +void get_loose_object_summary(struct strbuf *obj_info)
> +{
> +	struct strbuf dirpath = STRBUF_INIT;
> +	struct string_list subdirs = STRING_LIST_INIT_DUP;
> +	struct string_list_item *subdir;
> +
> +	strbuf_reset(obj_info);
> +
> +	strbuf_addstr(&dirpath, get_object_directory());
> +	strbuf_complete(&dirpath, '/');
> +
> +	list_contents_of_dir(&subdirs, &dirpath, 1, DT_DIR);
> +
> +	for_each_string_list_item(subdir, &subdirs)
> +	{
> +		struct strbuf subdir_buf = STRBUF_INIT;
> +		struct string_list objects = STRING_LIST_INIT_DUP;
> +
> +		/*
> +		 * Only interested in loose objects - so dirs named with the
> +		 * first byte of the object ID
> +		 */
> +		if (strlen(subdir->string) != 2 || !strcmp(subdir->string, ".."))
> +			continue;
> +
> +		strbuf_addbuf(&subdir_buf, &dirpath);
> +		strbuf_addstr(&subdir_buf, subdir->string);
> +		list_contents_of_dir(&objects, &subdir_buf, 0, 0);
> +		strbuf_addf(obj_info, "%s: %d objects\n", subdir->string,
> +			    objects.nr);

Hmm. Not only does this leak `objects`, it also throws away the contents
that we so painfully constructed.

Wouldn't it make more sense to do something like this instead?

static int is_hex(const char *string, size_t count)
{
	for (; count; string++, count--)
		if (hexval(*string) < 0)
			return 0;
	return 1;
}

static ssize_t count_loose_objects(struct strbuf *objects_path)
{
	ssize_t ret = 0;
	size_t len;
	struct dirent *d;
	DIR *dir, *subdir;

	dir = opendir(objects_path->buf);
	if (!dir)
		return -1;

	strbuf_complete(objects_path, '/');
	len = objects_path->len;
	while ((d = readdir(dir))) {
		if (d->d_type != DT_DIR)
			continue;
		strbuf_setlen(objects_path, len);
		strbuf_addstr(objects_path, d->d_name);
		subdir = opendir(objects_path->buf);
		if (!subdir)
			continue;
		while ((d = readdir(subdir)))
			if (d->dt_type == DT_REG &&
			    is_hex(dir->d_name, the_repository->hash_algo->hexsz))
				ret++;
		closedir(subdir);
	}
	closedir(dir);
	strbuf_reset(objects_path, len);
	return ret;
}

Ciao,
Dscho

> +	}
> +}
> diff --git a/bugreport.h b/bugreport.h
> index 942a5436e3..09ad0c2599 100644
> --- a/bugreport.h
> +++ b/bugreport.h
> @@ -18,3 +18,9 @@ void get_whitelisted_config(struct strbuf *sys_info);
>   * contents of hook_info will be discarded.
>   */
>  void get_populated_hooks(struct strbuf *hook_info);
> +
> +/**
> + * Adds the output of `git count-object -vH`. The previous contents of hook_info
> + * will be discarded.
> + */
> +void get_loose_object_summary(struct strbuf *obj_info);
> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> index a0eefba498..b2ab194207 100644
> --- a/builtin/bugreport.c
> +++ b/builtin/bugreport.c
> @@ -64,6 +64,10 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
>  	get_populated_hooks(&buffer);
>  	strbuf_write(&buffer, report);
>
> +	add_header(report, "Object Counts");
> +	get_loose_object_summary(&buffer);
> +	strbuf_write(&buffer, report);
> +
>  	fclose(report);
>
>  	launch_editor(report_path.buf, NULL, NULL);
> --
> 2.24.0.rc0.303.g954a862665-goog
>
>

  reply	other threads:[~2019-10-28 15:08 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-15  2:34 [PATCH] bugreport: add tool to generate debugging info Emily Shaffer
2019-08-15 14:15 ` Derrick Stolee
2019-08-15 14:36   ` Junio C Hamano
2019-08-15 22:52     ` Emily Shaffer
2019-08-15 23:40       ` Junio C Hamano
2019-08-16  1:25         ` Emily Shaffer
2019-08-16 16:41           ` Junio C Hamano
2019-08-16 19:08             ` Emily Shaffer
2019-08-15 20:07   ` Johannes Schindelin
2019-08-15 22:24     ` Emily Shaffer
2019-08-16 20:19       ` Johannes Schindelin
2019-08-15 20:13   ` Emily Shaffer
2019-08-15 18:10 ` Junio C Hamano
2019-08-15 21:52   ` Emily Shaffer
2019-08-15 22:29     ` Junio C Hamano
2019-08-15 22:54       ` Emily Shaffer
2019-08-17  0:39 ` [PATCH v2 0/2] add git-bugreport tool Emily Shaffer
2019-08-17  0:39   ` [PATCH v2 1/2] bugreport: add tool to generate debugging info Emily Shaffer
2019-08-17  0:39   ` [PATCH v2 2/2] bugreport: generate config whitelist based on docs Emily Shaffer
2019-08-17 20:38     ` Martin Ågren
2019-08-21 17:40       ` Emily Shaffer
2019-10-25  2:51   ` [PATCH v3 0/9] add git-bugreport tool Emily Shaffer
2019-10-25  2:51     ` [PATCH v3 1/9] bugreport: add tool to generate debugging info Emily Shaffer
2019-10-29 20:29       ` Josh Steadmon
2019-11-16  3:11       ` Junio C Hamano
2019-11-19 20:25         ` Emily Shaffer
2019-11-19 23:24           ` Johannes Schindelin
2019-11-20  0:37             ` Junio C Hamano
2019-11-20 10:51               ` Johannes Schindelin
2019-11-19 23:31           ` Johannes Schindelin
2019-11-20  0:39             ` Junio C Hamano
2019-11-20  2:09             ` Emily Shaffer
2019-11-20  0:32           ` Junio C Hamano
2019-10-25  2:51     ` [PATCH v3 2/9] bugreport: generate config whitelist based on docs Emily Shaffer
2019-10-28 13:27       ` Johannes Schindelin
2019-10-25  2:51     ` [PATCH v3 3/9] bugreport: add version and system information Emily Shaffer
2019-10-28 13:49       ` Johannes Schindelin
2019-11-08 21:48         ` Emily Shaffer
2019-11-11 13:48           ` Johannes Schindelin
2019-11-14 21:42             ` Emily Shaffer
2019-10-29 20:43       ` Josh Steadmon
2019-10-25  2:51     ` [PATCH v3 4/9] bugreport: add config values from whitelist Emily Shaffer
2019-10-28 14:14       ` Johannes Schindelin
2019-12-11 20:48         ` Emily Shaffer
2019-12-15 17:30           ` Johannes Schindelin
2019-10-29 20:58       ` Josh Steadmon
2019-10-30  1:37         ` Junio C Hamano
2019-11-14 21:55           ` Emily Shaffer
2019-10-25  2:51     ` [PATCH v3 5/9] bugreport: collect list of populated hooks Emily Shaffer
2019-10-28 14:31       ` Johannes Schindelin
2019-12-11 20:51         ` Emily Shaffer
2019-12-15 17:40           ` Johannes Schindelin
2019-10-25  2:51     ` [PATCH v3 6/9] bugreport: count loose objects Emily Shaffer
2019-10-28 15:07       ` Johannes Schindelin [this message]
2019-12-10 22:34         ` Emily Shaffer
2019-10-29 21:18       ` Josh Steadmon
2019-10-25  2:51     ` [PATCH v3 7/9] bugreport: add packed object summary Emily Shaffer
2019-10-28 15:43       ` Johannes Schindelin
2019-12-11  0:29         ` Emily Shaffer
2019-12-11 13:37           ` Johannes Schindelin
2019-12-11 20:52             ` Emily Shaffer
2019-10-25  2:51     ` [PATCH v3 8/9] bugreport: list contents of $OBJDIR/info Emily Shaffer
2019-10-28 15:51       ` Johannes Schindelin
2019-10-25  2:51     ` [PATCH v3 9/9] bugreport: print contents of alternates file Emily Shaffer
2019-10-28 15:57       ` Johannes Schindelin
2019-11-19 20:40         ` Emily Shaffer
2019-10-29  1:54     ` [PATCH v3 0/9] add git-bugreport tool Junio C Hamano
2019-10-29 11:13       ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.1910281540550.46@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).