git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] rev-list: support `--human-readable` option when applied `disk-usage`
@ 2022-08-05  7:54 Li Linchao via GitGitGadget
  2022-08-05 10:03 ` Ævar Arnfjörð Bjarmason
  2022-08-08  8:35 ` [PATCH v2] rev-list: support human-readable output for `--disk-usage` Li Linchao via GitGitGadget
  0 siblings, 2 replies; 17+ messages in thread
From: Li Linchao via GitGitGadget @ 2022-08-05  7:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Li Linchao, Li Linchao

From: Li Linchao <lilinchao@oschina.cn>

The '--disk-usage' option for git-rev-list was introduced in 16950f8384
(rev-list: add --disk-usage option for calculating disk usage, 2021-02-09).
This is very useful for people inspect their git repo's objects usage
infomation, but the result number is quit hard for human to read.

Teach git rev-list to output more human readable result when using
'--disk-usage' to calculate objects disk usage.

Signed-off-by: Li Linchao <lilinchao@oschina.cn>
---
    rev-list: support --human-readable option when applied disk-usage
    
    The '--disk-usage' option for git-rev-list was introduced in 16950f8384
    (rev-list: add --disk-usage option for calculating disk usage,
    2021-02-09). This is very useful for people inspect their git repo's
    objects usage infomation, but the result number is quit hard for human
    to read.
    
    Teach git rev-list to output more human readable result when using
    '--disk-usage' to calculate objects disk usage.
    
    Signed-off-by: Li Linchao lilinchao@oschina.cn

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1313%2FCactusinhand%2Fllc%2Fadd-human-readable-option-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1313/Cactusinhand/llc/add-human-readable-option-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1313

 Documentation/rev-list-options.txt |  5 +++++
 builtin/rev-list.c                 | 31 ++++++++++++++++++++++++++----
 t/t6115-rev-list-du.sh             | 18 +++++++++++++++++
 3 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 195e74eec63..d30301a9159 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -249,6 +249,11 @@ ifdef::git-rev-list[]
 	faster (especially with `--use-bitmap-index`). See the `CAVEATS`
 	section in linkgit:git-cat-file[1] for the limitations of what
 	"on-disk storage" means.
+
+-H::
+--human-readable::
+	Print on-disk objects size in human readable format. This option
+	must be combined with `--disk-usage` together.
 endif::git-rev-list[]
 
 --cherry-mark::
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 30fd8e83eaf..be677f29070 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -81,6 +81,7 @@ static int arg_show_object_names = 1;
 
 static int show_disk_usage;
 static off_t total_disk_usage;
+static int human_readable;
 
 static off_t get_object_disk_usage(struct object *obj)
 {
@@ -473,6 +474,8 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 				 int filter_provided_objects)
 {
 	struct bitmap_index *bitmap_git;
+	struct strbuf bitmap_size_buf = STRBUF_INIT;
+	off_t size_from_bitmap;
 
 	if (!show_disk_usage)
 		return -1;
@@ -481,8 +484,13 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 	if (!bitmap_git)
 		return -1;
 
-	printf("%"PRIuMAX"\n",
-	       (uintmax_t)get_disk_usage_from_bitmap(bitmap_git, revs));
+	size_from_bitmap = get_disk_usage_from_bitmap(bitmap_git, revs);
+	if (human_readable) {
+		strbuf_humanise_bytes(&bitmap_size_buf, size_from_bitmap);
+		printf("%s\n", bitmap_size_buf.buf);
+	} else
+		printf("%"PRIuMAX"\n", (uintmax_t)size_from_bitmap);
+	strbuf_release(&bitmap_size_buf);
 	return 0;
 }
 
@@ -490,6 +498,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info revs;
 	struct rev_list_info info;
+	struct strbuf disk_buf = STRBUF_INIT;
 	struct setup_revision_opt s_r_opt = {
 		.allow_exclude_promisor_objects = 1,
 	};
@@ -630,9 +639,17 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			continue;
 		}
 
+		if (!strcmp(arg, "--human-readable") || !strcmp(arg, "-H")) {
+			human_readable = 1;
+			continue;
+		}
+
 		usage(rev_list_usage);
 
 	}
+
+	if (!show_disk_usage && human_readable)
+		die(_("option '%s' should be used with '%s' together"), "--human-readable/-H", "--disk-usage");
 	if (revs.commit_format != CMIT_FMT_USERFORMAT)
 		revs.include_header = 1;
 	if (revs.commit_format != CMIT_FMT_UNSPECIFIED) {
@@ -752,10 +769,16 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			printf("%d\n", revs.count_left + revs.count_right);
 	}
 
-	if (show_disk_usage)
-		printf("%"PRIuMAX"\n", (uintmax_t)total_disk_usage);
+	if (show_disk_usage) {
+		if (human_readable) {
+			strbuf_humanise_bytes(&disk_buf, total_disk_usage);
+			printf("%s\n", disk_buf.buf);
+		} else
+			printf("%"PRIuMAX"\n", (uintmax_t)total_disk_usage);
+	}
 
 cleanup:
 	release_revisions(&revs);
+	strbuf_release(&disk_buf);
 	return ret;
 }
diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
index b4aef32b713..614ebb72aaa 100755
--- a/t/t6115-rev-list-du.sh
+++ b/t/t6115-rev-list-du.sh
@@ -48,4 +48,22 @@ check_du HEAD
 check_du --objects HEAD
 check_du --objects HEAD^..HEAD
 
+
+test_expect_success 'rev-list --disk-usage with --human-readable' '
+	git rev-list --objects HEAD --disk-usage --human-readable >actual &&
+	test_i18ngrep -e "446 bytes" actual
+'
+
+test_expect_success 'rev-list --disk-usage with bitmap and --human-readable' '
+	git rev-list --objects HEAD --use-bitmap-index --disk-usage -H >actual &&
+	test_i18ngrep -e "446 bytes" actual
+'
+
+test_expect_success 'rev-list use --human-readable without --disk-usage' '
+	test_must_fail git rev-list --objects HEAD --human-readable 2> err &&
+	echo "fatal: option '\''--human-readable/-H'\'' should be used with" \
+	"'\''--disk-usage'\'' together" >expect &&
+	test_cmp err expect
+'
+
 test_done

base-commit: 4af7188bc97f70277d0f10d56d5373022b1fa385
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] rev-list: support `--human-readable` option when applied `disk-usage`
  2022-08-05  7:54 [PATCH] rev-list: support `--human-readable` option when applied `disk-usage` Li Linchao via GitGitGadget
@ 2022-08-05 10:03 ` Ævar Arnfjörð Bjarmason
  2022-08-05 11:01   ` lilinchao
  2022-08-08  8:35 ` [PATCH v2] rev-list: support human-readable output for `--disk-usage` Li Linchao via GitGitGadget
  1 sibling, 1 reply; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-05 10:03 UTC (permalink / raw)
  To: Li Linchao via GitGitGadget; +Cc: git, Jeff King, Li Linchao


On Fri, Aug 05 2022, Li Linchao via GitGitGadget wrote:

> From: Li Linchao <lilinchao@oschina.cn>
>
> The '--disk-usage' option for git-rev-list was introduced in 16950f8384
> (rev-list: add --disk-usage option for calculating disk usage, 2021-02-09).
> This is very useful for people inspect their git repo's objects usage
> infomation, but the result number is quit hard for human to read.

s/the result number/the resulting number/
s/for human/for a human/

>
> Teach git rev-list to output more human readable result when using

s/to output more human/to output a human/

> '--disk-usage' to calculate objects disk usage.

For this I'd just s/ to calculate objects disk usage//. I.e. we already
discussed what --disk-usage does...

> +
> +-H::
> +--human-readable::
> +	Print on-disk objects size in human readable format. This option
> +	must be combined with `--disk-usage` together.
>  endif::git-rev-list[]

I'd really prefer if we didn't squat on -H, rev-list is overridden
enough, but how about:

	--disk-usage
	--disk-usage=human

Rather than introducing a new option?

>  	struct bitmap_index *bitmap_git;
> +	struct strbuf bitmap_size_buf = STRBUF_INIT;
> +	off_t size_from_bitmap;
>  
>  	if (!show_disk_usage)
>  		return -1;
> @@ -481,8 +484,13 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
>  	if (!bitmap_git)
>  		return -1;
>  
> -	printf("%"PRIuMAX"\n",
> -	       (uintmax_t)get_disk_usage_from_bitmap(bitmap_git, revs));
> +	size_from_bitmap = get_disk_usage_from_bitmap(bitmap_git, revs);
> +	if (human_readable) {
> +		strbuf_humanise_bytes(&bitmap_size_buf, size_from_bitmap);
> +		printf("%s\n", bitmap_size_buf.buf);
> +	} else
> +		printf("%"PRIuMAX"\n", (uintmax_t)size_from_bitmap);
> +	strbuf_release(&bitmap_size_buf);

I think this would be better if we just use the strbuf unconditionally
(and a short &sb is conventional in such a short one-use function). So just:

	if (human_readable)
        	strbuf_humanise_bytes(&sb, size_from_bitmap);
	else
		strbuf_addf(&sb, "%"PRIuMAX", (uintmax_t)size_from_bitmap);
	puts(sb.buf);

It gets you rid of the need for {} braces, and I think makes for a nicer
read.

> -	if (show_disk_usage)
> -		printf("%"PRIuMAX"\n", (uintmax_t)total_disk_usage);
> +	if (show_disk_usage) {
> +		if (human_readable) {
> +			strbuf_humanise_bytes(&disk_buf, total_disk_usage);
> +			printf("%s\n", disk_buf.buf);
> +		} else
> +			printf("%"PRIuMAX"\n", (uintmax_t)total_disk_usage);
> +	}

Ditto, and we could make the &sb scoped to that "if (show_disk_usage)".

> +test_expect_success 'rev-list --disk-usage with --human-readable' '
> +	git rev-list --objects HEAD --disk-usage --human-readable >actual &&
> +	test_i18ngrep -e "446 bytes" actual

use grep, not test_i18ngrep (the latter should be going away entirely).

But actually we should use test_cmp here, isn't that the *entire*
output? I.e. won't this pass?

	echo 446 bytes >expect &&
	... >expect &&
	test_cmp expect actual

If so let's test what we really mean, i.e. we want *this* to be the
output, not to have output that has that sub-string on any arbitrary
amount of lines somewhere...

In this case it's unlikely to do the wrong thing, but it's a good habit
to get into...

> +test_expect_success 'rev-list --disk-usage with bitmap and --human-readable' '
> +	git rev-list --objects HEAD --use-bitmap-index --disk-usage -H >actual &&
> +	test_i18ngrep -e "446 bytes" actual

ditto.


> +'
> +
> +test_expect_success 'rev-list use --human-readable without --disk-usage' '
> +	test_must_fail git rev-list --objects HEAD --human-readable 2> err &&
> +	echo "fatal: option '\''--human-readable/-H'\'' should be used with" \
> +	"'\''--disk-usage'\'' together" >expect &&

You can make this a bit nicer by not using echo, use a here-doc instead:

	cat >expect <<-\EOF
        fatal: ...
	EOF

But you'll still need the '\'' quoting, but I thing it'll be better, and
avoids the line-wrapping (which we try to avoid for this sort of thing).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: [PATCH] rev-list: support `--human-readable` option when applied `disk-usage`
  2022-08-05 10:03 ` Ævar Arnfjörð Bjarmason
@ 2022-08-05 11:01   ` lilinchao
  0 siblings, 0 replies; 17+ messages in thread
From: lilinchao @ 2022-08-05 11:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason,
	Li Linchao via GitGitGadget
  Cc: git, Jeff King

>
>On Fri, Aug 05 2022, Li Linchao via GitGitGadget wrote:
>
>> From: Li Linchao <lilinchao@oschina.cn>
>>
>> The '--disk-usage' option for git-rev-list was introduced in 16950f8384
>> (rev-list: add --disk-usage option for calculating disk usage, 2021-02-09).
>> This is very useful for people inspect their git repo's objects usage
>> infomation, but the result number is quit hard for human to read.
>
>s/the result number/the resulting number/
>s/for human/for a human/
>
>>
>> Teach git rev-list to output more human readable result when using
>
>s/to output more human/to output a human/
>
>> '--disk-usage' to calculate objects disk usage.
>
>For this I'd just s/ to calculate objects disk usage//. I.e. we already
>discussed what --disk-usage does... 
OK
>
>> +
>> +-H::
>> +--human-readable::
>> +	Print on-disk objects size in human readable format. This option
>> +	must be combined with `--disk-usage` together.
>>  endif::git-rev-list[]
>
>I'd really prefer if we didn't squat on -H, rev-list is overridden
>enough, but how about:
>
>	--disk-usage
>	--disk-usage=human
>
>Rather than introducing a new option? 
Yes, this makes sense.
>
>>  struct bitmap_index *bitmap_git;
>> +	struct strbuf bitmap_size_buf = STRBUF_INIT;
>> +	off_t size_from_bitmap;
>> 
>>  if (!show_disk_usage)
>>  return -1;
>> @@ -481,8 +484,13 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
>>  if (!bitmap_git)
>>  return -1;
>> 
>> -	printf("%"PRIuMAX"\n",
>> -	       (uintmax_t)get_disk_usage_from_bitmap(bitmap_git, revs));
>> +	size_from_bitmap = get_disk_usage_from_bitmap(bitmap_git, revs);
>> +	if (human_readable) {
>> +	strbuf_humanise_bytes(&bitmap_size_buf, size_from_bitmap);
>> +	printf("%s\n", bitmap_size_buf.buf);
>> +	} else
>> +	printf("%"PRIuMAX"\n", (uintmax_t)size_from_bitmap);
>> +	strbuf_release(&bitmap_size_buf);
>
>I think this would be better if we just use the strbuf unconditionally
>(and a short &sb is conventional in such a short one-use function). So just:
>
>	if (human_readable)
>        strbuf_humanise_bytes(&sb, size_from_bitmap);
>	else
>	strbuf_addf(&sb, "%"PRIuMAX", (uintmax_t)size_from_bitmap);
>	puts(sb.buf);
>
>It gets you rid of the need for {} braces, and I think makes for a nicer
>read. 
Agree
>
>> -	if (show_disk_usage)
>> -	printf("%"PRIuMAX"\n", (uintmax_t)total_disk_usage);
>> +	if (show_disk_usage) {
>> +	if (human_readable) {
>> +	strbuf_humanise_bytes(&disk_buf, total_disk_usage);
>> +	printf("%s\n", disk_buf.buf);
>> +	} else
>> +	printf("%"PRIuMAX"\n", (uintmax_t)total_disk_usage);
>> +	}
>
>Ditto, and we could make the &sb scoped to that "if (show_disk_usage)".
>
>> +test_expect_success 'rev-list --disk-usage with --human-readable' '
>> +	git rev-list --objects HEAD --disk-usage --human-readable >actual &&
>> +	test_i18ngrep -e "446 bytes" actual
>
>use grep, not test_i18ngrep (the latter should be going away entirely). 
OK
>
>But actually we should use test_cmp here, isn't that the *entire*
>output? I.e. won't this pass?
>
>	echo 446 bytes >expect &&
>	... >expect &&
>	test_cmp expect actual
>
>If so let's test what we really mean, i.e. we want *this* to be the
>output, not to have output that has that sub-string on any arbitrary
>amount of lines somewhere...
>
>In this case it's unlikely to do the wrong thing, but it's a good habit
>to get into...
>
>> +test_expect_success 'rev-list --disk-usage with bitmap and --human-readable' '
>> +	git rev-list --objects HEAD --use-bitmap-index --disk-usage -H >actual &&
>> +	test_i18ngrep -e "446 bytes" actual
>
>ditto. 
The output here is just "446  bytes" if we use '--disk-usage' option in this rest repo.
But Github CI/linux-sha256 reminded me that I made a mistake that
I should avoid to hardcore actual size here.
>
>
>> +'
>> +
>> +test_expect_success 'rev-list use --human-readable without --disk-usage' '
>> +	test_must_fail git rev-list --objects HEAD --human-readable 2> err &&
>> +	echo "fatal: option '\''--human-readable/-H'\'' should be used with" \
>> +	"'\''--disk-usage'\'' together" >expect &&
>
>You can make this a bit nicer by not using echo, use a here-doc instead:
>
>	cat >expect <<-\EOF
>        fatal: ...
>	EOF
>
>But you'll still need the '\'' quoting, but I thing it'll be better, and
>avoids the line-wrapping (which we try to avoid for this sort of thing). 
OK.

Many thanks for all your review comments :)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2] rev-list: support human-readable output for `--disk-usage`
  2022-08-05  7:54 [PATCH] rev-list: support `--human-readable` option when applied `disk-usage` Li Linchao via GitGitGadget
  2022-08-05 10:03 ` Ævar Arnfjörð Bjarmason
@ 2022-08-08  8:35 ` Li Linchao via GitGitGadget
  2022-08-08  9:37   ` lilinchao
                     ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Li Linchao via GitGitGadget @ 2022-08-08  8:35 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Li Linchao,
	Li Linchao

From: Li Linchao <lilinchao@oschina.cn>

The '--disk-usage' option for git-rev-list was introduced in 16950f8384
(rev-list: add --disk-usage option for calculating disk usage, 2021-02-09).
This is very useful for people inspect their git repo's objects usage
infomation, but the resulting number is quit hard for a human to read.

Teach git rev-list to output a human readable result when using
'--disk-usage'.

Signed-off-by: Li Linchao <lilinchao@oschina.cn>
---
    rev-list: support human-readable output for disk-usage
    
    The '--disk-usage' option for git-rev-list was introduced in 16950f8384
    (rev-list: add --disk-usage option for calculating disk usage,
    2021-02-09). This is very useful for people inspect their git repo's
    objects usage infomation, but the result number is quit hard for human
    to read.
    
    Teach git rev-list to output more human readable result when using
    '--disk-usage' to calculate objects disk usage.
    
    Signed-off-by: Li Linchao lilinchao@oschina.cn

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1313%2FCactusinhand%2Fllc%2Fadd-human-readable-option-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1313/Cactusinhand/llc/add-human-readable-option-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1313

Range-diff vs v1:

 1:  7f8a7d3f61d ! 1:  7e34d16efe4 rev-list: support `--human-readable` option when applied `disk-usage`
     @@ Metadata
      Author: Li Linchao <lilinchao@oschina.cn>
      
       ## Commit message ##
     -    rev-list: support `--human-readable` option when applied `disk-usage`
     +    rev-list: support human-readable output for `--disk-usage`
      
          The '--disk-usage' option for git-rev-list was introduced in 16950f8384
          (rev-list: add --disk-usage option for calculating disk usage, 2021-02-09).
          This is very useful for people inspect their git repo's objects usage
     -    infomation, but the result number is quit hard for human to read.
     +    infomation, but the resulting number is quit hard for a human to read.
      
     -    Teach git rev-list to output more human readable result when using
     -    '--disk-usage' to calculate objects disk usage.
     +    Teach git rev-list to output a human readable result when using
     +    '--disk-usage'.
      
          Signed-off-by: Li Linchao <lilinchao@oschina.cn>
      
       ## Documentation/rev-list-options.txt ##
      @@ Documentation/rev-list-options.txt: ifdef::git-rev-list[]
     - 	faster (especially with `--use-bitmap-index`). See the `CAVEATS`
     - 	section in linkgit:git-cat-file[1] for the limitations of what
     - 	"on-disk storage" means.
     -+
     -+-H::
     -+--human-readable::
     -+	Print on-disk objects size in human readable format. This option
     -+	must be combined with `--disk-usage` together.
     - endif::git-rev-list[]
     + 	to `/dev/null` as the output does not have to be formatted.
       
     - --cherry-mark::
     + --disk-usage::
     ++--disk-usage=human::
     + 	Suppress normal output; instead, print the sum of the bytes used
     +-	for on-disk storage by the selected commits or objects. This is
     ++	for on-disk storage by the selected commits or objects.
     ++	When it accepts a value `human`, like: `--disk-usage=human`, this
     ++	means to print objects size in human readable format. This is
     + 	equivalent to piping the output into `git cat-file
     + 	--batch-check='%(objectsize:disk)'`, except that it runs much
     + 	faster (especially with `--use-bitmap-index`). See the `CAVEATS`
      
       ## builtin/rev-list.c ##
     +@@ builtin/rev-list.c: static const char rev_list_usage[] =
     + "    --parents\n"
     + "    --children\n"
     + "    --objects | --objects-edge\n"
     ++"    --disk-usage | --disk-usage=human\n"
     + "    --unpacked\n"
     + "    --header | --pretty\n"
     + "    --[no-]object-names\n"
      @@ builtin/rev-list.c: static int arg_show_object_names = 1;
       
       static int show_disk_usage;
     @@ builtin/rev-list.c: static int try_bitmap_disk_usage(struct rev_info *revs,
       				 int filter_provided_objects)
       {
       	struct bitmap_index *bitmap_git;
     -+	struct strbuf bitmap_size_buf = STRBUF_INIT;
     ++	struct strbuf disk_buf = STRBUF_INIT;
      +	off_t size_from_bitmap;
       
       	if (!show_disk_usage)
     @@ builtin/rev-list.c: static int try_bitmap_disk_usage(struct rev_info *revs,
      -	printf("%"PRIuMAX"\n",
      -	       (uintmax_t)get_disk_usage_from_bitmap(bitmap_git, revs));
      +	size_from_bitmap = get_disk_usage_from_bitmap(bitmap_git, revs);
     -+	if (human_readable) {
     -+		strbuf_humanise_bytes(&bitmap_size_buf, size_from_bitmap);
     -+		printf("%s\n", bitmap_size_buf.buf);
     -+	} else
     -+		printf("%"PRIuMAX"\n", (uintmax_t)size_from_bitmap);
     -+	strbuf_release(&bitmap_size_buf);
     ++	if (human_readable)
     ++		strbuf_humanise_bytes(&disk_buf, size_from_bitmap);
     ++	else
     ++		strbuf_addf(&disk_buf, "%"PRIuMAX"", (uintmax_t)size_from_bitmap);
     ++	puts(disk_buf.buf);
     ++	strbuf_release(&disk_buf);
       	return 0;
       }
       
     -@@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *prefix)
     - {
     - 	struct rev_info revs;
     - 	struct rev_list_info info;
     -+	struct strbuf disk_buf = STRBUF_INIT;
     - 	struct setup_revision_opt s_r_opt = {
     - 		.allow_exclude_promisor_objects = 1,
     - 	};
      @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *prefix)
       			continue;
       		}
       
     -+		if (!strcmp(arg, "--human-readable") || !strcmp(arg, "-H")) {
     -+			human_readable = 1;
     -+			continue;
     -+		}
     -+
     - 		usage(rev_list_usage);
     +-		if (!strcmp(arg, "--disk-usage")) {
     +-			show_disk_usage = 1;
     +-			info.flags |= REV_LIST_QUIET;
     +-			continue;
     ++		if (skip_prefix(arg, "--disk-usage", &arg)) {
     ++			if (*arg == '=') {
     ++				if (!strcmp(++arg, "human")) {
     ++					human_readable = 1;
     ++					show_disk_usage = 1;
     ++					info.flags |= REV_LIST_QUIET;
     ++					continue;
     ++				} else
     ++					die(_("invalid value for '%s': '%s', try --disk-usage=human"), "--disk-usage", arg);
     ++			} else {
     ++				show_disk_usage = 1;
     ++				info.flags |= REV_LIST_QUIET;
     ++				continue;
     ++			}
     + 		}
       
     - 	}
     -+
     -+	if (!show_disk_usage && human_readable)
     -+		die(_("option '%s' should be used with '%s' together"), "--human-readable/-H", "--disk-usage");
     - 	if (revs.commit_format != CMIT_FMT_USERFORMAT)
     - 		revs.include_header = 1;
     - 	if (revs.commit_format != CMIT_FMT_UNSPECIFIED) {
     + 		usage(rev_list_usage);
      @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *prefix)
       			printf("%d\n", revs.count_left + revs.count_right);
       	}
     @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
      -	if (show_disk_usage)
      -		printf("%"PRIuMAX"\n", (uintmax_t)total_disk_usage);
      +	if (show_disk_usage) {
     -+		if (human_readable) {
     ++		struct strbuf disk_buf = STRBUF_INIT;
     ++		if (human_readable)
      +			strbuf_humanise_bytes(&disk_buf, total_disk_usage);
     -+			printf("%s\n", disk_buf.buf);
     -+		} else
     -+			printf("%"PRIuMAX"\n", (uintmax_t)total_disk_usage);
     ++		else
     ++			strbuf_addf(&disk_buf, "%"PRIuMAX"", (uintmax_t)total_disk_usage);
     ++		puts(disk_buf.buf);
     ++		strbuf_release(&disk_buf);
      +	}
       
       cleanup:
       	release_revisions(&revs);
     -+	strbuf_release(&disk_buf);
     - 	return ret;
     - }
      
       ## t/t6115-rev-list-du.sh ##
      @@ t/t6115-rev-list-du.sh: check_du HEAD
       check_du --objects HEAD
       check_du --objects HEAD^..HEAD
       
     -+
     -+test_expect_success 'rev-list --disk-usage with --human-readable' '
     -+	git rev-list --objects HEAD --disk-usage --human-readable >actual &&
     -+	test_i18ngrep -e "446 bytes" actual
     ++# As mentioned above, don't use hardcode sizes as actual size, but use the
     ++# output from git cat-file.
     ++test_expect_success 'rev-list --disk-usage=human' '
     ++	git rev-list --objects HEAD --disk-usage=human >actual &&
     ++	disk_usage_slow --objects HEAD >actual_size &&
     ++	grep "$(cat actual_size) bytes" actual
      +'
      +
     -+test_expect_success 'rev-list --disk-usage with bitmap and --human-readable' '
     -+	git rev-list --objects HEAD --use-bitmap-index --disk-usage -H >actual &&
     -+	test_i18ngrep -e "446 bytes" actual
     ++test_expect_success 'rev-list --disk-usage=human with bitmaps' '
     ++	git rev-list --objects HEAD --use-bitmap-index --disk-usage=human >actual &&
     ++	disk_usage_slow --objects HEAD >actual_size &&
     ++	grep "$(cat actual_size) bytes" actual
      +'
      +
     -+test_expect_success 'rev-list use --human-readable without --disk-usage' '
     -+	test_must_fail git rev-list --objects HEAD --human-readable 2> err &&
     -+	echo "fatal: option '\''--human-readable/-H'\'' should be used with" \
     -+	"'\''--disk-usage'\'' together" >expect &&
     ++test_expect_success 'rev-list use --disk-usage unproperly' '
     ++	test_must_fail git rev-list --objects HEAD --disk-usage=typo 2>err &&
     ++	cat >expect <<-\EOF &&
     ++	fatal: invalid value for '\''--disk-usage'\'': '\''typo'\'', try --disk-usage=human
     ++	EOF
      +	test_cmp err expect
      +'
      +


 Documentation/rev-list-options.txt |  5 +++-
 builtin/rev-list.c                 | 42 ++++++++++++++++++++++++------
 t/t6115-rev-list-du.sh             | 22 ++++++++++++++++
 3 files changed, 60 insertions(+), 9 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 195e74eec63..9966ce4ef91 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -242,8 +242,11 @@ ifdef::git-rev-list[]
 	to `/dev/null` as the output does not have to be formatted.
 
 --disk-usage::
+--disk-usage=human::
 	Suppress normal output; instead, print the sum of the bytes used
-	for on-disk storage by the selected commits or objects. This is
+	for on-disk storage by the selected commits or objects.
+	When it accepts a value `human`, like: `--disk-usage=human`, this
+	means to print objects size in human readable format. This is
 	equivalent to piping the output into `git cat-file
 	--batch-check='%(objectsize:disk)'`, except that it runs much
 	faster (especially with `--use-bitmap-index`). See the `CAVEATS`
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 30fd8e83eaf..05ef232dcbe 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -46,6 +46,7 @@ static const char rev_list_usage[] =
 "    --parents\n"
 "    --children\n"
 "    --objects | --objects-edge\n"
+"    --disk-usage | --disk-usage=human\n"
 "    --unpacked\n"
 "    --header | --pretty\n"
 "    --[no-]object-names\n"
@@ -81,6 +82,7 @@ static int arg_show_object_names = 1;
 
 static int show_disk_usage;
 static off_t total_disk_usage;
+static int human_readable;
 
 static off_t get_object_disk_usage(struct object *obj)
 {
@@ -473,6 +475,8 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 				 int filter_provided_objects)
 {
 	struct bitmap_index *bitmap_git;
+	struct strbuf disk_buf = STRBUF_INIT;
+	off_t size_from_bitmap;
 
 	if (!show_disk_usage)
 		return -1;
@@ -481,8 +485,13 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 	if (!bitmap_git)
 		return -1;
 
-	printf("%"PRIuMAX"\n",
-	       (uintmax_t)get_disk_usage_from_bitmap(bitmap_git, revs));
+	size_from_bitmap = get_disk_usage_from_bitmap(bitmap_git, revs);
+	if (human_readable)
+		strbuf_humanise_bytes(&disk_buf, size_from_bitmap);
+	else
+		strbuf_addf(&disk_buf, "%"PRIuMAX"", (uintmax_t)size_from_bitmap);
+	puts(disk_buf.buf);
+	strbuf_release(&disk_buf);
 	return 0;
 }
 
@@ -624,10 +633,20 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			continue;
 		}
 
-		if (!strcmp(arg, "--disk-usage")) {
-			show_disk_usage = 1;
-			info.flags |= REV_LIST_QUIET;
-			continue;
+		if (skip_prefix(arg, "--disk-usage", &arg)) {
+			if (*arg == '=') {
+				if (!strcmp(++arg, "human")) {
+					human_readable = 1;
+					show_disk_usage = 1;
+					info.flags |= REV_LIST_QUIET;
+					continue;
+				} else
+					die(_("invalid value for '%s': '%s', try --disk-usage=human"), "--disk-usage", arg);
+			} else {
+				show_disk_usage = 1;
+				info.flags |= REV_LIST_QUIET;
+				continue;
+			}
 		}
 
 		usage(rev_list_usage);
@@ -752,8 +771,15 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			printf("%d\n", revs.count_left + revs.count_right);
 	}
 
-	if (show_disk_usage)
-		printf("%"PRIuMAX"\n", (uintmax_t)total_disk_usage);
+	if (show_disk_usage) {
+		struct strbuf disk_buf = STRBUF_INIT;
+		if (human_readable)
+			strbuf_humanise_bytes(&disk_buf, total_disk_usage);
+		else
+			strbuf_addf(&disk_buf, "%"PRIuMAX"", (uintmax_t)total_disk_usage);
+		puts(disk_buf.buf);
+		strbuf_release(&disk_buf);
+	}
 
 cleanup:
 	release_revisions(&revs);
diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
index b4aef32b713..b34841a4ba8 100755
--- a/t/t6115-rev-list-du.sh
+++ b/t/t6115-rev-list-du.sh
@@ -48,4 +48,26 @@ check_du HEAD
 check_du --objects HEAD
 check_du --objects HEAD^..HEAD
 
+# As mentioned above, don't use hardcode sizes as actual size, but use the
+# output from git cat-file.
+test_expect_success 'rev-list --disk-usage=human' '
+	git rev-list --objects HEAD --disk-usage=human >actual &&
+	disk_usage_slow --objects HEAD >actual_size &&
+	grep "$(cat actual_size) bytes" actual
+'
+
+test_expect_success 'rev-list --disk-usage=human with bitmaps' '
+	git rev-list --objects HEAD --use-bitmap-index --disk-usage=human >actual &&
+	disk_usage_slow --objects HEAD >actual_size &&
+	grep "$(cat actual_size) bytes" actual
+'
+
+test_expect_success 'rev-list use --disk-usage unproperly' '
+	test_must_fail git rev-list --objects HEAD --disk-usage=typo 2>err &&
+	cat >expect <<-\EOF &&
+	fatal: invalid value for '\''--disk-usage'\'': '\''typo'\'', try --disk-usage=human
+	EOF
+	test_cmp err expect
+'
+
 test_done

base-commit: 679aad9e82d0dfd8ef3d1f98fa4629665496cec9
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] rev-list: support human-readable output for `--disk-usage`
  2022-08-08  8:35 ` [PATCH v2] rev-list: support human-readable output for `--disk-usage` Li Linchao via GitGitGadget
@ 2022-08-08  9:37   ` lilinchao
  2022-08-09 13:22   ` Jeff King
  2022-08-10  6:01   ` [PATCH v3] " Li Linchao via GitGitGadget
  2 siblings, 0 replies; 17+ messages in thread
From: lilinchao @ 2022-08-08  9:37 UTC (permalink / raw)
  To: Li Linchao via GitGitGadget, git
  Cc: Jeff King, Ævar Arnfjörð Bjarmason

>       static int show_disk_usage;
>     @@ builtin/rev-list.c: static int try_bitmap_disk_usage(struct rev_info *revs,
>       int filter_provided_objects)
>       {
>       struct bitmap_index *bitmap_git;
>     -+	struct strbuf bitmap_size_buf = STRBUF_INIT;
>     ++	struct strbuf disk_buf = STRBUF_INIT;
>      +	off_t size_from_bitmap;
In next iteration, will move these two lines to more close to their caller place to
avoid early return.
>      
>       if (!show_disk_usage)
>     @@ builtin/rev-list.c: static int try_bitmap_disk_usage(struct rev_info *revs,
>      -	printf("%"PRIuMAX"\n",
>      -	       (uintmax_t)get_disk_usage_from_bitmap(bitmap_git, revs));
>      +	size_from_bitmap = get_disk_usage_from_bitmap(bitmap_git, revs);
>     -+	if (human_readable) {
>     -+	strbuf_humanise_bytes(&bitmap_size_buf, size_from_bitmap);
>     -+	printf("%s\n", bitmap_size_buf.buf);
>     -+	} else
>     -+	printf("%"PRIuMAX"\n", (uintmax_t)size_from_bitmap);
>     -+	strbuf_release(&bitmap_size_buf);
>     ++	if (human_readable)
>     ++	strbuf_humanise_bytes(&disk_buf, size_from_bitmap);
>     ++	else
>     ++	strbuf_addf(&disk_buf, "%"PRIuMAX"", (uintmax_t)size_from_bitmap);
>     ++	puts(disk_buf.buf);
>     ++	strbuf_release(&disk_buf);
>       return 0;
>       }


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] rev-list: support human-readable output for `--disk-usage`
  2022-08-08  8:35 ` [PATCH v2] rev-list: support human-readable output for `--disk-usage` Li Linchao via GitGitGadget
  2022-08-08  9:37   ` lilinchao
@ 2022-08-09 13:22   ` Jeff King
  2022-08-09 16:46     ` lilinchao
  2022-08-10  6:01   ` [PATCH v3] " Li Linchao via GitGitGadget
  2 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2022-08-09 13:22 UTC (permalink / raw)
  To: Li Linchao via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason, Li Linchao

On Mon, Aug 08, 2022 at 08:35:21AM +0000, Li Linchao via GitGitGadget wrote:

> The '--disk-usage' option for git-rev-list was introduced in 16950f8384
> (rev-list: add --disk-usage option for calculating disk usage, 2021-02-09).
> This is very useful for people inspect their git repo's objects usage
> infomation, but the resulting number is quit hard for a human to read.
> 
> Teach git rev-list to output a human readable result when using
> '--disk-usage'.

OK. When adding --disk-usage, I never really dreamed people would use it
for human output, since "du .git" is usually a suitable approximation. :)
But I don't have any real objection. I'm curious what your use case is
like, if you don't mind sharing. We used it at GitHub for computing
per-fork sizes for analysis, etc (so the result was always fed into
another script).

>  Documentation/rev-list-options.txt |  5 +++-
>  builtin/rev-list.c                 | 42 ++++++++++++++++++++++++------
>  t/t6115-rev-list-du.sh             | 22 ++++++++++++++++
>  3 files changed, 60 insertions(+), 9 deletions(-)

The patch itself looks pretty sensible (and thanks Ævar for the first
round of review; the suggestions there all looked good). A few small
comments:

> @@ -481,8 +485,13 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
>  	if (!bitmap_git)
>  		return -1;
>  
> -	printf("%"PRIuMAX"\n",
> -	       (uintmax_t)get_disk_usage_from_bitmap(bitmap_git, revs));
> +	size_from_bitmap = get_disk_usage_from_bitmap(bitmap_git, revs);
> +	if (human_readable)
> +		strbuf_humanise_bytes(&disk_buf, size_from_bitmap);
> +	else
> +		strbuf_addf(&disk_buf, "%"PRIuMAX"", (uintmax_t)size_from_bitmap);
> +	puts(disk_buf.buf);
> +	strbuf_release(&disk_buf);

It's not a lot of duplicated lines, but since it is implementing policy
logic, I think it would be nice to move the formatting decision into a
function. Something like:

  static void show_disk_usage(off_t size)
  {
	struct strbuf sb = STRBUF_INIT;
	if (human_readable)
		strbuf_humanise_bytes(&sb, size);
	else
		strbuf_addf(&sb, "%"PRIuMAX, (uintmax_t)size_from_bitmap);
	puts(sb.buf);
	strbuf_release(&sb);
  }

and then you can call it from here, and from the non-bitmap path below.

(Also, while typing it out, I noticed that you don't need the extra ""
after PRIuMAX; that just concatenates an empty string).

> -		if (!strcmp(arg, "--disk-usage")) {
> -			show_disk_usage = 1;
> -			info.flags |= REV_LIST_QUIET;
> -			continue;
> +		if (skip_prefix(arg, "--disk-usage", &arg)) {
> +			if (*arg == '=') {
> +				if (!strcmp(++arg, "human")) {
> +					human_readable = 1;
> +					show_disk_usage = 1;
> +					info.flags |= REV_LIST_QUIET;
> +					continue;
> +				} else
> +					die(_("invalid value for '%s': '%s', try --disk-usage=human"), "--disk-usage", arg);
> +			} else {
> +				show_disk_usage = 1;
> +				info.flags |= REV_LIST_QUIET;
> +				continue;
> +			}
>  		}

We can put the common parts of each side of the conditional into the
outer block to avoid repeating ourselves. Also, your code matches
--show-disk-usage-without-an-equals, since it nows uses skip_prefix().
You could fix that by checking for '\0' in *arg. So together, something
like:

  if (skip_prefix(arg, "--disk-usage", &arg)) {
	if (*arg == '=') {
		if (!strcmp(++arg, "human"))
			human_readable = 1;
		else
			die(...);
	} else if (*arg) {
		/*
		 * Arguably should goto a label to continue chain of ifs?
		 * Doesn't matter unless we try to add --disk-usage-foo
		 * afterwards
		 */
		usage(rev_list_usage);
	}
	show_disk_usage = 1;
	info.flags |= REV_LIST_QUIET;
	continue;
  }

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: [PATCH v2] rev-list: support human-readable output for `--disk-usage`
  2022-08-09 13:22   ` Jeff King
@ 2022-08-09 16:46     ` lilinchao
  0 siblings, 0 replies; 17+ messages in thread
From: lilinchao @ 2022-08-09 16:46 UTC (permalink / raw)
  To: Jeff King, Li Linchao via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason

>On Mon, Aug 08, 2022 at 08:35:21AM +0000, Li Linchao via GitGitGadget wrote:
>
>> The '--disk-usage' option for git-rev-list was introduced in 16950f8384
>> (rev-list: add --disk-usage option for calculating disk usage, 2021-02-09).
>> This is very useful for people inspect their git repo's objects usage
>> infomation, but the resulting number is quit hard for a human to read.
>>
>> Teach git rev-list to output a human readable result when using
>> '--disk-usage'.
>
>OK. When adding --disk-usage, I never really dreamed people would use it
>for human output, since "du .git" is usually a suitable approximation. :)
>But I don't have any real objection. I'm curious what your use case is
>like, if you don't mind sharing. We used it at GitHub for computing
>per-fork sizes for analysis, etc (so the result was always fed into
>another script).
Sometimes, I need objects disk size for analysis too, but I do this by hand
not by script. And I also noticed that git-count-objects has an option '-H'
for better output, so I think if it could be applied to "git-rev-list --disk-usage".
>
>>  Documentation/rev-list-options.txt |  5 +++-
>>  builtin/rev-list.c                 | 42 ++++++++++++++++++++++++------
>>  t/t6115-rev-list-du.sh             | 22 ++++++++++++++++
>>  3 files changed, 60 insertions(+), 9 deletions(-)
>
>The patch itself looks pretty sensible (and thanks Ævar for the first
>round of review; the suggestions there all looked good). A few small
>comments:
>
>> @@ -481,8 +485,13 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
>>  if (!bitmap_git)
>>  return -1;
>> 
>> -	printf("%"PRIuMAX"\n",
>> -	       (uintmax_t)get_disk_usage_from_bitmap(bitmap_git, revs));
>> +	size_from_bitmap = get_disk_usage_from_bitmap(bitmap_git, revs);
>> +	if (human_readable)
>> +	strbuf_humanise_bytes(&disk_buf, size_from_bitmap);
>> +	else
>> +	strbuf_addf(&disk_buf, "%"PRIuMAX"", (uintmax_t)size_from_bitmap);
>> +	puts(disk_buf.buf);
>> +	strbuf_release(&disk_buf);
>
>It's not a lot of duplicated lines, but since it is implementing policy
>logic, I think it would be nice to move the formatting decision into a
>function. Something like:
>
>  static void show_disk_usage(off_t size)
>  {
>	struct strbuf sb = STRBUF_INIT;
>	if (human_readable)
>	strbuf_humanise_bytes(&sb, size);
>	else
>	strbuf_addf(&sb, "%"PRIuMAX, (uintmax_t)size_from_bitmap);
>	puts(sb.buf);
>	strbuf_release(&sb);
>  }
>
>and then you can call it from here, and from the non-bitmap path below.
Oh, this is really good.
>
>(Also, while typing it out, I noticed that you don't need the extra ""
>after PRIuMAX; that just concatenates an empty string).
>
>> -	if (!strcmp(arg, "--disk-usage")) {
>> -	show_disk_usage = 1;
>> -	info.flags |= REV_LIST_QUIET;
>> -	continue;
>> +	if (skip_prefix(arg, "--disk-usage", &arg)) {
>> +	if (*arg == '=') {
>> +	if (!strcmp(++arg, "human")) {
>> +	human_readable = 1;
>> +	show_disk_usage = 1;
>> +	info.flags |= REV_LIST_QUIET;
>> +	continue;
>> +	} else
>> +	die(_("invalid value for '%s': '%s', try --disk-usage=human"), "--disk-usage", arg);
>> +	} else {
>> +	show_disk_usage = 1;
>> +	info.flags |= REV_LIST_QUIET;
>> +	continue;
>> +	}
>>  }
>
>We can put the common parts of each side of the conditional into the
>outer block to avoid repeating ourselves. Also, your code matches
>--show-disk-usage-without-an-equals, since it nows uses skip_prefix().
>You could fix that by checking for '\0' in *arg. So together, something
>like:
>
>  if (skip_prefix(arg, "--disk-usage", &arg)) {
>	if (*arg == '=') {
>	if (!strcmp(++arg, "human"))
>	human_readable = 1;
>	else
>	die(...);
>	} else if (*arg) {
>	/*
>	* Arguably should goto a label to continue chain of ifs?
>	* Doesn't matter unless we try to add --disk-usage-foo
>	* afterwards
>	*/
>	usage(rev_list_usage);
>	}
>	show_disk_usage = 1;
>	info.flags |= REV_LIST_QUIET;
>	continue;
>  }
Thank you a lot for your very nice suggestions.

>
>-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3] rev-list: support human-readable output for `--disk-usage`
  2022-08-08  8:35 ` [PATCH v2] rev-list: support human-readable output for `--disk-usage` Li Linchao via GitGitGadget
  2022-08-08  9:37   ` lilinchao
  2022-08-09 13:22   ` Jeff King
@ 2022-08-10  6:01   ` Li Linchao via GitGitGadget
  2022-08-10  7:18     ` Johannes Sixt
  2022-08-10 11:14     ` [PATCH v4] " Li Linchao via GitGitGadget
  2 siblings, 2 replies; 17+ messages in thread
From: Li Linchao via GitGitGadget @ 2022-08-10  6:01 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Li Linchao,
	Li Linchao

From: Li Linchao <lilinchao@oschina.cn>

The '--disk-usage' option for git-rev-list was introduced in 16950f8384
(rev-list: add --disk-usage option for calculating disk usage, 2021-02-09).
This is very useful for people inspect their git repo's objects usage
infomation, but the resulting number is quit hard for a human to read.

Teach git rev-list to output a human readable result when using
'--disk-usage'.

Signed-off-by: Li Linchao <lilinchao@oschina.cn>
---
    rev-list: support human-readable output for disk-usage
    
    The '--disk-usage' option for git-rev-list was introduced in 16950f8384
    (rev-list: add --disk-usage option for calculating disk usage,
    2021-02-09). This is very useful for people inspect their git repo's
    objects usage infomation, but the result number is quit hard for human
    to read.
    
    Teach git rev-list to output more human readable result when using
    '--disk-usage' to calculate objects disk usage.
    
    Signed-off-by: Li Linchao lilinchao@oschina.cn

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1313%2FCactusinhand%2Fllc%2Fadd-human-readable-option-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1313/Cactusinhand/llc/add-human-readable-option-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1313

Range-diff vs v2:

 1:  7e34d16efe4 ! 1:  000a6b37ec9 rev-list: support human-readable output for `--disk-usage`
     @@ builtin/rev-list.c: static int arg_show_object_names = 1;
       
       static off_t get_object_disk_usage(struct object *obj)
       {
     +@@ builtin/rev-list.c: static int show_object_fast(
     + 	return 1;
     + }
     + 
     ++static void print_disk_usage(off_t size)
     ++{
     ++	struct strbuf sb = STRBUF_INIT;
     ++	if (human_readable)
     ++		strbuf_humanise_bytes(&sb, size);
     ++	else
     ++		strbuf_addf(&sb, "%"PRIuMAX, (uintmax_t)size);
     ++	puts(sb.buf);
     ++	strbuf_release(&sb);
     ++}
     ++
     + static inline int parse_missing_action_value(const char *value)
     + {
     + 	if (!strcmp(value, "error")) {
      @@ builtin/rev-list.c: static int try_bitmap_disk_usage(struct rev_info *revs,
       				 int filter_provided_objects)
       {
       	struct bitmap_index *bitmap_git;
     -+	struct strbuf disk_buf = STRBUF_INIT;
      +	off_t size_from_bitmap;
       
       	if (!show_disk_usage)
     @@ builtin/rev-list.c: static int try_bitmap_disk_usage(struct rev_info *revs,
      -	printf("%"PRIuMAX"\n",
      -	       (uintmax_t)get_disk_usage_from_bitmap(bitmap_git, revs));
      +	size_from_bitmap = get_disk_usage_from_bitmap(bitmap_git, revs);
     -+	if (human_readable)
     -+		strbuf_humanise_bytes(&disk_buf, size_from_bitmap);
     -+	else
     -+		strbuf_addf(&disk_buf, "%"PRIuMAX"", (uintmax_t)size_from_bitmap);
     -+	puts(disk_buf.buf);
     -+	strbuf_release(&disk_buf);
     ++	print_disk_usage(size_from_bitmap);
       	return 0;
       }
       
     @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
       		}
       
      -		if (!strcmp(arg, "--disk-usage")) {
     --			show_disk_usage = 1;
     --			info.flags |= REV_LIST_QUIET;
     --			continue;
      +		if (skip_prefix(arg, "--disk-usage", &arg)) {
      +			if (*arg == '=') {
      +				if (!strcmp(++arg, "human")) {
      +					human_readable = 1;
     -+					show_disk_usage = 1;
     -+					info.flags |= REV_LIST_QUIET;
     -+					continue;
      +				} else
      +					die(_("invalid value for '%s': '%s', try --disk-usage=human"), "--disk-usage", arg);
     -+			} else {
     -+				show_disk_usage = 1;
     -+				info.flags |= REV_LIST_QUIET;
     -+				continue;
     ++			} else if (*arg) {
     ++				/*
     ++				* Arguably should goto a label to continue chain of ifs?
     ++				* Doesn't matter unless we try to add --disk-usage-foo
     ++				* afterwards
     ++				*/
     ++				usage(rev_list_usage);
      +			}
     - 		}
     - 
     - 		usage(rev_list_usage);
     + 			show_disk_usage = 1;
     + 			info.flags |= REV_LIST_QUIET;
     + 			continue;
      @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *prefix)
     - 			printf("%d\n", revs.count_left + revs.count_right);
       	}
       
     --	if (show_disk_usage)
     + 	if (show_disk_usage)
      -		printf("%"PRIuMAX"\n", (uintmax_t)total_disk_usage);
     -+	if (show_disk_usage) {
     -+		struct strbuf disk_buf = STRBUF_INIT;
     -+		if (human_readable)
     -+			strbuf_humanise_bytes(&disk_buf, total_disk_usage);
     -+		else
     -+			strbuf_addf(&disk_buf, "%"PRIuMAX"", (uintmax_t)total_disk_usage);
     -+		puts(disk_buf.buf);
     -+		strbuf_release(&disk_buf);
     -+	}
     ++		print_disk_usage(total_disk_usage);
       
       cleanup:
       	release_revisions(&revs);


 Documentation/rev-list-options.txt |  5 ++++-
 builtin/rev-list.c                 | 35 ++++++++++++++++++++++++++----
 t/t6115-rev-list-du.sh             | 22 +++++++++++++++++++
 3 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 195e74eec63..9966ce4ef91 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -242,8 +242,11 @@ ifdef::git-rev-list[]
 	to `/dev/null` as the output does not have to be formatted.
 
 --disk-usage::
+--disk-usage=human::
 	Suppress normal output; instead, print the sum of the bytes used
-	for on-disk storage by the selected commits or objects. This is
+	for on-disk storage by the selected commits or objects.
+	When it accepts a value `human`, like: `--disk-usage=human`, this
+	means to print objects size in human readable format. This is
 	equivalent to piping the output into `git cat-file
 	--batch-check='%(objectsize:disk)'`, except that it runs much
 	faster (especially with `--use-bitmap-index`). See the `CAVEATS`
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 30fd8e83eaf..df42e1b667e 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -46,6 +46,7 @@ static const char rev_list_usage[] =
 "    --parents\n"
 "    --children\n"
 "    --objects | --objects-edge\n"
+"    --disk-usage | --disk-usage=human\n"
 "    --unpacked\n"
 "    --header | --pretty\n"
 "    --[no-]object-names\n"
@@ -81,6 +82,7 @@ static int arg_show_object_names = 1;
 
 static int show_disk_usage;
 static off_t total_disk_usage;
+static int human_readable;
 
 static off_t get_object_disk_usage(struct object *obj)
 {
@@ -368,6 +370,17 @@ static int show_object_fast(
 	return 1;
 }
 
+static void print_disk_usage(off_t size)
+{
+	struct strbuf sb = STRBUF_INIT;
+	if (human_readable)
+		strbuf_humanise_bytes(&sb, size);
+	else
+		strbuf_addf(&sb, "%"PRIuMAX, (uintmax_t)size);
+	puts(sb.buf);
+	strbuf_release(&sb);
+}
+
 static inline int parse_missing_action_value(const char *value)
 {
 	if (!strcmp(value, "error")) {
@@ -473,6 +486,7 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 				 int filter_provided_objects)
 {
 	struct bitmap_index *bitmap_git;
+	off_t size_from_bitmap;
 
 	if (!show_disk_usage)
 		return -1;
@@ -481,8 +495,8 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 	if (!bitmap_git)
 		return -1;
 
-	printf("%"PRIuMAX"\n",
-	       (uintmax_t)get_disk_usage_from_bitmap(bitmap_git, revs));
+	size_from_bitmap = get_disk_usage_from_bitmap(bitmap_git, revs);
+	print_disk_usage(size_from_bitmap);
 	return 0;
 }
 
@@ -624,7 +638,20 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			continue;
 		}
 
-		if (!strcmp(arg, "--disk-usage")) {
+		if (skip_prefix(arg, "--disk-usage", &arg)) {
+			if (*arg == '=') {
+				if (!strcmp(++arg, "human")) {
+					human_readable = 1;
+				} else
+					die(_("invalid value for '%s': '%s', try --disk-usage=human"), "--disk-usage", arg);
+			} else if (*arg) {
+				/*
+				* Arguably should goto a label to continue chain of ifs?
+				* Doesn't matter unless we try to add --disk-usage-foo
+				* afterwards
+				*/
+				usage(rev_list_usage);
+			}
 			show_disk_usage = 1;
 			info.flags |= REV_LIST_QUIET;
 			continue;
@@ -753,7 +780,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	}
 
 	if (show_disk_usage)
-		printf("%"PRIuMAX"\n", (uintmax_t)total_disk_usage);
+		print_disk_usage(total_disk_usage);
 
 cleanup:
 	release_revisions(&revs);
diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
index b4aef32b713..b34841a4ba8 100755
--- a/t/t6115-rev-list-du.sh
+++ b/t/t6115-rev-list-du.sh
@@ -48,4 +48,26 @@ check_du HEAD
 check_du --objects HEAD
 check_du --objects HEAD^..HEAD
 
+# As mentioned above, don't use hardcode sizes as actual size, but use the
+# output from git cat-file.
+test_expect_success 'rev-list --disk-usage=human' '
+	git rev-list --objects HEAD --disk-usage=human >actual &&
+	disk_usage_slow --objects HEAD >actual_size &&
+	grep "$(cat actual_size) bytes" actual
+'
+
+test_expect_success 'rev-list --disk-usage=human with bitmaps' '
+	git rev-list --objects HEAD --use-bitmap-index --disk-usage=human >actual &&
+	disk_usage_slow --objects HEAD >actual_size &&
+	grep "$(cat actual_size) bytes" actual
+'
+
+test_expect_success 'rev-list use --disk-usage unproperly' '
+	test_must_fail git rev-list --objects HEAD --disk-usage=typo 2>err &&
+	cat >expect <<-\EOF &&
+	fatal: invalid value for '\''--disk-usage'\'': '\''typo'\'', try --disk-usage=human
+	EOF
+	test_cmp err expect
+'
+
 test_done

base-commit: 679aad9e82d0dfd8ef3d1f98fa4629665496cec9
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] rev-list: support human-readable output for `--disk-usage`
  2022-08-10  6:01   ` [PATCH v3] " Li Linchao via GitGitGadget
@ 2022-08-10  7:18     ` Johannes Sixt
  2022-08-10 11:14     ` [PATCH v4] " Li Linchao via GitGitGadget
  1 sibling, 0 replies; 17+ messages in thread
From: Johannes Sixt @ 2022-08-10  7:18 UTC (permalink / raw)
  To: Li Linchao via GitGitGadget
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, git,
	Li Linchao

Am 10.08.22 um 08:01 schrieb Li Linchao via GitGitGadget:
>  --disk-usage::
> +--disk-usage=human::
>  	Suppress normal output; instead, print the sum of the bytes used
> -	for on-disk storage by the selected commits or objects. This is
> +	for on-disk storage by the selected commits or objects.
> +	When it accepts a value `human`, like: `--disk-usage=human`, this
> +	means to print objects size in human readable format. This is
>  	equivalent to piping the output into `git cat-file
>  	--batch-check='%(objectsize:disk)'`, except that it runs much
>  	faster (especially with `--use-bitmap-index`). See the `CAVEATS`

The original paragraph flows very well and explains what the option does
and how it computes the result. Please do not interrupt the flow of the
text with a whole sentence that should be just a parenthetical remark,
but add a sentence at the end, not in the middle.

You added a new feature, and I understand that it is important *to you*.
But do make it a habit to ask yourself if it is also important for the
general audience. Generally, a new feature is not as important as
existing features, otherwise, it would have been added earlier, wouldn't it?

-- Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4] rev-list: support human-readable output for `--disk-usage`
  2022-08-10  6:01   ` [PATCH v3] " Li Linchao via GitGitGadget
  2022-08-10  7:18     ` Johannes Sixt
@ 2022-08-10 11:14     ` Li Linchao via GitGitGadget
  2022-08-10 17:34       ` Junio C Hamano
  2022-08-11  4:47       ` [PATCH v5] " Li Linchao via GitGitGadget
  1 sibling, 2 replies; 17+ messages in thread
From: Li Linchao via GitGitGadget @ 2022-08-10 11:14 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Johannes Sixt,
	Li Linchao, Li Linchao

From: Li Linchao <lilinchao@oschina.cn>

The '--disk-usage' option for git-rev-list was introduced in 16950f8384
(rev-list: add --disk-usage option for calculating disk usage, 2021-02-09).
This is very useful for people inspect their git repo's objects usage
infomation, but the resulting number is quit hard for a human to read.

Teach git rev-list to output a human readable result when using
'--disk-usage'.

Signed-off-by: Li Linchao <lilinchao@oschina.cn>
---
    rev-list: support human-readable output for disk-usage
    
    The '--disk-usage' option for git-rev-list was introduced in 16950f8384
    (rev-list: add --disk-usage option for calculating disk usage,
    2021-02-09). This is very useful for people inspect their git repo's
    objects usage infomation, but the result number is quit hard for human
    to read.
    
    Teach git rev-list to output more human readable result when using
    '--disk-usage' to calculate objects disk usage.
    
    Signed-off-by: Li Linchao lilinchao@oschina.cn

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1313%2FCactusinhand%2Fllc%2Fadd-human-readable-option-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1313/Cactusinhand/llc/add-human-readable-option-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1313

Range-diff vs v3:

 1:  000a6b37ec9 ! 1:  e56da057a9a rev-list: support human-readable output for `--disk-usage`
     @@ Documentation/rev-list-options.txt: ifdef::git-rev-list[]
       --disk-usage::
      +--disk-usage=human::
       	Suppress normal output; instead, print the sum of the bytes used
     --	for on-disk storage by the selected commits or objects. This is
     -+	for on-disk storage by the selected commits or objects.
     -+	When it accepts a value `human`, like: `--disk-usage=human`, this
     -+	means to print objects size in human readable format. This is
     + 	for on-disk storage by the selected commits or objects. This is
       	equivalent to piping the output into `git cat-file
     - 	--batch-check='%(objectsize:disk)'`, except that it runs much
     +@@ Documentation/rev-list-options.txt: ifdef::git-rev-list[]
       	faster (especially with `--use-bitmap-index`). See the `CAVEATS`
     + 	section in linkgit:git-cat-file[1] for the limitations of what
     + 	"on-disk storage" means.
     ++	When it accepts a value `human`, like: `--disk-usage=human`, this
     ++	means to print objects size in human readable format.
     + endif::git-rev-list[]
     + 
     + --cherry-mark::
      
       ## builtin/rev-list.c ##
      @@ builtin/rev-list.c: static const char rev_list_usage[] =


 Documentation/rev-list-options.txt |  3 +++
 builtin/rev-list.c                 | 35 ++++++++++++++++++++++++++----
 t/t6115-rev-list-du.sh             | 22 +++++++++++++++++++
 3 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 195e74eec63..5d3880874fc 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -242,6 +242,7 @@ ifdef::git-rev-list[]
 	to `/dev/null` as the output does not have to be formatted.
 
 --disk-usage::
+--disk-usage=human::
 	Suppress normal output; instead, print the sum of the bytes used
 	for on-disk storage by the selected commits or objects. This is
 	equivalent to piping the output into `git cat-file
@@ -249,6 +250,8 @@ ifdef::git-rev-list[]
 	faster (especially with `--use-bitmap-index`). See the `CAVEATS`
 	section in linkgit:git-cat-file[1] for the limitations of what
 	"on-disk storage" means.
+	When it accepts a value `human`, like: `--disk-usage=human`, this
+	means to print objects size in human readable format.
 endif::git-rev-list[]
 
 --cherry-mark::
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 30fd8e83eaf..df42e1b667e 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -46,6 +46,7 @@ static const char rev_list_usage[] =
 "    --parents\n"
 "    --children\n"
 "    --objects | --objects-edge\n"
+"    --disk-usage | --disk-usage=human\n"
 "    --unpacked\n"
 "    --header | --pretty\n"
 "    --[no-]object-names\n"
@@ -81,6 +82,7 @@ static int arg_show_object_names = 1;
 
 static int show_disk_usage;
 static off_t total_disk_usage;
+static int human_readable;
 
 static off_t get_object_disk_usage(struct object *obj)
 {
@@ -368,6 +370,17 @@ static int show_object_fast(
 	return 1;
 }
 
+static void print_disk_usage(off_t size)
+{
+	struct strbuf sb = STRBUF_INIT;
+	if (human_readable)
+		strbuf_humanise_bytes(&sb, size);
+	else
+		strbuf_addf(&sb, "%"PRIuMAX, (uintmax_t)size);
+	puts(sb.buf);
+	strbuf_release(&sb);
+}
+
 static inline int parse_missing_action_value(const char *value)
 {
 	if (!strcmp(value, "error")) {
@@ -473,6 +486,7 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 				 int filter_provided_objects)
 {
 	struct bitmap_index *bitmap_git;
+	off_t size_from_bitmap;
 
 	if (!show_disk_usage)
 		return -1;
@@ -481,8 +495,8 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 	if (!bitmap_git)
 		return -1;
 
-	printf("%"PRIuMAX"\n",
-	       (uintmax_t)get_disk_usage_from_bitmap(bitmap_git, revs));
+	size_from_bitmap = get_disk_usage_from_bitmap(bitmap_git, revs);
+	print_disk_usage(size_from_bitmap);
 	return 0;
 }
 
@@ -624,7 +638,20 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			continue;
 		}
 
-		if (!strcmp(arg, "--disk-usage")) {
+		if (skip_prefix(arg, "--disk-usage", &arg)) {
+			if (*arg == '=') {
+				if (!strcmp(++arg, "human")) {
+					human_readable = 1;
+				} else
+					die(_("invalid value for '%s': '%s', try --disk-usage=human"), "--disk-usage", arg);
+			} else if (*arg) {
+				/*
+				* Arguably should goto a label to continue chain of ifs?
+				* Doesn't matter unless we try to add --disk-usage-foo
+				* afterwards
+				*/
+				usage(rev_list_usage);
+			}
 			show_disk_usage = 1;
 			info.flags |= REV_LIST_QUIET;
 			continue;
@@ -753,7 +780,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	}
 
 	if (show_disk_usage)
-		printf("%"PRIuMAX"\n", (uintmax_t)total_disk_usage);
+		print_disk_usage(total_disk_usage);
 
 cleanup:
 	release_revisions(&revs);
diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
index b4aef32b713..b34841a4ba8 100755
--- a/t/t6115-rev-list-du.sh
+++ b/t/t6115-rev-list-du.sh
@@ -48,4 +48,26 @@ check_du HEAD
 check_du --objects HEAD
 check_du --objects HEAD^..HEAD
 
+# As mentioned above, don't use hardcode sizes as actual size, but use the
+# output from git cat-file.
+test_expect_success 'rev-list --disk-usage=human' '
+	git rev-list --objects HEAD --disk-usage=human >actual &&
+	disk_usage_slow --objects HEAD >actual_size &&
+	grep "$(cat actual_size) bytes" actual
+'
+
+test_expect_success 'rev-list --disk-usage=human with bitmaps' '
+	git rev-list --objects HEAD --use-bitmap-index --disk-usage=human >actual &&
+	disk_usage_slow --objects HEAD >actual_size &&
+	grep "$(cat actual_size) bytes" actual
+'
+
+test_expect_success 'rev-list use --disk-usage unproperly' '
+	test_must_fail git rev-list --objects HEAD --disk-usage=typo 2>err &&
+	cat >expect <<-\EOF &&
+	fatal: invalid value for '\''--disk-usage'\'': '\''typo'\'', try --disk-usage=human
+	EOF
+	test_cmp err expect
+'
+
 test_done

base-commit: 679aad9e82d0dfd8ef3d1f98fa4629665496cec9
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4] rev-list: support human-readable output for `--disk-usage`
  2022-08-10 11:14     ` [PATCH v4] " Li Linchao via GitGitGadget
@ 2022-08-10 17:34       ` Junio C Hamano
  2022-08-10 21:20         ` Jeff King
  2022-08-11  4:47       ` [PATCH v5] " Li Linchao via GitGitGadget
  1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2022-08-10 17:34 UTC (permalink / raw)
  To: Li Linchao via GitGitGadget
  Cc: git, Jeff King, Ævar Arnfjörð Bjarmason,
	Johannes Sixt, Li Linchao

"Li Linchao via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> index 195e74eec63..5d3880874fc 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -242,6 +242,7 @@ ifdef::git-rev-list[]
>  	to `/dev/null` as the output does not have to be formatted.
>  
>  --disk-usage::
> +--disk-usage=human::
>  	Suppress normal output; instead, print the sum of the bytes used
>  	for on-disk storage by the selected commits or objects. This is
>  	equivalent to piping the output into `git cat-file
> @@ -249,6 +250,8 @@ ifdef::git-rev-list[]
>  	faster (especially with `--use-bitmap-index`). See the `CAVEATS`
>  	section in linkgit:git-cat-file[1] for the limitations of what
>  	"on-disk storage" means.
> +	When it accepts a value `human`, like: `--disk-usage=human`, this
> +	means to print objects size in human readable format.

"When it accepts" sounds wrong, because it implies there are two
cases, i.e. the user gives "--disk-usage=human" and the command
somehow decides to accept it, or to reject it, which is not what is
going on.  How about phrasing it more like ...

    With the optional value `human`, on-disk storage size is shown
    in human-readable string (e.g. 12.23 KiB, 3.50 MiB).

perhaps?

> @@ -46,6 +46,7 @@ static const char rev_list_usage[] =
>  "    --parents\n"
>  "    --children\n"
>  "    --objects | --objects-edge\n"
> +"    --disk-usage | --disk-usage=human\n"

Writing it like

	"--disk-usage[=human]\n"

would be more in line with ...

>  "    --[no-]object-names\n"

... this one.

> @@ -368,6 +370,17 @@ static int show_object_fast(
>  	return 1;
>  }
>  
> +static void print_disk_usage(off_t size)
> +{
> +	struct strbuf sb = STRBUF_INIT;
> +	if (human_readable)
> +		strbuf_humanise_bytes(&sb, size);
> +	else
> +		strbuf_addf(&sb, "%"PRIuMAX, (uintmax_t)size);
> +	puts(sb.buf);
> +	strbuf_release(&sb);
> +}

Hmph, I am not sure if we want to make it a helper like this.  The
normal case does not need to prepare the string into a strbuf but
just can send the output to the standard output stream.

It is probably easy to fix, like so:

	if (!human_readable) {
		printf("%" PRIuMAX "\n", disk_usage);
	} else {
		strbuf sb = STRBUF_INIT;
		strbuf_humanise_bytes(&sb, disk_usage);
		puts(sb.buf);
		strbuf_release(&sb);
	}

>  static inline int parse_missing_action_value(const char *value)
>  {
>  	if (!strcmp(value, "error")) {
> @@ -473,6 +486,7 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
>  				 int filter_provided_objects)
>  {
>  	struct bitmap_index *bitmap_git;
> +	off_t size_from_bitmap;

Questionably named variable.

>  	if (!show_disk_usage)
>  		return -1;
> @@ -481,8 +495,8 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
>  	if (!bitmap_git)
>  		return -1;
>  
> -	printf("%"PRIuMAX"\n",
> -	       (uintmax_t)get_disk_usage_from_bitmap(bitmap_git, revs));
> +	size_from_bitmap = get_disk_usage_from_bitmap(bitmap_git, revs);
> +	print_disk_usage(size_from_bitmap);

It makes sense to make the function declare how it gets disk usage
in its name, but once we call the function to get what we want,
there is no need to keep saying we got it from bitmap.  If we ever
gained another function that obtains the disk usage from other
means, then this part of the code would become

	if (use_bitmap)
		variable = get_disk_usage_from_bitmap(...);
	else
		variable = get_disk_usage_from_other_means(...);

	print_disk_usage(variable);

Now, what should that variable be named?  It is clear that it
shouldn't be named "from_bitmap" at all, because it may very well
have come from somewhere else.

You call it simply 'size' in print_disk_usage(), and that would
probably be a good name to use here.  Or even better, "disk_usage".

However, I think all of the above badness stems from the horrible
way the existing code deals with "use_bitmap_index", and it is not
in the scope of this patch to fix it, let's keep the above code
as-is.  This is only called from the "if (use_bitmap_index)" so
the size can only be from bitmap.

> @@ -624,7 +638,20 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  			continue;
>  		}
>  
> -		if (!strcmp(arg, "--disk-usage")) {
> +		if (skip_prefix(arg, "--disk-usage", &arg)) {
> +			if (*arg == '=') {
> +				if (!strcmp(++arg, "human")) {
> +					human_readable = 1;
> +				} else
> +					die(_("invalid value for '%s': '%s', try --disk-usage=human"), "--disk-usage", arg);

This makes the hardcoded "--disk-usage=human" subject to
translation, which it should not.  Perhaps like this:

	die(_("invalid value for '%s': '%s', the only allowed format is "%s"),
	    "--disk-usage=<format>", arg, "human");

The main point is to ensure that "human" that we are not allowing
the user to type in their language is left out of the
_("translatable string").

> +			} else if (*arg) {
> +				/*
> +				* Arguably should goto a label to continue chain of ifs?
> +				* Doesn't matter unless we try to add --disk-usage-foo
> +				* afterwards
> +				*/

Wrong indentation for a long comment?

> +				usage(rev_list_usage);
> +			}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4] rev-list: support human-readable output for `--disk-usage`
  2022-08-10 17:34       ` Junio C Hamano
@ 2022-08-10 21:20         ` Jeff King
  2022-08-10 21:25           ` Junio C Hamano
  2022-08-11  5:20           ` Junio C Hamano
  0 siblings, 2 replies; 17+ messages in thread
From: Jeff King @ 2022-08-10 21:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Li Linchao via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason, Johannes Sixt, Li Linchao

On Wed, Aug 10, 2022 at 10:34:32AM -0700, Junio C Hamano wrote:

> > +static void print_disk_usage(off_t size)
> > +{
> > +	struct strbuf sb = STRBUF_INIT;
> > +	if (human_readable)
> > +		strbuf_humanise_bytes(&sb, size);
> > +	else
> > +		strbuf_addf(&sb, "%"PRIuMAX, (uintmax_t)size);
> > +	puts(sb.buf);
> > +	strbuf_release(&sb);
> > +}
> 
> Hmph, I am not sure if we want to make it a helper like this.  The
> normal case does not need to prepare the string into a strbuf but
> just can send the output to the standard output stream.
> 
> It is probably easy to fix, like so:
> 
> 	if (!human_readable) {
> 		printf("%" PRIuMAX "\n", disk_usage);
> 	} else {
> 		strbuf sb = STRBUF_INIT;
> 		strbuf_humanise_bytes(&sb, disk_usage);
> 		puts(sb.buf);
> 		strbuf_release(&sb);
> 	}

It was my suggestion to turn it into a helper, because the same code
needs to be present in two distant spots (the bitmap and non-bitmap
cases).

I don't care much between "printf directly vs strbuf" for the non-human
case, but it was an earlier review suggestion to connect them. I do
think the result is a little easier to follow, but mostly I want to make
it clear that the author is getting stuck between warring review
comments here. ;)

> > @@ -481,8 +495,8 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
> >  	if (!bitmap_git)
> >  		return -1;
> >  
> > -	printf("%"PRIuMAX"\n",
> > -	       (uintmax_t)get_disk_usage_from_bitmap(bitmap_git, revs));
> > +	size_from_bitmap = get_disk_usage_from_bitmap(bitmap_git, revs);
> > +	print_disk_usage(size_from_bitmap);
> 
> It makes sense to make the function declare how it gets disk usage
> in its name, but once we call the function to get what we want,
> there is no need to keep saying we got it from bitmap.  If we ever
> gained another function that obtains the disk usage from other
> means, then this part of the code would become

Keep in mind that we are in try_bitmap_disk_usage() here. :) There is
indeed similar code to use other means, but it's far away, and this code
will always use bitmaps.

That said, I'd have just written:

  print_disk_usage(get_disk_usage_from_bitmap(bitmap_git, revs));

since the variable is not otherwise used. But arguably that's harder to
read.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4] rev-list: support human-readable output for `--disk-usage`
  2022-08-10 21:20         ` Jeff King
@ 2022-08-10 21:25           ` Junio C Hamano
  2022-08-11  5:20           ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2022-08-10 21:25 UTC (permalink / raw)
  To: Jeff King
  Cc: Li Linchao via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason, Johannes Sixt, Li Linchao

Jeff King <peff@peff.net> writes:

> It was my suggestion to turn it into a helper, because the same code
> needs to be present in two distant spots (the bitmap and non-bitmap
> cases).
> ...
> Keep in mind that we are in try_bitmap_disk_usage() here.

Yeah, both of them I notice in a later parts of the message you are
responding to ;-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5] rev-list: support human-readable output for `--disk-usage`
  2022-08-10 11:14     ` [PATCH v4] " Li Linchao via GitGitGadget
  2022-08-10 17:34       ` Junio C Hamano
@ 2022-08-11  4:47       ` Li Linchao via GitGitGadget
  2022-08-11 20:49         ` Junio C Hamano
  1 sibling, 1 reply; 17+ messages in thread
From: Li Linchao via GitGitGadget @ 2022-08-11  4:47 UTC (permalink / raw)
  To: git; +Cc: Li Linchao, Li Linchao

From: Li Linchao <lilinchao@oschina.cn>

The '--disk-usage' option for git-rev-list was introduced in 16950f8384
(rev-list: add --disk-usage option for calculating disk usage, 2021-02-09).
This is very useful for people inspect their git repo's objects usage
infomation, but the resulting number is quit hard for a human to read.

Teach git rev-list to output a human readable result when using
'--disk-usage'.

Signed-off-by: Li Linchao <lilinchao@oschina.cn>
---
    rev-list: support human-readable output for disk-usage
    
    Signed-off-by: Li Linchao lilinchao@oschina.cn Cc: Jeff King
    peff@peff.net cc: Ævar Arnfjörð Bjarmason avarab@gmail.com cc: Johannes
    Sixt j6t@kdbg.org

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1313%2FCactusinhand%2Fllc%2Fadd-human-readable-option-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1313/Cactusinhand/llc/add-human-readable-option-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/1313

Range-diff vs v4:

 1:  e56da057a9a ! 1:  7fbf4b636d6 rev-list: support human-readable output for `--disk-usage`
     @@ Documentation/rev-list-options.txt: ifdef::git-rev-list[]
       	faster (especially with `--use-bitmap-index`). See the `CAVEATS`
       	section in linkgit:git-cat-file[1] for the limitations of what
       	"on-disk storage" means.
     -+	When it accepts a value `human`, like: `--disk-usage=human`, this
     -+	means to print objects size in human readable format.
     ++	With the optional value `human`, on-disk storage size is shown
     ++	in human-readable string(e.g. 12.24 Kib, 3.50 Mib).
       endif::git-rev-list[]
       
       --cherry-mark::
     @@ builtin/rev-list.c: static const char rev_list_usage[] =
       "    --parents\n"
       "    --children\n"
       "    --objects | --objects-edge\n"
     -+"    --disk-usage | --disk-usage=human\n"
     ++"    --disk-usage[=human]\n"
       "    --unpacked\n"
       "    --header | --pretty\n"
       "    --[no-]object-names\n"
     @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
      +				if (!strcmp(++arg, "human")) {
      +					human_readable = 1;
      +				} else
     -+					die(_("invalid value for '%s': '%s', try --disk-usage=human"), "--disk-usage", arg);
     ++					die(_("invalid value for '%s': '%s', the only allowed format is '%s'"),
     ++					    "--disk-usage=<format>", arg, "human");
      +			} else if (*arg) {
      +				/*
     -+				* Arguably should goto a label to continue chain of ifs?
     -+				* Doesn't matter unless we try to add --disk-usage-foo
     -+				* afterwards
     -+				*/
     ++				 * Arguably should goto a label to continue chain of ifs?
     ++				 * Doesn't matter unless we try to add --disk-usage-foo
     ++				 * afterwards.
     ++				 */
      +				usage(rev_list_usage);
      +			}
       			show_disk_usage = 1;
     @@ t/t6115-rev-list-du.sh: check_du HEAD
      +test_expect_success 'rev-list use --disk-usage unproperly' '
      +	test_must_fail git rev-list --objects HEAD --disk-usage=typo 2>err &&
      +	cat >expect <<-\EOF &&
     -+	fatal: invalid value for '\''--disk-usage'\'': '\''typo'\'', try --disk-usage=human
     ++	fatal: invalid value for '\''--disk-usage=<format>'\'': '\''typo'\'', the only allowed format is '\''human'\''
      +	EOF
      +	test_cmp err expect
      +'


 Documentation/rev-list-options.txt |  3 +++
 builtin/rev-list.c                 | 36 ++++++++++++++++++++++++++----
 t/t6115-rev-list-du.sh             | 22 ++++++++++++++++++
 3 files changed, 57 insertions(+), 4 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 195e74eec63..bd08d18576f 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -242,6 +242,7 @@ ifdef::git-rev-list[]
 	to `/dev/null` as the output does not have to be formatted.
 
 --disk-usage::
+--disk-usage=human::
 	Suppress normal output; instead, print the sum of the bytes used
 	for on-disk storage by the selected commits or objects. This is
 	equivalent to piping the output into `git cat-file
@@ -249,6 +250,8 @@ ifdef::git-rev-list[]
 	faster (especially with `--use-bitmap-index`). See the `CAVEATS`
 	section in linkgit:git-cat-file[1] for the limitations of what
 	"on-disk storage" means.
+	With the optional value `human`, on-disk storage size is shown
+	in human-readable string(e.g. 12.24 Kib, 3.50 Mib).
 endif::git-rev-list[]
 
 --cherry-mark::
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 30fd8e83eaf..fba6f5d51f3 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -46,6 +46,7 @@ static const char rev_list_usage[] =
 "    --parents\n"
 "    --children\n"
 "    --objects | --objects-edge\n"
+"    --disk-usage[=human]\n"
 "    --unpacked\n"
 "    --header | --pretty\n"
 "    --[no-]object-names\n"
@@ -81,6 +82,7 @@ static int arg_show_object_names = 1;
 
 static int show_disk_usage;
 static off_t total_disk_usage;
+static int human_readable;
 
 static off_t get_object_disk_usage(struct object *obj)
 {
@@ -368,6 +370,17 @@ static int show_object_fast(
 	return 1;
 }
 
+static void print_disk_usage(off_t size)
+{
+	struct strbuf sb = STRBUF_INIT;
+	if (human_readable)
+		strbuf_humanise_bytes(&sb, size);
+	else
+		strbuf_addf(&sb, "%"PRIuMAX, (uintmax_t)size);
+	puts(sb.buf);
+	strbuf_release(&sb);
+}
+
 static inline int parse_missing_action_value(const char *value)
 {
 	if (!strcmp(value, "error")) {
@@ -473,6 +486,7 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 				 int filter_provided_objects)
 {
 	struct bitmap_index *bitmap_git;
+	off_t size_from_bitmap;
 
 	if (!show_disk_usage)
 		return -1;
@@ -481,8 +495,8 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 	if (!bitmap_git)
 		return -1;
 
-	printf("%"PRIuMAX"\n",
-	       (uintmax_t)get_disk_usage_from_bitmap(bitmap_git, revs));
+	size_from_bitmap = get_disk_usage_from_bitmap(bitmap_git, revs);
+	print_disk_usage(size_from_bitmap);
 	return 0;
 }
 
@@ -624,7 +638,21 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			continue;
 		}
 
-		if (!strcmp(arg, "--disk-usage")) {
+		if (skip_prefix(arg, "--disk-usage", &arg)) {
+			if (*arg == '=') {
+				if (!strcmp(++arg, "human")) {
+					human_readable = 1;
+				} else
+					die(_("invalid value for '%s': '%s', the only allowed format is '%s'"),
+					    "--disk-usage=<format>", arg, "human");
+			} else if (*arg) {
+				/*
+				 * Arguably should goto a label to continue chain of ifs?
+				 * Doesn't matter unless we try to add --disk-usage-foo
+				 * afterwards.
+				 */
+				usage(rev_list_usage);
+			}
 			show_disk_usage = 1;
 			info.flags |= REV_LIST_QUIET;
 			continue;
@@ -753,7 +781,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	}
 
 	if (show_disk_usage)
-		printf("%"PRIuMAX"\n", (uintmax_t)total_disk_usage);
+		print_disk_usage(total_disk_usage);
 
 cleanup:
 	release_revisions(&revs);
diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
index b4aef32b713..d59111dedec 100755
--- a/t/t6115-rev-list-du.sh
+++ b/t/t6115-rev-list-du.sh
@@ -48,4 +48,26 @@ check_du HEAD
 check_du --objects HEAD
 check_du --objects HEAD^..HEAD
 
+# As mentioned above, don't use hardcode sizes as actual size, but use the
+# output from git cat-file.
+test_expect_success 'rev-list --disk-usage=human' '
+	git rev-list --objects HEAD --disk-usage=human >actual &&
+	disk_usage_slow --objects HEAD >actual_size &&
+	grep "$(cat actual_size) bytes" actual
+'
+
+test_expect_success 'rev-list --disk-usage=human with bitmaps' '
+	git rev-list --objects HEAD --use-bitmap-index --disk-usage=human >actual &&
+	disk_usage_slow --objects HEAD >actual_size &&
+	grep "$(cat actual_size) bytes" actual
+'
+
+test_expect_success 'rev-list use --disk-usage unproperly' '
+	test_must_fail git rev-list --objects HEAD --disk-usage=typo 2>err &&
+	cat >expect <<-\EOF &&
+	fatal: invalid value for '\''--disk-usage=<format>'\'': '\''typo'\'', the only allowed format is '\''human'\''
+	EOF
+	test_cmp err expect
+'
+
 test_done

base-commit: 679aad9e82d0dfd8ef3d1f98fa4629665496cec9
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4] rev-list: support human-readable output for `--disk-usage`
  2022-08-10 21:20         ` Jeff King
  2022-08-10 21:25           ` Junio C Hamano
@ 2022-08-11  5:20           ` Junio C Hamano
  2022-08-11  8:38             ` Jeff King
  1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2022-08-11  5:20 UTC (permalink / raw)
  To: Jeff King
  Cc: Li Linchao via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason, Johannes Sixt, Li Linchao

Jeff King <peff@peff.net> writes:

> I don't care much between "printf directly vs strbuf" for the non-human
> case, but it was an earlier review suggestion to connect them. I do
> think the result is a little easier to follow, but mostly I want to make
> it clear that the author is getting stuck between warring review
> comments here. ;)

The function does look like it can later be called millions of
times, at which time we may want to measure, but as the only use of
this helper function is to get called just once at the end with a
single number and print it (as opposed to getting called once per
each object), I do not care too much about the normal case being
made unnecessarily more expensive, either.

I still do not see the point of changing it to print to a strbuf and
puts() the result, though.  It does not make the code shorter, more
efficient, or more readable.  Once "if (we are producing humanize
format)" condition is hit, both of its branches can either be (1)
responsible to print the number to the standard output stream, using
whatever implementation, or (2) responsible to print the number to a
strbuf, so that somebody outside the if statement will be
respohnsible for printing that string to the standard output stream.

The patch chooses (2), which is more complex, for no good reason.  A
good thing about (1) is that the non-human codepath can STAY to be
the same as before, which is one fewer chance to introduce
unnecessary bugs.

Anyway...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4] rev-list: support human-readable output for `--disk-usage`
  2022-08-11  5:20           ` Junio C Hamano
@ 2022-08-11  8:38             ` Jeff King
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2022-08-11  8:38 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Li Linchao via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason, Johannes Sixt, Li Linchao

On Wed, Aug 10, 2022 at 10:20:14PM -0700, Junio C Hamano wrote:

> I still do not see the point of changing it to print to a strbuf and
> puts() the result, though.  It does not make the code shorter, more
> efficient, or more readable.  Once "if (we are producing humanize
> format)" condition is hit, both of its branches can either be (1)
> responsible to print the number to the standard output stream, using
> whatever implementation, or (2) responsible to print the number to a
> strbuf, so that somebody outside the if statement will be
> respohnsible for printing that string to the standard output stream.

I was not the original reviewer who suggested that change, but FWIW, I
think the argument for cases like these is something like this. Between:

  if (some_cond) {
    foo(do_some_prep());
  } else {
    foo(do_some_other_prep());
  }

and:

  if (some_cond) {
    x = do_some_prep();
  } else {
    x = do_some_other_prep();
  }
  foo(x);

the latter makes it more clear that foo() is called on both sides of the
conditional. In this case the "foo" is not exactly the same: on one it
is printing a strbuf, and on the other it is a printf that
direct-formats. But the principle is the same, if you want to make it
clear that both sides will result in printing something.

As you note, it loses the possibility for one side of the conditional to
do its "foo" in a more efficient way, but I don't think that's very
important for this particular call site.

> The patch chooses (2), which is more complex, for no good reason.  A
> good thing about (1) is that the non-human codepath can STAY to be
> the same as before, which is one fewer chance to introduce
> unnecessary bugs.

True.

Again, I don't care much either way. But I am not quite on the "I do not
see the point at all" side.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5] rev-list: support human-readable output for `--disk-usage`
  2022-08-11  4:47       ` [PATCH v5] " Li Linchao via GitGitGadget
@ 2022-08-11 20:49         ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2022-08-11 20:49 UTC (permalink / raw)
  To: Li Linchao via GitGitGadget; +Cc: git, Li Linchao

"Li Linchao via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Li Linchao <lilinchao@oschina.cn>
>
> The '--disk-usage' option for git-rev-list was introduced in 16950f8384
> (rev-list: add --disk-usage option for calculating disk usage, 2021-02-09).
> This is very useful for people inspect their git repo's objects usage
> infomation, but the resulting number is quit hard for a human to read.
>
> Teach git rev-list to output a human readable result when using
> '--disk-usage'.
>
> Signed-off-by: Li Linchao <lilinchao@oschina.cn>
> ---

Thanks, queued.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-08-11 20:49 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-05  7:54 [PATCH] rev-list: support `--human-readable` option when applied `disk-usage` Li Linchao via GitGitGadget
2022-08-05 10:03 ` Ævar Arnfjörð Bjarmason
2022-08-05 11:01   ` lilinchao
2022-08-08  8:35 ` [PATCH v2] rev-list: support human-readable output for `--disk-usage` Li Linchao via GitGitGadget
2022-08-08  9:37   ` lilinchao
2022-08-09 13:22   ` Jeff King
2022-08-09 16:46     ` lilinchao
2022-08-10  6:01   ` [PATCH v3] " Li Linchao via GitGitGadget
2022-08-10  7:18     ` Johannes Sixt
2022-08-10 11:14     ` [PATCH v4] " Li Linchao via GitGitGadget
2022-08-10 17:34       ` Junio C Hamano
2022-08-10 21:20         ` Jeff King
2022-08-10 21:25           ` Junio C Hamano
2022-08-11  5:20           ` Junio C Hamano
2022-08-11  8:38             ` Jeff King
2022-08-11  4:47       ` [PATCH v5] " Li Linchao via GitGitGadget
2022-08-11 20:49         ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).