* [PATCH 0/2] rev-list --disk-usage @ 2021-01-27 22:11 Jeff King 2021-01-27 22:12 ` [PATCH 1/2] t: add --no-tag option to test_commit Jeff King ` (3 more replies) 0 siblings, 4 replies; 30+ messages in thread From: Jeff King @ 2021-01-27 22:11 UTC (permalink / raw) To: git; +Cc: Taylor Blau This series teaches rev-list to compute the on-disk size used by a set of objects. You can do the same thing with cat-file, but this is much faster (see the timings in the second commit). We've been running it for about 5 years at GitHub. I hesitated sending it upstream because it's a bit weird and special-purpose. But it does come in handy for debugging, analyzing repos, etc. So maybe others will find it useful. The first patch is just a test-script enhancement to let test_commit avoid creating tags. During some recent refactoring, we actually broke the --disk-usage feature but the test script didn't catch it because the tags were being picked up by "--all". Since this is at least the third time I've run into that in our test suite, I thought I'd make it a little more convenient to avoid. :) [1/2]: t: add --no-tag option to test_commit [2/2]: rev-list: add --disk-usage option for calculating disk usage Documentation/rev-list-options.txt | 9 ++++++ builtin/rev-list.c | 49 ++++++++++++++++++++++++++++ pack-bitmap.c | 50 +++++++++++++++++++++++++++++ pack-bitmap.h | 2 ++ t/t4208-log-magic-pathspec.sh | 9 ++---- t/t6114-rev-list-du.sh | 51 ++++++++++++++++++++++++++++++ t/test-lib-functions.sh | 9 +++++- 7 files changed, 171 insertions(+), 8 deletions(-) create mode 100755 t/t6114-rev-list-du.sh -Peff ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/2] t: add --no-tag option to test_commit 2021-01-27 22:11 [PATCH 0/2] rev-list --disk-usage Jeff King @ 2021-01-27 22:12 ` Jeff King 2021-01-27 22:48 ` Taylor Blau 2021-01-27 22:17 ` [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage Jeff King ` (2 subsequent siblings) 3 siblings, 1 reply; 30+ messages in thread From: Jeff King @ 2021-01-27 22:12 UTC (permalink / raw) To: git; +Cc: Taylor Blau One of the conveniences that test_commit offers is making a tag for each commit. This makes it easy to refer to the commits in subsequent commands. But it can also be a pain if you care about reachability, because those tags keep the commits reachable even if they are rewound from the branch they're made on. The alternative is that scripts have to call test_tick, git-add, and git-commit themselves. Let's add a --no-tag option to give them the one-liner convenience of using test_commit. This is in preparation for the next patch, which will add some more calls. But I cleaned up an existing site to show off the feature. There are probably more cleanups possible. Signed-off-by: Jeff King <peff@peff.net> --- t/t4208-log-magic-pathspec.sh | 9 ++------- t/test-lib-functions.sh | 9 ++++++++- 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/t/t4208-log-magic-pathspec.sh b/t/t4208-log-magic-pathspec.sh index 5e10136e9a..7f0c1dcc0f 100755 --- a/t/t4208-log-magic-pathspec.sh +++ b/t/t4208-log-magic-pathspec.sh @@ -31,13 +31,8 @@ test_expect_success '"git log :/a -- " should not be ambiguous' ' test_expect_success '"git log :/detached -- " should find a commit only in HEAD' ' test_when_finished "git checkout main" && git checkout --detach && - # Must manually call `test_tick` instead of using `test_commit`, - # because the latter additionally creates a tag, which would make - # the commit reachable not only via HEAD. - test_tick && - git commit --allow-empty -m detached && - test_tick && - git commit --allow-empty -m something-else && + test_commit --no-tag detached && + test_commit --no-tag something-else && git log :/detached -- ' diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 6bca002316..1587241ba0 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -202,6 +202,7 @@ test_commit () { author= && signoff= && indir= && + no_tag= && while test $# != 0 do case "$1" in @@ -222,6 +223,9 @@ test_commit () { indir="$2" shift ;; + --no-tag) + no_tag=yes + ;; *) break ;; @@ -244,7 +248,10 @@ test_commit () { git ${indir:+ -C "$indir"} commit \ ${author:+ --author "$author"} \ $signoff -m "$1" && - git ${indir:+ -C "$indir"} tag "${4:-$1}" + if test -z "$no_tag" + then + git ${indir:+ -C "$indir"} tag "${4:-$1}" + fi } # Call test_merge with the arguments "<message> <commit>", where <commit> -- 2.30.0.758.g9692d13bf2 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 1/2] t: add --no-tag option to test_commit 2021-01-27 22:12 ` [PATCH 1/2] t: add --no-tag option to test_commit Jeff King @ 2021-01-27 22:48 ` Taylor Blau 0 siblings, 0 replies; 30+ messages in thread From: Taylor Blau @ 2021-01-27 22:48 UTC (permalink / raw) To: Jeff King; +Cc: git, Taylor Blau On Wed, Jan 27, 2021 at 05:12:25PM -0500, Jeff King wrote: > The alternative is that scripts have to call test_tick, git-add, and > git-commit themselves. Let's add a --no-tag option to give them the > one-liner convenience of using test_commit. Thanks for finding a spot that does this and making it more readable with the new --no-tag option. This patch looks obviously correct. I'm sure that (as you note) there are more cleanups possible, but I'm happy to just grab an easy one and let future refactorings clean up the remaining ones. Thanks, Taylor ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage 2021-01-27 22:11 [PATCH 0/2] rev-list --disk-usage Jeff King 2021-01-27 22:12 ` [PATCH 1/2] t: add --no-tag option to test_commit Jeff King @ 2021-01-27 22:17 ` Jeff King 2021-01-27 22:57 ` Taylor Blau ` (2 more replies) 2021-01-27 22:46 ` [PATCH 0/2] rev-list --disk-usage Taylor Blau 2021-02-09 10:52 ` [PATCH v2] " Jeff King 3 siblings, 3 replies; 30+ messages in thread From: Jeff King @ 2021-01-27 22:17 UTC (permalink / raw) To: git; +Cc: Taylor Blau It can sometimes be useful to see which refs are contributing to the overall repository size (e.g., does some branch have a bunch of objects not found elsewhere in history, which indicates that deleting it would shrink the size of a clone). You can find that out by generating a list of objects, getting their sizes from cat-file, and then summing them, like: git rev-list --objects main..branch cut -d' ' -f1 | git cat-file --batch-check='%(objectsize:disk)' | perl -lne '$total += $_; END { print $total }' Though note that the caveats from git-cat-file(1) apply here. We "blame" base objects more than their deltas, even though the relationship could easily be flipped. Still, it can be a useful rough measure. But one problem is that it's slow to run. Teaching rev-list to sum up the sizes can be much faster for two reasons: 1. It skips all of the piping of object names and sizes. 2. If bitmaps are in use, for objects that are in the bitmapped packfile we can skip the oid_object_info() lookup entirely, and just ask the revindex for the on-disk size. This patch implements a --disk-usage option which produces the same answer in a fraction of the time. Here are some timings using a clone of torvalds/linux: [rev-list piped to cat-file, no bitmaps] $ time git rev-list --objects --all | cut -d' ' -f1 | git cat-file --buffer --batch-check='%(objectsize:disk)' | perl -lne '$total += $_; END { print $total }' 1455691059 real 0m34.336s user 0m46.533s sys 0m2.953s [internal, no bitmaps] $ time git rev-list --disk-usage --all 1455691059 real 0m32.662s user 0m32.306s sys 0m0.353s The wall-clock times aren't that different because of parallelism, but notice the CPU savings between the two. We saved 35% of the CPU just by avoiding the pipes. But the real win is with bitmaps. If we use them without the new option: [rev-list piped to cat-file, bitmaps] $ time git rev-list --objects --all --use-bitmap-index | cut -d' ' -f1 | git cat-file --batch-check='%(objectsize:disk)' | perl -lne '$total += $_; END { print $total }' real 0m9.954s user 0m11.234s sys 0m8.522s then we're faster to generate the list of objects, but we still spend a lot of time piping and looking things up. But if we do both together: [internal, bitmaps] $ time git rev-list --disk-usage --all --use-bitmap-index 1455691059 real 0m0.235s user 0m0.186s sys 0m0.049s then we get the same answer much faster. For "--all", that answer will correspond closely to "du objects/pack", of course. But we're actually checking reachability here, so we're still fast when we ask for more interesting things: $ time git rev-list --disk-usage --all --use-bitmap-index v5.0..v5.10 374798628 real 0m0.429s user 0m0.356s sys 0m0.072s Signed-off-by: Jeff King <peff@peff.net> --- This _could_ be made more flexible, but I didn't think it was worth the complexity. Some obvious things one might want are: - not counting up all reachable objects (i.e., requiring --objects for this output, and omitting it just counts up commits). This could be handled in the bitmap case with some extra code (OR-ing with the type bitmaps). But after 5 years of this patch, I've never wanted that once. The disk usage of just some of the objects isn't really that useful (and of course you can still get it by piping to cat-file). - an option to output the sizes of specific objects along with their oids. But if you want to get to this level of flexibility, I think you're better off just using cat-file (and if we are concerned about the pipe costs, we should teach rev-list to understand cat-file's custom formats). Documentation/rev-list-options.txt | 9 ++++++ builtin/rev-list.c | 49 ++++++++++++++++++++++++++++ pack-bitmap.c | 50 +++++++++++++++++++++++++++++ pack-bitmap.h | 2 ++ t/t6114-rev-list-du.sh | 51 ++++++++++++++++++++++++++++++ 5 files changed, 161 insertions(+) create mode 100755 t/t6114-rev-list-du.sh diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 002379056a..1e5826f26d 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -222,6 +222,15 @@ ifdef::git-rev-list[] test the exit status to see if a range of objects is fully connected (or not). It is faster than redirecting stdout to `/dev/null` as the output does not have to be formatted. + +--disk-usage:: + Suppress normal output; instead, print the sum of the bytes used + for on-disk storage by the selected objects. This is equivalent + to piping the output of `rev-list --objects` into + `git cat-file --batch-check='%(objectsize:disk)', except that it + runs much faster (especially with `--use-bitmap-index`). See the + `CAVEATS` section in linkgit:git-cat-file[1] for the limitations + of what "on-disk storage" means. endif::git-rev-list[] --cherry-mark:: diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 25c6c3b38d..2262b613dd 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -80,6 +80,19 @@ static int arg_show_object_names = 1; #define DEFAULT_OIDSET_SIZE (16*1024) +static int show_disk_usage; +static off_t total_disk_usage; + +static off_t get_object_disk_usage(struct object *obj) +{ + off_t size; + struct object_info oi = OBJECT_INFO_INIT; + oi.disk_sizep = &size; + if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0) + die(_("unable to get disk usage of %s"), oid_to_hex(&obj->oid)); + return size; +} + static void finish_commit(struct commit *commit); static void show_commit(struct commit *commit, void *data) { @@ -88,6 +101,9 @@ static void show_commit(struct commit *commit, void *data) display_progress(progress, ++progress_counter); + if (show_disk_usage) + total_disk_usage += get_object_disk_usage(&commit->object); + if (info->flags & REV_LIST_QUIET) { finish_commit(commit); return; @@ -258,6 +274,8 @@ static void show_object(struct object *obj, const char *name, void *cb_data) if (finish_object(obj, name, cb_data)) return; display_progress(progress, ++progress_counter); + if (show_disk_usage) + total_disk_usage += get_object_disk_usage(obj); if (info->flags & REV_LIST_QUIET) return; @@ -452,6 +470,23 @@ static int try_bitmap_traversal(struct rev_info *revs, return 0; } +static int try_bitmap_disk_usage(struct rev_info *revs, + struct list_objects_filter_options *filter) +{ + struct bitmap_index *bitmap_git; + + if (!show_disk_usage) + return -1; + + bitmap_git = prepare_bitmap_walk(revs, filter); + if (!bitmap_git) + return -1; + + printf("%"PRIuMAX"\n", + (uintmax_t)get_disk_usage_from_bitmap(bitmap_git)); + return 0; +} + int cmd_rev_list(int argc, const char **argv, const char *prefix) { struct rev_info revs; @@ -584,6 +619,15 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) continue; } + if (!strcmp(arg, "--disk-usage")) { + show_disk_usage = 1; + revs.tag_objects = 1; + revs.tree_objects = 1; + revs.blob_objects = 1; + info.flags |= REV_LIST_QUIET; + continue; + } + usage(rev_list_usage); } @@ -626,6 +670,8 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) if (use_bitmap_index) { if (!try_bitmap_count(&revs, &filter_options)) return 0; + if (!try_bitmap_disk_usage(&revs, &filter_options)) + return 0; if (!try_bitmap_traversal(&revs, &filter_options)) return 0; } @@ -690,5 +736,8 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) printf("%d\n", revs.count_left + revs.count_right); } + if (show_disk_usage) + printf("%"PRIuMAX"\n", (uintmax_t)total_disk_usage); + return 0; } diff --git a/pack-bitmap.c b/pack-bitmap.c index 60fe20fb87..ba36b9c6a0 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1430,3 +1430,53 @@ int bitmap_has_oid_in_uninteresting(struct bitmap_index *bitmap_git, return bitmap_git && bitmap_walk_contains(bitmap_git, bitmap_git->haves, oid); } + +off_t get_disk_usage_from_bitmap(struct bitmap_index *bitmap_git) +{ + struct bitmap *result = bitmap_git->result; + struct packed_git *pack = bitmap_git->pack; + struct eindex *eindex = &bitmap_git->ext_index; + struct object_info oi = OBJECT_INFO_INIT; + off_t object_size; + off_t total = 0; + size_t i; + + oi.disk_sizep = &object_size; + + for (i = 0; i < result->word_alloc; i++) { + eword_t word = result->words[i]; + size_t base = (i * BITS_IN_EWORD); + unsigned offset; + + for (offset = 0; offset < BITS_IN_EWORD; offset++) { + size_t pos; + + if ((word >> offset) == 0) + break; + + offset += ewah_bit_ctz64(word >> offset); + pos = base + offset; + + /* + * If it's in the pack, we can use the fast path + * and just check the revindex. Otherwise, we + * fall back to looking it up. + */ + if (pos < pack->num_objects) { + object_size = + pack_pos_to_offset(pack, pos + 1) - + pack_pos_to_offset(pack, pos); + } else { + struct object *obj; + obj = eindex->objects[pos - pack->num_objects]; + if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0) + die(_("unable to get disk usage of %s"), + oid_to_hex(&obj->oid)); + } + + total += object_size; + } + } + + return total; +} diff --git a/pack-bitmap.h b/pack-bitmap.h index 25dfcf5615..c8070606b7 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -68,6 +68,8 @@ int bitmap_walk_contains(struct bitmap_index *, */ int bitmap_has_oid_in_uninteresting(struct bitmap_index *, const struct object_id *oid); +off_t get_disk_usage_from_bitmap(struct bitmap_index *); + void bitmap_writer_show_progress(int show); void bitmap_writer_set_checksum(unsigned char *sha1); void bitmap_writer_build_type_index(struct packing_data *to_pack, diff --git a/t/t6114-rev-list-du.sh b/t/t6114-rev-list-du.sh new file mode 100755 index 0000000000..1fadbcaded --- /dev/null +++ b/t/t6114-rev-list-du.sh @@ -0,0 +1,51 @@ +#!/bin/sh + +test_description='basic tests of rev-list --disk-usage' +. ./test-lib.sh + +# we want a mix of reachable and unreachable, as well as +# objects in the bitmapped pack and some outside of it +test_expect_success 'set up repository' ' + test_commit --no-tag one && + test_commit --no-tag two && + git repack -adb && + git reset --hard HEAD^ && + test_commit --no-tag three && + test_commit --no-tag four && + git reset --hard HEAD^ +' + +# We don't want to hardcode sizes, because they depend on the exact details of +# packing, zlib, etc. We'll assume that the regular rev-list and cat-file +# machinery works and compare the --disk-usage output to that. +disk_usage_slow () { + git rev-list --objects "$@" | + cut -d' ' -f1 | + git cat-file --batch-check="%(objectsize:disk)" | + perl -lne '$total += $_; END { print $total}' +} + +# check behavior with given rev-list options; note that +# whitespace is not preserved in args +check_du () { + args=$* + + test_expect_success "generate expected size ($args)" " + disk_usage_slow $args >expect + " + + test_expect_success "rev-list --disk-usage without bitmaps ($args)" " + git rev-list --disk-usage $args >actual && + test_cmp expect actual + " + + test_expect_success "rev-list --disk-usage with bitmaps ($args)" " + git rev-list --disk-usage --use-bitmap-index $args >actual && + test_cmp expect actual + " +} + +check_du HEAD +check_du HEAD^..HEAD + +test_done -- 2.30.0.758.g9692d13bf2 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage 2021-01-27 22:17 ` [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage Jeff King @ 2021-01-27 22:57 ` Taylor Blau 2021-01-27 23:34 ` Jeff King 2021-01-27 23:01 ` Kyle Meyer 2021-01-27 23:07 ` Eric Sunshine 2 siblings, 1 reply; 30+ messages in thread From: Taylor Blau @ 2021-01-27 22:57 UTC (permalink / raw) To: Jeff King; +Cc: git, Taylor Blau On Wed, Jan 27, 2021 at 05:17:07PM -0500, Jeff King wrote: > It can sometimes be useful to see which refs are contributing to the > overall repository size (e.g., does some branch have a bunch of objects > not found elsewhere in history, which indicates that deleting it would > shrink the size of a clone). > > You can find that out by generating a list of objects, getting their > sizes from cat-file, and then summing them, like: > > git rev-list --objects main..branch > cut -d' ' -f1 | I suspect that this is from the original commit message that you wrote a half-decade ago. Not that it really means much, but you could shave one process off of this example by passing '--no-object-names' to 'git rev-list'. The whole point is that we can avoid having to do this, so I don't think it really matters, anyway. > [...] > then we're faster to generate the list of objects, but we still spend a > lot of time piping and looking things up. But if we do both together: > > [internal, bitmaps] > $ time git rev-list --disk-usage --all --use-bitmap-index > 1455691059 > real 0m0.235s > user 0m0.186s > sys 0m0.049s > > then we get the same answer much faster. Very nice. > This _could_ be made more flexible, but I didn't think it was worth the > complexity. Some obvious things one might want are: > > - not counting up all reachable objects (i.e., requiring --objects for > this output, and omitting it just counts up commits). This could be > handled in the bitmap case with some extra code (OR-ing with the > type bitmaps). > > But after 5 years of this patch, I've never wanted that once. The > disk usage of just some of the objects isn't really that useful (and > of course you can still get it by piping to cat-file). Yeah. I think it's trivial to support it, but I'm in favor of a simpler interface. That said, I worry about painting ourselves into a corner if the default implies --objects. If we wanted to change that, I'm pretty sure you'd have to write a rule that says "imply objects, unless --tags, --blobs or etc. are specified, and then only do that". Maybe we'll never have to address that, but it's worth thinking about before committing to implying '--objects'. > - an option to output the sizes of specific objects along with their > oids. But if you want to get to this level of flexibility, I think > you're better off just using cat-file (and if we are concerned about > the pipe costs, we should teach rev-list to understand cat-file's > custom formats). This I agree with completely. Any caller who wants that level of flexibility shouldn't mind the piping. I have no comments on the patch itself, which looks fine to me (and I have seen over and over again as it seems to regularly cause conflicts when merging new releases into GitHub's fork :-)). Thanks, Taylor ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage 2021-01-27 22:57 ` Taylor Blau @ 2021-01-27 23:34 ` Jeff King 0 siblings, 0 replies; 30+ messages in thread From: Jeff King @ 2021-01-27 23:34 UTC (permalink / raw) To: Taylor Blau; +Cc: git On Wed, Jan 27, 2021 at 05:57:21PM -0500, Taylor Blau wrote: > > You can find that out by generating a list of objects, getting their > > sizes from cat-file, and then summing them, like: > > > > git rev-list --objects main..branch > > cut -d' ' -f1 | > > I suspect that this is from the original commit message that you wrote a > half-decade ago. Not that it really means much, but you could shave one > process off of this example by passing '--no-object-names' to 'git > rev-list'. That, plus my muscle memory to do the cut. We should probably model the better form here, and use it in the test, though (not worth a re-roll on its own, but it looks like there are a few other minor bits). > > - not counting up all reachable objects (i.e., requiring --objects for > > this output, and omitting it just counts up commits). This could be > > handled in the bitmap case with some extra code (OR-ing with the > > type bitmaps). > > > > But after 5 years of this patch, I've never wanted that once. The > > disk usage of just some of the objects isn't really that useful (and > > of course you can still get it by piping to cat-file). > > Yeah. I think it's trivial to support it, but I'm in favor of a simpler > interface. > > That said, I worry about painting ourselves into a corner if the default > implies --objects. If we wanted to change that, I'm pretty sure you'd > have to write a rule that says "imply objects, unless --tags, --blobs or > etc. are specified, and then only do that". > > Maybe we'll never have to address that, but it's worth thinking about > before committing to implying '--objects'. Yeah, the one thing that gives me pause is that it would be hard to undo later. I didn't write the code to handle it in the bitmap case, but I don't think it would be _too_ bad. It is slightly annoying for the all-objects case, because the existing code isn't set up well to iterate either a specific type, or all types. > I have no comments on the patch itself, which looks fine to me (and I > have seen over and over again as it seems to regularly cause conflicts > when merging new releases into GitHub's fork :-)). You are exposing my ulterior motive. :) -Peff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage 2021-01-27 22:17 ` [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage Jeff King 2021-01-27 22:57 ` Taylor Blau @ 2021-01-27 23:01 ` Kyle Meyer 2021-01-27 23:36 ` Jeff King 2021-01-27 23:07 ` Eric Sunshine 2 siblings, 1 reply; 30+ messages in thread From: Kyle Meyer @ 2021-01-27 23:01 UTC (permalink / raw) To: Jeff King; +Cc: Taylor Blau, git Jeff King writes: > diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt > index 002379056a..1e5826f26d 100644 > --- a/Documentation/rev-list-options.txt > +++ b/Documentation/rev-list-options.txt > @@ -222,6 +222,15 @@ ifdef::git-rev-list[] > test the exit status to see if a range of objects is fully > connected (or not). It is faster than redirecting stdout > to `/dev/null` as the output does not have to be formatted. > + > +--disk-usage:: > + Suppress normal output; instead, print the sum of the bytes used > + for on-disk storage by the selected objects. This is equivalent > + to piping the output of `rev-list --objects` into > + `git cat-file --batch-check='%(objectsize:disk)', except that it [ Just a drive-by typo comment from a reader not knowledgeable enough to review the code change :) ] The cat-file command is missing its closing quote. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage 2021-01-27 23:01 ` Kyle Meyer @ 2021-01-27 23:36 ` Jeff King 0 siblings, 0 replies; 30+ messages in thread From: Jeff King @ 2021-01-27 23:36 UTC (permalink / raw) To: Kyle Meyer; +Cc: Taylor Blau, git On Wed, Jan 27, 2021 at 06:01:51PM -0500, Kyle Meyer wrote: > > +--disk-usage:: > > + Suppress normal output; instead, print the sum of the bytes used > > + for on-disk storage by the selected objects. This is equivalent > > + to piping the output of `rev-list --objects` into > > + `git cat-file --batch-check='%(objectsize:disk)', except that it > > [ Just a drive-by typo comment from a reader not knowledgeable enough to > review the code change :) ] > > The cat-file command is missing its closing quote. Thanks for catching that. I should have looked at the output of doc-diff, which does reveal it. -Peff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage 2021-01-27 22:17 ` [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage Jeff King 2021-01-27 22:57 ` Taylor Blau 2021-01-27 23:01 ` Kyle Meyer @ 2021-01-27 23:07 ` Eric Sunshine 2021-01-27 23:39 ` Jeff King 2 siblings, 1 reply; 30+ messages in thread From: Eric Sunshine @ 2021-01-27 23:07 UTC (permalink / raw) To: Jeff King; +Cc: Git List, Taylor Blau On Wed, Jan 27, 2021 at 5:20 PM Jeff King <peff@peff.net> wrote: > This patch implements a --disk-usage option which produces the same > answer in a fraction of the time. Here are some timings using a clone of > torvalds/linux: > > [rev-list piped to cat-file, no bitmaps] > $ time git rev-list --objects --all | > cut -d' ' -f1 | > git cat-file --buffer --batch-check='%(objectsize:disk)' | > perl -lne '$total += $_; END { print $total }' > 1455691059 > real 0m34.336s > user 0m46.533s > sys 0m2.953s This example shows the computed size (1455691059)... > But the real win is with bitmaps. If we use them without the new option: > > [rev-list piped to cat-file, bitmaps] > $ time git rev-list --objects --all --use-bitmap-index | > cut -d' ' -f1 | > git cat-file --batch-check='%(objectsize:disk)' | > perl -lne '$total += $_; END { print $total }' > real 0m9.954s > user 0m11.234s > sys 0m8.522s ...however, this example does not (but all the others do). Simple copy/paste error? Not worth a re-roll, of course. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage 2021-01-27 23:07 ` Eric Sunshine @ 2021-01-27 23:39 ` Jeff King 0 siblings, 0 replies; 30+ messages in thread From: Jeff King @ 2021-01-27 23:39 UTC (permalink / raw) To: Eric Sunshine; +Cc: Git List, Taylor Blau On Wed, Jan 27, 2021 at 06:07:57PM -0500, Eric Sunshine wrote: > This example shows the computed size (1455691059)... > [...] > ...however, this example does not (but all the others do). Simple > copy/paste error? Yep, thanks for catching. (Of course I have since repacked my linux.git, so now it produces a different answer! It does match the current value of the other techniques, though). > Not worth a re-roll, of course. Agreed, but it looks like there are a few other minor bits, so I'll definitely fix it up at the same time. I'll give a little more time before re-rolling in case there are any other comments. -Peff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/2] rev-list --disk-usage 2021-01-27 22:11 [PATCH 0/2] rev-list --disk-usage Jeff King 2021-01-27 22:12 ` [PATCH 1/2] t: add --no-tag option to test_commit Jeff King 2021-01-27 22:17 ` [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage Jeff King @ 2021-01-27 22:46 ` Taylor Blau 2021-02-09 10:52 ` [PATCH v2] " Jeff King 3 siblings, 0 replies; 30+ messages in thread From: Taylor Blau @ 2021-01-27 22:46 UTC (permalink / raw) To: Jeff King; +Cc: git, Taylor Blau On Wed, Jan 27, 2021 at 05:11:36PM -0500, Jeff King wrote: > The first patch is just a test-script enhancement to let test_commit > avoid creating tags. During some recent refactoring, we actually broke > the --disk-usage feature but the test script didn't catch it because the > tags were being picked up by "--all". Since this is at least the third > time I've run into that in our test suite, I thought I'd make it a > little more convenient to avoid. :) I appreciate the non-incriminating "we", but the person who caused the regression was most certainly me ;-). This happened while cherry-picking Junio's recent merge of tb/revindex-api, which obviously did not cause a merge conflict with this new caller. The remaining details are boring, but they definitely weren't Peff's fault :-). Thanks, Taylor ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2] rev-list --disk-usage 2021-01-27 22:11 [PATCH 0/2] rev-list --disk-usage Jeff King ` (2 preceding siblings ...) 2021-01-27 22:46 ` [PATCH 0/2] rev-list --disk-usage Taylor Blau @ 2021-02-09 10:52 ` Jeff King 2021-02-09 10:52 ` [PATCH v2 1/2] t: add --no-tag option to test_commit Jeff King ` (3 more replies) 3 siblings, 4 replies; 30+ messages in thread From: Jeff King @ 2021-02-09 10:52 UTC (permalink / raw) To: git; +Cc: Kyle Meyer, Eric Sunshine, Taylor Blau Here's a re-roll of my series to add "rev-list --disk-usage", for counting up object storage used for various slices of history. This fixes the minor bits mentioned in review for v1, but the big change is that "--disk-usage" no longer implies "--objects". I think you generally would want to use it with that option, but it really seemed to violate the principle of least surprise for the user. That requires handling each object type independently, but the code for that turned out to be not too bad (and is modeled after the similar logic in traverse_bitmap_commit_list()). I was slightly concerned that it would slow things down to walk over the bitmap multiple times, but it doesn't seem to make much of a difference in practice. There's a range-diff below, but it's not really worth looking at. All of the interesting parts were rewritten completely, so you're better off to just read patch 2 again (and patch 1 did not change at all). [1/2]: t: add --no-tag option to test_commit [2/2]: rev-list: add --disk-usage option for calculating disk usage Documentation/rev-list-options.txt | 9 ++++ builtin/rev-list.c | 46 +++++++++++++++++ pack-bitmap.c | 81 ++++++++++++++++++++++++++++++ pack-bitmap.h | 2 + t/t4208-log-magic-pathspec.sh | 9 +--- t/t6114-rev-list-du.sh | 51 +++++++++++++++++++ t/test-lib-functions.sh | 9 +++- 7 files changed, 199 insertions(+), 8 deletions(-) create mode 100755 t/t6114-rev-list-du.sh 1: 20f8edeff1 = 1: 6365cd94bd t: add --no-tag option to test_commit 2: 64e28cb6c9 ! 2: 8a93583dee rev-list: add --disk-usage option for calculating disk usage @@ Commit message You can find that out by generating a list of objects, getting their sizes from cat-file, and then summing them, like: - git rev-list --objects main..branch - cut -d' ' -f1 | + git rev-list --objects --no-object-names main..branch git cat-file --batch-check='%(objectsize:disk)' | perl -lne '$total += $_; END { print $total }' @@ Commit message torvalds/linux: [rev-list piped to cat-file, no bitmaps] - $ time git rev-list --objects --all | - cut -d' ' -f1 | + $ time git rev-list --objects --no-object-names --all | git cat-file --buffer --batch-check='%(objectsize:disk)' | perl -lne '$total += $_; END { print $total }' - 1455691059 - real 0m34.336s - user 0m46.533s - sys 0m2.953s + 1459938510 + real 0m29.635s + user 0m38.003s + sys 0m1.093s [internal, no bitmaps] - $ time git rev-list --disk-usage --all - 1455691059 - real 0m32.662s - user 0m32.306s - sys 0m0.353s + $ time git rev-list --disk-usage --objects --all + 1459938510 + real 0m31.262s + user 0m30.885s + sys 0m0.376s - The wall-clock times aren't that different because of parallelism, but - notice the CPU savings between the two. We saved 35% of the CPU just by + Even though the wall-clock time is slightly worse due to parallelism, + notice the CPU savings between the two. We saved 21% of the CPU just by avoiding the pipes. But the real win is with bitmaps. If we use them without the new option: [rev-list piped to cat-file, bitmaps] - $ time git rev-list --objects --all --use-bitmap-index | - cut -d' ' -f1 | + $ time git rev-list --objects --no-object-names --all --use-bitmap-index | git cat-file --batch-check='%(objectsize:disk)' | perl -lne '$total += $_; END { print $total }' - real 0m9.954s - user 0m11.234s - sys 0m8.522s + 1459938510 + real 0m6.244s + user 0m8.452s + sys 0m0.311s then we're faster to generate the list of objects, but we still spend a lot of time piping and looking things up. But if we do both together: [internal, bitmaps] - $ time git rev-list --disk-usage --all --use-bitmap-index - 1455691059 - real 0m0.235s - user 0m0.186s + $ time git rev-list --disk-usage --objects --all --use-bitmap-index + 1459938510 + real 0m0.219s + user 0m0.169s sys 0m0.049s then we get the same answer much faster. @@ Commit message of course. But we're actually checking reachability here, so we're still fast when we ask for more interesting things: - $ time git rev-list --disk-usage --all --use-bitmap-index v5.0..v5.10 + $ time git rev-list --disk-usage --use-bitmap-index v5.0..v5.10 374798628 real 0m0.429s user 0m0.356s @@ Documentation/rev-list-options.txt: ifdef::git-rev-list[] + +--disk-usage:: + Suppress normal output; instead, print the sum of the bytes used -+ for on-disk storage by the selected objects. This is equivalent -+ to piping the output of `rev-list --objects` into -+ `git cat-file --batch-check='%(objectsize:disk)', except that it -+ runs much faster (especially with `--use-bitmap-index`). See the -+ `CAVEATS` section in linkgit:git-cat-file[1] for the limitations -+ of what "on-disk storage" means. ++ for on-disk storage by the selected commits or objects. This is ++ equivalent to piping the output into `git cat-file ++ --batch-check='%(objectsize:disk)'`, except that it runs much ++ faster (especially with `--use-bitmap-index`). See the `CAVEATS` ++ section in linkgit:git-cat-file[1] for the limitations of what ++ "on-disk storage" means. endif::git-rev-list[] --cherry-mark:: @@ builtin/rev-list.c: static int try_bitmap_traversal(struct rev_info *revs, + return -1; + + printf("%"PRIuMAX"\n", -+ (uintmax_t)get_disk_usage_from_bitmap(bitmap_git)); ++ (uintmax_t)get_disk_usage_from_bitmap(bitmap_git, revs)); + return 0; +} + @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr + if (!strcmp(arg, "--disk-usage")) { + show_disk_usage = 1; -+ revs.tag_objects = 1; -+ revs.tree_objects = 1; -+ revs.blob_objects = 1; + info.flags |= REV_LIST_QUIET; + continue; + } @@ pack-bitmap.c: int bitmap_has_oid_in_uninteresting(struct bitmap_index *bitmap_g bitmap_walk_contains(bitmap_git, bitmap_git->haves, oid); } + -+off_t get_disk_usage_from_bitmap(struct bitmap_index *bitmap_git) ++static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git, ++ enum object_type object_type) +{ + struct bitmap *result = bitmap_git->result; + struct packed_git *pack = bitmap_git->pack; -+ struct eindex *eindex = &bitmap_git->ext_index; -+ struct object_info oi = OBJECT_INFO_INIT; -+ off_t object_size; + off_t total = 0; ++ struct ewah_iterator it; ++ eword_t filter; + size_t i; + -+ oi.disk_sizep = &object_size; -+ -+ for (i = 0; i < result->word_alloc; i++) { -+ eword_t word = result->words[i]; ++ init_type_iterator(&it, bitmap_git, object_type); ++ for (i = 0; i < result->word_alloc && ++ ewah_iterator_next(&filter, &it); i++) { ++ eword_t word = result->words[i] & filter; + size_t base = (i * BITS_IN_EWORD); + unsigned offset; + ++ if (!word) ++ continue; ++ + for (offset = 0; offset < BITS_IN_EWORD; offset++) { + size_t pos; + @@ pack-bitmap.c: int bitmap_has_oid_in_uninteresting(struct bitmap_index *bitmap_g + + offset += ewah_bit_ctz64(word >> offset); + pos = base + offset; -+ -+ /* -+ * If it's in the pack, we can use the fast path -+ * and just check the revindex. Otherwise, we -+ * fall back to looking it up. -+ */ -+ if (pos < pack->num_objects) { -+ object_size = -+ pack_pos_to_offset(pack, pos + 1) - -+ pack_pos_to_offset(pack, pos); -+ } else { -+ struct object *obj; -+ obj = eindex->objects[pos - pack->num_objects]; -+ if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0) -+ die(_("unable to get disk usage of %s"), -+ oid_to_hex(&obj->oid)); -+ } -+ -+ total += object_size; ++ total += pack_pos_to_offset(pack, pos + 1) - ++ pack_pos_to_offset(pack, pos); + } + } + + return total; ++} ++ ++static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git) ++{ ++ struct bitmap *result = bitmap_git->result; ++ struct packed_git *pack = bitmap_git->pack; ++ struct eindex *eindex = &bitmap_git->ext_index; ++ off_t total = 0; ++ struct object_info oi = OBJECT_INFO_INIT; ++ off_t object_size; ++ size_t i; ++ ++ oi.disk_sizep = &object_size; ++ ++ for (i = 0; i < eindex->count; i++) { ++ struct object *obj = eindex->objects[i]; ++ ++ if (!bitmap_get(result, pack->num_objects + i)) ++ continue; ++ ++ if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0) ++ die(_("unable to get disk usage of %s"), ++ oid_to_hex(&obj->oid)); ++ ++ total += object_size; ++ } ++ return total; ++} ++ ++off_t get_disk_usage_from_bitmap(struct bitmap_index *bitmap_git, ++ struct rev_info *revs) ++{ ++ off_t total = 0; ++ ++ total += get_disk_usage_for_type(bitmap_git, OBJ_COMMIT); ++ if (revs->tree_objects) ++ total += get_disk_usage_for_type(bitmap_git, OBJ_TREE); ++ if (revs->blob_objects) ++ total += get_disk_usage_for_type(bitmap_git, OBJ_BLOB); ++ if (revs->tag_objects) ++ total += get_disk_usage_for_type(bitmap_git, OBJ_TAG); ++ ++ total += get_disk_usage_for_extended(bitmap_git); ++ ++ return total; +} ## pack-bitmap.h ## @@ pack-bitmap.h: int bitmap_walk_contains(struct bitmap_index *, */ int bitmap_has_oid_in_uninteresting(struct bitmap_index *, const struct object_id *oid); -+off_t get_disk_usage_from_bitmap(struct bitmap_index *); ++off_t get_disk_usage_from_bitmap(struct bitmap_index *, struct rev_info *); + void bitmap_writer_show_progress(int show); void bitmap_writer_set_checksum(unsigned char *sha1); @@ t/t6114-rev-list-du.sh (new) +# packing, zlib, etc. We'll assume that the regular rev-list and cat-file +# machinery works and compare the --disk-usage output to that. +disk_usage_slow () { -+ git rev-list --objects "$@" | -+ cut -d' ' -f1 | ++ git rev-list --no-object-names "$@" | + git cat-file --batch-check="%(objectsize:disk)" | + perl -lne '$total += $_; END { print $total}' +} @@ t/t6114-rev-list-du.sh (new) +} + +check_du HEAD -+check_du HEAD^..HEAD ++check_du --objects HEAD ++check_du --objects HEAD^..HEAD + +test_done ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 1/2] t: add --no-tag option to test_commit 2021-02-09 10:52 ` [PATCH v2] " Jeff King @ 2021-02-09 10:52 ` Jeff King 2021-02-09 10:53 ` [PATCH v2 2/2] rev-list: add --disk-usage option for calculating disk usage Jeff King ` (2 subsequent siblings) 3 siblings, 0 replies; 30+ messages in thread From: Jeff King @ 2021-02-09 10:52 UTC (permalink / raw) To: git; +Cc: Kyle Meyer, Eric Sunshine, Taylor Blau One of the conveniences that test_commit offers is making a tag for each commit. This makes it easy to refer to the commits in subsequent commands. But it can also be a pain if you care about reachability, because those tags keep the commits reachable even if they are rewound from the branch they're made on. The alternative is that scripts have to call test_tick, git-add, and git-commit themselves. Let's add a --no-tag option to give them the one-liner convenience of using test_commit. This is in preparation for the next patch, which will add some more calls. But I cleaned up an existing site to show off the feature. There are probably more cleanups possible. Signed-off-by: Jeff King <peff@peff.net> --- t/t4208-log-magic-pathspec.sh | 9 ++------- t/test-lib-functions.sh | 9 ++++++++- 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/t/t4208-log-magic-pathspec.sh b/t/t4208-log-magic-pathspec.sh index 5e10136e9a..7f0c1dcc0f 100755 --- a/t/t4208-log-magic-pathspec.sh +++ b/t/t4208-log-magic-pathspec.sh @@ -31,13 +31,8 @@ test_expect_success '"git log :/a -- " should not be ambiguous' ' test_expect_success '"git log :/detached -- " should find a commit only in HEAD' ' test_when_finished "git checkout main" && git checkout --detach && - # Must manually call `test_tick` instead of using `test_commit`, - # because the latter additionally creates a tag, which would make - # the commit reachable not only via HEAD. - test_tick && - git commit --allow-empty -m detached && - test_tick && - git commit --allow-empty -m something-else && + test_commit --no-tag detached && + test_commit --no-tag something-else && git log :/detached -- ' diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 6bca002316..1587241ba0 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -202,6 +202,7 @@ test_commit () { author= && signoff= && indir= && + no_tag= && while test $# != 0 do case "$1" in @@ -222,6 +223,9 @@ test_commit () { indir="$2" shift ;; + --no-tag) + no_tag=yes + ;; *) break ;; @@ -244,7 +248,10 @@ test_commit () { git ${indir:+ -C "$indir"} commit \ ${author:+ --author "$author"} \ $signoff -m "$1" && - git ${indir:+ -C "$indir"} tag "${4:-$1}" + if test -z "$no_tag" + then + git ${indir:+ -C "$indir"} tag "${4:-$1}" + fi } # Call test_merge with the arguments "<message> <commit>", where <commit> -- 2.30.1.887.ge7d57fcab0 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v2 2/2] rev-list: add --disk-usage option for calculating disk usage 2021-02-09 10:52 ` [PATCH v2] " Jeff King 2021-02-09 10:52 ` [PATCH v2 1/2] t: add --no-tag option to test_commit Jeff King @ 2021-02-09 10:53 ` Jeff King 2021-02-09 11:09 ` [PATCH v2] rev-list --disk-usage Jeff King 2021-02-10 0:44 ` Junio C Hamano 3 siblings, 0 replies; 30+ messages in thread From: Jeff King @ 2021-02-09 10:53 UTC (permalink / raw) To: git; +Cc: Kyle Meyer, Eric Sunshine, Taylor Blau It can sometimes be useful to see which refs are contributing to the overall repository size (e.g., does some branch have a bunch of objects not found elsewhere in history, which indicates that deleting it would shrink the size of a clone). You can find that out by generating a list of objects, getting their sizes from cat-file, and then summing them, like: git rev-list --objects --no-object-names main..branch git cat-file --batch-check='%(objectsize:disk)' | perl -lne '$total += $_; END { print $total }' Though note that the caveats from git-cat-file(1) apply here. We "blame" base objects more than their deltas, even though the relationship could easily be flipped. Still, it can be a useful rough measure. But one problem is that it's slow to run. Teaching rev-list to sum up the sizes can be much faster for two reasons: 1. It skips all of the piping of object names and sizes. 2. If bitmaps are in use, for objects that are in the bitmapped packfile we can skip the oid_object_info() lookup entirely, and just ask the revindex for the on-disk size. This patch implements a --disk-usage option which produces the same answer in a fraction of the time. Here are some timings using a clone of torvalds/linux: [rev-list piped to cat-file, no bitmaps] $ time git rev-list --objects --no-object-names --all | git cat-file --buffer --batch-check='%(objectsize:disk)' | perl -lne '$total += $_; END { print $total }' 1459938510 real 0m29.635s user 0m38.003s sys 0m1.093s [internal, no bitmaps] $ time git rev-list --disk-usage --objects --all 1459938510 real 0m31.262s user 0m30.885s sys 0m0.376s Even though the wall-clock time is slightly worse due to parallelism, notice the CPU savings between the two. We saved 21% of the CPU just by avoiding the pipes. But the real win is with bitmaps. If we use them without the new option: [rev-list piped to cat-file, bitmaps] $ time git rev-list --objects --no-object-names --all --use-bitmap-index | git cat-file --batch-check='%(objectsize:disk)' | perl -lne '$total += $_; END { print $total }' 1459938510 real 0m6.244s user 0m8.452s sys 0m0.311s then we're faster to generate the list of objects, but we still spend a lot of time piping and looking things up. But if we do both together: [internal, bitmaps] $ time git rev-list --disk-usage --objects --all --use-bitmap-index 1459938510 real 0m0.219s user 0m0.169s sys 0m0.049s then we get the same answer much faster. For "--all", that answer will correspond closely to "du objects/pack", of course. But we're actually checking reachability here, so we're still fast when we ask for more interesting things: $ time git rev-list --disk-usage --use-bitmap-index v5.0..v5.10 374798628 real 0m0.429s user 0m0.356s sys 0m0.072s Signed-off-by: Jeff King <peff@peff.net> --- Documentation/rev-list-options.txt | 9 ++++ builtin/rev-list.c | 46 +++++++++++++++++ pack-bitmap.c | 81 ++++++++++++++++++++++++++++++ pack-bitmap.h | 2 + t/t6114-rev-list-du.sh | 51 +++++++++++++++++++ 5 files changed, 189 insertions(+) create mode 100755 t/t6114-rev-list-du.sh diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 96cc89d157..1238bfd915 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -227,6 +227,15 @@ ifdef::git-rev-list[] test the exit status to see if a range of objects is fully connected (or not). It is faster than redirecting stdout to `/dev/null` as the output does not have to be formatted. + +--disk-usage:: + Suppress normal output; instead, print the sum of the bytes used + for on-disk storage by the selected commits or objects. This is + equivalent to piping the output into `git cat-file + --batch-check='%(objectsize:disk)'`, except that it runs much + faster (especially with `--use-bitmap-index`). See the `CAVEATS` + section in linkgit:git-cat-file[1] for the limitations of what + "on-disk storage" means. endif::git-rev-list[] --cherry-mark:: diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 25c6c3b38d..b4d8ea0a35 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -80,6 +80,19 @@ static int arg_show_object_names = 1; #define DEFAULT_OIDSET_SIZE (16*1024) +static int show_disk_usage; +static off_t total_disk_usage; + +static off_t get_object_disk_usage(struct object *obj) +{ + off_t size; + struct object_info oi = OBJECT_INFO_INIT; + oi.disk_sizep = &size; + if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0) + die(_("unable to get disk usage of %s"), oid_to_hex(&obj->oid)); + return size; +} + static void finish_commit(struct commit *commit); static void show_commit(struct commit *commit, void *data) { @@ -88,6 +101,9 @@ static void show_commit(struct commit *commit, void *data) display_progress(progress, ++progress_counter); + if (show_disk_usage) + total_disk_usage += get_object_disk_usage(&commit->object); + if (info->flags & REV_LIST_QUIET) { finish_commit(commit); return; @@ -258,6 +274,8 @@ static void show_object(struct object *obj, const char *name, void *cb_data) if (finish_object(obj, name, cb_data)) return; display_progress(progress, ++progress_counter); + if (show_disk_usage) + total_disk_usage += get_object_disk_usage(obj); if (info->flags & REV_LIST_QUIET) return; @@ -452,6 +470,23 @@ static int try_bitmap_traversal(struct rev_info *revs, return 0; } +static int try_bitmap_disk_usage(struct rev_info *revs, + struct list_objects_filter_options *filter) +{ + struct bitmap_index *bitmap_git; + + if (!show_disk_usage) + return -1; + + bitmap_git = prepare_bitmap_walk(revs, filter); + if (!bitmap_git) + return -1; + + printf("%"PRIuMAX"\n", + (uintmax_t)get_disk_usage_from_bitmap(bitmap_git, revs)); + return 0; +} + int cmd_rev_list(int argc, const char **argv, const char *prefix) { struct rev_info revs; @@ -584,6 +619,12 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) continue; } + if (!strcmp(arg, "--disk-usage")) { + show_disk_usage = 1; + info.flags |= REV_LIST_QUIET; + continue; + } + usage(rev_list_usage); } @@ -626,6 +667,8 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) if (use_bitmap_index) { if (!try_bitmap_count(&revs, &filter_options)) return 0; + if (!try_bitmap_disk_usage(&revs, &filter_options)) + return 0; if (!try_bitmap_traversal(&revs, &filter_options)) return 0; } @@ -690,5 +733,8 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) printf("%d\n", revs.count_left + revs.count_right); } + if (show_disk_usage) + printf("%"PRIuMAX"\n", (uintmax_t)total_disk_usage); + return 0; } diff --git a/pack-bitmap.c b/pack-bitmap.c index 60fe20fb87..1f69b5fa85 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1430,3 +1430,84 @@ int bitmap_has_oid_in_uninteresting(struct bitmap_index *bitmap_git, return bitmap_git && bitmap_walk_contains(bitmap_git, bitmap_git->haves, oid); } + +static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git, + enum object_type object_type) +{ + struct bitmap *result = bitmap_git->result; + struct packed_git *pack = bitmap_git->pack; + off_t total = 0; + struct ewah_iterator it; + eword_t filter; + size_t i; + + init_type_iterator(&it, bitmap_git, object_type); + for (i = 0; i < result->word_alloc && + ewah_iterator_next(&filter, &it); i++) { + eword_t word = result->words[i] & filter; + size_t base = (i * BITS_IN_EWORD); + unsigned offset; + + if (!word) + continue; + + for (offset = 0; offset < BITS_IN_EWORD; offset++) { + size_t pos; + + if ((word >> offset) == 0) + break; + + offset += ewah_bit_ctz64(word >> offset); + pos = base + offset; + total += pack_pos_to_offset(pack, pos + 1) - + pack_pos_to_offset(pack, pos); + } + } + + return total; +} + +static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git) +{ + struct bitmap *result = bitmap_git->result; + struct packed_git *pack = bitmap_git->pack; + struct eindex *eindex = &bitmap_git->ext_index; + off_t total = 0; + struct object_info oi = OBJECT_INFO_INIT; + off_t object_size; + size_t i; + + oi.disk_sizep = &object_size; + + for (i = 0; i < eindex->count; i++) { + struct object *obj = eindex->objects[i]; + + if (!bitmap_get(result, pack->num_objects + i)) + continue; + + if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0) + die(_("unable to get disk usage of %s"), + oid_to_hex(&obj->oid)); + + total += object_size; + } + return total; +} + +off_t get_disk_usage_from_bitmap(struct bitmap_index *bitmap_git, + struct rev_info *revs) +{ + off_t total = 0; + + total += get_disk_usage_for_type(bitmap_git, OBJ_COMMIT); + if (revs->tree_objects) + total += get_disk_usage_for_type(bitmap_git, OBJ_TREE); + if (revs->blob_objects) + total += get_disk_usage_for_type(bitmap_git, OBJ_BLOB); + if (revs->tag_objects) + total += get_disk_usage_for_type(bitmap_git, OBJ_TAG); + + total += get_disk_usage_for_extended(bitmap_git); + + return total; +} diff --git a/pack-bitmap.h b/pack-bitmap.h index 25dfcf5615..36d99930d8 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -68,6 +68,8 @@ int bitmap_walk_contains(struct bitmap_index *, */ int bitmap_has_oid_in_uninteresting(struct bitmap_index *, const struct object_id *oid); +off_t get_disk_usage_from_bitmap(struct bitmap_index *, struct rev_info *); + void bitmap_writer_show_progress(int show); void bitmap_writer_set_checksum(unsigned char *sha1); void bitmap_writer_build_type_index(struct packing_data *to_pack, diff --git a/t/t6114-rev-list-du.sh b/t/t6114-rev-list-du.sh new file mode 100755 index 0000000000..b4aef32b71 --- /dev/null +++ b/t/t6114-rev-list-du.sh @@ -0,0 +1,51 @@ +#!/bin/sh + +test_description='basic tests of rev-list --disk-usage' +. ./test-lib.sh + +# we want a mix of reachable and unreachable, as well as +# objects in the bitmapped pack and some outside of it +test_expect_success 'set up repository' ' + test_commit --no-tag one && + test_commit --no-tag two && + git repack -adb && + git reset --hard HEAD^ && + test_commit --no-tag three && + test_commit --no-tag four && + git reset --hard HEAD^ +' + +# We don't want to hardcode sizes, because they depend on the exact details of +# packing, zlib, etc. We'll assume that the regular rev-list and cat-file +# machinery works and compare the --disk-usage output to that. +disk_usage_slow () { + git rev-list --no-object-names "$@" | + git cat-file --batch-check="%(objectsize:disk)" | + perl -lne '$total += $_; END { print $total}' +} + +# check behavior with given rev-list options; note that +# whitespace is not preserved in args +check_du () { + args=$* + + test_expect_success "generate expected size ($args)" " + disk_usage_slow $args >expect + " + + test_expect_success "rev-list --disk-usage without bitmaps ($args)" " + git rev-list --disk-usage $args >actual && + test_cmp expect actual + " + + test_expect_success "rev-list --disk-usage with bitmaps ($args)" " + git rev-list --disk-usage --use-bitmap-index $args >actual && + test_cmp expect actual + " +} + +check_du HEAD +check_du --objects HEAD +check_du --objects HEAD^..HEAD + +test_done -- 2.30.1.887.ge7d57fcab0 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v2] rev-list --disk-usage 2021-02-09 10:52 ` [PATCH v2] " Jeff King 2021-02-09 10:52 ` [PATCH v2 1/2] t: add --no-tag option to test_commit Jeff King 2021-02-09 10:53 ` [PATCH v2 2/2] rev-list: add --disk-usage option for calculating disk usage Jeff King @ 2021-02-09 11:09 ` Jeff King 2021-02-09 21:14 ` Junio C Hamano 2021-02-10 0:44 ` Junio C Hamano 3 siblings, 1 reply; 30+ messages in thread From: Jeff King @ 2021-02-09 11:09 UTC (permalink / raw) To: git; +Cc: Kyle Meyer, Eric Sunshine, Taylor Blau On Tue, Feb 09, 2021 at 05:52:28AM -0500, Jeff King wrote: > This fixes the minor bits mentioned in review for v1, but the big change > is that "--disk-usage" no longer implies "--objects". I think you > generally would want to use it with that option, but it really seemed to > violate the principle of least surprise for the user. > > That requires handling each object type independently, but the code for > that turned out to be not too bad (and is modeled after the similar > logic in traverse_bitmap_commit_list()). I was slightly concerned that > it would slow things down to walk over the bitmap multiple times, but it > doesn't seem to make much of a difference in practice. You might reasonably ask whether we could just directly use traverse_bitmap_commit_list(), since after all it takes a callback. And indeed, doing so reduces the size of the code (see the patch below). But it's shockingly slower! It takes consistently 2-3x longer to produce the same answer on linux.git with bitmaps. The problem is that we give more information to the callback than the disk-usage computation needs. In particular, finding nth_packed_object_id() is a big killer. Which kind of makes sense. We memcpy() the oids out of the .idx file into a "struct object_id" on the stack. And linux.git has ~200MB of oids to copy (and I'm sure doing it 20 bytes at a time isn't quite optimal). That adds several hundred milliseconds. Not a lot in absolute terms, but we're able to do the whole computation in ~200ms to start with, so it's relatively a big change. This could be solved by having a more "bare" callback that just passes the pack position, and not the oid (and then the callback is responsible for looking it up if they care). But it gets pretty awkward when we have to complete the bitmap traversal with non-bitmap objects (for those we _do_ have an oid to pass, but no pack position). I think the implementation in my 2/2 isn't so bad in comparison (and we can always swap it out later; these are all just implementation details). I did find it a bit interesting, though. When we moved to "struct object_id" and started copying bits out with nth_packed_object_id(), rather than just pointing to the mmap'd .idx bytes, we wondered whether there would be any measurable difference. Likewise when we extended it to handle the oid size changing at runtime. At the time, I wasn't able to measure any impact for real operations, but I guess we just needed a case that highlighted it more. I don't know that it's really worth digging into that much, though it's quite possible there may be some easy wins by optimizing those memcpy calls. E.g., I'm not sure if the compiler ends up inlining them or not. If it doesn't realize that the_hash_algo->rawsz is only ever "20" or "32", we could perhaps help it along with specialized versions of hashcpy(). If somebody does want to play with it, this patch may make a good testbed. :) -- >8 -- builtin/pack-objects.c | 3 +- builtin/rev-list.c | 40 +++++++++++------------ pack-bitmap.c | 86 ++------------------------------------------------ pack-bitmap.h | 1 + reachable.c | 1 + 5 files changed, 25 insertions(+), 106 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 13cde5896a..33f7d19eb3 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1388,7 +1388,8 @@ static int add_object_entry(const struct object_id *oid, enum object_type type, static int add_object_entry_from_bitmap(const struct object_id *oid, enum object_type type, int flags, uint32_t name_hash, - struct packed_git *pack, off_t offset) + struct packed_git *pack, + uint32_t pack_pos, off_t offset) { display_progress(progress_state, ++nr_seen); diff --git a/builtin/rev-list.c b/builtin/rev-list.c index b4d8ea0a35..cc96b4c854 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -83,13 +83,13 @@ static int arg_show_object_names = 1; static int show_disk_usage; static off_t total_disk_usage; -static off_t get_object_disk_usage(struct object *obj) +static off_t get_object_disk_usage(const struct object_id *oid) { off_t size; struct object_info oi = OBJECT_INFO_INIT; oi.disk_sizep = &size; - if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0) - die(_("unable to get disk usage of %s"), oid_to_hex(&obj->oid)); + if (oid_object_info_extended(the_repository, oid, &oi, 0) < 0) + die(_("unable to get disk usage of %s"), oid_to_hex(oid)); return size; } @@ -102,7 +102,7 @@ static void show_commit(struct commit *commit, void *data) display_progress(progress, ++progress_counter); if (show_disk_usage) - total_disk_usage += get_object_disk_usage(&commit->object); + total_disk_usage += get_object_disk_usage(&commit->object.oid); if (info->flags & REV_LIST_QUIET) { finish_commit(commit); @@ -275,7 +275,7 @@ static void show_object(struct object *obj, const char *name, void *cb_data) return; display_progress(progress, ++progress_counter); if (show_disk_usage) - total_disk_usage += get_object_disk_usage(obj); + total_disk_usage += get_object_disk_usage(&obj->oid); if (info->flags & REV_LIST_QUIET) return; @@ -363,8 +363,19 @@ static int show_object_fast( int exclude, uint32_t name_hash, struct packed_git *found_pack, + uint32_t pack_pos, off_t found_offset) { + if (show_disk_usage) { + if (found_pack) { + total_disk_usage += + pack_pos_to_offset(found_pack, pack_pos + 1) - + found_offset; + } else { + total_disk_usage += get_object_disk_usage(oid); + } + return 1; + } fprintf(stdout, "%s\n", oid_to_hex(oid)); return 1; } @@ -467,23 +478,10 @@ static int try_bitmap_traversal(struct rev_info *revs, traverse_bitmap_commit_list(bitmap_git, revs, &show_object_fast); free_bitmap_index(bitmap_git); - return 0; -} - -static int try_bitmap_disk_usage(struct rev_info *revs, - struct list_objects_filter_options *filter) -{ - struct bitmap_index *bitmap_git; - if (!show_disk_usage) - return -1; - - bitmap_git = prepare_bitmap_walk(revs, filter); - if (!bitmap_git) - return -1; + if (show_disk_usage) + printf("%"PRIuMAX"\n", (uintmax_t)total_disk_usage); - printf("%"PRIuMAX"\n", - (uintmax_t)get_disk_usage_from_bitmap(bitmap_git, revs)); return 0; } @@ -667,8 +665,6 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) if (use_bitmap_index) { if (!try_bitmap_count(&revs, &filter_options)) return 0; - if (!try_bitmap_disk_usage(&revs, &filter_options)) - return 0; if (!try_bitmap_traversal(&revs, &filter_options)) return 0; } diff --git a/pack-bitmap.c b/pack-bitmap.c index 1f69b5fa85..f118a365e1 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -655,7 +655,7 @@ static void show_extended_objects(struct bitmap_index *bitmap_git, (obj->type == OBJ_TAG && !revs->tag_objects)) continue; - show_reach(&obj->oid, obj->type, 0, eindex->hashes[i], NULL, 0); + show_reach(&obj->oid, obj->type, 0, eindex->hashes[i], NULL, 0, 0); } } @@ -726,7 +726,8 @@ static void show_objects_for_type( if (bitmap_git->hashes) hash = get_be32(bitmap_git->hashes + index_pos); - show_reach(&oid, object_type, 0, hash, bitmap_git->pack, ofs); + show_reach(&oid, object_type, 0, hash, + bitmap_git->pack, pos + offset, ofs); } } } @@ -1430,84 +1431,3 @@ int bitmap_has_oid_in_uninteresting(struct bitmap_index *bitmap_git, return bitmap_git && bitmap_walk_contains(bitmap_git, bitmap_git->haves, oid); } - -static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git, - enum object_type object_type) -{ - struct bitmap *result = bitmap_git->result; - struct packed_git *pack = bitmap_git->pack; - off_t total = 0; - struct ewah_iterator it; - eword_t filter; - size_t i; - - init_type_iterator(&it, bitmap_git, object_type); - for (i = 0; i < result->word_alloc && - ewah_iterator_next(&filter, &it); i++) { - eword_t word = result->words[i] & filter; - size_t base = (i * BITS_IN_EWORD); - unsigned offset; - - if (!word) - continue; - - for (offset = 0; offset < BITS_IN_EWORD; offset++) { - size_t pos; - - if ((word >> offset) == 0) - break; - - offset += ewah_bit_ctz64(word >> offset); - pos = base + offset; - total += pack_pos_to_offset(pack, pos + 1) - - pack_pos_to_offset(pack, pos); - } - } - - return total; -} - -static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git) -{ - struct bitmap *result = bitmap_git->result; - struct packed_git *pack = bitmap_git->pack; - struct eindex *eindex = &bitmap_git->ext_index; - off_t total = 0; - struct object_info oi = OBJECT_INFO_INIT; - off_t object_size; - size_t i; - - oi.disk_sizep = &object_size; - - for (i = 0; i < eindex->count; i++) { - struct object *obj = eindex->objects[i]; - - if (!bitmap_get(result, pack->num_objects + i)) - continue; - - if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0) - die(_("unable to get disk usage of %s"), - oid_to_hex(&obj->oid)); - - total += object_size; - } - return total; -} - -off_t get_disk_usage_from_bitmap(struct bitmap_index *bitmap_git, - struct rev_info *revs) -{ - off_t total = 0; - - total += get_disk_usage_for_type(bitmap_git, OBJ_COMMIT); - if (revs->tree_objects) - total += get_disk_usage_for_type(bitmap_git, OBJ_TREE); - if (revs->blob_objects) - total += get_disk_usage_for_type(bitmap_git, OBJ_BLOB); - if (revs->tag_objects) - total += get_disk_usage_for_type(bitmap_git, OBJ_TAG); - - total += get_disk_usage_for_extended(bitmap_git); - - return total; -} diff --git a/pack-bitmap.h b/pack-bitmap.h index 36d99930d8..ba71a9f5c6 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -38,6 +38,7 @@ typedef int (*show_reachable_fn)( int flags, uint32_t hash, struct packed_git *found_pack, + uint32_t pack_pos, off_t found_offset); struct bitmap_index; diff --git a/reachable.c b/reachable.c index 77a60c70a5..79ebe8f940 100644 --- a/reachable.c +++ b/reachable.c @@ -182,6 +182,7 @@ static int mark_object_seen(const struct object_id *oid, int exclude, uint32_t name_hash, struct packed_git *found_pack, + uint32_t pack_pos, off_t found_offset) { struct object *obj = lookup_object_by_type(the_repository, oid, type); ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v2] rev-list --disk-usage 2021-02-09 11:09 ` [PATCH v2] rev-list --disk-usage Jeff King @ 2021-02-09 21:14 ` Junio C Hamano 2021-02-10 9:38 ` Jeff King 0 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2021-02-09 21:14 UTC (permalink / raw) To: Jeff King; +Cc: git, Kyle Meyer, Eric Sunshine, Taylor Blau Jeff King <peff@peff.net> writes: > I don't know that it's really worth digging into that much, though it's > quite possible there may be some easy wins by optimizing those memcpy > calls. E.g., I'm not sure if the compiler ends up inlining them or not. > If it doesn't realize that the_hash_algo->rawsz is only ever "20" or > "32", we could perhaps help it along with specialized versions of > hashcpy(). If somebody does want to play with it, this patch may make a > good testbed. :) Yuck. That reminds me of the adventure Shawn he made in the Java land benchmarking which one among int[5], int a,b,c,d,e, char[40] is the most efficient way (both storage-wise and performance-wise) to store SHA-1 hash. I wish we didn't have to go there. It indeed is an interesting, despite a bit sad, observation that even with a good precomputed information, an overly heavy interface can kill potential performance benefit. Thanks. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2] rev-list --disk-usage 2021-02-09 21:14 ` Junio C Hamano @ 2021-02-10 9:38 ` Jeff King 0 siblings, 0 replies; 30+ messages in thread From: Jeff King @ 2021-02-10 9:38 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Kyle Meyer, Eric Sunshine, Taylor Blau On Tue, Feb 09, 2021 at 01:14:17PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > I don't know that it's really worth digging into that much, though it's > > quite possible there may be some easy wins by optimizing those memcpy > > calls. E.g., I'm not sure if the compiler ends up inlining them or not. > > If it doesn't realize that the_hash_algo->rawsz is only ever "20" or > > "32", we could perhaps help it along with specialized versions of > > hashcpy(). If somebody does want to play with it, this patch may make a > > good testbed. :) > > Yuck. That reminds me of the adventure Shawn he made in the Java > land benchmarking which one among int[5], int a,b,c,d,e, char[40] is > the most efficient way (both storage-wise and performance-wise) to > store SHA-1 hash. I wish we didn't have to go there. > > It indeed is an interesting, despite a bit sad, observation that > even with a good precomputed information, an overly heavy interface > can kill potential performance benefit. Agreed. But I'm hoping we can continue to mostly ignore it. I suspect this finding means we are wasting a few hundred milliseconds copying oids around during a clone of torvalds/linux. But overall that is a pretty heavy-weight operation, and I doubt anybody really notices. And for something as lightweight as --disk-usage, it was easy enough to optimize around it. It probably does have a more measurable impact in something like: git rev-list --use-bitmap-index --objects HEAD >/dev/null where we really do need those oids, and the extra copying might add up. I guess if somebody is interested in micro-optimizing, that is probably a good command to look at. -Peff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2] rev-list --disk-usage 2021-02-09 10:52 ` [PATCH v2] " Jeff King ` (2 preceding siblings ...) 2021-02-09 11:09 ` [PATCH v2] rev-list --disk-usage Jeff King @ 2021-02-10 0:44 ` Junio C Hamano 2021-02-10 1:49 ` Taylor Blau 2021-02-10 10:01 ` Jeff King 3 siblings, 2 replies; 30+ messages in thread From: Junio C Hamano @ 2021-02-10 0:44 UTC (permalink / raw) To: Jeff King; +Cc: git, Kyle Meyer, Eric Sunshine, Taylor Blau Jeff King <peff@peff.net> writes: > Here's a re-roll of my series to add "rev-list --disk-usage", for > counting up object storage used for various slices of history. > ... > t/t6114-rev-list-du.sh | 51 +++++++++++++++++++ > t/test-lib-functions.sh | 9 +++- > 7 files changed, 199 insertions(+), 8 deletions(-) > create mode 100755 t/t6114-rev-list-du.sh I relocated 6114 to 6115 to avoid tests sharing the same number. I am getting these numbers from random ranges I am interested in, but do they say what I think they mean? Was the development effort went into the v2.28 release almost half the size of v2.29, and have we already done about the same amont of work for this cycle? : gitster git.git/seen; rungit seen rev-list --disk-usage master..next 83105 : gitster git.git/seen; rungit seen rev-list --disk-usage v2.30.0..master 183463 : gitster git.git/seen; rungit seen rev-list --disk-usage v2.29.0..v2.30.0 231640 : gitster git.git/seen; rungit seen rev-list --disk-usage v2.28.0..v2.29.0 334355 : gitster git.git/seen; rungit seen rev-list --disk-usage v2.27.0..v2.28.0 182298 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2] rev-list --disk-usage 2021-02-10 0:44 ` Junio C Hamano @ 2021-02-10 1:49 ` Taylor Blau 2021-02-10 10:01 ` Jeff King 1 sibling, 0 replies; 30+ messages in thread From: Taylor Blau @ 2021-02-10 1:49 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, git, Kyle Meyer, Eric Sunshine On Tue, Feb 09, 2021 at 04:44:27PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Here's a re-roll of my series to add "rev-list --disk-usage", for > > counting up object storage used for various slices of history. > > ... > > t/t6114-rev-list-du.sh | 51 +++++++++++++++++++ > > t/test-lib-functions.sh | 9 +++- > > 7 files changed, 199 insertions(+), 8 deletions(-) > > create mode 100755 t/t6114-rev-list-du.sh > > I relocated 6114 to 6115 to avoid tests sharing the same number. Thanks. > I am getting these numbers from random ranges I am interested in, > but do they say what I think they mean? Was the development effort > went into the v2.28 release almost half the size of v2.29, and have > we already done about the same amont of work for this cycle? > > : gitster git.git/seen; rungit seen rev-list --disk-usage master..next > 83105 > : gitster git.git/seen; rungit seen rev-list --disk-usage v2.30.0..master > 183463 > : gitster git.git/seen; rungit seen rev-list --disk-usage v2.29.0..v2.30.0 > 231640 > : gitster git.git/seen; rungit seen rev-list --disk-usage v2.28.0..v2.29.0 > 334355 > : gitster git.git/seen; rungit seen rev-list --disk-usage v2.27.0..v2.28.0 > 182298 I think you are surprised by these numbers because you're only counting disk usage of commit objects in those ranges. v1 of this series implied --objects by default, but this changed in v2 due to my suggestion. Passing --objects to count the disk-usage of all objects in those ranges gives more reasonable numbers (and match my rough guesses, i.e., that 2.29 was busier than 2.30, and so on): $ for range in origin/master..origin/next v2.30.0..origin/master \ v2.29.0..v2.30.0 v2.28.0..v2.29.0 v2.27.0..v2.28.0 do printf "%s %d vs. %d\n" $range \ "$(git rev-list --objects --no-object-names $range | git cat-file --batch-check='%(objectsize:disk)' | paste -sd+ | bc)" \ "$(git.seen rev-list --objects --disk-usage $range)" done origin/master..origin/next 671380 vs. 671380 v2.30.0..origin/master 1618815 vs. 1618815 v2.29.0..v2.30.0 3308295 vs. 3308295 v2.28.0..v2.29.0 4080789 vs. 4080789 v2.27.0..v2.28.0 2846196 vs. 2846196 Thanks, Taylor ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2] rev-list --disk-usage 2021-02-10 0:44 ` Junio C Hamano 2021-02-10 1:49 ` Taylor Blau @ 2021-02-10 10:01 ` Jeff King 2021-02-10 16:31 ` Junio C Hamano 1 sibling, 1 reply; 30+ messages in thread From: Jeff King @ 2021-02-10 10:01 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Kyle Meyer, Eric Sunshine, Taylor Blau On Tue, Feb 09, 2021 at 04:44:27PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Here's a re-roll of my series to add "rev-list --disk-usage", for > > counting up object storage used for various slices of history. > > ... > > t/t6114-rev-list-du.sh | 51 +++++++++++++++++++ > > t/test-lib-functions.sh | 9 +++- > > 7 files changed, 199 insertions(+), 8 deletions(-) > > create mode 100755 t/t6114-rev-list-du.sh > > I relocated 6114 to 6115 to avoid tests sharing the same number. Thanks. I wondered why I didn't notice, but it's because the other 6114 also just made it into "seen". :) > I am getting these numbers from random ranges I am interested in, > but do they say what I think they mean? Was the development effort > went into the v2.28 release almost half the size of v2.29, and have > we already done about the same amont of work for this cycle? > > : gitster git.git/seen; rungit seen rev-list --disk-usage master..next > 83105 > : gitster git.git/seen; rungit seen rev-list --disk-usage v2.30.0..master > 183463 > : gitster git.git/seen; rungit seen rev-list --disk-usage v2.29.0..v2.30.0 > 231640 > : gitster git.git/seen; rungit seen rev-list --disk-usage v2.28.0..v2.29.0 > 334355 > : gitster git.git/seen; rungit seen rev-list --disk-usage v2.27.0..v2.28.0 > 182298 As Taylor mentioned, this is only hitting the commits. So you might as well just be looking at commit counts as a measure of work, I'd think (and indeed v2.28 has about half as many commits as v2.29!). Adding --objects gets you a rougher estimate of "bytes changed", which helps accounts for commits of different sizes. But there I think you'd do just as well to look at the actual number of lines changed with "git diff --numstat". I'd expect the number of on-disk bytes to _roughly_ correspond to the size of the changes. But you are working against the heuristics of the delta chains there. It may well be that we would store a base object in the v2.28..v2.29 range, and a delta against it in v2.27..v2.28. And that would attribute most of the bytes to v2.29, even though they should be shared roughly with v2.28. I'm sure one could devise a scheme for "sharing" the bytes from a delta family across all of its objects. That might even be worth implementing on top (I don't even think it would be too expensive; you just have to collect the delta chains for any objects you're reporting, and then average the total size among a chain). But in practice, we've found this kind of naive --disk-usage useful for answering questions like: - do I need all of these objects? Comparing "rev-list --disk-usage --objects --all", "rev-list --disk-usage --objects --all --reflog", and "du objects/pack/*.pack" will tell you if a prune/repack might help, and whether expiring reflogs makes a difference. - the size of the shared alternates repo for a set of forks has jumped. Comparing "rev-list --disk-usage --objects --remotes=$base --not --remotes=$fork" will tell you what's reachable from a fork but not from the base (we use "refs/remotes/$id/*" to keep track of fork refs in our alternates repo). This can be junk like somebody forking git/git and then uploading a bunch of pirated video files. :) - likewise, the size of cloning a single repo may jump. Comparing "rev-list --disk-usage --objects HEAD..$branch" for each branch might show that one branch is an outlier (e.g., because somebody accidentally committed a bunch of build artifacts). In those kinds of cases, it's not usually "oh, this version is twice as big as this other one". It's more like "wow, this branch is 100x as big as the other branches", and little decisions like delta direction are just noise. I imagine that in those cases the uncompressed object sizes would probably produce similar patterns and answers. But it's actually faster to produce the on-disk sizes. :) -Peff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2] rev-list --disk-usage 2021-02-10 10:01 ` Jeff King @ 2021-02-10 16:31 ` Junio C Hamano 2021-02-10 20:38 ` Jeff King 0 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2021-02-10 16:31 UTC (permalink / raw) To: Jeff King; +Cc: git, Kyle Meyer, Eric Sunshine, Taylor Blau Jeff King <peff@peff.net> writes: > But in practice, we've found this kind of naive --disk-usage useful for > answering questions like: > > - do I need all of these objects? Comparing "rev-list --disk-usage > --objects --all", "rev-list --disk-usage --objects --all --reflog", > and "du objects/pack/*.pack" will tell you if a prune/repack might > help, and whether expiring reflogs makes a difference. > > - the size of the shared alternates repo for a set of forks has > jumped. Comparing "rev-list --disk-usage --objects --remotes=$base > --not --remotes=$fork" will tell you what's reachable from a fork > but not from the base (we use "refs/remotes/$id/*" to keep track of > fork refs in our alternates repo). This can be junk like somebody > forking git/git and then uploading a bunch of pirated video files. > :) > > - likewise, the size of cloning a single repo may jump. Comparing > "rev-list --disk-usage --objects HEAD..$branch" for each branch > might show that one branch is an outlier (e.g., because somebody > accidentally committed a bunch of build artifacts). > > In those kinds of cases, it's not usually "oh, this version is twice as > big as this other one". It's more like "wow, this branch is 100x as big > as the other branches", and little decisions like delta direction are > just noise. I imagine that in those cases the uncompressed object sizes > would probably produce similar patterns and answers. But it's actually > faster to produce the on-disk sizes. :) Thanks. I kind of feel sad to have a nice write-up like this only in the list archive. Is there a section in our documentation set to keep collection of such a real-life use cases? Perhaps the examples section of manpages is the closest thing, but it looks a bit too narrowly scoped for the example section of "rev-list" manpage. THanks. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2] rev-list --disk-usage 2021-02-10 16:31 ` Junio C Hamano @ 2021-02-10 20:38 ` Jeff King 2021-02-10 23:15 ` Taylor Blau 0 siblings, 1 reply; 30+ messages in thread From: Jeff King @ 2021-02-10 20:38 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Kyle Meyer, Eric Sunshine, Taylor Blau On Wed, Feb 10, 2021 at 08:31:08AM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > But in practice, we've found this kind of naive --disk-usage useful for > > answering questions like: > [...] > > I kind of feel sad to have a nice write-up like this only in the > list archive. Is there a section in our documentation set to keep > collection of such a real-life use cases? Perhaps the examples > section of manpages is the closest thing, but it looks a bit too > narrowly scoped for the example section of "rev-list" manpage. Agreed on both counts. If this gets put into a release, I suspect Taylor would cover it in a release blog post. That is not quite the same thing as having it in the documentation, but it may provide more search engine boost than the list archive. I dunno. -Peff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2] rev-list --disk-usage 2021-02-10 20:38 ` Jeff King @ 2021-02-10 23:15 ` Taylor Blau 2021-02-11 11:00 ` Jeff King 0 siblings, 1 reply; 30+ messages in thread From: Taylor Blau @ 2021-02-10 23:15 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git, Kyle Meyer, Eric Sunshine, Taylor Blau On Wed, Feb 10, 2021 at 03:38:58PM -0500, Jeff King wrote: > On Wed, Feb 10, 2021 at 08:31:08AM -0800, Junio C Hamano wrote: > > > Jeff King <peff@peff.net> writes: > > > > > But in practice, we've found this kind of naive --disk-usage useful for > > > answering questions like: > > [...] > > > > I kind of feel sad to have a nice write-up like this only in the > > list archive. Is there a section in our documentation set to keep > > collection of such a real-life use cases? Perhaps the examples > > section of manpages is the closest thing, but it looks a bit too > > narrowly scoped for the example section of "rev-list" manpage. > > Agreed on both counts. If this gets put into a release, I suspect Taylor > would cover it in a release blog post. That is not quite the same thing > as having it in the documentation, but it may provide more search engine > boost than the list archive. I dunno. Yeah, this is the perfect sort of thing for those blog posts. But it makes sense to include some of these examples in our own documentation here, too. git-rev-list(1) doesn't have an EXAMPLES section, but maybe it should. > -Peff Thanks, Taylor ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2] rev-list --disk-usage 2021-02-10 23:15 ` Taylor Blau @ 2021-02-11 11:00 ` Jeff King 2021-02-11 12:04 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 30+ messages in thread From: Jeff King @ 2021-02-11 11:00 UTC (permalink / raw) To: Taylor Blau; +Cc: Junio C Hamano, git, Kyle Meyer, Eric Sunshine On Wed, Feb 10, 2021 at 06:15:16PM -0500, Taylor Blau wrote: > > > I kind of feel sad to have a nice write-up like this only in the > > > list archive. Is there a section in our documentation set to keep > > > collection of such a real-life use cases? Perhaps the examples > > > section of manpages is the closest thing, but it looks a bit too > > > narrowly scoped for the example section of "rev-list" manpage. > > > > Agreed on both counts. If this gets put into a release, I suspect Taylor > > would cover it in a release blog post. That is not quite the same thing > > as having it in the documentation, but it may provide more search engine > > boost than the list archive. I dunno. > > Yeah, this is the perfect sort of thing for those blog posts. > > But it makes sense to include some of these examples in our own > documentation here, too. git-rev-list(1) doesn't have an EXAMPLES > section, but maybe it should. I think this is the "narrowly scoped" bit from Junio's response above. It would be a bit weird to have an examples section for rev-list that only mentions this rather obscure feature. -Peff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2] rev-list --disk-usage 2021-02-11 11:00 ` Jeff King @ 2021-02-11 12:04 ` Ævar Arnfjörð Bjarmason 2021-02-11 17:57 ` Junio C Hamano 2021-02-17 23:31 ` [PATCH 0/2] rev-list --disk-usage example docs Jeff King 0 siblings, 2 replies; 30+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-11 12:04 UTC (permalink / raw) To: Jeff King; +Cc: Taylor Blau, Junio C Hamano, git, Kyle Meyer, Eric Sunshine On Thu, Feb 11 2021, Jeff King wrote: > On Wed, Feb 10, 2021 at 06:15:16PM -0500, Taylor Blau wrote: > >> > > I kind of feel sad to have a nice write-up like this only in the >> > > list archive. Is there a section in our documentation set to keep >> > > collection of such a real-life use cases? Perhaps the examples >> > > section of manpages is the closest thing, but it looks a bit too >> > > narrowly scoped for the example section of "rev-list" manpage. >> > >> > Agreed on both counts. If this gets put into a release, I suspect Taylor >> > would cover it in a release blog post. That is not quite the same thing >> > as having it in the documentation, but it may provide more search engine >> > boost than the list archive. I dunno. >> >> Yeah, this is the perfect sort of thing for those blog posts. >> >> But it makes sense to include some of these examples in our own >> documentation here, too. git-rev-list(1) doesn't have an EXAMPLES >> section, but maybe it should. > > I think this is the "narrowly scoped" bit from Junio's response above. > It would be a bit weird to have an examples section for rev-list that > only mentions this rather obscure feature. I don't think the lack of an EXAMPLES section or the relative obscurity of the switch should preclude us from adding useful documentation. Yes it would feel a bit out of place, but we can always have a sub-section of EXAMPLES, and we've got to start somewhere. In this case I don't see why it couldn't be added to OPTIONS, we've got some very long discussion there already, and as long as there's a clear separation in prose from an initial brief discussion of the switch and further prose it won't be confusing for readers, they can just page past the details. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2] rev-list --disk-usage 2021-02-11 12:04 ` Ævar Arnfjörð Bjarmason @ 2021-02-11 17:57 ` Junio C Hamano 2021-02-17 23:31 ` [PATCH 0/2] rev-list --disk-usage example docs Jeff King 1 sibling, 0 replies; 30+ messages in thread From: Junio C Hamano @ 2021-02-11 17:57 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Jeff King, Taylor Blau, git, Kyle Meyer, Eric Sunshine Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> I think this is the "narrowly scoped" bit from Junio's response above. >> It would be a bit weird to have an examples section for rev-list that >> only mentions this rather obscure feature. > > I don't think the lack of an EXAMPLES section or the relative obscurity > of the switch should preclude us from adding useful documentation. > > Yes it would feel a bit out of place, but we can always have a > sub-section of EXAMPLES, and we've got to start somewhere. > > In this case I don't see why it couldn't be added to OPTIONS, we've got > some very long discussion there already, and as long as there's a clear > separation in prose from an initial brief discussion of the switch and > further prose it won't be confusing for readers, they can just page past > the details. OK. In any case, [v2] as we have it (with test number relocation) should be good as-is, so I'd start preparing to merge it down to 'next' soonish. Thanks. ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 0/2] rev-list --disk-usage example docs 2021-02-11 12:04 ` Ævar Arnfjörð Bjarmason 2021-02-11 17:57 ` Junio C Hamano @ 2021-02-17 23:31 ` Jeff King 2021-02-17 23:34 ` [PATCH 1/2] docs/rev-list: add an examples section Jeff King ` (2 more replies) 1 sibling, 3 replies; 30+ messages in thread From: Jeff King @ 2021-02-17 23:31 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Taylor Blau, Junio C Hamano, git, Kyle Meyer, Eric Sunshine On Thu, Feb 11, 2021 at 01:04:26PM +0100, Ævar Arnfjörð Bjarmason wrote: > > I think this is the "narrowly scoped" bit from Junio's response above. > > It would be a bit weird to have an examples section for rev-list that > > only mentions this rather obscure feature. > > I don't think the lack of an EXAMPLES section or the relative obscurity > of the switch should preclude us from adding useful documentation. > > Yes it would feel a bit out of place, but we can always have a > sub-section of EXAMPLES, and we've got to start somewhere. Fair enough. Here are some patches (to go on top of jk/rev-list-disk-usage, though obviously the first one could be applied independently). > In this case I don't see why it couldn't be added to OPTIONS, we've got > some very long discussion there already, and as long as there's a clear > separation in prose from an initial brief discussion of the switch and > further prose it won't be confusing for readers, they can just page past > the details. It's already big and scary enough that I prefer starting an EXAMPLES section. :) By the way, there's one other finishing touch we might consider: enabling --use-bitmap-index automatically when bitmaps are present, for requests that produce the identical answer (so _not_ a regular traversal, because the output order and presence of pathnames are different there). I'd prefer to do that as a separate series, though, since there are multiple arguments that might benefit (like --count). [1/2]: docs/rev-list: add an examples section [2/2]: docs/rev-list: add some examples of --disk-usage Documentation/git-rev-list.txt | 93 ++++++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+) -Peff ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/2] docs/rev-list: add an examples section 2021-02-17 23:31 ` [PATCH 0/2] rev-list --disk-usage example docs Jeff King @ 2021-02-17 23:34 ` Jeff King 2021-02-17 23:35 ` [PATCH 2/2] docs/rev-list: add some examples of --disk-usage Jeff King 2021-02-17 23:44 ` [PATCH 0/2] rev-list --disk-usage example docs Taylor Blau 2 siblings, 0 replies; 30+ messages in thread From: Jeff King @ 2021-02-17 23:34 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Taylor Blau, Junio C Hamano, git, Kyle Meyer, Eric Sunshine We currently don't show any examples of using git-rev-list at all. Let's add some pretty elementary examples. They likely seem obvious to anybody who has worked with the tool for a while, but my purpose here is two-fold: - they may be enlightening to people who haven't used the tool a lot to give a general flavor of how it is meant to be used - they can serve as a starting point for adding more interesting examples (we can do that without the basic ones, of course, but I think it makes sense to show off the building blocks) This set is far from exhaustive, but again, the purpose is to be a starting point for further additions. Signed-off-by: Jeff King <peff@peff.net> --- I'm open to feedback on these. But please, if you have suggestions for adding more, do it in the form of a patch on top. :) Documentation/git-rev-list.txt | 52 ++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt index 5da66232dc..d7ff519b90 100644 --- a/Documentation/git-rev-list.txt +++ b/Documentation/git-rev-list.txt @@ -31,6 +31,58 @@ include::rev-list-options.txt[] include::pretty-formats.txt[] +EXAMPLES +-------- + +* Print the list of commits reachable from the current branch. ++ +---------- +git rev-list HEAD +---------- + +* Print the list of commits on this branch, but not present in the + upstream branch. ++ +---------- +git rev-list @{upstream}..HEAD +---------- + +* Format commits with their author and commit message (see also the + porcelain linkgit:git-log[1]). ++ +---------- +git rev-list --format=medium HEAD +---------- + +* Format commits along with their diffs (see also the porcelain + linkgit:git-log[1], which can do this in a single process). ++ +---------- +git rev-list HEAD | +git diff-tree --stdin --format=medium -p +---------- + +* Print the list of commits on the current branch that touched any + file in the `Documentation` directory. ++ +---------- +git rev-list HEAD -- Documentation/ +---------- + +* Print the list of commits authored by you in the past year, on + any branch, tag, or other ref. ++ +---------- +git rev-list --author=you@example.com --since=1.year.ago --all +---------- + +* Print the list of objects reachable from the current branch (i.e., all + commits and the blobs and trees they contain). ++ +---------- +git rev-list --objects HEAD +---------- + GIT --- Part of the linkgit:git[1] suite -- 2.30.1.989.g5e01c2f281 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/2] docs/rev-list: add some examples of --disk-usage 2021-02-17 23:31 ` [PATCH 0/2] rev-list --disk-usage example docs Jeff King 2021-02-17 23:34 ` [PATCH 1/2] docs/rev-list: add an examples section Jeff King @ 2021-02-17 23:35 ` Jeff King 2021-02-17 23:44 ` [PATCH 0/2] rev-list --disk-usage example docs Taylor Blau 2 siblings, 0 replies; 30+ messages in thread From: Jeff King @ 2021-02-17 23:35 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Taylor Blau, Junio C Hamano, git, Kyle Meyer, Eric Sunshine It's not immediately obvious why --disk-usage might be a useful thing. These examples show off a few of the real-world cases I've used it for. Signed-off-by: Jeff King <peff@peff.net> --- Documentation/git-rev-list.txt | 41 ++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt index d7ff519b90..20bb8e8217 100644 --- a/Documentation/git-rev-list.txt +++ b/Documentation/git-rev-list.txt @@ -83,6 +83,47 @@ git rev-list --author=you@example.com --since=1.year.ago --all git rev-list --objects HEAD ---------- +* Compare the disk size of all reachable objects, versus those + reachable from reflogs, versus the total packed size. This can tell + you whether running `git repack -ad` might reduce the repository size + (by dropping unreachable objects), and whether expiring reflogs might + help. ++ +---------- +# reachable objects +git rev-list --disk-usage --objects --all +# plus reflogs +git rev-list --disk-usage --objects --all --reflog +# total disk size used +du -c .git/objects/pack/*.pack .git/objects/??/* +# alternative to du: add up "size" and "size-pack" fields +git count-objects -v +---------- + +* Report the disk size of each branch, not including objects used by the + current branch. This can find outliers that are contributing to a + bloated repository size (e.g., because somebody accidentally committed + large build artifacts). ++ +---------- +git for-each-ref --format='%(refname)' | +while read branch +do + size=$(git rev-list --disk-usage --objects HEAD..$branch) + echo "$size $branch" +done | +sort -n +---------- + +* Compare the on-disk size of branches in one group of refs, excluding + another. If you co-mingle objects from multiple remotes in a single + repository, this can show which remotes are contributing to the + repository size (taking the size of `origin` as a baseline). ++ +---------- +git rev-list --disk-usage --objects --remotes=$suspect --not --remotes=origin +---------- + GIT --- Part of the linkgit:git[1] suite -- 2.30.1.989.g5e01c2f281 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 0/2] rev-list --disk-usage example docs 2021-02-17 23:31 ` [PATCH 0/2] rev-list --disk-usage example docs Jeff King 2021-02-17 23:34 ` [PATCH 1/2] docs/rev-list: add an examples section Jeff King 2021-02-17 23:35 ` [PATCH 2/2] docs/rev-list: add some examples of --disk-usage Jeff King @ 2021-02-17 23:44 ` Taylor Blau 2 siblings, 0 replies; 30+ messages in thread From: Taylor Blau @ 2021-02-17 23:44 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð Bjarmason, Taylor Blau, Junio C Hamano, git, Kyle Meyer, Eric Sunshine On Wed, Feb 17, 2021 at 06:31:07PM -0500, Jeff King wrote: > It's already big and scary enough that I prefer starting an EXAMPLES > section. :) The patches you sent below are great. I think that it's easy to nitpick and say "oh, you should have added this or that example, too", but I think you gave a great set of starting examples. I'd be happy to see this merged so that others can add more examples on top. > By the way, there's one other finishing touch we might consider: > enabling --use-bitmap-index automatically when bitmaps are present, for > requests that produce the identical answer (so _not_ a regular > traversal, because the output order and presence of pathnames are > different there). I'd prefer to do that as a separate series, though, > since there are multiple arguments that might benefit (like --count). This would be really neat. I look forward to it. Thanks, Taylor ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2021-02-17 23:45 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-01-27 22:11 [PATCH 0/2] rev-list --disk-usage Jeff King 2021-01-27 22:12 ` [PATCH 1/2] t: add --no-tag option to test_commit Jeff King 2021-01-27 22:48 ` Taylor Blau 2021-01-27 22:17 ` [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage Jeff King 2021-01-27 22:57 ` Taylor Blau 2021-01-27 23:34 ` Jeff King 2021-01-27 23:01 ` Kyle Meyer 2021-01-27 23:36 ` Jeff King 2021-01-27 23:07 ` Eric Sunshine 2021-01-27 23:39 ` Jeff King 2021-01-27 22:46 ` [PATCH 0/2] rev-list --disk-usage Taylor Blau 2021-02-09 10:52 ` [PATCH v2] " Jeff King 2021-02-09 10:52 ` [PATCH v2 1/2] t: add --no-tag option to test_commit Jeff King 2021-02-09 10:53 ` [PATCH v2 2/2] rev-list: add --disk-usage option for calculating disk usage Jeff King 2021-02-09 11:09 ` [PATCH v2] rev-list --disk-usage Jeff King 2021-02-09 21:14 ` Junio C Hamano 2021-02-10 9:38 ` Jeff King 2021-02-10 0:44 ` Junio C Hamano 2021-02-10 1:49 ` Taylor Blau 2021-02-10 10:01 ` Jeff King 2021-02-10 16:31 ` Junio C Hamano 2021-02-10 20:38 ` Jeff King 2021-02-10 23:15 ` Taylor Blau 2021-02-11 11:00 ` Jeff King 2021-02-11 12:04 ` Ævar Arnfjörð Bjarmason 2021-02-11 17:57 ` Junio C Hamano 2021-02-17 23:31 ` [PATCH 0/2] rev-list --disk-usage example docs Jeff King 2021-02-17 23:34 ` [PATCH 1/2] docs/rev-list: add an examples section Jeff King 2021-02-17 23:35 ` [PATCH 2/2] docs/rev-list: add some examples of --disk-usage Jeff King 2021-02-17 23:44 ` [PATCH 0/2] rev-list --disk-usage example docs Taylor Blau
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).