git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 3/4] cat-file: add --batch-disk-sizes option
Date: Sun, 7 Jul 2013 06:09:49 -0400	[thread overview]
Message-ID: <20130707100949.GC19143@sigill.intra.peff.net> (raw)
In-Reply-To: <20130707100133.GA18717@sigill.intra.peff.net>

This option is just like --batch-check, but shows the
on-disk size rather than the true object size. In other
words, it makes the "disk_size" query of sha1_object_info_extended
available via the command-line.

This can be used for rough attribution of disk usage to
particular refs, though see the caveats in the
documentation.

This patch does not include any tests, as the exact numbers
returned are volatile and subject to zlib and packing
decisions.

Signed-off-by: Jeff King <peff@peff.net>
---
I sort of tacked this onto the --batch-check format by replacing the
"real" object size with the on-disk size when this option is used. I'm
open to suggestions. Two other things I considered were:

  1. Having the option simply output an extra field with the on-disk
     size. But then you are paying for the true object size lookup, even
     if you don't necessarily care.

  2. Simply outputting the disk-size and object name. For my purposes, I
     do not care about the object type, and finding the type takes non-trivial
     resources (we have to walk delta chains to find the true type).

Perhaps we need

  git cat-file --batch-format="%(disk-size) %(object)"

or similar.

 Documentation/git-cat-file.txt | 16 ++++++++++++++++
 builtin/cat-file.c             |  9 +++++++++
 2 files changed, 25 insertions(+)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 30d585a..d4af1fc 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -65,6 +65,22 @@ OPTIONS
 	Print the SHA-1, type, and size of each object provided on stdin. May not
 	be combined with any other options or arguments.
 
+--batch-disk-sizes::
+	Like `--batch-check`, but print the on-disk size of each object
+	(including zlib and delta compression) rather than the object's
+	true size. May not be combined with any other options or
+	arguments.
++
+NOTE: The on-disk size reported is accurate, but care should be taken in
+drawing conclusions about which refs or objects are responsible for disk
+usage. The size of a packed non-delta object be much larger than the
+size of objects which delta against it, but the choice of which object
+is the base and which is the delta is arbitrary and is subject to change
+during a repack. Note also that multiple copies of an object may be
+present in the object database; in this case, it is undefined which
+copy's size will be reported.
+
+
 OUTPUT
 ------
 If '-t' is specified, one of the <type>.
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 045cee7..5112c64 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -15,6 +15,7 @@
 
 #define BATCH 1
 #define BATCH_CHECK 2
+#define BATCH_DISK_SIZES 3
 
 static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
 {
@@ -135,6 +136,11 @@ static int batch_one_object(const char *obj_name, int print_contents)
 
 	if (print_contents == BATCH)
 		contents = read_sha1_file(sha1, &type, &size);
+	else if (print_contents == BATCH_DISK_SIZES) {
+		struct object_info oi = {0};
+		oi.disk_sizep = &size;
+		type = sha1_object_info_extended(sha1, &oi);
+	}
 	else
 		type = sha1_object_info(sha1, &size);
 
@@ -206,6 +212,9 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		OPT_SET_INT(0, "batch-check", &batch,
 			    N_("show info about objects fed from the standard input"),
 			    BATCH_CHECK),
+		OPT_SET_INT(0, "batch-disk-sizes", &batch,
+			    N_("show on-disk size of objects fed from standard input"),
+			    BATCH_DISK_SIZES),
 		OPT_END()
 	};
 
-- 
1.8.3.rc3.24.gec82cb9

  parent reply	other threads:[~2013-07-07 10:10 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-07 10:01 [RFC/PATCH 0/4] cat-file --batch-disk-sizes Jeff King
2013-07-07 10:03 ` [PATCH 1/4] zero-initialize object_info structs Jeff King
2013-07-07 17:34   ` Junio C Hamano
2013-07-07 10:04 ` [PATCH 2/4] teach sha1_object_info_extended a "disk_size" query Jeff King
2013-07-07 10:09 ` Jeff King [this message]
2013-07-07 17:49   ` [PATCH 3/4] cat-file: add --batch-disk-sizes option Junio C Hamano
2013-07-07 18:19     ` Jeff King
2013-07-08 11:04     ` Duy Nguyen
2013-07-08 12:00       ` Ramkumar Ramachandra
2013-07-08 13:13         ` Duy Nguyen
2013-07-08 13:37           ` Ramkumar Ramachandra
2013-07-09  2:55             ` Duy Nguyen
2013-07-09 10:32               ` Ramkumar Ramachandra
2013-07-10 11:16             ` Jeff King
2013-07-08 16:40           ` Junio C Hamano
2013-07-10 11:04     ` Jeff King
2013-07-11 16:35       ` Junio C Hamano
2013-07-07 21:15   ` brian m. carlson
2013-07-10 10:57     ` Jeff King
2013-07-07 10:14 ` [PATCH 4/4] pack-revindex: radix-sort the revindex Jeff King
2013-07-07 23:52   ` Shawn Pearce
2013-07-08  7:57     ` Jeff King
2013-07-08 15:38       ` Shawn Pearce
2013-07-08 20:50   ` Brandon Casey
2013-07-08 21:35     ` Brandon Casey
2013-07-10 10:57       ` Jeff King
2013-07-10 10:52     ` Jeff King
2013-07-10 11:34 ` [PATCHv2 00/10] cat-file formats/on-disk sizes Jeff King
2013-07-10 11:35   ` [PATCH 01/10] zero-initialize object_info structs Jeff King
2013-07-10 11:35   ` [PATCH 02/10] teach sha1_object_info_extended a "disk_size" query Jeff King
2013-07-10 11:36   ` [PATCH 03/10] t1006: modernize output comparisons Jeff King
2013-07-10 11:38   ` [PATCH 04/10] cat-file: teach --batch to stream blob objects Jeff King
2013-07-10 11:38   ` [PATCH 05/10] cat-file: refactor --batch option parsing Jeff King
2013-07-10 11:45   ` [PATCH 06/10] cat-file: add --batch-check=<format> Jeff King
2013-07-10 11:57     ` Eric Sunshine
2013-07-10 14:51     ` Ramkumar Ramachandra
2013-07-11 11:24       ` Jeff King
2013-07-10 11:46   ` [PATCH 07/10] cat-file: add %(objectsize:disk) format atom Jeff King
2013-07-10 11:48   ` [PATCH 08/10] cat-file: split --batch input lines on whitespace Jeff King
2013-07-10 15:29     ` Ramkumar Ramachandra
2013-07-11 11:36       ` Jeff King
2013-07-11 17:42         ` Junio C Hamano
2013-07-11 20:45         ` [PATCHv3 " Jeff King
2013-07-10 11:50   ` [PATCH 09/10] pack-revindex: use unsigned to store number of objects Jeff King
2013-07-10 11:55   ` [PATCH 10/10] pack-revindex: radix-sort the revindex Jeff King
2013-07-10 12:00     ` Jeff King
2013-07-10 13:17     ` Ramkumar Ramachandra
2013-07-11 11:03       ` Jeff King
2013-07-10 17:10     ` Brandon Casey
2013-07-11 11:17       ` Jeff King
2013-07-11 12:16     ` [PATCHv3 " Jeff King
2013-07-11 21:12       ` Brandon Casey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130707100949.GC19143@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --subject='Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).