From: Johan Herland <johan@herland.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
johan@herland.net
Subject: [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis
Date: Tue, 12 Apr 2011 11:26:49 +0200 [thread overview]
Message-ID: <201104121126.49881.johan@herland.net> (raw)
In-Reply-To: <201104121122.56870.johan@herland.net>
This patch adds an alternative implementation of show_dirstat(), called
show_dirstat_based_on_diffstat(), which uses the more expensive diffstat
analysis (as opposed to --dirstat's own (inexpensive) analysis) to derive
the numbers from which the --dirstat output is computed.
The alternative implementation is controlled by a new config variable called
diff.dirstatBasedOnDiffstat.
In linux-2.6.git, running
time git diff v2.6.20..v2.6.30 --dirstat=0 > /dev/null
with and without diff.dirstatBasedOnDiffstat enabled yields the following
average runtimes on my machine:
- disabled: ~6.0 s
- enabled: ~9.7 s
So, as expected, there's a considerable performance hit (>60%) by going
through the full diffstat analysis. As such, the new option is probably
only useful if you really need the --dirstat numbers to be consistent with
the numbers returned from the other --*stat options.
In --dirstat-by-file mode, the diffstat analysis is obviously a waste of time,
so --dirstat-by-file automatically disabled diff.dirstatBasedOnDiffstat.
Signed-off-by: Johan Herland <johan@herland.net>
---
This might not be worth applying at all, but if it is, I can send a re-roll
with documentation and more user-friendlyness.
Have fun! :)
...Johan
diff.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++--
1 files changed, 51 insertions(+), 2 deletions(-)
diff --git a/diff.c b/diff.c
index 5376d01..a496ba6 100644
--- a/diff.c
+++ b/diff.c
@@ -31,6 +31,7 @@ static const char *external_diff_cmd_cfg;
int diff_auto_refresh_index = 1;
static int diff_mnemonic_prefix;
static int diff_no_prefix;
+static int dirstat_based_on_diffstat;
static struct diff_options default_diff_options;
static char diff_colors[][COLOR_MAXLEN] = {
@@ -103,6 +104,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
diff_no_prefix = git_config_bool(var, value);
return 0;
}
+ if (!strcmp(var, "diff.dirstatbasedondiffstat")) {
+ dirstat_based_on_diffstat = git_config_bool(var, value);
+ return 0;
+ }
if (!strcmp(var, "diff.external"))
return git_config_string(&external_diff_cmd_cfg, var, value);
if (!strcmp(var, "diff.wordregex"))
@@ -1619,6 +1624,43 @@ found_damage:
gather_dirstat(options, &dir, changed, "", 0);
}
+static void show_dirstat_based_on_diffstat(struct diffstat_t *data, struct diff_options
*options)
+{
+ int i;
+ unsigned long changed;
+ struct dirstat_dir dir;
+
+ if (data->nr == 0)
+ return;
+
+ dir.files = NULL;
+ dir.alloc = 0;
+ dir.nr = 0;
+ dir.percent = options->dirstat_percent;
+ dir.cumulative = DIFF_OPT_TST(options, DIRSTAT_CUMULATIVE);
+
+ changed = 0;
+ for (i = 0; i < data->nr; i++) {
+ struct diffstat_file *file = data->files[i];
+ unsigned long damage;
+
+ damage = file->added + file->deleted;
+ ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
+ dir.files[dir.nr].name = file->name;
+ dir.files[dir.nr].changed = damage;
+ changed += damage;
+ dir.nr++;
+ }
+
+ /* This can happen even with many files, if everything was renames */
+ if (!changed)
+ return;
+
+ /* Show all directories with more than x% of the changes */
+ qsort(dir.files, dir.nr, sizeof(dir.files[0]), dirstat_compare);
+ gather_dirstat(options, &dir, changed, "", 0);
+}
+
static void free_diffstat_info(struct diffstat_t *diffstat)
{
int i;
@@ -4012,7 +4054,12 @@ void diff_flush(struct diff_options *options)
separator++;
}
- if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT)) {
+ // --dirstat-by-file REALLY don't need the full diffstat analysis
+ if (DIFF_OPT_TST(options, DIRSTAT_BY_FILE))
+ dirstat_based_on_diffstat = 0;
+
+ if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT) ||
+ ((output_format & DIFF_FORMAT_DIRSTAT) && dirstat_based_on_diffstat)) {
struct diffstat_t diffstat;
memset(&diffstat, 0, sizeof(struct diffstat_t));
@@ -4027,10 +4074,12 @@ void diff_flush(struct diff_options *options)
show_stats(&diffstat, options);
if (output_format & DIFF_FORMAT_SHORTSTAT)
show_shortstats(&diffstat, options);
+ if (output_format & DIFF_FORMAT_DIRSTAT)
+ show_dirstat_based_on_diffstat(&diffstat, options);
free_diffstat_info(&diffstat);
separator++;
}
- if (output_format & DIFF_FORMAT_DIRSTAT)
+ if ((output_format & DIFF_FORMAT_DIRSTAT) && !dirstat_based_on_diffstat)
show_dirstat(options);
if (output_format & DIFF_FORMAT_SUMMARY && !is_summary_empty(q)) {
--
1.7.5.rc1.3.g4d7b
next prev parent reply other threads:[~2011-04-12 9:27 UTC|newest]
Thread overview: 91+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-07 13:49 BUG? in --dirstat when rearranging lines in a file Johan Herland
2011-04-07 14:56 ` Linus Torvalds
2011-04-07 22:43 ` Junio C Hamano
2011-04-07 22:59 ` Linus Torvalds
2011-04-08 14:46 ` Johan Herland
2011-04-08 14:48 ` [PATCH 1/3] --dirstat: Document shortcomings compared to --stat or regular diff Johan Herland
2011-04-08 19:50 ` Junio C Hamano
2011-04-08 14:50 ` [PATCH 2/3] --dirstat-by-file: Make it faster and more correct Johan Herland
2011-04-08 14:55 ` [RFC/PATCH 3/3] Teach --dirstat to not completely ignore rearranged lines Johan Herland
2011-04-08 15:04 ` BUG? in --dirstat when rearranging lines in a file Linus Torvalds
2011-04-08 19:56 ` Junio C Hamano
2011-04-10 22:48 ` [PATCHv2 0/3] --dirstat fixes Johan Herland
2011-04-10 22:48 ` [PATCHv2 1/3] --dirstat: Describe non-obvious differences relative to --stat or regular diff Johan Herland
2011-04-10 22:48 ` [PATCHv2 2/3] --dirstat-by-file: Make it faster and more correct Johan Herland
2011-04-11 18:14 ` Junio C Hamano
2011-04-10 22:48 ` [PATCHv2 3/3] Teach --dirstat to not completely ignore rearranged lines within a file Johan Herland
2011-04-11 21:38 ` Junio C Hamano
2011-04-11 21:56 ` Johan Herland
2011-04-11 22:08 ` Junio C Hamano
2011-04-12 9:22 ` Johan Herland
2011-04-12 9:24 ` [PATCH 4/3] --dirstat: In case of renames, use target filename instead of source filename Johan Herland
2011-04-12 14:59 ` Linus Torvalds
2011-04-12 9:26 ` Johan Herland [this message]
2011-04-12 14:46 ` [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis Linus Torvalds
2011-04-12 15:08 ` Linus Torvalds
2011-04-12 22:03 ` Johan Herland
2011-04-12 22:12 ` Linus Torvalds
2011-04-12 22:22 ` Junio C Hamano
2011-04-26 0:01 ` [PATCH 0/6] --dirstat fixes, part 2 Johan Herland
2011-04-26 0:01 ` [PATCH 1/6] Add several testcases for --dirstat and friends Johan Herland
2011-04-26 0:01 ` [PATCH 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
2011-04-26 0:01 ` [PATCH 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
2011-04-26 16:36 ` Junio C Hamano
2011-04-27 2:02 ` Johan Herland
2011-04-27 4:53 ` Junio C Hamano
2011-04-27 20:51 ` Junio C Hamano
2011-04-27 21:01 ` Junio C Hamano
2011-04-26 0:01 ` [PATCH 4/6] Add config variable for specifying default --dirstat behavior Johan Herland
2011-04-26 16:43 ` Junio C Hamano
2011-04-27 2:02 ` Johan Herland
2011-04-26 0:01 ` [PATCH 5/6] Use floating point for --dirstat percentages Johan Herland
2011-04-26 16:52 ` Junio C Hamano
2011-04-27 2:02 ` Johan Herland
2011-04-27 4:42 ` Junio C Hamano
2011-04-27 4:53 ` Linus Torvalds
2011-04-27 5:20 ` Junio C Hamano
2011-04-26 0:01 ` [PATCH 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
2011-04-26 16:59 ` Junio C Hamano
2011-04-27 2:02 ` Johan Herland
2011-04-26 0:15 ` [PATCH 0/6] --dirstat fixes, part 2 Linus Torvalds
2011-04-27 2:12 ` [PATCHv2 " Johan Herland
2011-04-27 2:12 ` [PATCHv2 1/6] Add several testcases for --dirstat and friends Johan Herland
2011-04-27 2:12 ` [PATCHv2 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
2011-04-27 2:12 ` [PATCHv2 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
2011-04-27 2:12 ` [PATCHv2 4/6] Add config variable for specifying default --dirstat behavior Johan Herland
2011-04-27 2:12 ` [PATCHv2 5/6] Use floating point for --dirstat percentages Johan Herland
2011-04-27 2:45 ` Linus Torvalds
2011-04-27 2:12 ` [PATCHv2 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
2011-04-27 8:24 ` [PATCHv3 0/6] --dirstat fixes, part 2 Johan Herland
2011-04-27 8:24 ` [PATCHv3 1/6] Add several testcases for --dirstat and friends Johan Herland
2011-04-27 8:24 ` [PATCHv3 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
2011-04-27 8:24 ` [PATCHv3 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
2011-04-27 8:24 ` [PATCHv3 4/6] Add config variable for specifying default --dirstat behavior Johan Herland
2011-04-27 8:24 ` [PATCHv3 5/6] Allow specifying --dirstat cut-off percentage as a floating point number Johan Herland
2011-04-27 8:37 ` Linus Torvalds
2011-04-27 10:29 ` [PATCHv4 " Johan Herland
2011-04-27 8:24 ` [PATCHv3 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
2011-04-28 1:17 ` [PATCHv5 0/7] --dirstat fixes, part 2 Johan Herland
2011-04-28 1:17 ` [PATCHv5 1/7] Add several testcases for --dirstat and friends Johan Herland
2011-04-28 1:17 ` [PATCHv5 2/7] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
2011-04-28 1:17 ` [PATCHv5 3/7] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
2011-04-28 1:17 ` [PATCHv5 4/7] Add config variable for specifying default --dirstat behavior Johan Herland
2011-04-28 1:17 ` [PATCHv5 5/7] Allow specifying --dirstat cut-off percentage as a floating point number Johan Herland
2011-04-28 1:17 ` [PATCHv5 6/7] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
2011-04-28 1:17 ` [PATCHv5 7/7] Improve error handling when parsing dirstat parameters Johan Herland
2011-04-28 18:41 ` Junio C Hamano
2011-04-28 19:20 ` Junio C Hamano
2011-04-28 23:16 ` Johan Herland
2011-04-28 23:13 ` Johan Herland
2011-04-29 4:06 ` Junio C Hamano
2011-04-29 9:36 ` [PATCHv6 0/8] --dirstat fixes, part 2 Johan Herland
2011-04-29 9:36 ` [PATCHv6 1/8] Add several testcases for --dirstat and friends Johan Herland
2011-04-29 9:36 ` [PATCHv6 2/8] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
2011-04-29 9:36 ` [PATCHv6 3/8] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
2011-04-29 9:36 ` [PATCHv6 4/8] Add config variable for specifying default --dirstat behavior Johan Herland
2011-04-29 9:36 ` [PATCHv6 5/8] Allow specifying --dirstat cut-off percentage as a floating point number Johan Herland
2011-04-29 9:36 ` [PATCHv6 6/8] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
2011-04-29 9:36 ` [PATCHv6 7/8] Improve error handling when parsing dirstat parameters Johan Herland
2011-04-29 9:36 ` [PATCHv6 8/8] Mark dirstat error messages for translation Johan Herland
2011-04-12 18:34 ` [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis Junio C Hamano
2011-04-10 23:17 ` [PATCHv2 0/3] --dirstat fixes Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201104121126.49881.johan@herland.net \
--to=johan@herland.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).