git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johan Herland <johan@herland.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	johan@herland.net
Subject: [PATCH 2/3] --dirstat-by-file: Make it faster and more correct
Date: Fri, 8 Apr 2011 16:50:49 +0200	[thread overview]
Message-ID: <201104081650.49254.johan@herland.net> (raw)
In-Reply-To: <201104081646.35750.johan@herland.net>

Currently, when using --dirstat-by-file, it first does the full --dirstat
analysis (using diffcore_count_changes()), and then resets 'damage' to 1,
if any damage was found by diffcore_count_changes().

But --dirstat-by-file is not interested in the file damage per se. It only
cares if the file changed at all. In that sense it only cares if the blob
SHA1 for a file has changed. Fortunately, determining which files have
changed is already done when we build the diff_queue, and by the time we
get to show_dirstat(), we know that each entry in the queue correspond to
a changed file. Therefore we can skip the entire --dirstat analysis and
simply set 'damage' to 1 for each entry in the diff queue.

This makes --dirstat-by-file faster, and also bypasses --dirstat's issues
with detecting changes that merely rearrange lines.

The patch also contains an added testcase verifying that --dirstat-by-file
now detects changes that only rearrange lines.

Signed-off-by: Johan Herland <johan@herland.net>
---

I hope someone with more intimate diffcore knowledge can verify that
the diff queue indeed never contains entries that should be considered
"unchanged" by --dirstat-by-file.


...Johan

 diff.c                                             |   21 +++++++++++++++----
 t/t4013-diff-various.sh                            |    2 +
 .../diff.diff_--dirstat-by-file_initial_rearrange  |    3 ++
 3 files changed, 21 insertions(+), 5 deletions(-)
 create mode 100644 t/t4013/diff.diff_--dirstat-by-file_initial_rearrange

diff --git a/diff.c b/diff.c
index 9fa8410..28d9293 100644
--- a/diff.c
+++ b/diff.c
@@ -1541,6 +1541,20 @@ static void show_dirstat(struct diff_options *options)
 
 		name = p->one->path ? p->one->path : p->two->path;
 
+		if (DIFF_OPT_TST(options, DIRSTAT_BY_FILE)) {
+			/*
+			 * In --dirstat-by-file mode, we're only interested in
+			 * whether the file changed _at_all_.
+			 * We don't need to look at the actual file contents.
+			 * Assuming that the diff queue does not contain
+			 * unchanged entries, we can unconditionally add this
+			 * file to the list of results (with each file
+			 * contributing equal damage).
+			 */
+			damage = 1;
+			goto found_damage;
+		}
+
 		if (DIFF_FILE_VALID(p->one) && DIFF_FILE_VALID(p->two)) {
 			diff_populate_filespec(p->one, 0);
 			diff_populate_filespec(p->two, 0);
@@ -1563,14 +1577,11 @@ static void show_dirstat(struct diff_options *options)
 		/*
 		 * Original minus copied is the removed material,
 		 * added is the new material.  They are both damages
-		 * made to the preimage. In --dirstat-by-file mode, count
-		 * damaged files, not damaged lines. This is done by
-		 * counting only a single damaged line per file.
+		 * made to the preimage.
 		 */
 		damage = (p->one->size - copied) + added;
-		if (DIFF_OPT_TST(options, DIRSTAT_BY_FILE) && damage > 0)
-			damage = 1;
 
+found_damage:
 		ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
 		dir.files[dir.nr].name = name;
 		dir.files[dir.nr].changed = damage;
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 8cc94ef..e8240f2 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -302,6 +302,8 @@ diff master master^ side
 diff --dirstat master~1 master~2
 # --dirstat does NOT pick up changes that simply rearrange existing lines
 diff --dirstat initial rearrange
+# ...but --dirstat-by-file DOES pick up rearranged lines
+diff --dirstat-by-file initial rearrange
 EOF
 
 test_expect_success 'log -S requires an argument' '
diff --git a/t/t4013/diff.diff_--dirstat-by-file_initial_rearrange b/t/t4013/diff.diff_--dirstat-by-file_initial_rearrange
new file mode 100644
index 0000000..e48e33f
--- /dev/null
+++ b/t/t4013/diff.diff_--dirstat-by-file_initial_rearrange
@@ -0,0 +1,3 @@
+$ git diff --dirstat-by-file initial rearrange
+ 100.0% dir/
+$
-- 
1.7.5.rc1

  parent reply	other threads:[~2011-04-08 14:52 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-07 13:49 BUG? in --dirstat when rearranging lines in a file Johan Herland
2011-04-07 14:56 ` Linus Torvalds
2011-04-07 22:43   ` Junio C Hamano
2011-04-07 22:59     ` Linus Torvalds
2011-04-08 14:46   ` Johan Herland
2011-04-08 14:48     ` [PATCH 1/3] --dirstat: Document shortcomings compared to --stat or regular diff Johan Herland
2011-04-08 19:50       ` Junio C Hamano
2011-04-08 14:50     ` Johan Herland [this message]
2011-04-08 14:55     ` [RFC/PATCH 3/3] Teach --dirstat to not completely ignore rearranged lines Johan Herland
2011-04-08 15:04     ` BUG? in --dirstat when rearranging lines in a file Linus Torvalds
2011-04-08 19:56       ` Junio C Hamano
2011-04-10 22:48         ` [PATCHv2 0/3] --dirstat fixes Johan Herland
2011-04-10 22:48           ` [PATCHv2 1/3] --dirstat: Describe non-obvious differences relative to --stat or regular diff Johan Herland
2011-04-10 22:48           ` [PATCHv2 2/3] --dirstat-by-file: Make it faster and more correct Johan Herland
2011-04-11 18:14             ` Junio C Hamano
2011-04-10 22:48           ` [PATCHv2 3/3] Teach --dirstat to not completely ignore rearranged lines within a file Johan Herland
2011-04-11 21:38             ` Junio C Hamano
2011-04-11 21:56               ` Johan Herland
2011-04-11 22:08                 ` Junio C Hamano
2011-04-12  9:22                   ` Johan Herland
2011-04-12  9:24                     ` [PATCH 4/3] --dirstat: In case of renames, use target filename instead of source filename Johan Herland
2011-04-12 14:59                       ` Linus Torvalds
2011-04-12  9:26                     ` [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis Johan Herland
2011-04-12 14:46                       ` Linus Torvalds
2011-04-12 15:08                         ` Linus Torvalds
2011-04-12 22:03                           ` Johan Herland
2011-04-12 22:12                             ` Linus Torvalds
2011-04-12 22:22                             ` Junio C Hamano
2011-04-26  0:01                         ` [PATCH 0/6] --dirstat fixes, part 2 Johan Herland
2011-04-26  0:01                           ` [PATCH 1/6] Add several testcases for --dirstat and friends Johan Herland
2011-04-26  0:01                           ` [PATCH 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
2011-04-26  0:01                           ` [PATCH 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
2011-04-26 16:36                             ` Junio C Hamano
2011-04-27  2:02                               ` Johan Herland
2011-04-27  4:53                                 ` Junio C Hamano
2011-04-27 20:51                                 ` Junio C Hamano
2011-04-27 21:01                                   ` Junio C Hamano
2011-04-26  0:01                           ` [PATCH 4/6] Add config variable for specifying default --dirstat behavior Johan Herland
2011-04-26 16:43                             ` Junio C Hamano
2011-04-27  2:02                               ` Johan Herland
2011-04-26  0:01                           ` [PATCH 5/6] Use floating point for --dirstat percentages Johan Herland
2011-04-26 16:52                             ` Junio C Hamano
2011-04-27  2:02                               ` Johan Herland
2011-04-27  4:42                                 ` Junio C Hamano
2011-04-27  4:53                                   ` Linus Torvalds
2011-04-27  5:20                                     ` Junio C Hamano
2011-04-26  0:01                           ` [PATCH 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
2011-04-26 16:59                             ` Junio C Hamano
2011-04-27  2:02                               ` Johan Herland
2011-04-26  0:15                           ` [PATCH 0/6] --dirstat fixes, part 2 Linus Torvalds
2011-04-27  2:12                           ` [PATCHv2 " Johan Herland
2011-04-27  2:12                             ` [PATCHv2 1/6] Add several testcases for --dirstat and friends Johan Herland
2011-04-27  2:12                             ` [PATCHv2 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
2011-04-27  2:12                             ` [PATCHv2 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
2011-04-27  2:12                             ` [PATCHv2 4/6] Add config variable for specifying default --dirstat behavior Johan Herland
2011-04-27  2:12                             ` [PATCHv2 5/6] Use floating point for --dirstat percentages Johan Herland
2011-04-27  2:45                               ` Linus Torvalds
2011-04-27  2:12                             ` [PATCHv2 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
2011-04-27  8:24                             ` [PATCHv3 0/6] --dirstat fixes, part 2 Johan Herland
2011-04-27  8:24                               ` [PATCHv3 1/6] Add several testcases for --dirstat and friends Johan Herland
2011-04-27  8:24                               ` [PATCHv3 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
2011-04-27  8:24                               ` [PATCHv3 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
2011-04-27  8:24                               ` [PATCHv3 4/6] Add config variable for specifying default --dirstat behavior Johan Herland
2011-04-27  8:24                               ` [PATCHv3 5/6] Allow specifying --dirstat cut-off percentage as a floating point number Johan Herland
2011-04-27  8:37                                 ` Linus Torvalds
2011-04-27 10:29                                   ` [PATCHv4 " Johan Herland
2011-04-27  8:24                               ` [PATCHv3 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
2011-04-28  1:17                               ` [PATCHv5 0/7] --dirstat fixes, part 2 Johan Herland
2011-04-28  1:17                                 ` [PATCHv5 1/7] Add several testcases for --dirstat and friends Johan Herland
2011-04-28  1:17                                 ` [PATCHv5 2/7] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
2011-04-28  1:17                                 ` [PATCHv5 3/7] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
2011-04-28  1:17                                 ` [PATCHv5 4/7] Add config variable for specifying default --dirstat behavior Johan Herland
2011-04-28  1:17                                 ` [PATCHv5 5/7] Allow specifying --dirstat cut-off percentage as a floating point number Johan Herland
2011-04-28  1:17                                 ` [PATCHv5 6/7] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
2011-04-28  1:17                                 ` [PATCHv5 7/7] Improve error handling when parsing dirstat parameters Johan Herland
2011-04-28 18:41                                   ` Junio C Hamano
2011-04-28 19:20                                     ` Junio C Hamano
2011-04-28 23:16                                       ` Johan Herland
2011-04-28 23:13                                     ` Johan Herland
2011-04-29  4:06                                       ` Junio C Hamano
2011-04-29  9:36                                         ` [PATCHv6 0/8] --dirstat fixes, part 2 Johan Herland
2011-04-29  9:36                                           ` [PATCHv6 1/8] Add several testcases for --dirstat and friends Johan Herland
2011-04-29  9:36                                           ` [PATCHv6 2/8] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
2011-04-29  9:36                                           ` [PATCHv6 3/8] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
2011-04-29  9:36                                           ` [PATCHv6 4/8] Add config variable for specifying default --dirstat behavior Johan Herland
2011-04-29  9:36                                           ` [PATCHv6 5/8] Allow specifying --dirstat cut-off percentage as a floating point number Johan Herland
2011-04-29  9:36                                           ` [PATCHv6 6/8] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
2011-04-29  9:36                                           ` [PATCHv6 7/8] Improve error handling when parsing dirstat parameters Johan Herland
2011-04-29  9:36                                           ` [PATCHv6 8/8] Mark dirstat error messages for translation Johan Herland
2011-04-12 18:34                       ` [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis Junio C Hamano
2011-04-10 23:17           ` [PATCHv2 0/3] --dirstat fixes Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201104081650.49254.johan@herland.net \
    --to=johan@herland.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).