git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* BUG? in --dirstat when rearranging lines in a file
@ 2011-04-07 13:49 Johan Herland
  2011-04-07 14:56 ` Linus Torvalds
  0 siblings, 1 reply; 91+ messages in thread
From: Johan Herland @ 2011-04-07 13:49 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds

Hi,

(CCed the two main authors of --dirstat and diffcore-delta.c)

Consider the following sequence of commands:

$ git init
$ mkdir dir
$ echo -e "foo\nbar" > dir/file
$ git add dir
$ git commit -m "first"
$ # Rearrange lines in dir/file
$ echo -e "bar\nfoo" > dir/file
$ git diff
diff --git a/dir/file b/dir/file
index 3bd1f0e..1289765 100644
--- a/dir/file
+++ b/dir/file
@@ -1,2 +1,2 @@
-foo
 bar
+foo
$ git diff --stat
 dir/file |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
$ git diff --dirstat
$ # WTF!?

"git diff" and "git diff --stat" generates the expected output, but "git 
diff --dirstat" unexpectedly generates no output at all. I've traced 
this down through show_dirstat(), to diffcore_count_changes() which 
processes the pre-image and post-image to accumulate two counts:

- src_copied (#lines (or 64-byte chunks) copied from pre- to post-)

- literal_added (#lines/chunks added in post-).

When the diff consists only of rearranging lines (like the above 
example) the line-based hashing and subsequent sorting in 
diffcore-delta.c ends up hiding he fact that lines have been moved 
around, and the resulting --dirstat reports less changes than expected.

Is this a bug or a feature? :)


(This issue was originally found by a colleague at $dayjob who wrote a 
script (using --dirstat) to produce a summary of the areas of the 
source tree touched by a given commit)


Have fun! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: BUG? in --dirstat when rearranging lines in a file
  2011-04-07 13:49 BUG? in --dirstat when rearranging lines in a file Johan Herland
@ 2011-04-07 14:56 ` Linus Torvalds
  2011-04-07 22:43   ` Junio C Hamano
  2011-04-08 14:46   ` Johan Herland
  0 siblings, 2 replies; 91+ messages in thread
From: Linus Torvalds @ 2011-04-07 14:56 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Junio C Hamano

On Thu, Apr 7, 2011 at 6:49 AM, Johan Herland <johan@herland.net> wrote:
>
> Consider the following sequence of commands:
> [...]
> $ git diff --stat
>  dir/file |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> $ git diff --dirstat
> $ # WTF!?

So the "--dirstat" thing really is different - it has never done a
full patch, it really only does a line hash count and then estimates
the amount of deleted/new code from that.

> Is this a bug or a feature? :)

It's a "bueature" or a "featug". The dirstat code counts "damage" as
you noticed, and it does that because it's easy and often relevant. It
can be computed without actually generating the whole diff, the same
way we do rename-detection without actually generating the diff, by
just looking at "hash and count each line" information. In fact, it
uses the same "diffcore_count_changes()" function for it.

So the reason I wouldn't call it a bug is that it very much is on
purpose. Generating a real diff is much more expensive, and instead
using the line hashes gives us a quick and efficient O(n) way to
gather up differences. But because it does the difference by basically
just comparing hashes of the lines without taking _ordering_ into
account, it gives you a "how many lines do these files have in common"
rather than a real diff.

So git internally has *three* different "difference" engines:

 - the "delta" algorithm that we use for packing (and binary diffs)
 - the traditional line-based diff for normal diffs
 - the "rename/copy/dirstat damage detectior" that doesn't take line
ordering into account, only some unordered "heap of data contents"
comparison.

You could think of the damage detection as a "rename detection within
a file". It's actually quite nice for "git diff -M --dirstat", where
it means that pure code movement - whether inside a file or by
renaming a file - doesn't count as damage.

(Of course, moving a piece of code _from_ one file to another still
counts as damage, so it's not really ignoring pure code movement).

NOTE! Speed isn't the only reason we do that "unordered heap of data
contents" comparison. For rename detection, we really don't want
moving a big function around in a file to be counted as "rewriting the
file". So there are actually other reasons to like these semantics.
That said, honestly, for dirstat, the big issue was that it made it
really really simple. Look at how small the dirstat patch was (commit
7df7c019c2a4), and realize that it's because it just re-used the
existing damage counting code.

                            Linus

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: BUG? in --dirstat when rearranging lines in a file
  2011-04-07 14:56 ` Linus Torvalds
@ 2011-04-07 22:43   ` Junio C Hamano
  2011-04-07 22:59     ` Linus Torvalds
  2011-04-08 14:46   ` Johan Herland
  1 sibling, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2011-04-07 22:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johan Herland, git, tutufan

Linus Torvalds <torvalds@linux-foundation.org> writes:

> That said, honestly, for dirstat, the big issue was that it made it
> really really simple. Look at how small the dirstat patch was (commit
> 7df7c019c2a4), and realize that it's because it just re-used the
> existing damage counting code.

Yes, the most of the logic added by the patch is to percolate the damage
point up the tree to either coalesce or filter the result into manageable
size.

Speaking of that logic, I've been wondering for about a year and a half if
this "if (permille)" exclusion was intentional:

	/*
	 * We don't report dirstat's for
	 *  - the top level
	 *  - or cases where everything came from a single directory
	 *    under this directory (sources == 1).
	 */
	if (baselen && sources != 1) {
		int permille = this_dir * 1000 / changed;
		if (permille) {
			int percent = permille / 10;
			if (percent >= dir->percent) {
				fprintf(opt->file, "%s%4d.%01d%% %.*s\n", line_prefix,
					percent, permille % 10, baselen, base);
				if (!dir->cumulative)
					return 0;
			}
		}
	}

If the user sets dir->percent to zero, with an expectation that it will
disable all filtering, shouldn't we show everything?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: BUG? in --dirstat when rearranging lines in a file
  2011-04-07 22:43   ` Junio C Hamano
@ 2011-04-07 22:59     ` Linus Torvalds
  0 siblings, 0 replies; 91+ messages in thread
From: Linus Torvalds @ 2011-04-07 22:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, git, tutufan

On Thu, Apr 7, 2011 at 3:43 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> Speaking of that logic, I've been wondering for about a year and a half if
> this "if (permille)" exclusion was intentional:
>
> If the user sets dir->percent to zero, with an expectation that it will
> disable all filtering, shouldn't we show everything?

Hmm. My gut feel is that you still don't want to see directories with
no changes. In fact, doesn't the whole "avoid even diffing identical
directories with the same SHA1" logic end up meaning that even if you
were to disable filtering, you'd _still_ not show the 0% case?

But hey, I dunno. If you want the semantics to be "not identical, but
not damaged enough to be even 0.1%, so show it", I don't think that
would be _wrong_ per se. I just don't think our current "ignore 0.0%
files" is wrong either ;)

                           Linus

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: BUG? in --dirstat when rearranging lines in a file
  2011-04-07 14:56 ` Linus Torvalds
  2011-04-07 22:43   ` Junio C Hamano
@ 2011-04-08 14:46   ` Johan Herland
  2011-04-08 14:48     ` [PATCH 1/3] --dirstat: Document shortcomings compared to --stat or regular diff Johan Herland
                       ` (3 more replies)
  1 sibling, 4 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-08 14:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds, johan

On Thursday 07 April 2011, Linus Torvalds wrote:
> On Thu, Apr 7, 2011 at 6:49 AM, Johan Herland wrote:
> > Consider the following sequence of commands:
> > [...]
> > $ git diff --stat
> >  dir/file |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > $ git diff --dirstat
> > $ # WTF!?
>
> So the "--dirstat" thing really is different - it has never done a
> full patch, it really only does a line hash count and then estimates
> the amount of deleted/new code from that.
>
> [...]

Ok, so here are 3 patches to somewhat improve the situation without 
making --dirstat too ugly or expensive.

#1: Simply document the current behavior.

#2: Improve --dirstat-by-file. It doesn't really care about the per-file 
analysis done by --dirstat, but only whether or not a file has changed 
at all. Since the diff queue does not contain unchanged files (<- this 
is an assumption that I hope someone with more diffcore knowledge can 
verify), we can unconditionally assign damage == 1 to each entry in the 
diff queue, and bypass the entire --dirstat per-file analysis.

#3. This is a quick/ugly hack that depends on the same assumption as #2: 
If an entry is in the diff queue, we now that it is not unchanged. So 
if the per-file analysis yields damage == 0, we know that it must have 
overlooked something (rearranged lines), so we set damage = 1 instead. 
The logic is that underrepresenting a change in --dirstat is better 
than ignoring it...


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 1/3] --dirstat: Document shortcomings compared to --stat or regular diff
  2011-04-08 14:46   ` Johan Herland
@ 2011-04-08 14:48     ` Johan Herland
  2011-04-08 19:50       ` Junio C Hamano
  2011-04-08 14:50     ` [PATCH 2/3] --dirstat-by-file: Make it faster and more correct Johan Herland
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 91+ messages in thread
From: Johan Herland @ 2011-04-08 14:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds, johan

Also add a testcase documenting the current behavior.

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/diff-options.txt                     |    5 +++
 t/t4013-diff-various.sh                            |   27 +++++++++++++++----
 t/t4013/diff.diff_--dirstat_initial_rearrange      |    2 +
 ...tch_--stdout_--cover-letter_-n_initial..master^ |    2 +-
 t/t4013/diff.log_--decorate=full_--all             |    6 ++++
 t/t4013/diff.log_--decorate_--all                  |    6 ++++
 6 files changed, 41 insertions(+), 7 deletions(-)
 create mode 100644 t/t4013/diff.diff_--dirstat_initial_rearrange

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index c93124b..25e48c4 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -72,6 +72,11 @@ endif::git-format-patch[]
 	a cut-off percent (3% by default) are not shown. The cut-off percent
 	can be set with `--dirstat=<limit>`. Changes in a child directory are not
 	counted for the parent directory, unless `--cumulative` is used.
++
+Note that `--dirstat` does not use the regular diff machinery to calculate
+the changes (rather it is based on the rename detection machinery). Therefore,
+`--dirstat` may skip some changes that `--stat` does not skip. For example,
+rearranging the lines in a file will not be detected by `--dirstat`.
 
 --dirstat-by-file[=<limit>]::
 	Same as `--dirstat`, but counts changed files instead of lines.
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 5daa0f2..8cc94ef 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -80,18 +80,31 @@ test_expect_success setup '
 
 	git config log.showroot false &&
 	git commit --amend &&
+
+	GIT_AUTHOR_DATE="2006-06-26 00:06:00 +0000" &&
+	GIT_COMMITTER_DATE="2006-06-26 00:06:00 +0000" &&
+	export GIT_AUTHOR_DATE GIT_COMMITTER_DATE &&
+	git checkout -b rearrange initial &&
+	for i in B A; do echo $i; done >dir/sub &&
+	git add dir/sub &&
+	git commit -m "Rearranged lines in dir/sub" &&
+	git checkout master &&
+
 	git show-branch
 '
 
 : <<\EOF
 ! [initial] Initial
  * [master] Merge branch 'side'
-  ! [side] Side
----
- -  [master] Merge branch 'side'
- *+ [side] Side
- *  [master^] Second
-+*+ [initial] Initial
+  ! [rearrange] Rearranged lines in dir/sub
+   ! [side] Side
+----
+  +  [rearrange] Rearranged lines in dir/sub
+ -   [master] Merge branch 'side'
+ * + [side] Side
+ *   [master^] Third
+ *   [master~2] Second
++*++ [initial] Initial
 EOF
 
 V=`git version | sed -e 's/^git version //' -e 's/\./\\./g'`
@@ -287,6 +300,8 @@ diff --no-index --name-status -- dir2 dir
 diff --no-index dir dir3
 diff master master^ side
 diff --dirstat master~1 master~2
+# --dirstat does NOT pick up changes that simply rearrange existing lines
+diff --dirstat initial rearrange
 EOF
 
 test_expect_success 'log -S requires an argument' '
diff --git a/t/t4013/diff.diff_--dirstat_initial_rearrange b/t/t4013/diff.diff_--dirstat_initial_rearrange
new file mode 100644
index 0000000..fb2e17d
--- /dev/null
+++ b/t/t4013/diff.diff_--dirstat_initial_rearrange
@@ -0,0 +1,2 @@
+$ git diff --dirstat initial rearrange
+$
diff --git a/t/t4013/diff.format-patch_--stdout_--cover-letter_-n_initial..master^ b/t/t4013/diff.format-patch_--stdout_--cover-letter_-n_initial..master^
index 1f0f9ad..3b4e113 100644
--- a/t/t4013/diff.format-patch_--stdout_--cover-letter_-n_initial..master^
+++ b/t/t4013/diff.format-patch_--stdout_--cover-letter_-n_initial..master^
@@ -1,7 +1,7 @@
 $ git format-patch --stdout --cover-letter -n initial..master^
 From 9a6d4949b6b76956d9d5e26f2791ec2ceff5fdc0 Mon Sep 17 00:00:00 2001
 From: C O Mitter <committer@example.com>
-Date: Mon, 26 Jun 2006 00:05:00 +0000
+Date: Mon, 26 Jun 2006 00:06:00 +0000
 Subject: [DIFFERENT_PREFIX 0/2] *** SUBJECT HERE ***
 
 *** BLURB HERE ***
diff --git a/t/t4013/diff.log_--decorate=full_--all b/t/t4013/diff.log_--decorate=full_--all
index d155e0b..44d4525 100644
--- a/t/t4013/diff.log_--decorate=full_--all
+++ b/t/t4013/diff.log_--decorate=full_--all
@@ -1,4 +1,10 @@
 $ git log --decorate=full --all
+commit cd4e72fd96faed3f0ba949dc42967430374e2290 (refs/heads/rearrange)
+Author: A U Thor <author@example.com>
+Date:   Mon Jun 26 00:06:00 2006 +0000
+
+    Rearranged lines in dir/sub
+
 commit 59d314ad6f356dd08601a4cd5e530381da3e3c64 (HEAD, refs/heads/master)
 Merge: 9a6d494 c7a2ab9
 Author: A U Thor <author@example.com>
diff --git a/t/t4013/diff.log_--decorate_--all b/t/t4013/diff.log_--decorate_--all
index fd7c3e6..27d3eab 100644
--- a/t/t4013/diff.log_--decorate_--all
+++ b/t/t4013/diff.log_--decorate_--all
@@ -1,4 +1,10 @@
 $ git log --decorate --all
+commit cd4e72fd96faed3f0ba949dc42967430374e2290 (rearrange)
+Author: A U Thor <author@example.com>
+Date:   Mon Jun 26 00:06:00 2006 +0000
+
+    Rearranged lines in dir/sub
+
 commit 59d314ad6f356dd08601a4cd5e530381da3e3c64 (HEAD, master)
 Merge: 9a6d494 c7a2ab9
 Author: A U Thor <author@example.com>
-- 
1.7.5.rc1

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 2/3] --dirstat-by-file: Make it faster and more correct
  2011-04-08 14:46   ` Johan Herland
  2011-04-08 14:48     ` [PATCH 1/3] --dirstat: Document shortcomings compared to --stat or regular diff Johan Herland
@ 2011-04-08 14:50     ` Johan Herland
  2011-04-08 14:55     ` [RFC/PATCH 3/3] Teach --dirstat to not completely ignore rearranged lines Johan Herland
  2011-04-08 15:04     ` BUG? in --dirstat when rearranging lines in a file Linus Torvalds
  3 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-08 14:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds, johan

Currently, when using --dirstat-by-file, it first does the full --dirstat
analysis (using diffcore_count_changes()), and then resets 'damage' to 1,
if any damage was found by diffcore_count_changes().

But --dirstat-by-file is not interested in the file damage per se. It only
cares if the file changed at all. In that sense it only cares if the blob
SHA1 for a file has changed. Fortunately, determining which files have
changed is already done when we build the diff_queue, and by the time we
get to show_dirstat(), we know that each entry in the queue correspond to
a changed file. Therefore we can skip the entire --dirstat analysis and
simply set 'damage' to 1 for each entry in the diff queue.

This makes --dirstat-by-file faster, and also bypasses --dirstat's issues
with detecting changes that merely rearrange lines.

The patch also contains an added testcase verifying that --dirstat-by-file
now detects changes that only rearrange lines.

Signed-off-by: Johan Herland <johan@herland.net>
---

I hope someone with more intimate diffcore knowledge can verify that
the diff queue indeed never contains entries that should be considered
"unchanged" by --dirstat-by-file.


...Johan

 diff.c                                             |   21 +++++++++++++++----
 t/t4013-diff-various.sh                            |    2 +
 .../diff.diff_--dirstat-by-file_initial_rearrange  |    3 ++
 3 files changed, 21 insertions(+), 5 deletions(-)
 create mode 100644 t/t4013/diff.diff_--dirstat-by-file_initial_rearrange

diff --git a/diff.c b/diff.c
index 9fa8410..28d9293 100644
--- a/diff.c
+++ b/diff.c
@@ -1541,6 +1541,20 @@ static void show_dirstat(struct diff_options *options)
 
 		name = p->one->path ? p->one->path : p->two->path;
 
+		if (DIFF_OPT_TST(options, DIRSTAT_BY_FILE)) {
+			/*
+			 * In --dirstat-by-file mode, we're only interested in
+			 * whether the file changed _at_all_.
+			 * We don't need to look at the actual file contents.
+			 * Assuming that the diff queue does not contain
+			 * unchanged entries, we can unconditionally add this
+			 * file to the list of results (with each file
+			 * contributing equal damage).
+			 */
+			damage = 1;
+			goto found_damage;
+		}
+
 		if (DIFF_FILE_VALID(p->one) && DIFF_FILE_VALID(p->two)) {
 			diff_populate_filespec(p->one, 0);
 			diff_populate_filespec(p->two, 0);
@@ -1563,14 +1577,11 @@ static void show_dirstat(struct diff_options *options)
 		/*
 		 * Original minus copied is the removed material,
 		 * added is the new material.  They are both damages
-		 * made to the preimage. In --dirstat-by-file mode, count
-		 * damaged files, not damaged lines. This is done by
-		 * counting only a single damaged line per file.
+		 * made to the preimage.
 		 */
 		damage = (p->one->size - copied) + added;
-		if (DIFF_OPT_TST(options, DIRSTAT_BY_FILE) && damage > 0)
-			damage = 1;
 
+found_damage:
 		ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
 		dir.files[dir.nr].name = name;
 		dir.files[dir.nr].changed = damage;
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 8cc94ef..e8240f2 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -302,6 +302,8 @@ diff master master^ side
 diff --dirstat master~1 master~2
 # --dirstat does NOT pick up changes that simply rearrange existing lines
 diff --dirstat initial rearrange
+# ...but --dirstat-by-file DOES pick up rearranged lines
+diff --dirstat-by-file initial rearrange
 EOF
 
 test_expect_success 'log -S requires an argument' '
diff --git a/t/t4013/diff.diff_--dirstat-by-file_initial_rearrange b/t/t4013/diff.diff_--dirstat-by-file_initial_rearrange
new file mode 100644
index 0000000..e48e33f
--- /dev/null
+++ b/t/t4013/diff.diff_--dirstat-by-file_initial_rearrange
@@ -0,0 +1,3 @@
+$ git diff --dirstat-by-file initial rearrange
+ 100.0% dir/
+$
-- 
1.7.5.rc1

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [RFC/PATCH 3/3] Teach --dirstat to not completely ignore rearranged lines
  2011-04-08 14:46   ` Johan Herland
  2011-04-08 14:48     ` [PATCH 1/3] --dirstat: Document shortcomings compared to --stat or regular diff Johan Herland
  2011-04-08 14:50     ` [PATCH 2/3] --dirstat-by-file: Make it faster and more correct Johan Herland
@ 2011-04-08 14:55     ` Johan Herland
  2011-04-08 15:04     ` BUG? in --dirstat when rearranging lines in a file Linus Torvalds
  3 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-08 14:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds, johan

Currently, the --dirstat analysis fails to detect some kinds of changes.
For example, rearranging lines in a file causes the "damage" calculated
by show_dirstat() to be 0. However, when we process the diff queue in
show_dirstat(), we already now that there should be at least _some_
damage assigned to each entry, because truly _unchanged_ entries are
simply not present in the diff queue.

This patch teaches show_dirstat() to assign a minimum amount of damage
(== 1) to entries for which the analysis otherwise yields zero damage.
Obviously this is not a complete fix, but it's at least better to
underrepresent these changes, rather than simply pretending that they
don't exist.

Signed-off-by: Johan Herland <johan@herland.net>
---

This is a somewhat quick and ugly solution to make --dirstat at least
show _something_ for changes that consist solely of rearranging lines
in a file. Sure, those changes would be thoroughly underrepresented by
--dirstat (probably falling below the default 3% threshold in many
cases), but I figure it's better to underrepresent them rather than
ignoring them completely.

As with 2/3, this patch also relies on the assumption that the diff
queue never contains entries that should be considered "unchanged" by
--dirstat.

 Documentation/diff-options.txt                |    4 ++--
 diff.c                                        |    8 ++++++++
 t/t4013-diff-various.sh                       |    2 --
 t/t4013/diff.diff_--dirstat_initial_rearrange |    1 +
 4 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 25e48c4..61a8409 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -75,8 +75,8 @@ endif::git-format-patch[]
 +
 Note that `--dirstat` does not use the regular diff machinery to calculate
 the changes (rather it is based on the rename detection machinery). Therefore,
-`--dirstat` may skip some changes that `--stat` does not skip. For example,
-rearranging the lines in a file will not be detected by `--dirstat`.
+`--dirstat` will count some changes differently than `--stat`. For example,
+rearranged lines in a file will be underrepresented by `--dirstat`.
 
 --dirstat-by-file[=<limit>]::
 	Same as `--dirstat`, but counts changed files instead of lines.
diff --git a/diff.c b/diff.c
index 28d9293..0d82082 100644
--- a/diff.c
+++ b/diff.c
@@ -1578,8 +1578,16 @@ static void show_dirstat(struct diff_options *options)
 		 * Original minus copied is the removed material,
 		 * added is the new material.  They are both damages
 		 * made to the preimage.
+		 * If the resulting damage is zero, we know that
+		 * diffcore_count_changes() considers the two entries
+		 * to be identical, but since they are in the diff
+		 * queue at all, we now that there must have been
+		 * _some_ kind of change, so we force all entries to
+		 * have at least a minimum of damage.
 		 */
 		damage = (p->one->size - copied) + added;
+		if (!damage)
+			damage = 1;
 
 found_damage:
 		ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index e8240f2..93a6f20 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -300,9 +300,7 @@ diff --no-index --name-status -- dir2 dir
 diff --no-index dir dir3
 diff master master^ side
 diff --dirstat master~1 master~2
-# --dirstat does NOT pick up changes that simply rearrange existing lines
 diff --dirstat initial rearrange
-# ...but --dirstat-by-file DOES pick up rearranged lines
 diff --dirstat-by-file initial rearrange
 EOF
 
diff --git a/t/t4013/diff.diff_--dirstat_initial_rearrange b/t/t4013/diff.diff_--dirstat_initial_rearrange
index fb2e17d..5fb02c1 100644
--- a/t/t4013/diff.diff_--dirstat_initial_rearrange
+++ b/t/t4013/diff.diff_--dirstat_initial_rearrange
@@ -1,2 +1,3 @@
 $ git diff --dirstat initial rearrange
+ 100.0% dir/
 $
-- 
1.7.5.rc1

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: BUG? in --dirstat when rearranging lines in a file
  2011-04-08 14:46   ` Johan Herland
                       ` (2 preceding siblings ...)
  2011-04-08 14:55     ` [RFC/PATCH 3/3] Teach --dirstat to not completely ignore rearranged lines Johan Herland
@ 2011-04-08 15:04     ` Linus Torvalds
  2011-04-08 19:56       ` Junio C Hamano
  3 siblings, 1 reply; 91+ messages in thread
From: Linus Torvalds @ 2011-04-08 15:04 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, git

On Fri, Apr 8, 2011 at 7:46 AM, Johan Herland <johan@herland.net> wrote:
>
> #2: Improve --dirstat-by-file. It doesn't really care about the per-file
> analysis done by --dirstat, but only whether or not a file has changed
> at all. Since the diff queue does not contain unchanged files (<- this
> is an assumption that I hope someone with more diffcore knowledge can
> verify),

Hmm.

I think that with renames, the diff queue _can_ contain unchanged
files (ie pure renames).

Also, I think -CC (aka --find-copies-harder), _every_ file ends up in
the diff queue because that's how it does the detection.

But I didn't actually check, and I may be full of sh*t.

                              Linus

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 1/3] --dirstat: Document shortcomings compared to --stat or regular diff
  2011-04-08 14:48     ` [PATCH 1/3] --dirstat: Document shortcomings compared to --stat or regular diff Johan Herland
@ 2011-04-08 19:50       ` Junio C Hamano
  0 siblings, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2011-04-08 19:50 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Linus Torvalds

Johan Herland <johan@herland.net> writes:

> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index c93124b..25e48c4 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -72,6 +72,11 @@ endif::git-format-patch[]
>  	a cut-off percent (3% by default) are not shown. The cut-off percent
>  	can be set with `--dirstat=<limit>`. Changes in a child directory are not
>  	counted for the parent directory, unless `--cumulative` is used.
> ++
> +Note that `--dirstat` does not use the regular diff machinery to calculate
> +the changes (rather it is based on the rename detection machinery). Therefore,
> +`--dirstat` may skip some changes that `--stat` does not skip. For example,
> +rearranging the lines in a file will not be detected by `--dirstat`.

Be positive: s/will not be detected/is not considered to be a change/,
perhaps.  Also "it is based on the rename detection machinery" is
describing an implementation detail without helping the end users.

Try to rephrase what Linus explained when he said "it is very much on
purpose".  Perhaps like this?

    Note that the `--dirstat` option computes the changes while ignoring
    pure code movements within a file.  In other words, rearranging lines
    in a file is not counted as a change.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: BUG? in --dirstat when rearranging lines in a file
  2011-04-08 15:04     ` BUG? in --dirstat when rearranging lines in a file Linus Torvalds
@ 2011-04-08 19:56       ` Junio C Hamano
  2011-04-10 22:48         ` [PATCHv2 0/3] --dirstat fixes Johan Herland
  0 siblings, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2011-04-08 19:56 UTC (permalink / raw)
  To: Johan Herland; +Cc: Linus Torvalds, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Fri, Apr 8, 2011 at 7:46 AM, Johan Herland <johan@herland.net> wrote:
>>
>> #2: Improve --dirstat-by-file. It doesn't really care about the per-file
>> analysis done by --dirstat, but only whether or not a file has changed
>> at all. Since the diff queue does not contain unchanged files (<- this
>> is an assumption that I hope someone with more diffcore knowledge can
>> verify),
>
> Hmm.
>
> I think that with renames, the diff queue _can_ contain unchanged
> files (ie pure renames).
>
> Also, I think -CC (aka --find-copies-harder), _every_ file ends up in
> the diff queue because that's how it does the detection.

Both are correct, but the output phase happens after diffcore_std() cleans
up the unused and unchanged filepairs thrown into the queue for the
purpose of find-copies-harder, so you shouldn't have to worry about them.

When you rename a file without changing its contents, what do you want to
see in --dirstat-by-file output?  I assume that you do not want to show
anything, so it would be sufficient to compare the two object names.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCHv2 0/3] --dirstat fixes
  2011-04-08 19:56       ` Junio C Hamano
@ 2011-04-10 22:48         ` Johan Herland
  2011-04-10 22:48           ` [PATCHv2 1/3] --dirstat: Describe non-obvious differences relative to --stat or regular diff Johan Herland
                             ` (3 more replies)
  0 siblings, 4 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-10 22:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds, Johan Herland

Here's a reroll of the previous series. Changes since v1:

- Adopt Junio's phrasing of the differences between --dirstat and
  regular diff (--stat)

- Detect and ignore pure renames in the diff queue. This is done by
  comparing the SHA1s of each file pair, and if they are equal, we
  know the files are identical, and should not show up in --dirstat.

  As an extra bonus in this version, when the SHA1s do match, we can
  bypass the usual --dirstat analysis, because we know it would find
  no changes. Instead, we can directly set damage = 0 in that case.

I've looked at the contents of the diff queue and resulting output in
a variety of cases:

- files with no changes, rearranged lines, and other changes
- files that are copied, moved, or not moved
- unstaged changes, staged changes, committed changes
- diff options: (none), --stat, --dirstat, and --dirstat-by-file
- diff options: (none), -M, and -C -C

(324 variations in total) and I'm fairly sure about the current patches
and how they interact with the diff queue.

A remaining question AFAICS is if there's a different (i.e. better) way
to (cheaply) estimate the damage contributed by code movements within a
file. The current "damage = 1" approach is somewhat crude, but IMHO
still better that ignoring code movements altogether.


Have fun! :)

...Johan


Johan Herland (3):
  --dirstat: Describe non-obvious differences relative to --stat or regular diff
  --dirstat-by-file: Make it faster and more correct
  Teach --dirstat to not completely ignore rearranged lines within a file

 Documentation/diff-options.txt                     |    4 ++
 diff.c                                             |   40 ++++++++++++++++++--
 t/t4013-diff-various.sh                            |   27 ++++++++++---
 .../diff.diff_--dirstat-by-file_initial_rearrange  |    3 +
 t/t4013/diff.diff_--dirstat_initial_rearrange      |    3 +
 ...tch_--stdout_--cover-letter_-n_initial..master^ |    2 +-
 t/t4013/diff.log_--decorate=full_--all             |    6 +++
 t/t4013/diff.log_--decorate_--all                  |    6 +++
 8 files changed, 80 insertions(+), 11 deletions(-)
 create mode 100644 t/t4013/diff.diff_--dirstat-by-file_initial_rearrange
 create mode 100644 t/t4013/diff.diff_--dirstat_initial_rearrange

-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCHv2 1/3] --dirstat: Describe non-obvious differences relative to --stat or regular diff
  2011-04-10 22:48         ` [PATCHv2 0/3] --dirstat fixes Johan Herland
@ 2011-04-10 22:48           ` Johan Herland
  2011-04-10 22:48           ` [PATCHv2 2/3] --dirstat-by-file: Make it faster and more correct Johan Herland
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-10 22:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds, Johan Herland

Also add a testcase documenting the current behavior.

Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/diff-options.txt                     |    4 +++
 t/t4013-diff-various.sh                            |   27 +++++++++++++++----
 t/t4013/diff.diff_--dirstat_initial_rearrange      |    2 +
 ...tch_--stdout_--cover-letter_-n_initial..master^ |    2 +-
 t/t4013/diff.log_--decorate=full_--all             |    6 ++++
 t/t4013/diff.log_--decorate_--all                  |    6 ++++
 6 files changed, 40 insertions(+), 7 deletions(-)
 create mode 100644 t/t4013/diff.diff_--dirstat_initial_rearrange

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index c93124b..23772d6 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -72,6 +72,10 @@ endif::git-format-patch[]
 	a cut-off percent (3% by default) are not shown. The cut-off percent
 	can be set with `--dirstat=<limit>`. Changes in a child directory are not
 	counted for the parent directory, unless `--cumulative` is used.
++
+Note that the `--dirstat` option computes the changes while ignoring
+pure code movements within a file.  In other words, rearranging lines
+in a file is not counted as a change.
 
 --dirstat-by-file[=<limit>]::
 	Same as `--dirstat`, but counts changed files instead of lines.
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 5daa0f2..3b1b392 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -80,18 +80,31 @@ test_expect_success setup '
 
 	git config log.showroot false &&
 	git commit --amend &&
+
+	GIT_AUTHOR_DATE="2006-06-26 00:06:00 +0000" &&
+	GIT_COMMITTER_DATE="2006-06-26 00:06:00 +0000" &&
+	export GIT_AUTHOR_DATE GIT_COMMITTER_DATE &&
+	git checkout -b rearrange initial &&
+	for i in B A; do echo $i; done >dir/sub &&
+	git add dir/sub &&
+	git commit -m "Rearranged lines in dir/sub" &&
+	git checkout master &&
+
 	git show-branch
 '
 
 : <<\EOF
 ! [initial] Initial
  * [master] Merge branch 'side'
-  ! [side] Side
----
- -  [master] Merge branch 'side'
- *+ [side] Side
- *  [master^] Second
-+*+ [initial] Initial
+  ! [rearrange] Rearranged lines in dir/sub
+   ! [side] Side
+----
+  +  [rearrange] Rearranged lines in dir/sub
+ -   [master] Merge branch 'side'
+ * + [side] Side
+ *   [master^] Third
+ *   [master~2] Second
++*++ [initial] Initial
 EOF
 
 V=`git version | sed -e 's/^git version //' -e 's/\./\\./g'`
@@ -287,6 +300,8 @@ diff --no-index --name-status -- dir2 dir
 diff --no-index dir dir3
 diff master master^ side
 diff --dirstat master~1 master~2
+# --dirstat doesn't notice changes that simply rearrange existing lines
+diff --dirstat initial rearrange
 EOF
 
 test_expect_success 'log -S requires an argument' '
diff --git a/t/t4013/diff.diff_--dirstat_initial_rearrange b/t/t4013/diff.diff_--dirstat_initial_rearrange
new file mode 100644
index 0000000..fb2e17d
--- /dev/null
+++ b/t/t4013/diff.diff_--dirstat_initial_rearrange
@@ -0,0 +1,2 @@
+$ git diff --dirstat initial rearrange
+$
diff --git a/t/t4013/diff.format-patch_--stdout_--cover-letter_-n_initial..master^ b/t/t4013/diff.format-patch_--stdout_--cover-letter_-n_initial..master^
index 1f0f9ad..3b4e113 100644
--- a/t/t4013/diff.format-patch_--stdout_--cover-letter_-n_initial..master^
+++ b/t/t4013/diff.format-patch_--stdout_--cover-letter_-n_initial..master^
@@ -1,7 +1,7 @@
 $ git format-patch --stdout --cover-letter -n initial..master^
 From 9a6d4949b6b76956d9d5e26f2791ec2ceff5fdc0 Mon Sep 17 00:00:00 2001
 From: C O Mitter <committer@example.com>
-Date: Mon, 26 Jun 2006 00:05:00 +0000
+Date: Mon, 26 Jun 2006 00:06:00 +0000
 Subject: [DIFFERENT_PREFIX 0/2] *** SUBJECT HERE ***
 
 *** BLURB HERE ***
diff --git a/t/t4013/diff.log_--decorate=full_--all b/t/t4013/diff.log_--decorate=full_--all
index d155e0b..44d4525 100644
--- a/t/t4013/diff.log_--decorate=full_--all
+++ b/t/t4013/diff.log_--decorate=full_--all
@@ -1,4 +1,10 @@
 $ git log --decorate=full --all
+commit cd4e72fd96faed3f0ba949dc42967430374e2290 (refs/heads/rearrange)
+Author: A U Thor <author@example.com>
+Date:   Mon Jun 26 00:06:00 2006 +0000
+
+    Rearranged lines in dir/sub
+
 commit 59d314ad6f356dd08601a4cd5e530381da3e3c64 (HEAD, refs/heads/master)
 Merge: 9a6d494 c7a2ab9
 Author: A U Thor <author@example.com>
diff --git a/t/t4013/diff.log_--decorate_--all b/t/t4013/diff.log_--decorate_--all
index fd7c3e6..27d3eab 100644
--- a/t/t4013/diff.log_--decorate_--all
+++ b/t/t4013/diff.log_--decorate_--all
@@ -1,4 +1,10 @@
 $ git log --decorate --all
+commit cd4e72fd96faed3f0ba949dc42967430374e2290 (rearrange)
+Author: A U Thor <author@example.com>
+Date:   Mon Jun 26 00:06:00 2006 +0000
+
+    Rearranged lines in dir/sub
+
 commit 59d314ad6f356dd08601a4cd5e530381da3e3c64 (HEAD, master)
 Merge: 9a6d494 c7a2ab9
 Author: A U Thor <author@example.com>
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv2 2/3] --dirstat-by-file: Make it faster and more correct
  2011-04-10 22:48         ` [PATCHv2 0/3] --dirstat fixes Johan Herland
  2011-04-10 22:48           ` [PATCHv2 1/3] --dirstat: Describe non-obvious differences relative to --stat or regular diff Johan Herland
@ 2011-04-10 22:48           ` Johan Herland
  2011-04-11 18:14             ` Junio C Hamano
  2011-04-10 22:48           ` [PATCHv2 3/3] Teach --dirstat to not completely ignore rearranged lines within a file Johan Herland
  2011-04-10 23:17           ` [PATCHv2 0/3] --dirstat fixes Linus Torvalds
  3 siblings, 1 reply; 91+ messages in thread
From: Johan Herland @ 2011-04-10 22:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds, Johan Herland

Currently, when using --dirstat-by-file, it first does the full --dirstat
analysis (using diffcore_count_changes()), and then resets 'damage' to 1,
if any damage was found by diffcore_count_changes().

But --dirstat-by-file is not interested in the file damage per se. It only
cares if the file changed at all. In that sense it only cares if the blob
SHA1 for a file has changed. We therefore only need to compare the SHA1s
of each file pair in the diff queue. As a result, we can skip the entire
--dirstat analysis and simply set 'damage' to 1 for each entry where the
SHA1 has changed.

This makes --dirstat-by-file faster, and also bypasses --dirstat's practice
of ignoring rearranged lines within a file.

The patch also contains an added testcase verifying that --dirstat-by-file
now detects changes that only rearrange lines within a file.

Signed-off-by: Johan Herland <johan@herland.net>
---
 diff.c                                             |   25 ++++++++++++++++----
 t/t4013-diff-various.sh                            |    2 +
 .../diff.diff_--dirstat-by-file_initial_rearrange  |    3 ++
 3 files changed, 25 insertions(+), 5 deletions(-)
 create mode 100644 t/t4013/diff.diff_--dirstat-by-file_initial_rearrange

diff --git a/diff.c b/diff.c
index 9fa8410..a224048 100644
--- a/diff.c
+++ b/diff.c
@@ -1538,9 +1538,27 @@ static void show_dirstat(struct diff_options *options)
 		struct diff_filepair *p = q->queue[i];
 		const char *name;
 		unsigned long copied, added, damage;
+		int content_changed;
 
 		name = p->one->path ? p->one->path : p->two->path;
 
+		if (p->one->sha1_valid && p->two->sha1_valid)
+			content_changed = hashcmp(p->one->sha1, p->two->sha1);
+		else
+			content_changed = 1;
+
+		if (DIFF_OPT_TST(options, DIRSTAT_BY_FILE)) {
+			/*
+			 * In --dirstat-by-file mode, we don't really need to
+			 * look at the actual file contents at all.
+			 * The fact that the SHA1 changed is enough for us to
+			 * add this file to the list of results
+			 * (with each file contributing equal damage).
+			 */
+			damage = content_changed ? 1 : 0;
+			goto found_damage;
+		}
+
 		if (DIFF_FILE_VALID(p->one) && DIFF_FILE_VALID(p->two)) {
 			diff_populate_filespec(p->one, 0);
 			diff_populate_filespec(p->two, 0);
@@ -1563,14 +1581,11 @@ static void show_dirstat(struct diff_options *options)
 		/*
 		 * Original minus copied is the removed material,
 		 * added is the new material.  They are both damages
-		 * made to the preimage. In --dirstat-by-file mode, count
-		 * damaged files, not damaged lines. This is done by
-		 * counting only a single damaged line per file.
+		 * made to the preimage.
 		 */
 		damage = (p->one->size - copied) + added;
-		if (DIFF_OPT_TST(options, DIRSTAT_BY_FILE) && damage > 0)
-			damage = 1;
 
+found_damage:
 		ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
 		dir.files[dir.nr].name = name;
 		dir.files[dir.nr].changed = damage;
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 3b1b392..6428a90 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -302,6 +302,8 @@ diff master master^ side
 diff --dirstat master~1 master~2
 # --dirstat doesn't notice changes that simply rearrange existing lines
 diff --dirstat initial rearrange
+# ...but --dirstat-by-file does notice changes that only rearrange lines
+diff --dirstat-by-file initial rearrange
 EOF
 
 test_expect_success 'log -S requires an argument' '
diff --git a/t/t4013/diff.diff_--dirstat-by-file_initial_rearrange b/t/t4013/diff.diff_--dirstat-by-file_initial_rearrange
new file mode 100644
index 0000000..e48e33f
--- /dev/null
+++ b/t/t4013/diff.diff_--dirstat-by-file_initial_rearrange
@@ -0,0 +1,3 @@
+$ git diff --dirstat-by-file initial rearrange
+ 100.0% dir/
+$
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv2 3/3] Teach --dirstat to not completely ignore rearranged lines within a file
  2011-04-10 22:48         ` [PATCHv2 0/3] --dirstat fixes Johan Herland
  2011-04-10 22:48           ` [PATCHv2 1/3] --dirstat: Describe non-obvious differences relative to --stat or regular diff Johan Herland
  2011-04-10 22:48           ` [PATCHv2 2/3] --dirstat-by-file: Make it faster and more correct Johan Herland
@ 2011-04-10 22:48           ` Johan Herland
  2011-04-11 21:38             ` Junio C Hamano
  2011-04-10 23:17           ` [PATCHv2 0/3] --dirstat fixes Linus Torvalds
  3 siblings, 1 reply; 91+ messages in thread
From: Johan Herland @ 2011-04-10 22:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds, Johan Herland

Currently, the --dirstat analysis fails to detect when lines within a
file are rearranged, because the "damage" calculated by show_dirstat()
is 0. However, if the SHA1 sum has changed, we already now that there
should be at least some minimum amount of damage.

This patch teaches show_dirstat() to assign a minimum amount of damage
(== 1) to entries for which the analysis otherwise yields zero damage.
Obviously this is not a complete fix, but it's at least better to
underrepresent these changes, rather than simply pretending that they
don't exist.

Also, with the added SHA1 comparison, we can safely skip the --dirstat
analysis when the SHA1s do happen to match (e.g. for a pure file rename)

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/diff-options.txt                |    4 ++--
 diff.c                                        |   19 ++++++++++++++++++-
 t/t4013-diff-various.sh                       |    2 --
 t/t4013/diff.diff_--dirstat_initial_rearrange |    1 +
 4 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 23772d6..7e4bd42 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -74,8 +74,8 @@ endif::git-format-patch[]
 	counted for the parent directory, unless `--cumulative` is used.
 +
 Note that the `--dirstat` option computes the changes while ignoring
-pure code movements within a file.  In other words, rearranging lines
-in a file is not counted as a change.
+the amount of pure code movements within a file.  In other words,
+rearranging lines in a file is not counted as much as other changes.
 
 --dirstat-by-file[=<limit>]::
 	Same as `--dirstat`, but counts changed files instead of lines.
diff --git a/diff.c b/diff.c
index a224048..3e0bc1f 100644
--- a/diff.c
+++ b/diff.c
@@ -1547,6 +1547,16 @@ static void show_dirstat(struct diff_options *options)
 		else
 			content_changed = 1;
 
+		if (!content_changed) {
+			/*
+			 * The SHA1 has not changed, so pre-/post-content is
+			 * identical. We can therefore skip looking at the
+			 * file contents altogether.
+			 */
+			damage = 0;
+			goto found_damage;
+		}
+
 		if (DIFF_OPT_TST(options, DIRSTAT_BY_FILE)) {
 			/*
 			 * In --dirstat-by-file mode, we don't really need to
@@ -1555,7 +1565,7 @@ static void show_dirstat(struct diff_options *options)
 			 * add this file to the list of results
 			 * (with each file contributing equal damage).
 			 */
-			damage = content_changed ? 1 : 0;
+			damage = 1;
 			goto found_damage;
 		}
 
@@ -1582,8 +1592,15 @@ static void show_dirstat(struct diff_options *options)
 		 * Original minus copied is the removed material,
 		 * added is the new material.  They are both damages
 		 * made to the preimage.
+		 * If the resulting damage is zero, we know that
+		 * diffcore_count_changes() considers the two entries to
+		 * be identical, but since content_changed is true, we
+		 * know that there must have been _some_ kind of change,
+		 * so we force all entries to have damage > 0.
 		 */
 		damage = (p->one->size - copied) + added;
+		if (!damage)
+			damage = 1;
 
 found_damage:
 		ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 6428a90..93a6f20 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -300,9 +300,7 @@ diff --no-index --name-status -- dir2 dir
 diff --no-index dir dir3
 diff master master^ side
 diff --dirstat master~1 master~2
-# --dirstat doesn't notice changes that simply rearrange existing lines
 diff --dirstat initial rearrange
-# ...but --dirstat-by-file does notice changes that only rearrange lines
 diff --dirstat-by-file initial rearrange
 EOF
 
diff --git a/t/t4013/diff.diff_--dirstat_initial_rearrange b/t/t4013/diff.diff_--dirstat_initial_rearrange
index fb2e17d..5fb02c1 100644
--- a/t/t4013/diff.diff_--dirstat_initial_rearrange
+++ b/t/t4013/diff.diff_--dirstat_initial_rearrange
@@ -1,2 +1,3 @@
 $ git diff --dirstat initial rearrange
+ 100.0% dir/
 $
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCHv2 0/3] --dirstat fixes
  2011-04-10 22:48         ` [PATCHv2 0/3] --dirstat fixes Johan Herland
                             ` (2 preceding siblings ...)
  2011-04-10 22:48           ` [PATCHv2 3/3] Teach --dirstat to not completely ignore rearranged lines within a file Johan Herland
@ 2011-04-10 23:17           ` Linus Torvalds
  3 siblings, 0 replies; 91+ messages in thread
From: Linus Torvalds @ 2011-04-10 23:17 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, git

On Sun, Apr 10, 2011 at 3:48 PM, Johan Herland <johan@herland.net> wrote:
> Here's a reroll of the previous series. Changes since v1:

The series looks fine to me,

       Linus

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv2 2/3] --dirstat-by-file: Make it faster and more correct
  2011-04-10 22:48           ` [PATCHv2 2/3] --dirstat-by-file: Make it faster and more correct Johan Herland
@ 2011-04-11 18:14             ` Junio C Hamano
  0 siblings, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2011-04-11 18:14 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Linus Torvalds

Johan Herland <johan@herland.net> writes:

> Currently, when using --dirstat-by-file, it first does the full --dirstat
> analysis (using diffcore_count_changes()), and then resets 'damage' to 1,
> if any damage was found by diffcore_count_changes().
>
> But --dirstat-by-file is not interested in the file damage per se. It only
> cares if the file changed at all. In that sense it only cares if the blob
> SHA1 for a file has changed. We therefore only need to compare the SHA1s
> of each file pair in the diff queue. As a result, we can skip the entire
> --dirstat analysis and simply set 'damage' to 1 for each entry where the
> SHA1 has changed.

Very sensible.  Thanks.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv2 3/3] Teach --dirstat to not completely ignore rearranged lines within a file
  2011-04-10 22:48           ` [PATCHv2 3/3] Teach --dirstat to not completely ignore rearranged lines within a file Johan Herland
@ 2011-04-11 21:38             ` Junio C Hamano
  2011-04-11 21:56               ` Johan Herland
  0 siblings, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2011-04-11 21:38 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Linus Torvalds

Johan Herland <johan@herland.net> writes:

> Currently, the --dirstat analysis fails to detect when lines within a
> file are rearranged, because the "damage" calculated by show_dirstat()
> is 0. However, if the SHA1 sum has changed, we already now that there
> should be at least some minimum amount of damage.

This logic is sensible, modulo that "fails to detect" is actually "ignores
mere line movements on purpose".

In any case, if the object names are different, we already know that there
is _some_ damage, and it is very unintiutive to claim that there is _no_
damage.

> This patch teaches show_dirstat() to assign a minimum amount of damage
> (== 1) to entries for which the analysis otherwise yields zero damage.

So it is perfectly in line with the above logic to give a minimum here.
Zero was simply just unintuitive, and this is a good fix to the problem.

> Obviously this is not a complete fix, but it's at least better to

I however do not understand what "a complete fix" means in this context.
You've fixed the unintuitiveness, and as far as the description in the
introductory paragraph of the problem goes, I think this already is a
complete fix.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv2 3/3] Teach --dirstat to not completely ignore rearranged lines within a file
  2011-04-11 21:38             ` Junio C Hamano
@ 2011-04-11 21:56               ` Johan Herland
  2011-04-11 22:08                 ` Junio C Hamano
  0 siblings, 1 reply; 91+ messages in thread
From: Johan Herland @ 2011-04-11 21:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

On Monday 11 April 2011, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > Currently, the --dirstat analysis fails to detect when lines within a
> > file are rearranged, because the "damage" calculated by show_dirstat()
> > is 0. However, if the SHA1 sum has changed, we already now that there
> > should be at least some minimum amount of damage.
> 
> This logic is sensible, modulo that "fails to detect" is actually
> "ignores mere line movements on purpose".

I apologize for my commit message not having caught up with discussion 
around this issue. I came into the discussion from the POV of "--dirstat 
does not pick up what --stat picks up; there must be a bug in --dirstat", 
and my original objective was therefore to "fix" --dirstat to be "more like 
--stat". Obviously, I now know exactly why --dirstat is different, and that 
we don't want to fundamentally change it. My commit message should have been 
rephrased in a more positive light as a result. Feel free to fix before 
applying.

> In any case, if the object names are different, we already know that
> there is _some_ damage, and it is very unintiutive to claim that there
> is _no_ damage.

Agreed.

> > This patch teaches show_dirstat() to assign a minimum amount of damage
> > (== 1) to entries for which the analysis otherwise yields zero damage.
> 
> So it is perfectly in line with the above logic to give a minimum here.
> Zero was simply just unintuitive, and this is a good fix to the problem.
> 
> > Obviously this is not a complete fix, but it's at least better to
> 
> I however do not understand what "a complete fix" means in this context.
> You've fixed the unintuitiveness, and as far as the description in the
> introductory paragraph of the problem goes, I think this already is a
> complete fix.

I still feel that a file with 1000 rearranged lines should somehow count 
"more" than a file with only 1 rearranged line, but it's hard to get there 
without futzing with diffcore_count_changes(), probably making the whole 
thing considerably more expensive... So in that sense, I agree that the 
current solution is probably as complete as we can get.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv2 3/3] Teach --dirstat to not completely ignore rearranged lines within a file
  2011-04-11 21:56               ` Johan Herland
@ 2011-04-11 22:08                 ` Junio C Hamano
  2011-04-12  9:22                   ` Johan Herland
  0 siblings, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2011-04-11 22:08 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Linus Torvalds

Johan Herland <johan@herland.net> writes:

> I still feel that a file with 1000 rearranged lines should somehow count 
> "more" than a file with only 1 rearranged line,...

I think that is just entirely a different mode of operation.  I do not
think it is wrong to have an alternative implementation of the dirstat
damage counter that is based on numstat code.

It may end up counting the damage slower than the current code, and more
importantly it will count a different kind of damage than the current code
does, so we may probably want to make it an optional feature.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv2 3/3] Teach --dirstat to not completely ignore rearranged lines within a file
  2011-04-11 22:08                 ` Junio C Hamano
@ 2011-04-12  9:22                   ` Johan Herland
  2011-04-12  9:24                     ` [PATCH 4/3] --dirstat: In case of renames, use target filename instead of source filename Johan Herland
  2011-04-12  9:26                     ` [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis Johan Herland
  0 siblings, 2 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-12  9:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds, johan

On Tuesday 12 April 2011, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > I still feel that a file with 1000 rearranged lines should somehow
> > count "more" than a file with only 1 rearranged line,...
> 
> I think that is just entirely a different mode of operation.  I do not
> think it is wrong to have an alternative implementation of the dirstat
> damage counter that is based on numstat code.
> 
> It may end up counting the damage slower than the current code, and more
> importantly it will count a different kind of damage than the current
> code does, so we may probably want to make it an optional feature.

I wrote it up just for fun, and here's the patch. I'll leave it up to you
to decide if it's worth it.

First, though, I've got another patch to --dirstat, which - in the case
of renames, attributes the damage to the target filename instead of the
source filename. I found this more intuitive, especially in the case of
copies (-C -C) where damage would be attributed to the directory
containing the (unchanged) source file, instead of the directory
containing the (changed) target file.


Have fun! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 4/3] --dirstat: In case of renames, use target filename instead of source filename
  2011-04-12  9:22                   ` Johan Herland
@ 2011-04-12  9:24                     ` Johan Herland
  2011-04-12 14:59                       ` Linus Torvalds
  2011-04-12  9:26                     ` [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis Johan Herland
  1 sibling, 1 reply; 91+ messages in thread
From: Johan Herland @ 2011-04-12  9:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds, johan

This changes --dirstat analysis to count "damage" toward the target filename,
rather than the source filename. For renames within a directory, this won't
matter to the final output, but when moving files between diretories, the
output now lists the target directory rather than the source directory.

Signed-off-by: Johan Herland <johan@herland.net>
---
 diff.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/diff.c b/diff.c
index 3e0bc1f..5376d01 100644
--- a/diff.c
+++ b/diff.c
@@ -1540,7 +1540,7 @@ static void show_dirstat(struct diff_options *options)
 		unsigned long copied, added, damage;
 		int content_changed;
 
-		name = p->one->path ? p->one->path : p->two->path;
+		name = p->two->path ? p->two->path : p->one->path;
 
 		if (p->one->sha1_valid && p->two->sha1_valid)
 			content_changed = hashcmp(p->one->sha1, p->two->sha1);
-- 
1.7.5.rc1.3.g4d7b



-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis
  2011-04-12  9:22                   ` Johan Herland
  2011-04-12  9:24                     ` [PATCH 4/3] --dirstat: In case of renames, use target filename instead of source filename Johan Herland
@ 2011-04-12  9:26                     ` Johan Herland
  2011-04-12 14:46                       ` Linus Torvalds
  2011-04-12 18:34                       ` [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis Junio C Hamano
  1 sibling, 2 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-12  9:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds, johan

This patch adds an alternative implementation of show_dirstat(), called
show_dirstat_based_on_diffstat(), which uses the more expensive diffstat
analysis (as opposed to --dirstat's own (inexpensive) analysis) to derive
the numbers from which the --dirstat output is computed.

The alternative implementation is controlled by a new config variable called
diff.dirstatBasedOnDiffstat.

In linux-2.6.git, running

  time git diff v2.6.20..v2.6.30 --dirstat=0 > /dev/null

with and without diff.dirstatBasedOnDiffstat enabled yields the following
average runtimes on my machine:

- disabled: ~6.0 s
- enabled:  ~9.7 s

So, as expected, there's a considerable performance hit (>60%) by going
through the full diffstat analysis. As such, the new option is probably
only useful if you really need the --dirstat numbers to be consistent with
the numbers returned from the other --*stat options.

In --dirstat-by-file mode, the diffstat analysis is obviously a waste of time,
so --dirstat-by-file automatically disabled diff.dirstatBasedOnDiffstat.

Signed-off-by: Johan Herland <johan@herland.net>
---

This might not be worth applying at all, but if it is, I can send a re-roll
with documentation and more user-friendlyness.


Have fun! :)

...Johan

 diff.c |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 5376d01..a496ba6 100644
--- a/diff.c
+++ b/diff.c
@@ -31,6 +31,7 @@ static const char *external_diff_cmd_cfg;
 int diff_auto_refresh_index = 1;
 static int diff_mnemonic_prefix;
 static int diff_no_prefix;
+static int dirstat_based_on_diffstat;
 static struct diff_options default_diff_options;
 
 static char diff_colors[][COLOR_MAXLEN] = {
@@ -103,6 +104,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
 		diff_no_prefix = git_config_bool(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "diff.dirstatbasedondiffstat")) {
+		dirstat_based_on_diffstat = git_config_bool(var, value);
+		return 0;
+	}
 	if (!strcmp(var, "diff.external"))
 		return git_config_string(&external_diff_cmd_cfg, var, value);
 	if (!strcmp(var, "diff.wordregex"))
@@ -1619,6 +1624,43 @@ found_damage:
 	gather_dirstat(options, &dir, changed, "", 0);
 }
 
+static void show_dirstat_based_on_diffstat(struct diffstat_t *data, struct diff_options 
*options)
+{
+	int i;
+	unsigned long changed;
+	struct dirstat_dir dir;
+
+	if (data->nr == 0)
+		return;
+
+	dir.files = NULL;
+	dir.alloc = 0;
+	dir.nr = 0;
+	dir.percent = options->dirstat_percent;
+	dir.cumulative = DIFF_OPT_TST(options, DIRSTAT_CUMULATIVE);
+
+	changed = 0;
+	for (i = 0; i < data->nr; i++) {
+		struct diffstat_file *file = data->files[i];
+		unsigned long damage;
+
+		damage = file->added + file->deleted;
+		ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
+		dir.files[dir.nr].name = file->name;
+		dir.files[dir.nr].changed = damage;
+		changed += damage;
+		dir.nr++;
+	}
+
+	/* This can happen even with many files, if everything was renames */
+	if (!changed)
+		return;
+
+	/* Show all directories with more than x% of the changes */
+	qsort(dir.files, dir.nr, sizeof(dir.files[0]), dirstat_compare);
+	gather_dirstat(options, &dir, changed, "", 0);
+}
+
 static void free_diffstat_info(struct diffstat_t *diffstat)
 {
 	int i;
@@ -4012,7 +4054,12 @@ void diff_flush(struct diff_options *options)
 		separator++;
 	}
 
-	if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT)) {
+	// --dirstat-by-file REALLY don't need the full diffstat analysis
+	if (DIFF_OPT_TST(options, DIRSTAT_BY_FILE))
+		dirstat_based_on_diffstat = 0;
+
+	if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT) ||
+	    ((output_format & DIFF_FORMAT_DIRSTAT) && dirstat_based_on_diffstat)) {
 		struct diffstat_t diffstat;
 
 		memset(&diffstat, 0, sizeof(struct diffstat_t));
@@ -4027,10 +4074,12 @@ void diff_flush(struct diff_options *options)
 			show_stats(&diffstat, options);
 		if (output_format & DIFF_FORMAT_SHORTSTAT)
 			show_shortstats(&diffstat, options);
+		if (output_format & DIFF_FORMAT_DIRSTAT)
+			show_dirstat_based_on_diffstat(&diffstat, options);
 		free_diffstat_info(&diffstat);
 		separator++;
 	}
-	if (output_format & DIFF_FORMAT_DIRSTAT)
+	if ((output_format & DIFF_FORMAT_DIRSTAT) && !dirstat_based_on_diffstat)
 		show_dirstat(options);
 
 	if (output_format & DIFF_FORMAT_SUMMARY && !is_summary_empty(q)) {
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis
  2011-04-12  9:26                     ` [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis Johan Herland
@ 2011-04-12 14:46                       ` Linus Torvalds
  2011-04-12 15:08                         ` Linus Torvalds
  2011-04-26  0:01                         ` [PATCH 0/6] --dirstat fixes, part 2 Johan Herland
  2011-04-12 18:34                       ` [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis Junio C Hamano
  1 sibling, 2 replies; 91+ messages in thread
From: Linus Torvalds @ 2011-04-12 14:46 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, git

On Tue, Apr 12, 2011 at 2:26 AM, Johan Herland <johan@herland.net> wrote:
> This patch adds an alternative implementation of show_dirstat(), called
> show_dirstat_based_on_diffstat(), which uses the more expensive diffstat
> analysis (as opposed to --dirstat's own (inexpensive) analysis) to derive
> the numbers from which the --dirstat output is computed.
>
> The alternative implementation is controlled by a new config variable called
> diff.dirstatBasedOnDiffstat.

So I don't hate the idea, but I do hate the "use a config option"
part. Or rather, I hate the fact that it's the _only_ way to do it
(and the particular config name you chose).

I'd much rather have a command line option for the two cases, and then
have the config file part be a way to perhaps set the default value.

Something like "--dirstat=exact", and then without the explicit
setting you might fall back on the config file.

(One reason I'd like that is that I think the "--cumulative" option
was a mistake. Again, it _should_ have been another option to
"--dirstat", rather than a stand-alone option that makes no sense on
its own)

So in a better world, I think we should be able to write

  --dirstat=[non]exact,[non]cumulative,1

to say exactly what kind of dirstat we actually want. And the config
options would also match, iow

  [dirstat]
     exact = true
     cumulative = true
     percentage = 1

rather than the cumbersome name you chose that is based on an
implementation issue rather than a user interface issue (I think
config options should talk about the user experience more than about
how it was implemented, so "diff.dirstatbasedondiffstat: is not
wonderful)

Wouldn't that be nicer? Can I sucker you into parsing something like that?

If you do this, another thing I've occasionally wanted to see was a
percentage that allows fractional percentages. We show the results in
permille, after all, it should be possible to ask for cut-offs at the
same precision, ie "1.5%".

                 Linus "can I find a sucker to implement this" Torvalds

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 4/3] --dirstat: In case of renames, use target filename instead of source filename
  2011-04-12  9:24                     ` [PATCH 4/3] --dirstat: In case of renames, use target filename instead of source filename Johan Herland
@ 2011-04-12 14:59                       ` Linus Torvalds
  0 siblings, 0 replies; 91+ messages in thread
From: Linus Torvalds @ 2011-04-12 14:59 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, git

On Tue, Apr 12, 2011 at 2:24 AM, Johan Herland <johan@herland.net> wrote:
> This changes --dirstat analysis to count "damage" toward the target filename,
> rather than the source filename. For renames within a directory, this won't
> matter to the final output, but when moving files between diretories, the
> output now lists the target directory rather than the source directory.

Ack. I think the use of the source filename was actually a bug.

The original dirstat code used the "struct diffstat_file_t *" pointer,
and took the name from the ->name field of that. And that actually
defaults to the target name (see diffstat_add()). But then commit
c04a7155a03e changed it to use "struct dirstat_file" and picked the
name for that from the source.

                        Linus

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis
  2011-04-12 14:46                       ` Linus Torvalds
@ 2011-04-12 15:08                         ` Linus Torvalds
  2011-04-12 22:03                           ` Johan Herland
  2011-04-26  0:01                         ` [PATCH 0/6] --dirstat fixes, part 2 Johan Herland
  1 sibling, 1 reply; 91+ messages in thread
From: Linus Torvalds @ 2011-04-12 15:08 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, git

On Tue, Apr 12, 2011 at 7:46 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So I don't hate the idea, but I do hate the "use a config option"
> part. Or rather, I hate the fact that it's the _only_ way to do it
> (and the particular config name you chose).

Oh, and one thing strikes me: I think the fast dirstat gave reasonable
values when you had mixed text and binary (in the kernel tree, look
for the Documentation/logo.gif file, for example: it changed to the
Tasmanian devil in one release).

Have you checked what happens to that when you use the diffstat one?
Because binary files are done very differently (byte-based counts).

So check out

   git show --dirstat 3d4f16348b77efbf81b7fa186a18a0eb815b6b84

with and without your change. The old dirstat gives

  44.0% Documentation/
  55.9% drivers/video/logo/

which is at least not completely insane.

The reason I bring this up is because I think this was an issue at one
point, and one of the statistics things (--stat or --numstat or
--dirstat) gave absolutely horrid values (basically comparing "bytes
changed" for binaries with "lines changed" for text files). Resulting
in totally skewed statistics.

                       Linus

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis
  2011-04-12  9:26                     ` [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis Johan Herland
  2011-04-12 14:46                       ` Linus Torvalds
@ 2011-04-12 18:34                       ` Junio C Hamano
  1 sibling, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2011-04-12 18:34 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Linus Torvalds

Johan Herland <johan@herland.net> writes:

> This patch adds an alternative implementation of show_dirstat(), called
> show_dirstat_based_on_diffstat(), which uses the more expensive diffstat
> analysis (as opposed to --dirstat's own (inexpensive) analysis) to derive
> the numbers from which the --dirstat output is computed.
> ...
> diff --git a/diff.c b/diff.c
> index 5376d01..a496ba6 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -31,6 +31,7 @@ static const char *external_diff_cmd_cfg;
>  int diff_auto_refresh_index = 1;
>  static int diff_mnemonic_prefix;
>  static int diff_no_prefix;
> +static int dirstat_based_on_diffstat;
>  static struct diff_options default_diff_options;
>  
>  static char diff_colors[][COLOR_MAXLEN] = {
> @@ -103,6 +104,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
>  		diff_no_prefix = git_config_bool(var, value);
>  		return 0;
>  	}
> +	if (!strcmp(var, "diff.dirstatbasedondiffstat")) {
> +		dirstat_based_on_diffstat = git_config_bool(var, value);
> +		return 0;
> +	}

People may think of other damage calculator, so the variable shouldn't be
a boolean that says "dirstat-based-on-diffstat" but rather an enum.

We would need a command line interface for this.  How about something like
"--dirstat=lines" vs "--dirstat=changes", and default "--dirstat" without
an explicit type to traditional "--dirstat=changes"?  

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis
  2011-04-12 15:08                         ` Linus Torvalds
@ 2011-04-12 22:03                           ` Johan Herland
  2011-04-12 22:12                             ` Linus Torvalds
  2011-04-12 22:22                             ` Junio C Hamano
  0 siblings, 2 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-12 22:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Junio C Hamano

On Tuesday 12 April 2011, Linus Torvalds wrote:
> On Tue, Apr 12, 2011 at 7:46 AM, Linus Torvalds
> 
> <torvalds@linux-foundation.org> wrote:
> > So I don't hate the idea, but I do hate the "use a config option"
> > part. Or rather, I hate the fact that it's the _only_ way to do it
> > (and the particular config name you chose).
> 
> Oh, and one thing strikes me: I think the fast dirstat gave reasonable
> values when you had mixed text and binary (in the kernel tree, look
> for the Documentation/logo.gif file, for example: it changed to the
> Tasmanian devil in one release).
> 
> Have you checked what happens to that when you use the diffstat one?
> Because binary files are done very differently (byte-based counts).
> 
> So check out
> 
>    git show --dirstat 3d4f16348b77efbf81b7fa186a18a0eb815b6b84
> 
> with and without your change. The old dirstat gives
> 
>   44.0% Documentation/
>   55.9% drivers/video/logo/
> 
> which is at least not completely insane.

My change obviously makes a difference:

  68.7% Documentation/
  31.2% drivers/video/logo/

To make some more sense of the number, here they are with some extra
output from a debug printf:

$ ../git/git show --dirstat 3d4f163
  [...]

        Documentation/logo.gif: +16335 -0 => damage = 16335
        Documentation/logo.svg: +0 -310450 => damage = 310450
        Documentation/logo.txt: +562 -200 => damage = 762
        drivers/video/logo/logo_linux_clut224.ppm: +76628 -136093 => damage = 212721
        drivers/video/logo/logo_linux_vga16.ppm: +76837 -126084 => damage = 202921
  44.0% Documentation/
  55.9% drivers/video/logo/


$ ../git/git -c diff.dirstatbasedondiffstat=true show --dirstat 3d4f163
  [...]

        Documentation/logo.gif: +16335 -0 => damage = 16335
        Documentation/logo.svg: +0 -2911 => damage = 2911
        Documentation/logo.txt: +12 -3 => damage = 15
        drivers/video/logo/logo_linux_clut224.ppm: +1602 -2826 => damage = 4428
        drivers/video/logo/logo_linux_vga16.ppm: +1602 -2737 => damage = 4339
  68.7% Documentation/
  31.2% drivers/video/logo/

In the original dirstat numbers (computed by diffcore_count_changes())
all the numbers (both from text and binary files) are on a byte scale.
(making the binary logo.gif changes proportional in scale to the rest).

In the diffstat analysis, however, binary changes are reported in bytes,
while text changes are reported in lines. This obviously makes binary
changes count disproportionately more than textual changes.

> The reason I bring this up is because I think this was an issue at one
> point, and one of the statistics things (--stat or --numstat or
> --dirstat) gave absolutely horrid values (basically comparing "bytes
> changed" for binaries with "lines changed" for text files). Resulting
> in totally skewed statistics.

Indeed, that's exactly what's going on here. Looking at the other
--*stat options:

--stat has a special output mode for binary files:

        Documentation/logo.gif      | Bin 0 -> 16335 bytes

--numstat refuses to show any meaningful output for binary files:

        -       -       Documentation/logo.gif

--shortstat skips binary (and unmerged) files altogether.


So, how should we count binary files in the diffstat version of
--dirstat? Looking at the available data in struct diffstat_file,
there's not a lot of "source material" available. If I had easy
access to the file pre/post size, and the total number of lines,
I could calculate the average number of bytes per line, and then
multiply that with the diffstat numbers to get an approximate
byte count. A crude fallback would be to use 64 bytes per line...

A better solution might be to add a flag to struct diffstat_t
indicating that we want byte counts (as opposed to line counts) for
text files, and then use that flag from within diffstat_consume() to
add "len" instead of "1" to x->added/deleted.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis
  2011-04-12 22:03                           ` Johan Herland
@ 2011-04-12 22:12                             ` Linus Torvalds
  2011-04-12 22:22                             ` Junio C Hamano
  1 sibling, 0 replies; 91+ messages in thread
From: Linus Torvalds @ 2011-04-12 22:12 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Junio C Hamano

On Tue, Apr 12, 2011 at 3:03 PM, Johan Herland <johan@herland.net> wrote:
>
> --stat has a special output mode for binary files:
>
>        Documentation/logo.gif      | Bin 0 -> 16335 bytes

Yeah, I think that's the one we introduced exactly to not give crazy
results (ie really big bars of +++/---).

One option might be to just do something like

    if (binary)
       damage /= 52;

and just say that "52 bytes of binary diff counts as one line".

Which is obviously totally crazy and idiotic, but it actually is
roughly what happens when you print out the binary diff (that "52" is
made up, but I think it may be true. I forget what encoding rules we
use, it's in that kind of range).

So it's "true" in some insane made-up sense.

Feel free to pick any number you like instead of "52" that makes some sense.

Because I don't think we have the option to just dismiss the binary
changes entirely, and I don't like the idea of comparing bytes against
lines.

The other option would be to turn lines (of a non-binary diff) into
bytes, and not count lines at all.

                      Linus

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis
  2011-04-12 22:03                           ` Johan Herland
  2011-04-12 22:12                             ` Linus Torvalds
@ 2011-04-12 22:22                             ` Junio C Hamano
  1 sibling, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2011-04-12 22:22 UTC (permalink / raw)
  To: Johan Herland; +Cc: Linus Torvalds, git, Junio C Hamano

Johan Herland <johan@herland.net> writes:

> So, how should we count binary files in the diffstat version of
> --dirstat?

IIRC, the reason Linus used the "change" (not "lines") damage computation
in dirstat was exactly for this reason.

Comparing and combining the damage as number of lines and changed bytes
simply does not make much sense, so my gut answer to this question is "we
shouldn't".  The --numstat mode punts exactly for this reason, to avoid
tempting people to add numbers up without thinking and getting nonsense
results.

I suspect that any heuristics is as good as your divide-by-64; you
probably could run count_lines(one->data, one->size) in the text diff
codepath in builtin_diffstat() to keep a running average of the line
lengths of the files involved, but I do not think it is worth it.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 0/6] --dirstat fixes, part 2
  2011-04-12 14:46                       ` Linus Torvalds
  2011-04-12 15:08                         ` Linus Torvalds
@ 2011-04-26  0:01                         ` Johan Herland
  2011-04-26  0:01                           ` [PATCH 1/6] Add several testcases for --dirstat and friends Johan Herland
                                             ` (7 more replies)
  1 sibling, 8 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-26  0:01 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Hi,

I finally found the time to re-roll the remaining dirstat fixes,
incorporating feedback from Linus and Junio in the surrounding thread.

The first patch adds a number of testcases for --dirstat, guarding
against regressions.

The second patch fixes a small issue I found while playing around with
--dirstat=0.

The next three patches revamps the dirstat-related command-line options
and introduces a diff.dirstat config variable for controlling the
--dirstat defaults. The third patch here (accepting floating-point
percentage input) has some remaining questions mentioned in that email.

Finally, the last patch is a re-roll of the previous "RFC/PATCH 5/3"
that introduces a new dirstat mode, based on the diffstat analysis.


Have fun! :)

...Johan


Johan Herland (6):
  Add several testcases for --dirstat and friends
  Make --dirstat=0 output directories that contribute < 0.1% of changes
  Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
  Add config variable for specifying default --dirstat behavior
  Use floating point for --dirstat percentages
  New --dirstat=lines mode, doing dirstat analysis based on diffstat

 Documentation/config.txt       |   43 ++
 Documentation/diff-options.txt |   52 ++-
 diff.c                         |  182 ++++++++-
 diff.h                         |    3 +-
 t/t4046-diff-dirstat.sh        |  873 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 1121 insertions(+), 32 deletions(-)
 create mode 100755 t/t4046-diff-dirstat.sh

-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 1/6] Add several testcases for --dirstat and friends
  2011-04-26  0:01                         ` [PATCH 0/6] --dirstat fixes, part 2 Johan Herland
@ 2011-04-26  0:01                           ` Johan Herland
  2011-04-26  0:01                           ` [PATCH 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
                                             ` (6 subsequent siblings)
  7 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-26  0:01 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Currently, t4013 is the only selftest that exercises the --dirstat machinery,
but it only does a superficial verification of --dirstat's output.

This patch adds a new selftest - t4046-diff-dirstat.sh - which prepares a
commit containing:
- unchanged files, changed files and files with rearranged lines
- copied files, moved files, and unmoved files

It then verifies the correct dirstat output for that commit in the following
dirstat modes:
- --dirstat
- --dirstat=0
- --cumulative
- --dirstat-by-file
- (plus combinations of the above)

Each of the above tests are also run with:
- no rename detection
- rename detection (-M)
- expensive copy detection (-C -C)

Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t4046-diff-dirstat.sh |  562 +++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 562 insertions(+), 0 deletions(-)
 create mode 100755 t/t4046-diff-dirstat.sh

diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
new file mode 100755
index 0000000..1690468
--- /dev/null
+++ b/t/t4046-diff-dirstat.sh
@@ -0,0 +1,562 @@
+#!/bin/sh
+
+test_description='diff --dirstat tests'
+. ./test-lib.sh
+
+# set up two commits where the second commit has these files
+# (10 lines in each file):
+#
+#   unchanged/text           (unchanged from 1st commit)
+#   changed/text             (changed 1st line)
+#   rearranged/text          (swapped 1st and 2nd line)
+#   dst/copy/unchanged/text  (copied from src/copy/unchanged/text, unchanged)
+#   dst/copy/changed/text    (copied from src/copy/changed/text, changed)
+#   dst/copy/rearranged/text (copied from src/copy/rearranged/text, rearranged)
+#   dst/move/unchanged/text  (moved from src/move/unchanged/text, unchanged)
+#   dst/move/changed/text    (moved from src/move/changed/text, changed)
+#   dst/move/rearranged/text (moved from src/move/rearranged/text, rearranged)
+
+test_expect_success 'setup' '
+	mkdir unchanged &&
+	mkdir changed &&
+	mkdir rearranged &&
+	mkdir src &&
+	mkdir src/copy &&
+	mkdir src/copy/unchanged &&
+	mkdir src/copy/changed &&
+	mkdir src/copy/rearranged &&
+	mkdir src/move &&
+	mkdir src/move/unchanged &&
+	mkdir src/move/changed &&
+	mkdir src/move/rearranged &&
+	cat <<EOF >unchanged/text &&
+unchanged       line #0
+unchanged       line #1
+unchanged       line #2
+unchanged       line #3
+unchanged       line #4
+unchanged       line #5
+unchanged       line #6
+unchanged       line #7
+unchanged       line #8
+unchanged       line #9
+EOF
+	cat <<EOF >changed/text &&
+changed         line #0
+changed         line #1
+changed         line #2
+changed         line #3
+changed         line #4
+changed         line #5
+changed         line #6
+changed         line #7
+changed         line #8
+changed         line #9
+EOF
+	cat <<EOF >rearranged/text &&
+rearranged      line #0
+rearranged      line #1
+rearranged      line #2
+rearranged      line #3
+rearranged      line #4
+rearranged      line #5
+rearranged      line #6
+rearranged      line #7
+rearranged      line #8
+rearranged      line #9
+EOF
+	cat <<EOF >src/copy/unchanged/text &&
+copy  unchanged line #0
+copy  unchanged line #1
+copy  unchanged line #2
+copy  unchanged line #3
+copy  unchanged line #4
+copy  unchanged line #5
+copy  unchanged line #6
+copy  unchanged line #7
+copy  unchanged line #8
+copy  unchanged line #9
+EOF
+	cat <<EOF >src/copy/changed/text &&
+copy    changed line #0
+copy    changed line #1
+copy    changed line #2
+copy    changed line #3
+copy    changed line #4
+copy    changed line #5
+copy    changed line #6
+copy    changed line #7
+copy    changed line #8
+copy    changed line #9
+EOF
+	cat <<EOF >src/copy/rearranged/text &&
+copy rearranged line #0
+copy rearranged line #1
+copy rearranged line #2
+copy rearranged line #3
+copy rearranged line #4
+copy rearranged line #5
+copy rearranged line #6
+copy rearranged line #7
+copy rearranged line #8
+copy rearranged line #9
+EOF
+	cat <<EOF >src/move/unchanged/text &&
+move  unchanged line #0
+move  unchanged line #1
+move  unchanged line #2
+move  unchanged line #3
+move  unchanged line #4
+move  unchanged line #5
+move  unchanged line #6
+move  unchanged line #7
+move  unchanged line #8
+move  unchanged line #9
+EOF
+	cat <<EOF >src/move/changed/text &&
+move    changed line #0
+move    changed line #1
+move    changed line #2
+move    changed line #3
+move    changed line #4
+move    changed line #5
+move    changed line #6
+move    changed line #7
+move    changed line #8
+move    changed line #9
+EOF
+	cat <<EOF >src/move/rearranged/text &&
+move rearranged line #0
+move rearranged line #1
+move rearranged line #2
+move rearranged line #3
+move rearranged line #4
+move rearranged line #5
+move rearranged line #6
+move rearranged line #7
+move rearranged line #8
+move rearranged line #9
+EOF
+	git add . &&
+	git commit -m "initial" &&
+	mkdir dst &&
+	mkdir dst/copy &&
+	mkdir dst/copy/unchanged &&
+	mkdir dst/copy/changed &&
+	mkdir dst/copy/rearranged &&
+	mkdir dst/move &&
+	mkdir dst/move/unchanged &&
+	mkdir dst/move/changed &&
+	mkdir dst/move/rearranged &&
+	cat <<EOF >changed/text &&
+CHANGED XXXXXXX line #0
+changed         line #1
+changed         line #2
+changed         line #3
+changed         line #4
+changed         line #5
+changed         line #6
+changed         line #7
+changed         line #8
+changed         line #9
+EOF
+	cat <<EOF >rearranged/text &&
+rearranged      line #1
+rearranged      line #0
+rearranged      line #2
+rearranged      line #3
+rearranged      line #4
+rearranged      line #5
+rearranged      line #6
+rearranged      line #7
+rearranged      line #8
+rearranged      line #9
+EOF
+	cat <<EOF >dst/copy/unchanged/text &&
+copy  unchanged line #0
+copy  unchanged line #1
+copy  unchanged line #2
+copy  unchanged line #3
+copy  unchanged line #4
+copy  unchanged line #5
+copy  unchanged line #6
+copy  unchanged line #7
+copy  unchanged line #8
+copy  unchanged line #9
+EOF
+	cat <<EOF >dst/copy/changed/text &&
+copy XXXCHANGED line #0
+copy    changed line #1
+copy    changed line #2
+copy    changed line #3
+copy    changed line #4
+copy    changed line #5
+copy    changed line #6
+copy    changed line #7
+copy    changed line #8
+copy    changed line #9
+EOF
+	cat <<EOF >dst/copy/rearranged/text &&
+copy rearranged line #1
+copy rearranged line #0
+copy rearranged line #2
+copy rearranged line #3
+copy rearranged line #4
+copy rearranged line #5
+copy rearranged line #6
+copy rearranged line #7
+copy rearranged line #8
+copy rearranged line #9
+EOF
+	cat <<EOF >dst/move/unchanged/text &&
+move  unchanged line #0
+move  unchanged line #1
+move  unchanged line #2
+move  unchanged line #3
+move  unchanged line #4
+move  unchanged line #5
+move  unchanged line #6
+move  unchanged line #7
+move  unchanged line #8
+move  unchanged line #9
+EOF
+	cat <<EOF >dst/move/changed/text &&
+move XXXCHANGED line #0
+move    changed line #1
+move    changed line #2
+move    changed line #3
+move    changed line #4
+move    changed line #5
+move    changed line #6
+move    changed line #7
+move    changed line #8
+move    changed line #9
+EOF
+	cat <<EOF >dst/move/rearranged/text &&
+move rearranged line #1
+move rearranged line #0
+move rearranged line #2
+move rearranged line #3
+move rearranged line #4
+move rearranged line #5
+move rearranged line #6
+move rearranged line #7
+move rearranged line #8
+move rearranged line #9
+EOF
+	git add . &&
+	git rm -r src/move/unchanged &&
+	git rm -r src/move/changed &&
+	git rm -r src/move/rearranged &&
+	git commit -m "changes"
+'
+
+cat <<EOF >expect_diff_stat
+ changed/text             |    2 +-
+ dst/copy/changed/text    |   10 ++++++++++
+ dst/copy/rearranged/text |   10 ++++++++++
+ dst/copy/unchanged/text  |   10 ++++++++++
+ dst/move/changed/text    |   10 ++++++++++
+ dst/move/rearranged/text |   10 ++++++++++
+ dst/move/unchanged/text  |   10 ++++++++++
+ rearranged/text          |    2 +-
+ src/move/changed/text    |   10 ----------
+ src/move/rearranged/text |   10 ----------
+ src/move/unchanged/text  |   10 ----------
+ 11 files changed, 62 insertions(+), 32 deletions(-)
+EOF
+
+cat <<EOF >expect_diff_stat_M
+ changed/text                      |    2 +-
+ dst/copy/changed/text             |   10 ++++++++++
+ dst/copy/rearranged/text          |   10 ++++++++++
+ dst/copy/unchanged/text           |   10 ++++++++++
+ {src => dst}/move/changed/text    |    2 +-
+ {src => dst}/move/rearranged/text |    2 +-
+ {src => dst}/move/unchanged/text  |    0
+ rearranged/text                   |    2 +-
+ 8 files changed, 34 insertions(+), 4 deletions(-)
+EOF
+
+cat <<EOF >expect_diff_stat_CC
+ changed/text                      |    2 +-
+ {src => dst}/copy/changed/text    |    2 +-
+ {src => dst}/copy/rearranged/text |    2 +-
+ {src => dst}/copy/unchanged/text  |    0
+ {src => dst}/move/changed/text    |    2 +-
+ {src => dst}/move/rearranged/text |    2 +-
+ {src => dst}/move/unchanged/text  |    0
+ rearranged/text                   |    2 +-
+ 8 files changed, 6 insertions(+), 6 deletions(-)
+EOF
+
+test_expect_success 'sanity check setup (--stat)' '
+	git diff --stat HEAD^..HEAD >actual_diff_stat &&
+	test_cmp expect_diff_stat actual_diff_stat &&
+	git diff --stat -M HEAD^..HEAD >actual_diff_stat_M &&
+	test_cmp expect_diff_stat_M actual_diff_stat_M &&
+	git diff --stat -C -C HEAD^..HEAD >actual_diff_stat_CC &&
+	test_cmp expect_diff_stat_CC actual_diff_stat_CC
+'
+
+# changed/text and rearranged/text falls below default 3% threshold
+cat <<EOF >expect_diff_dirstat
+  10.8% dst/copy/changed/
+  10.8% dst/copy/rearranged/
+  10.8% dst/copy/unchanged/
+  10.8% dst/move/changed/
+  10.8% dst/move/rearranged/
+  10.8% dst/move/unchanged/
+  10.8% src/move/changed/
+  10.8% src/move/rearranged/
+  10.8% src/move/unchanged/
+EOF
+
+# rearranged/text falls below default 3% threshold
+cat <<EOF >expect_diff_dirstat_M
+   5.8% changed/
+  29.3% dst/copy/changed/
+  29.3% dst/copy/rearranged/
+  29.3% dst/copy/unchanged/
+   5.8% dst/move/changed/
+EOF
+
+# rearranged/text falls below default 3% threshold
+cat <<EOF >expect_diff_dirstat_CC
+  32.6% changed/
+  32.6% dst/copy/changed/
+  32.6% dst/move/changed/
+EOF
+
+test_expect_success 'vanilla --dirstat' '
+	git diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
+cat <<EOF >expect_diff_dirstat
+   2.1% changed/
+  10.8% dst/copy/changed/
+  10.8% dst/copy/rearranged/
+  10.8% dst/copy/unchanged/
+  10.8% dst/move/changed/
+  10.8% dst/move/rearranged/
+  10.8% dst/move/unchanged/
+  10.8% src/move/changed/
+  10.8% src/move/rearranged/
+  10.8% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.8% changed/
+  29.3% dst/copy/changed/
+  29.3% dst/copy/rearranged/
+  29.3% dst/copy/unchanged/
+   5.8% dst/move/changed/
+   0.1% dst/move/rearranged/
+   0.1% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  32.6% changed/
+  32.6% dst/copy/changed/
+   0.6% dst/copy/rearranged/
+  32.6% dst/move/changed/
+   0.6% dst/move/rearranged/
+   0.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=0' '
+	git diff --dirstat=0 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=0 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
+cat <<EOF >expect_diff_dirstat
+   2.1% changed/
+  10.8% dst/copy/changed/
+  10.8% dst/copy/rearranged/
+  10.8% dst/copy/unchanged/
+  32.5% dst/copy/
+  10.8% dst/move/changed/
+  10.8% dst/move/rearranged/
+  10.8% dst/move/unchanged/
+  32.5% dst/move/
+  65.1% dst/
+  10.8% src/move/changed/
+  10.8% src/move/rearranged/
+  10.8% src/move/unchanged/
+  32.5% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.8% changed/
+  29.3% dst/copy/changed/
+  29.3% dst/copy/rearranged/
+  29.3% dst/copy/unchanged/
+  88.0% dst/copy/
+   5.8% dst/move/changed/
+   0.1% dst/move/rearranged/
+   5.9% dst/move/
+  94.0% dst/
+   0.1% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  32.6% changed/
+  32.6% dst/copy/changed/
+   0.6% dst/copy/rearranged/
+  33.3% dst/copy/
+  32.6% dst/move/changed/
+   0.6% dst/move/rearranged/
+  33.3% dst/move/
+  66.6% dst/
+   0.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=0 --cumulative' '
+	git diff --dirstat=0 --cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=0 --cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=0 --cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+   9.0% changed/
+   9.0% dst/copy/changed/
+   9.0% dst/copy/rearranged/
+   9.0% dst/copy/unchanged/
+   9.0% dst/move/changed/
+   9.0% dst/move/rearranged/
+   9.0% dst/move/unchanged/
+   9.0% rearranged/
+   9.0% src/move/changed/
+   9.0% src/move/rearranged/
+   9.0% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat-by-file' '
+	git diff --dirstat-by-file HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat-by-file -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat-by-file -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+  27.2% dst/copy/
+  27.2% dst/move/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat-by-file=10' '
+	git diff --dirstat-by-file=10 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat-by-file=10 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat-by-file=10 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+   9.0% changed/
+   9.0% dst/copy/changed/
+   9.0% dst/copy/rearranged/
+   9.0% dst/copy/unchanged/
+  27.2% dst/copy/
+   9.0% dst/move/changed/
+   9.0% dst/move/rearranged/
+   9.0% dst/move/unchanged/
+  27.2% dst/move/
+  54.5% dst/
+   9.0% rearranged/
+   9.0% src/move/changed/
+   9.0% src/move/rearranged/
+   9.0% src/move/unchanged/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  42.8% dst/copy/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  28.5% dst/move/
+  71.4% dst/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  33.3% dst/copy/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  33.3% dst/move/
+  66.6% dst/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat-by-file --cumulative' '
+	git diff --dirstat-by-file --cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat-by-file --cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat-by-file --cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes
  2011-04-26  0:01                         ` [PATCH 0/6] --dirstat fixes, part 2 Johan Herland
  2011-04-26  0:01                           ` [PATCH 1/6] Add several testcases for --dirstat and friends Johan Herland
@ 2011-04-26  0:01                           ` Johan Herland
  2011-04-26  0:01                           ` [PATCH 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
                                             ` (5 subsequent siblings)
  7 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-26  0:01 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

The expected output from --dirstat=0, is to include any directory with
changes, even if those changes contribute a minuscule portion of the total
changes. However, currently, directories that contribute less than 0.1% are
not included, since their 'permille' value is 0, and there is an
'if (permille)' check in gather_dirstat() that causes them to be ignored.

This test is obviously intended to exclude directories that contribute no
changes whatsoever, but in this case, it hits too broadly. The correct
check is against 'this_dir' from which the permille is calculated. Only if
this value is 0 does the directory truly contribute no changes, and should
be skipped from the output.

This patches fixes this issue, and updates corresponding testcases to
expect the new behvaior.

Signed-off-by: Johan Herland <johan@herland.net>
---
 diff.c                  |    4 ++--
 t/t4046-diff-dirstat.sh |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index abd9cd5..cfbfa92 100644
--- a/diff.c
+++ b/diff.c
@@ -1500,8 +1500,8 @@ static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
 	 *    under this directory (sources == 1).
 	 */
 	if (baselen && sources != 1) {
-		int permille = this_dir * 1000 / changed;
-		if (permille) {
+		if (this_dir) {
+			int permille = this_dir * 1000 / changed;
 			int percent = permille / 10;
 			if (percent >= dir->percent) {
 				fprintf(opt->file, "%s%4d.%01d%% %.*s\n", line_prefix,
diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
index 1690468..694a950 100755
--- a/t/t4046-diff-dirstat.sh
+++ b/t/t4046-diff-dirstat.sh
@@ -337,7 +337,6 @@ test_expect_success 'vanilla --dirstat' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
-# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -346,6 +345,7 @@ cat <<EOF >expect_diff_dirstat
   10.8% dst/move/changed/
   10.8% dst/move/rearranged/
   10.8% dst/move/unchanged/
+   0.0% rearranged/
   10.8% src/move/changed/
   10.8% src/move/rearranged/
   10.8% src/move/unchanged/
@@ -379,7 +379,6 @@ test_expect_success '--dirstat=0' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
-# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -391,6 +390,7 @@ cat <<EOF >expect_diff_dirstat
   10.8% dst/move/unchanged/
   32.5% dst/move/
   65.1% dst/
+   0.0% rearranged/
   10.8% src/move/changed/
   10.8% src/move/rearranged/
   10.8% src/move/unchanged/
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
  2011-04-26  0:01                         ` [PATCH 0/6] --dirstat fixes, part 2 Johan Herland
  2011-04-26  0:01                           ` [PATCH 1/6] Add several testcases for --dirstat and friends Johan Herland
  2011-04-26  0:01                           ` [PATCH 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
@ 2011-04-26  0:01                           ` Johan Herland
  2011-04-26 16:36                             ` Junio C Hamano
  2011-04-26  0:01                           ` [PATCH 4/6] Add config variable for specifying default --dirstat behavior Johan Herland
                                             ` (4 subsequent siblings)
  7 siblings, 1 reply; 91+ messages in thread
From: Johan Herland @ 2011-04-26  0:01 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Instead of having multiple interconnected dirstat-related options, teach
the --dirstat option itself to accept all behavior modifiers as arguments.

- Preserve the current --dirstat=<limit> (where <limit> is an integer
  specifying a cut-off percentage)
- Add --dirstat=cumulative, replacing --cumulative
- Add --dirstat=files, replacing --dirstat-by-file
- Also add --dirstat=changes and --dirstat=noncumulative for specifying the
  current default behavior. These allow the user to reset other --dirstat
  arguments (e.g. 'cumulative' and 'files') occuring earlier on the command
  line.

Allow multiple arguments to be separated by commas, e.g.:
  --dirstat=files,10,cumulative

Update the documentation accordingly, and add testcases verifying the
behavior of the new syntax.

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/diff-options.txt |   45 +++++++++++++-----
 diff.c                         |   80 +++++++++++++++++++++++++++----
 t/t4046-diff-dirstat.sh        |  102 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 205 insertions(+), 22 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 7e4bd42..b6b1448 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -66,19 +66,40 @@ endif::git-format-patch[]
 	number of modified files, as well as number of added and deleted
 	lines.
 
---dirstat[=<limit>]::
-	Output the distribution of relative amount of changes (number of lines added or
-	removed) for each sub-directory. Directories with changes below
-	a cut-off percent (3% by default) are not shown. The cut-off percent
-	can be set with `--dirstat=<limit>`. Changes in a child directory are not
-	counted for the parent directory, unless `--cumulative` is used.
+--dirstat[=<arg1,arg2,...>]::
+	Output the distribution of relative amount of changes for each
+	sub-directory. The behavior of `--dirstat` can be customized by
+	passing it a comma separated list of arguments. The defaults
+	are controlled by the `diff.dirstat` configuration variable (see
+	linkgit:git-config[1]). The following arguments are available:
 +
-Note that the `--dirstat` option computes the changes while ignoring
-the amount of pure code movements within a file.  In other words,
-rearranging lines in a file is not counted as much as other changes.
-
---dirstat-by-file[=<limit>]::
-	Same as `--dirstat`, but counts changed files instead of lines.
+--
+`changes`;;
+	Compute the dirstat numbers by counting the lines that have been
+	removed from the source, or added to the destination. This ignores
+	the amount of pure code movements within a file.  In other words,
+	rearranging lines in a file is not counted as much as other changes.
+	This is the default `--dirstat` behavior.
+`files`;;
+	Compute the dirstat numbers by counting the number of files changed.
+	Each changed file counts equally in the dirstat analysis. This is
+	the computationally cheapest `--dirstat` behavior, since it does
+	not look at the file contents at all.
+`cumulative`;;
+	Count changes in a child directory for the parent directory as well.
+	Note that when using `cumulative`, the sum of the percentages
+	reported may exceed 100%. The default (non-cumulative) behavior can
+	be specified with the `noncumulative` argument.
+<limit>;;
+	An integer argument specifies a cut-off percent (3% by default).
+	Directories contributing less than this percentage of the changes
+	are not shown in the output.
+--
++
+Example: The following will count changed files, while ignoring
+directories with less than 10% of the total amount of changed files,
+and accumulating child directory counts in the parent directories:
+`--dirstat=files,10,cumulative`.
 
 --summary::
 	Output a condensed summary of extended header information
diff --git a/diff.c b/diff.c
index cfbfa92..08aaa47 100644
--- a/diff.c
+++ b/diff.c
@@ -3144,6 +3144,72 @@ static int stat_opt(struct diff_options *options, const char **av)
 	return argcount;
 }
 
+static int dirstat_opt(struct diff_options *options, const char **av)
+{
+	const char *p, *arg = av[0];
+	char *mangled = NULL;
+	char sep = '=';
+
+	if (!strcmp(arg, "--cumulative")) /* deprecated */
+		/* handle '--cumulative' like '--dirstat=cumulative' */
+		p = "=cumulative";
+	else if (!strcmp(arg, "--dirstat-by-file") ||
+		 !prefixcmp(arg, "--dirstat-by-file=")) { /* deprecated */
+		/* handle '--dirstat-by-file=*' like '--dirstat=files,*' */
+		mangled = xstrdup(arg + 2);
+		memcpy(mangled, "--dirstat=files", 15);
+		if (mangled[15]) {
+			assert(mangled[15] == '=');
+			mangled[15] = ',';
+		}
+		arg = mangled;
+		p = mangled + 9;
+	}
+	else if (!prefixcmp(arg, "-X"))
+		p = arg + 2;
+	else if (!prefixcmp(arg, "--dirstat"))
+		p = arg + 9;
+	else
+		return 0;
+
+	options->output_format |= DIFF_FORMAT_DIRSTAT;
+
+	while (*p) {
+		if (*p != sep)
+			die("Missing argument separator ('%c'), at index %lu of '%s'",
+			    sep, p - arg, arg);
+		sep = ',';
+		++p;
+		if (!prefixcmp(p, "changes")) {
+			p += 7;
+			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
+		}
+		else if (!prefixcmp(p, "files")) {
+			p += 5;
+			DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
+		}
+		else if (!prefixcmp(p, "noncumulative")) {
+			p += 13;
+			DIFF_OPT_CLR(options, DIRSTAT_CUMULATIVE);
+		}
+		else if (!prefixcmp(p, "cumulative")) {
+			p += 10;
+			DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
+		}
+		else if (isdigit(*p)) {
+			char *end;
+			options->dirstat_percent = strtoul(p, &end, 10);
+			assert(end > p);
+			p = end;
+		}
+		else
+			die("Unknown --dirstat argument '%s'", p);
+	}
+
+	free(mangled);
+	return 1;
+}
+
 int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 {
 	const char *arg = av[0];
@@ -3163,16 +3229,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		options->output_format |= DIFF_FORMAT_NUMSTAT;
 	else if (!strcmp(arg, "--shortstat"))
 		options->output_format |= DIFF_FORMAT_SHORTSTAT;
-	else if (opt_arg(arg, 'X', "dirstat", &options->dirstat_percent))
-		options->output_format |= DIFF_FORMAT_DIRSTAT;
-	else if (!strcmp(arg, "--cumulative")) {
-		options->output_format |= DIFF_FORMAT_DIRSTAT;
-		DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
-	} else if (opt_arg(arg, 0, "dirstat-by-file",
-			   &options->dirstat_percent)) {
-		options->output_format |= DIFF_FORMAT_DIRSTAT;
-		DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
-	}
+	else if (!prefixcmp(arg, "-X") || !prefixcmp(arg, "--dirstat") ||
+		 !strcmp(arg, "--cumulative"))
+		/* -X, --dirstat[=<args>], --dirstat-by-file, or --cumulative */
+		return dirstat_opt(options, av);
 	else if (!strcmp(arg, "--check"))
 		options->output_format |= DIFF_FORMAT_CHECKDIFF;
 	else if (!strcmp(arg, "--summary"))
diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
index 694a950..bd1494c 100755
--- a/t/t4046-diff-dirstat.sh
+++ b/t/t4046-diff-dirstat.sh
@@ -337,6 +337,31 @@ test_expect_success 'vanilla --dirstat' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'explicit defaults: --dirstat=changes,noncumulative,3' '
+	git diff --dirstat=changes,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=changes,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=changes,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+
+test_expect_success 'later options override earlier options:' '
+	git diff --dirstat=files,10,cumulative,changes,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,10,cumulative,changes,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,10,cumulative,changes,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+	git diff --dirstat=files --dirstat=10 --dirstat=cumulative --dirstat=changes --dirstat=noncumulative --dirstat=3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files --dirstat=10 --dirstat=cumulative --dirstat=changes --dirstat=noncumulative --dirstat=3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files --dirstat=10 --dirstat=cumulative --dirstat=changes --dirstat=noncumulative --dirstat=3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -431,6 +456,15 @@ test_expect_success '--dirstat=0 --cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=0,cumulative' '
+	git diff --dirstat=0,cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=0,cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=0,cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -473,6 +507,15 @@ test_expect_success '--dirstat-by-file' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=files' '
+	git diff --dirstat=files HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
   27.2% dst/copy/
   27.2% dst/move/
@@ -507,6 +550,15 @@ test_expect_success '--dirstat-by-file=10' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=files,10' '
+	git diff --dirstat=files,10 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,10 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,10 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -559,4 +611,54 @@ test_expect_success '--dirstat-by-file --cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=files,cumulative' '
+	git diff --dirstat=files,cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+  27.2% dst/copy/
+  27.2% dst/move/
+  54.5% dst/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  42.8% dst/copy/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  28.5% dst/move/
+  71.4% dst/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  33.3% dst/copy/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  33.3% dst/move/
+  66.6% dst/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=files,cumulative,10' '
+	git diff --dirstat=files,cumulative,10 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative,10 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative,10 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 4/6] Add config variable for specifying default --dirstat behavior
  2011-04-26  0:01                         ` [PATCH 0/6] --dirstat fixes, part 2 Johan Herland
                                             ` (2 preceding siblings ...)
  2011-04-26  0:01                           ` [PATCH 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
@ 2011-04-26  0:01                           ` Johan Herland
  2011-04-26 16:43                             ` Junio C Hamano
  2011-04-26  0:01                           ` [PATCH 5/6] Use floating point for --dirstat percentages Johan Herland
                                             ` (3 subsequent siblings)
  7 siblings, 1 reply; 91+ messages in thread
From: Johan Herland @ 2011-04-26  0:01 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

The new diff.dirstat config variable takes the same arguments as
'--dirstat=<args>', and specifies the default arguments for --dirstat.
The config is obviously overridden by --dirstat arguments passed on the
command line.

When not specified, the --dirstat defaults are 'changes,noncumulative,3'.

The parsing of the config variable is done by the new function -
dirstat_opt_args() - which has been refactored out of dirstat_opt().

The patch also adds several tests verifying the interaction between the
diff.dirstat config variable, and the --dirstat command line option.

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/config.txt |   36 +++++++++++++++
 diff.c                   |  110 ++++++++++++++++++++++++++++-----------------
 t/t4046-diff-dirstat.sh  |   72 ++++++++++++++++++++++++++++++
 3 files changed, 176 insertions(+), 42 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 6babbc7..10fa89a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -822,6 +822,42 @@ diff.autorefreshindex::
 	affects only 'git diff' Porcelain, and not lower level
 	'diff' commands such as 'git diff-files'.
 
+diff.dirstat::
+	A comma separated list of `--dirstat` arguments specifying the
+	default behavior of the `--dirstat` option to linkgit:git-diff[1]`
+	and friends. The defaults can be overridden on the command line
+	(using `--dirstat=<arg1,arg2,...>`). The fallback defaults (when
+	not changed by `diff.dirstat`) are `changes,noncumulative,3`.
+	The following arguments are available:
++
+--
+`changes`;;
+	Compute the dirstat numbers by counting the lines that have been
+	removed from the source, or added to the destination. This ignores
+	the amount of pure code movements within a file.  In other words,
+	rearranging lines in a file is not counted as much as other changes.
+	This is the default `--dirstat` behavior.
+`files`;;
+	Compute the dirstat numbers by counting the number of files changed.
+	Each changed file counts equally in the dirstat analysis. This is
+	the computationally cheapest `--dirstat` behavior, since it does
+	not look at the file contents at all.
+`cumulative`;;
+	Count changes in a child directory for the parent directory as well.
+	Note that when using `cumulative`, the sum of the percentages
+	reported may exceed 100%. The default (non-cumulative) behavior can
+	be specified with the `noncumulative` argument.
+<limit>;;
+	An integer argument specifies a cut-off percent (3% by default).
+	Directories contributing less than this percentage of the changes
+	are not shown in the output.
+--
++
+Example: The following will count changed files, while ignoring
+directories with less than 10% of the total amount of changed files,
+and accumulating child directory counts in the parent directories:
+`files,10,cumulative`.
+
 diff.external::
 	If this config variable is set, diff generation is not
 	performed using the internal diff machinery, but using the
diff --git a/diff.c b/diff.c
index 08aaa47..20fe02c 100644
--- a/diff.c
+++ b/diff.c
@@ -45,6 +45,17 @@ static char diff_colors[][COLOR_MAXLEN] = {
 	GIT_COLOR_NORMAL,	/* FUNCINFO */
 };
 
+static void init_default_diff_options()
+{
+	static int initialized = 0;
+	if (initialized)
+		return;
+
+	default_diff_options.dirstat_percent = 3;
+
+	initialized = 1;
+}
+
 static int parse_diff_color_slot(const char *var, int ofs)
 {
 	if (!strcasecmp(var+ofs, "plain"))
@@ -114,6 +125,44 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
 	return git_diff_basic_config(var, value, cb);
 }
 
+static void dirstat_opt_args(struct diff_options *options, const char *args)
+{
+	const char *p = args;
+	while (*p) {
+		if (!prefixcmp(p, "changes")) {
+			p += 7;
+			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
+		}
+		else if (!prefixcmp(p, "files")) {
+			p += 5;
+			DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
+		}
+		else if (!prefixcmp(p, "noncumulative")) {
+			p += 13;
+			DIFF_OPT_CLR(options, DIRSTAT_CUMULATIVE);
+		}
+		else if (!prefixcmp(p, "cumulative")) {
+			p += 10;
+			DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
+		}
+		else if (isdigit(*p)) {
+			char *end;
+			options->dirstat_percent = strtoul(p, &end, 10);
+			assert(end > p);
+			p = end;
+		}
+		else
+			die("Unknown --dirstat argument '%s'", p);
+
+		if (*p) { /* more arguments, swallow separator */
+			if (*p != ',')
+				die("Missing comma separator, at index %lu of '%s'",
+				    p - args, args);
+			++p;
+		}
+	}
+}
+
 int git_diff_basic_config(const char *var, const char *value, void *cb)
 {
 	if (!strcmp(var, "diff.renamelimit")) {
@@ -145,6 +194,12 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (!strcmp(var, "diff.dirstat")) {
+		init_default_diff_options();
+		dirstat_opt_args(&default_diff_options, value);
+		return 0;
+	}
+
 	if (!prefixcmp(var, "submodule."))
 		return parse_submodule_config_option(var, value);
 
@@ -2879,6 +2934,8 @@ static void run_checkdiff(struct diff_filepair *p, struct diff_options *o)
 
 void diff_setup(struct diff_options *options)
 {
+	init_default_diff_options();
+
 	memcpy(options, &default_diff_options, sizeof(*options));
 
 	options->file = stdout;
@@ -2886,7 +2943,6 @@ void diff_setup(struct diff_options *options)
 	options->line_termination = '\n';
 	options->break_opt = -1;
 	options->rename_limit = -1;
-	options->dirstat_percent = 3;
 	options->context = 3;
 
 	options->change = diff_change;
@@ -3148,7 +3204,6 @@ static int dirstat_opt(struct diff_options *options, const char **av)
 {
 	const char *p, *arg = av[0];
 	char *mangled = NULL;
-	char sep = '=';
 
 	if (!strcmp(arg, "--cumulative")) /* deprecated */
 		/* handle '--cumulative' like '--dirstat=cumulative' */
@@ -3156,14 +3211,13 @@ static int dirstat_opt(struct diff_options *options, const char **av)
 	else if (!strcmp(arg, "--dirstat-by-file") ||
 		 !prefixcmp(arg, "--dirstat-by-file=")) { /* deprecated */
 		/* handle '--dirstat-by-file=*' like '--dirstat=files,*' */
-		mangled = xstrdup(arg + 2);
-		memcpy(mangled, "--dirstat=files", 15);
-		if (mangled[15]) {
-			assert(mangled[15] == '=');
-			mangled[15] = ',';
+		mangled = xstrdup(arg + 11);
+		memcpy(mangled, "=files", 6);
+		if (mangled[6]) {
+			assert(mangled[6] == '=');
+			mangled[6] = ',';
 		}
-		arg = mangled;
-		p = mangled + 9;
+		p = mangled;
 	}
 	else if (!prefixcmp(arg, "-X"))
 		p = arg + 2;
@@ -3172,40 +3226,12 @@ static int dirstat_opt(struct diff_options *options, const char **av)
 	else
 		return 0;
 
-	options->output_format |= DIFF_FORMAT_DIRSTAT;
-
-	while (*p) {
-		if (*p != sep)
-			die("Missing argument separator ('%c'), at index %lu of '%s'",
-			    sep, p - arg, arg);
-		sep = ',';
-		++p;
-		if (!prefixcmp(p, "changes")) {
-			p += 7;
-			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
-		}
-		else if (!prefixcmp(p, "files")) {
-			p += 5;
-			DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
-		}
-		else if (!prefixcmp(p, "noncumulative")) {
-			p += 13;
-			DIFF_OPT_CLR(options, DIRSTAT_CUMULATIVE);
-		}
-		else if (!prefixcmp(p, "cumulative")) {
-			p += 10;
-			DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
-		}
-		else if (isdigit(*p)) {
-			char *end;
-			options->dirstat_percent = strtoul(p, &end, 10);
-			assert(end > p);
-			p = end;
-		}
-		else
-			die("Unknown --dirstat argument '%s'", p);
-	}
+	if (*p == '=')
+		dirstat_opt_args(options, ++p);
+	else if (*p)
+		return 0;
 
+	options->output_format |= DIFF_FORMAT_DIRSTAT;
 	free(mangled);
 	return 1;
 }
diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
index bd1494c..021c9c4 100755
--- a/t/t4046-diff-dirstat.sh
+++ b/t/t4046-diff-dirstat.sh
@@ -362,6 +362,15 @@ test_expect_success 'later options override earlier options:' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'non-defaults in config overridden by explicit defaults on command line' '
+	git -c diff.dirstat=files,cumulative,50 diff --dirstat=changes,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=files,cumulative,50 diff --dirstat=changes,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=files,cumulative,50 diff --dirstat=changes,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -404,6 +413,15 @@ test_expect_success '--dirstat=0' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=0' '
+	git -c diff.dirstat=0 diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0 diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0 diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -465,6 +483,24 @@ test_expect_success '--dirstat=0,cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=0,cumulative' '
+	git -c diff.dirstat=0,cumulative diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0,cumulative diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0,cumulative diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=0 & --dirstat=cumulative' '
+	git -c diff.dirstat=0 diff --dirstat=cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0 diff --dirstat=cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0 diff --dirstat=cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -516,6 +552,15 @@ test_expect_success '--dirstat=files' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=files' '
+	git -c diff.dirstat=files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
   27.2% dst/copy/
   27.2% dst/move/
@@ -559,6 +604,15 @@ test_expect_success '--dirstat=files,10' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=10,files' '
+	git -c diff.dirstat=10,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=10,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=10,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -620,6 +674,15 @@ test_expect_success '--dirstat=files,cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=cumulative,files' '
+	git -c diff.dirstat=cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
   27.2% dst/copy/
   27.2% dst/move/
@@ -661,4 +724,13 @@ test_expect_success '--dirstat=files,cumulative,10' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=10,cumulative,files' '
+	git -c diff.dirstat=10,cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=10,cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=10,cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 5/6] Use floating point for --dirstat percentages
  2011-04-26  0:01                         ` [PATCH 0/6] --dirstat fixes, part 2 Johan Herland
                                             ` (3 preceding siblings ...)
  2011-04-26  0:01                           ` [PATCH 4/6] Add config variable for specifying default --dirstat behavior Johan Herland
@ 2011-04-26  0:01                           ` Johan Herland
  2011-04-26 16:52                             ` Junio C Hamano
  2011-04-26  0:01                           ` [PATCH 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
                                             ` (2 subsequent siblings)
  7 siblings, 1 reply; 91+ messages in thread
From: Johan Herland @ 2011-04-26  0:01 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Allow specifying --dirstat cut-off percentage as a floating point number.

When printing the dirstat output, floating point numbers are presented in
rounded form (as opposed to truncated). Therefore, this patch includes a
significant churn in the expected output of the dirstat selftests.

A selftest verifying floating-point percentage input has been added.

Signed-off-by: Johan Herland <johan@herland.net>
---

Remaining questions:
- Locale issues with strod(), e.g. decimal separator is a comma in certain
  locales.
- Is it really worth the extensive churn in the dirstat output?


Have fun! :)

...Johan

 diff.c                  |   22 ++--
 diff.h                  |    2 +-
 t/t4046-diff-dirstat.sh |  327 ++++++++++++++++++++++++++---------------------
 3 files changed, 195 insertions(+), 156 deletions(-)

diff --git a/diff.c b/diff.c
index 20fe02c..4da3b68 100644
--- a/diff.c
+++ b/diff.c
@@ -51,7 +51,7 @@ static void init_default_diff_options()
 	if (initialized)
 		return;
 
-	default_diff_options.dirstat_percent = 3;
+	default_diff_options.dirstat_percent = 3.0;
 
 	initialized = 1;
 }
@@ -146,10 +146,12 @@ static void dirstat_opt_args(struct diff_options *options, const char *args)
 			DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
 		}
 		else if (isdigit(*p)) {
-			char *end;
-			options->dirstat_percent = strtoul(p, &end, 10);
-			assert(end > p);
-			p = end;
+			char *end = (char *) p;
+			options->dirstat_percent = strtod(p, &end);
+			if (end > p && (*end == ',' || !*end))
+				p = end;
+			else
+				die("Failed to parse percent threshold '%s'", p);
 		}
 		else
 			die("Unknown --dirstat argument '%s'", p);
@@ -1508,7 +1510,8 @@ struct dirstat_file {
 
 struct dirstat_dir {
 	struct dirstat_file *files;
-	int alloc, nr, percent, cumulative;
+	double percent;
+	int alloc, nr, cumulative;
 };
 
 static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
@@ -1556,11 +1559,10 @@ static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
 	 */
 	if (baselen && sources != 1) {
 		if (this_dir) {
-			int permille = this_dir * 1000 / changed;
-			int percent = permille / 10;
+			double percent = this_dir * 100.0 / changed;
 			if (percent >= dir->percent) {
-				fprintf(opt->file, "%s%4d.%01d%% %.*s\n", line_prefix,
-					percent, permille % 10, baselen, base);
+				fprintf(opt->file, "%s%6.1f%% %.*s\n", line_prefix,
+					percent, baselen, base);
 				if (!dir->cumulative)
 					return 0;
 			}
diff --git a/diff.h b/diff.h
index 0083d92..781c620 100644
--- a/diff.h
+++ b/diff.h
@@ -111,13 +111,13 @@ struct diff_options {
 	int rename_score;
 	int rename_limit;
 	int warn_on_too_large_rename;
-	int dirstat_percent;
 	int setup;
 	int abbrev;
 	const char *prefix;
 	int prefix_length;
 	const char *stat_sep;
 	long xdl_opts;
+	double dirstat_percent;
 
 	int stat_width;
 	int stat_name_width;
diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
index 021c9c4..da4484c 100755
--- a/t/t4046-diff-dirstat.sh
+++ b/t/t4046-diff-dirstat.sh
@@ -301,31 +301,31 @@ test_expect_success 'sanity check setup (--stat)' '
 
 # changed/text and rearranged/text falls below default 3% threshold
 cat <<EOF >expect_diff_dirstat
-  10.8% dst/copy/changed/
-  10.8% dst/copy/rearranged/
-  10.8% dst/copy/unchanged/
-  10.8% dst/move/changed/
-  10.8% dst/move/rearranged/
-  10.8% dst/move/unchanged/
-  10.8% src/move/changed/
-  10.8% src/move/rearranged/
-  10.8% src/move/unchanged/
+  10.9% dst/copy/changed/
+  10.9% dst/copy/rearranged/
+  10.9% dst/copy/unchanged/
+  10.9% dst/move/changed/
+  10.9% dst/move/rearranged/
+  10.9% dst/move/unchanged/
+  10.9% src/move/changed/
+  10.9% src/move/rearranged/
+  10.9% src/move/unchanged/
 EOF
 
 # rearranged/text falls below default 3% threshold
 cat <<EOF >expect_diff_dirstat_M
-   5.8% changed/
+   5.9% changed/
   29.3% dst/copy/changed/
   29.3% dst/copy/rearranged/
   29.3% dst/copy/unchanged/
-   5.8% dst/move/changed/
+   5.9% dst/move/changed/
 EOF
 
 # rearranged/text falls below default 3% threshold
 cat <<EOF >expect_diff_dirstat_CC
-  32.6% changed/
-  32.6% dst/copy/changed/
-  32.6% dst/move/changed/
+  32.7% changed/
+  32.7% dst/copy/changed/
+  32.7% dst/move/changed/
 EOF
 
 test_expect_success 'vanilla --dirstat' '
@@ -372,36 +372,36 @@ test_expect_success 'non-defaults in config overridden by explicit defaults on c
 '
 
 cat <<EOF >expect_diff_dirstat
-   2.1% changed/
-  10.8% dst/copy/changed/
-  10.8% dst/copy/rearranged/
-  10.8% dst/copy/unchanged/
-  10.8% dst/move/changed/
-  10.8% dst/move/rearranged/
-  10.8% dst/move/unchanged/
+   2.2% changed/
+  10.9% dst/copy/changed/
+  10.9% dst/copy/rearranged/
+  10.9% dst/copy/unchanged/
+  10.9% dst/move/changed/
+  10.9% dst/move/rearranged/
+  10.9% dst/move/unchanged/
    0.0% rearranged/
-  10.8% src/move/changed/
-  10.8% src/move/rearranged/
-  10.8% src/move/unchanged/
+  10.9% src/move/changed/
+  10.9% src/move/rearranged/
+  10.9% src/move/unchanged/
 EOF
 
 cat <<EOF >expect_diff_dirstat_M
-   5.8% changed/
+   5.9% changed/
   29.3% dst/copy/changed/
   29.3% dst/copy/rearranged/
   29.3% dst/copy/unchanged/
-   5.8% dst/move/changed/
+   5.9% dst/move/changed/
    0.1% dst/move/rearranged/
    0.1% rearranged/
 EOF
 
 cat <<EOF >expect_diff_dirstat_CC
-  32.6% changed/
-  32.6% dst/copy/changed/
-   0.6% dst/copy/rearranged/
-  32.6% dst/move/changed/
-   0.6% dst/move/rearranged/
-   0.6% rearranged/
+  32.7% changed/
+  32.7% dst/copy/changed/
+   0.7% dst/copy/rearranged/
+  32.7% dst/move/changed/
+   0.7% dst/move/rearranged/
+   0.7% rearranged/
 EOF
 
 test_expect_success '--dirstat=0' '
@@ -423,46 +423,46 @@ test_expect_success 'diff.dirstat=0' '
 '
 
 cat <<EOF >expect_diff_dirstat
-   2.1% changed/
-  10.8% dst/copy/changed/
-  10.8% dst/copy/rearranged/
-  10.8% dst/copy/unchanged/
-  32.5% dst/copy/
-  10.8% dst/move/changed/
-  10.8% dst/move/rearranged/
-  10.8% dst/move/unchanged/
-  32.5% dst/move/
-  65.1% dst/
+   2.2% changed/
+  10.9% dst/copy/changed/
+  10.9% dst/copy/rearranged/
+  10.9% dst/copy/unchanged/
+  32.6% dst/copy/
+  10.9% dst/move/changed/
+  10.9% dst/move/rearranged/
+  10.9% dst/move/unchanged/
+  32.6% dst/move/
+  65.2% dst/
    0.0% rearranged/
-  10.8% src/move/changed/
-  10.8% src/move/rearranged/
-  10.8% src/move/unchanged/
-  32.5% src/move/
+  10.9% src/move/changed/
+  10.9% src/move/rearranged/
+  10.9% src/move/unchanged/
+  32.6% src/move/
 EOF
 
 cat <<EOF >expect_diff_dirstat_M
-   5.8% changed/
+   5.9% changed/
   29.3% dst/copy/changed/
   29.3% dst/copy/rearranged/
   29.3% dst/copy/unchanged/
   88.0% dst/copy/
-   5.8% dst/move/changed/
+   5.9% dst/move/changed/
    0.1% dst/move/rearranged/
-   5.9% dst/move/
+   6.0% dst/move/
   94.0% dst/
    0.1% rearranged/
 EOF
 
 cat <<EOF >expect_diff_dirstat_CC
-  32.6% changed/
-  32.6% dst/copy/changed/
-   0.6% dst/copy/rearranged/
+  32.7% changed/
+  32.7% dst/copy/changed/
+   0.7% dst/copy/rearranged/
   33.3% dst/copy/
-  32.6% dst/move/changed/
-   0.6% dst/move/rearranged/
+  32.7% dst/move/changed/
+   0.7% dst/move/rearranged/
   33.3% dst/move/
-  66.6% dst/
-   0.6% rearranged/
+  66.7% dst/
+   0.7% rearranged/
 EOF
 
 test_expect_success '--dirstat=0 --cumulative' '
@@ -502,36 +502,36 @@ test_expect_success 'diff.dirstat=0 & --dirstat=cumulative' '
 '
 
 cat <<EOF >expect_diff_dirstat
-   9.0% changed/
-   9.0% dst/copy/changed/
-   9.0% dst/copy/rearranged/
-   9.0% dst/copy/unchanged/
-   9.0% dst/move/changed/
-   9.0% dst/move/rearranged/
-   9.0% dst/move/unchanged/
-   9.0% rearranged/
-   9.0% src/move/changed/
-   9.0% src/move/rearranged/
-   9.0% src/move/unchanged/
+   9.1% changed/
+   9.1% dst/copy/changed/
+   9.1% dst/copy/rearranged/
+   9.1% dst/copy/unchanged/
+   9.1% dst/move/changed/
+   9.1% dst/move/rearranged/
+   9.1% dst/move/unchanged/
+   9.1% rearranged/
+   9.1% src/move/changed/
+   9.1% src/move/rearranged/
+   9.1% src/move/unchanged/
 EOF
 
 cat <<EOF >expect_diff_dirstat_M
-  14.2% changed/
-  14.2% dst/copy/changed/
-  14.2% dst/copy/rearranged/
-  14.2% dst/copy/unchanged/
-  14.2% dst/move/changed/
-  14.2% dst/move/rearranged/
-  14.2% rearranged/
+  14.3% changed/
+  14.3% dst/copy/changed/
+  14.3% dst/copy/rearranged/
+  14.3% dst/copy/unchanged/
+  14.3% dst/move/changed/
+  14.3% dst/move/rearranged/
+  14.3% rearranged/
 EOF
 
 cat <<EOF >expect_diff_dirstat_CC
-  16.6% changed/
-  16.6% dst/copy/changed/
-  16.6% dst/copy/rearranged/
-  16.6% dst/move/changed/
-  16.6% dst/move/rearranged/
-  16.6% rearranged/
+  16.7% changed/
+  16.7% dst/copy/changed/
+  16.7% dst/copy/rearranged/
+  16.7% dst/move/changed/
+  16.7% dst/move/rearranged/
+  16.7% rearranged/
 EOF
 
 test_expect_success '--dirstat-by-file' '
@@ -562,28 +562,28 @@ test_expect_success 'diff.dirstat=files' '
 '
 
 cat <<EOF >expect_diff_dirstat
-  27.2% dst/copy/
-  27.2% dst/move/
-  27.2% src/move/
+  27.3% dst/copy/
+  27.3% dst/move/
+  27.3% src/move/
 EOF
 
 cat <<EOF >expect_diff_dirstat_M
-  14.2% changed/
-  14.2% dst/copy/changed/
-  14.2% dst/copy/rearranged/
-  14.2% dst/copy/unchanged/
-  14.2% dst/move/changed/
-  14.2% dst/move/rearranged/
-  14.2% rearranged/
+  14.3% changed/
+  14.3% dst/copy/changed/
+  14.3% dst/copy/rearranged/
+  14.3% dst/copy/unchanged/
+  14.3% dst/move/changed/
+  14.3% dst/move/rearranged/
+  14.3% rearranged/
 EOF
 
 cat <<EOF >expect_diff_dirstat_CC
-  16.6% changed/
-  16.6% dst/copy/changed/
-  16.6% dst/copy/rearranged/
-  16.6% dst/move/changed/
-  16.6% dst/move/rearranged/
-  16.6% rearranged/
+  16.7% changed/
+  16.7% dst/copy/changed/
+  16.7% dst/copy/rearranged/
+  16.7% dst/move/changed/
+  16.7% dst/move/rearranged/
+  16.7% rearranged/
 EOF
 
 test_expect_success '--dirstat-by-file=10' '
@@ -614,46 +614,46 @@ test_expect_success 'diff.dirstat=10,files' '
 '
 
 cat <<EOF >expect_diff_dirstat
-   9.0% changed/
-   9.0% dst/copy/changed/
-   9.0% dst/copy/rearranged/
-   9.0% dst/copy/unchanged/
-  27.2% dst/copy/
-   9.0% dst/move/changed/
-   9.0% dst/move/rearranged/
-   9.0% dst/move/unchanged/
-  27.2% dst/move/
+   9.1% changed/
+   9.1% dst/copy/changed/
+   9.1% dst/copy/rearranged/
+   9.1% dst/copy/unchanged/
+  27.3% dst/copy/
+   9.1% dst/move/changed/
+   9.1% dst/move/rearranged/
+   9.1% dst/move/unchanged/
+  27.3% dst/move/
   54.5% dst/
-   9.0% rearranged/
-   9.0% src/move/changed/
-   9.0% src/move/rearranged/
-   9.0% src/move/unchanged/
-  27.2% src/move/
+   9.1% rearranged/
+   9.1% src/move/changed/
+   9.1% src/move/rearranged/
+   9.1% src/move/unchanged/
+  27.3% src/move/
 EOF
 
 cat <<EOF >expect_diff_dirstat_M
-  14.2% changed/
-  14.2% dst/copy/changed/
-  14.2% dst/copy/rearranged/
-  14.2% dst/copy/unchanged/
-  42.8% dst/copy/
-  14.2% dst/move/changed/
-  14.2% dst/move/rearranged/
-  28.5% dst/move/
+  14.3% changed/
+  14.3% dst/copy/changed/
+  14.3% dst/copy/rearranged/
+  14.3% dst/copy/unchanged/
+  42.9% dst/copy/
+  14.3% dst/move/changed/
+  14.3% dst/move/rearranged/
+  28.6% dst/move/
   71.4% dst/
-  14.2% rearranged/
+  14.3% rearranged/
 EOF
 
 cat <<EOF >expect_diff_dirstat_CC
-  16.6% changed/
-  16.6% dst/copy/changed/
-  16.6% dst/copy/rearranged/
+  16.7% changed/
+  16.7% dst/copy/changed/
+  16.7% dst/copy/rearranged/
   33.3% dst/copy/
-  16.6% dst/move/changed/
-  16.6% dst/move/rearranged/
+  16.7% dst/move/changed/
+  16.7% dst/move/rearranged/
   33.3% dst/move/
-  66.6% dst/
-  16.6% rearranged/
+  66.7% dst/
+  16.7% rearranged/
 EOF
 
 test_expect_success '--dirstat-by-file --cumulative' '
@@ -684,35 +684,35 @@ test_expect_success 'diff.dirstat=cumulative,files' '
 '
 
 cat <<EOF >expect_diff_dirstat
-  27.2% dst/copy/
-  27.2% dst/move/
+  27.3% dst/copy/
+  27.3% dst/move/
   54.5% dst/
-  27.2% src/move/
+  27.3% src/move/
 EOF
 
 cat <<EOF >expect_diff_dirstat_M
-  14.2% changed/
-  14.2% dst/copy/changed/
-  14.2% dst/copy/rearranged/
-  14.2% dst/copy/unchanged/
-  42.8% dst/copy/
-  14.2% dst/move/changed/
-  14.2% dst/move/rearranged/
-  28.5% dst/move/
+  14.3% changed/
+  14.3% dst/copy/changed/
+  14.3% dst/copy/rearranged/
+  14.3% dst/copy/unchanged/
+  42.9% dst/copy/
+  14.3% dst/move/changed/
+  14.3% dst/move/rearranged/
+  28.6% dst/move/
   71.4% dst/
-  14.2% rearranged/
+  14.3% rearranged/
 EOF
 
 cat <<EOF >expect_diff_dirstat_CC
-  16.6% changed/
-  16.6% dst/copy/changed/
-  16.6% dst/copy/rearranged/
+  16.7% changed/
+  16.7% dst/copy/changed/
+  16.7% dst/copy/rearranged/
   33.3% dst/copy/
-  16.6% dst/move/changed/
-  16.6% dst/move/rearranged/
+  16.7% dst/move/changed/
+  16.7% dst/move/rearranged/
   33.3% dst/move/
-  66.6% dst/
-  16.6% rearranged/
+  66.7% dst/
+  16.7% rearranged/
 EOF
 
 test_expect_success '--dirstat=files,cumulative,10' '
@@ -733,4 +733,41 @@ test_expect_success 'diff.dirstat=10,cumulative,files' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+cat <<EOF >expect_diff_dirstat
+  27.3% dst/copy/
+  27.3% dst/move/
+  54.5% dst/
+  27.3% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  42.9% dst/copy/
+  28.6% dst/move/
+  71.4% dst/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  33.3% dst/copy/
+  33.3% dst/move/
+  66.7% dst/
+EOF
+
+test_expect_success '--dirstat=files,cumulative,16.7' '
+	git diff --dirstat=files,cumulative,16.7 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative,16.7 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative,16.7 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=16.7,cumulative,files' '
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat
  2011-04-26  0:01                         ` [PATCH 0/6] --dirstat fixes, part 2 Johan Herland
                                             ` (4 preceding siblings ...)
  2011-04-26  0:01                           ` [PATCH 5/6] Use floating point for --dirstat percentages Johan Herland
@ 2011-04-26  0:01                           ` Johan Herland
  2011-04-26 16:59                             ` Junio C Hamano
  2011-04-26  0:15                           ` [PATCH 0/6] --dirstat fixes, part 2 Linus Torvalds
  2011-04-27  2:12                           ` [PATCHv2 " Johan Herland
  7 siblings, 1 reply; 91+ messages in thread
From: Johan Herland @ 2011-04-26  0:01 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

This patch adds an alternative implementation of show_dirstat(), called
show_dirstat_by_line(), which uses the more expensive diffstat analysis
(as opposed to show_dirstat()'s own (relatively inexpensive) analysis)
to derive the numbers from which the --dirstat output is computed.

The alternative implementation is controlled by the new "lines" argument
to the --dirstat option (or the diff.dirstat config variable).

In linux-2.6.git, running the three different --dirstat modes:

  time git diff v2.6.20..v2.6.30 --dirstat=changes > /dev/null
vs.
  time git diff v2.6.20..v2.6.30 --dirstat=lines > /dev/null
vs.
  time git diff v2.6.20..v2.6.30 --dirstat=files > /dev/null

yields the following average runtimes on my machine:

- "changes" (default): ~6.0 s
- "lines":             ~9.6 s
- "files":             ~0.1 s

So, as expected, there's a considerable performance hit (~60%) by going
through the full diffstat analysis as compared to the default "changes"
analysis (obviously, "files" is much faster than both). As such, the
"lines" mode is probably only useful if you really need the --dirstat
numbers to be consistent with the numbers returned from the other
--*stat options.

The patch also includes documentation and tests for the new dirstat mode.

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/config.txt       |    7 +++
 Documentation/diff-options.txt |    7 +++
 diff.c                         |   60 +++++++++++++++++++++++-
 diff.h                         |    1 +
 t/t4046-diff-dirstat.sh        |  100 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 173 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 10fa89a..47b2423 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -837,6 +837,13 @@ diff.dirstat::
 	the amount of pure code movements within a file.  In other words,
 	rearranging lines in a file is not counted as much as other changes.
 	This is the default `--dirstat` behavior.
+`lines`;;
+	Compute the dirstat numbers by doing the regular line-based diff
+	analysis, and summing the removed/added line counts. This is a more
+	expensive `--dirstat` behavior than the `changes` behavior, but it
+	does count rearranged lines within a file as much as other changes.
+	The resulting output is consistent with what you get from the other
+	`--*stat` options.
 `files`;;
 	Compute the dirstat numbers by counting the number of files changed.
 	Each changed file counts equally in the dirstat analysis. This is
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index b6b1448..0b7417b 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -80,6 +80,13 @@ endif::git-format-patch[]
 	the amount of pure code movements within a file.  In other words,
 	rearranging lines in a file is not counted as much as other changes.
 	This is the default `--dirstat` behavior.
+`lines`;;
+	Compute the dirstat numbers by doing the regular line-based diff
+	analysis, and summing the removed/added line counts. This is a more
+	expensive `--dirstat` behavior than the `changes` behavior, but it
+	does count rearranged lines within a file as much as other changes.
+	The resulting output is consistent with what you get from the other
+	`--*stat` options.
 `files`;;
 	Compute the dirstat numbers by counting the number of files changed.
 	Each changed file counts equally in the dirstat analysis. This is
diff --git a/diff.c b/diff.c
index 4da3b68..c00984f 100644
--- a/diff.c
+++ b/diff.c
@@ -131,10 +131,17 @@ static void dirstat_opt_args(struct diff_options *options, const char *args)
 	while (*p) {
 		if (!prefixcmp(p, "changes")) {
 			p += 7;
+			DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
+			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
+		}
+		else if (!prefixcmp(p, "lines")) {
+			p += 5;
+			DIFF_OPT_SET(options, DIRSTAT_BY_LINE);
 			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
 		}
 		else if (!prefixcmp(p, "files")) {
 			p += 5;
+			DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
 			DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
 		}
 		else if (!prefixcmp(p, "noncumulative")) {
@@ -1677,6 +1684,48 @@ found_damage:
 	gather_dirstat(options, &dir, changed, "", 0);
 }
 
+static void show_dirstat_by_line(struct diffstat_t *data, struct diff_options *options)
+{
+	int i;
+	unsigned long changed;
+	struct dirstat_dir dir;
+
+	if (data->nr == 0)
+		return;
+
+	dir.files = NULL;
+	dir.alloc = 0;
+	dir.nr = 0;
+	dir.percent = options->dirstat_percent;
+	dir.cumulative = DIFF_OPT_TST(options, DIRSTAT_CUMULATIVE);
+
+	changed = 0;
+	for (i = 0; i < data->nr; i++) {
+		struct diffstat_file *file = data->files[i];
+		unsigned long damage = file->added + file->deleted;
+		if (damage && file->is_binary)
+			/*
+			 * binary files counts bytes, not lines. Must find some
+			 * way to normalize binary bytes vs. textual lines.
+			 * The following heuristic is cheap, but beyond ugly...
+			 */
+			damage = damage < 52 ? 1 : damage / 52;
+		ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
+		dir.files[dir.nr].name = file->name;
+		dir.files[dir.nr].changed = damage;
+		changed += damage;
+		dir.nr++;
+	}
+
+	/* This can happen even with many files, if everything was renames */
+	if (!changed)
+		return;
+
+	/* Show all directories with more than x% of the changes */
+	qsort(dir.files, dir.nr, sizeof(dir.files[0]), dirstat_compare);
+	gather_dirstat(options, &dir, changed, "", 0);
+}
+
 static void free_diffstat_info(struct diffstat_t *diffstat)
 {
 	int i;
@@ -4081,6 +4130,7 @@ void diff_flush(struct diff_options *options)
 	struct diff_queue_struct *q = &diff_queued_diff;
 	int i, output_format = options->output_format;
 	int separator = 0;
+	int dirstat_by_line = 0;
 
 	/*
 	 * Order: raw, stat, summary, patch
@@ -4101,7 +4151,11 @@ void diff_flush(struct diff_options *options)
 		separator++;
 	}
 
-	if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT)) {
+	if (output_format & DIFF_FORMAT_DIRSTAT && DIFF_OPT_TST(options, DIRSTAT_BY_LINE))
+		dirstat_by_line = 1;
+
+	if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT) ||
+	    dirstat_by_line) {
 		struct diffstat_t diffstat;
 
 		memset(&diffstat, 0, sizeof(struct diffstat_t));
@@ -4116,10 +4170,12 @@ void diff_flush(struct diff_options *options)
 			show_stats(&diffstat, options);
 		if (output_format & DIFF_FORMAT_SHORTSTAT)
 			show_shortstats(&diffstat, options);
+		if (output_format & DIFF_FORMAT_DIRSTAT)
+			show_dirstat_by_line(&diffstat, options);
 		free_diffstat_info(&diffstat);
 		separator++;
 	}
-	if (output_format & DIFF_FORMAT_DIRSTAT)
+	if ((output_format & DIFF_FORMAT_DIRSTAT) && !dirstat_by_line)
 		show_dirstat(options);
 
 	if (output_format & DIFF_FORMAT_SUMMARY && !is_summary_empty(q)) {
diff --git a/diff.h b/diff.h
index 781c620..5f12049 100644
--- a/diff.h
+++ b/diff.h
@@ -78,6 +78,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
 #define DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES (1 << 25)
 #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
 #define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
+#define DIFF_OPT_DIRSTAT_BY_LINE     (1 << 28)
 
 #define DIFF_OPT_TST(opts, flag)    ((opts)->flags & DIFF_OPT_##flag)
 #define DIFF_OPT_SET(opts, flag)    ((opts)->flags |= DIFF_OPT_##flag)
diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
index da4484c..ef9326a 100755
--- a/t/t4046-diff-dirstat.sh
+++ b/t/t4046-diff-dirstat.sh
@@ -770,4 +770,104 @@ test_expect_success 'diff.dirstat=16.7,cumulative,files' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+cat <<EOF >expect_diff_dirstat
+  10.6% dst/copy/changed/
+  10.6% dst/copy/rearranged/
+  10.6% dst/copy/unchanged/
+  10.6% dst/move/changed/
+  10.6% dst/move/rearranged/
+  10.6% dst/move/unchanged/
+  10.6% src/move/changed/
+  10.6% src/move/rearranged/
+  10.6% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.3% changed/
+  26.3% dst/copy/changed/
+  26.3% dst/copy/rearranged/
+  26.3% dst/copy/unchanged/
+   5.3% dst/move/changed/
+   5.3% dst/move/rearranged/
+   5.3% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.7% changed/
+  16.7% dst/copy/changed/
+  16.7% dst/copy/rearranged/
+  16.7% dst/move/changed/
+  16.7% dst/move/rearranged/
+  16.7% rearranged/
+EOF
+
+test_expect_success '--dirstat=lines' '
+	git diff --dirstat=lines HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=lines -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=lines -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=lines' '
+	git -c diff.dirstat=lines diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=lines diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=lines diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+   2.1% changed/
+  10.6% dst/copy/changed/
+  10.6% dst/copy/rearranged/
+  10.6% dst/copy/unchanged/
+  10.6% dst/move/changed/
+  10.6% dst/move/rearranged/
+  10.6% dst/move/unchanged/
+   2.1% rearranged/
+  10.6% src/move/changed/
+  10.6% src/move/rearranged/
+  10.6% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.3% changed/
+  26.3% dst/copy/changed/
+  26.3% dst/copy/rearranged/
+  26.3% dst/copy/unchanged/
+   5.3% dst/move/changed/
+   5.3% dst/move/rearranged/
+   5.3% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.7% changed/
+  16.7% dst/copy/changed/
+  16.7% dst/copy/rearranged/
+  16.7% dst/move/changed/
+  16.7% dst/move/rearranged/
+  16.7% rearranged/
+EOF
+
+test_expect_success '--dirstat=lines,0' '
+	git diff --dirstat=lines,0 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=lines,0 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=lines,0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=0,lines' '
+	git -c diff.dirstat=0,lines diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0,lines diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0,lines diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH 0/6] --dirstat fixes, part 2
  2011-04-26  0:01                         ` [PATCH 0/6] --dirstat fixes, part 2 Johan Herland
                                             ` (5 preceding siblings ...)
  2011-04-26  0:01                           ` [PATCH 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
@ 2011-04-26  0:15                           ` Linus Torvalds
  2011-04-27  2:12                           ` [PATCHv2 " Johan Herland
  7 siblings, 0 replies; 91+ messages in thread
From: Linus Torvalds @ 2011-04-26  0:15 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Junio C Hamano

On Mon, Apr 25, 2011 at 5:01 PM, Johan Herland <johan@herland.net> wrote:
>
> I finally found the time to re-roll the remaining dirstat fixes,
> incorporating feedback from Linus and Junio in the surrounding thread.

After a _very_ superficial walk-through of the patches, I have no real
issues. Looks ok by me,

                 Linus

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
  2011-04-26  0:01                           ` [PATCH 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
@ 2011-04-26 16:36                             ` Junio C Hamano
  2011-04-27  2:02                               ` Johan Herland
  0 siblings, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2011-04-26 16:36 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Linus Torvalds

Johan Herland <johan@herland.net> writes:

> Instead of having multiple interconnected dirstat-related options, teach
> the --dirstat option itself to accept all behavior modifiers as arguments.
>
> - Preserve the current --dirstat=<limit> (where <limit> is an integer
>   specifying a cut-off percentage)
> - Add --dirstat=cumulative, replacing --cumulative
> - Add --dirstat=files, replacing --dirstat-by-file
> - Also add --dirstat=changes and --dirstat=noncumulative for specifying the
>   current default behavior. These allow the user to reset other --dirstat
>   arguments (e.g. 'cumulative' and 'files') occuring earlier on the command
>   line.
>
> Allow multiple arguments to be separated by commas, e.g.:
>   --dirstat=files,10,cumulative
>
> Update the documentation accordingly, and add testcases verifying the
> behavior of the new syntax.

The above description is unclear if the version of git will error out when
given --cumulative or --dirstat-by-file.  I can sort of guess by lack of
removed lines from the documentation, but please do not make readers guess.

Also a miniscule style nitpick: could you indent your bulletted-list just
a bit (one space indent is just fine)?

> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 7e4bd42..b6b1448 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -66,19 +66,40 @@ endif::git-format-patch[]
>  	number of modified files, as well as number of added and deleted
>  	lines.
>  
> ---dirstat[=<limit>]::
> -	Output the distribution of relative amount of changes (number of lines added or
> -	removed) for each sub-directory. Directories with changes below
> -	a cut-off percent (3% by default) are not shown. The cut-off percent
> -	can be set with `--dirstat=<limit>`. Changes in a child directory are not
> -	counted for the parent directory, unless `--cumulative` is used.
> +--dirstat[=<arg1,arg2,...>]::
> +	Output the distribution of relative amount of changes for each
> +	sub-directory. The behavior of `--dirstat` can be customized by
> +	passing it a comma separated list of arguments. The defaults
> +	are controlled by the `diff.dirstat` configuration variable (see
> +	linkgit:git-config[1]). The following arguments are available:

These "arguments" feel more like "options" (or "parameters"), no?  Your
code in diff.c also calls it "opt".  The second line of the proposed log
message has the same issue.

> +--
> +`changes`;;
> +	Compute the dirstat numbers by counting the lines that have been
> +	removed from the source, or added to the destination. This ignores
> +	the amount of pure code movements within a file.  In other words,
> +	rearranging lines in a file is not counted as much as other changes.
> +	This is the default `--dirstat` behavior.

"default behavior when no option is given"?

> +`files`;;
> +	Compute the dirstat numbers by counting the number of files changed.
> +	Each changed file counts equally in the dirstat analysis. This is
> +	the computationally cheapest `--dirstat` behavior, since it does
> +	not look at the file contents at all.

s/not look/not have to look/?

> +`cumulative`;;
> +	Count changes in a child directory for the parent directory as well.
> +	Note that when using `cumulative`, the sum of the percentages
> +	reported may exceed 100%. The default (non-cumulative) behavior can
> +	be specified with the `noncumulative` argument.

So the later one wins?  I.e. --dirstat=cumulative,noncumulative from the
command line (which seems silly), or more importantly with

    [alias]
    	dstat = diff --dirstat=cumulative

and you can say "git dstat --dirstat=noncumulative A..B"?

> diff --git a/diff.c b/diff.c
> index cfbfa92..08aaa47 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -3144,6 +3144,72 @@ static int stat_opt(struct diff_options *options, const char **av)
>  	return argcount;
>  }
>  

/*
 * Document what the return value from this function means here.
 */
> +static int dirstat_opt(struct diff_options *options, const char **av)

Do you have to pass "const char **av", or just "const char *arg"?

> +{
> +	const char *p, *arg = av[0];
> +	char *mangled = NULL;
> +	char sep = '=';
> +
> +	if (!strcmp(arg, "--cumulative")) /* deprecated */
> +		/* handle '--cumulative' like '--dirstat=cumulative' */
> +		p = "=cumulative";
> +	else if (!strcmp(arg, "--dirstat-by-file") ||
> +		 !prefixcmp(arg, "--dirstat-by-file=")) { /* deprecated */
> +		/* handle '--dirstat-by-file=*' like '--dirstat=files,*' */
> +		mangled = xstrdup(arg + 2);
> +		memcpy(mangled, "--dirstat=files", 15);
> +		if (mangled[15]) {
> +			assert(mangled[15] == '=');
> +			mangled[15] = ',';
> +		}
> +		arg = mangled;
> +		p = mangled + 9;

I understand you wanted to reuse the while() loop below, but I do not
think it is worth it.  Isn't it easier to read if you handled the above
cases in their if/else body and return?

	if (--cumulative) {
		options->output_format |= DIFF_FORMAT_DIRSTAT;
        	DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
                return 1;
	}
        if (--dirstat-by-file) {
		options->output_format |= DIFF_FORMAT_DIRSTAT;
		DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
		return 1;
	}
	...

Even better, probably they can be left to diff_opt_parse() without calling
this function, as you are deprecating them and do not have to allow them
to take the opt1,opt2,... form of parameter.

> +	}
> +	else if (!prefixcmp(arg, "-X"))
> +		p = arg + 2;
> +	else if (!prefixcmp(arg, "--dirstat"))
> +		p = arg + 9;
> +	else
> +		return 0;
> +
> +	options->output_format |= DIFF_FORMAT_DIRSTAT;
> +
> +	while (*p) {
> +		if (*p != sep)

What happens to "diff -X3 A..B"?

> +			die("Missing argument separator ('%c'), at index %lu of '%s'",
> +			    sep, p - arg, arg);

Don't you need to cast (p-arg) for %lu from ptrdiff type here?  It
probably is more common to say s/index/char/;

> +		sep = ',';
> +		++p;

We tend to write postincrement when there is no strong reason to do
otherwise.

> +		if (!prefixcmp(p, "changes")) {
> +			p += 7;
> +			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
> +		}
> +		else if (!prefixcmp(p, "files")) {
> +			p += 5;
> +			DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
> +		}
> +		else if (!prefixcmp(p, "noncumulative")) {
> +			p += 13;
> +			DIFF_OPT_CLR(options, DIRSTAT_CUMULATIVE);
> +		}
> +		else if (!prefixcmp(p, "cumulative")) {
> +			p += 10;
> +			DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
> +		}
> +		else if (isdigit(*p)) {
> +			char *end;
> +			options->dirstat_percent = strtoul(p, &end, 10);
> +			assert(end > p);
> +			p = end;
> +		}

That's a senseless assert(), isn't it?

You already know the first letter is a digit, so assert(p < end) will
always be true.  You may want to check that this particular option is all
digit by checking (*end == '\0' || *end == ',') but that is done at the
beginning of this loop anyway, so I don't think there is anything to check
here.

> +		else
> +			die("Unknown --dirstat argument '%s'", p);

The function parses dirstat_OPT, but this says argument?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 4/6] Add config variable for specifying default --dirstat behavior
  2011-04-26  0:01                           ` [PATCH 4/6] Add config variable for specifying default --dirstat behavior Johan Herland
@ 2011-04-26 16:43                             ` Junio C Hamano
  2011-04-27  2:02                               ` Johan Herland
  0 siblings, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2011-04-26 16:43 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Junio C Hamano, Linus Torvalds

Johan Herland <johan@herland.net> writes:

> diff --git a/diff.c b/diff.c
> index 08aaa47..20fe02c 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -45,6 +45,17 @@ static char diff_colors[][COLOR_MAXLEN] = {
>  	GIT_COLOR_NORMAL,	/* FUNCINFO */
>  };
>  
> +static void init_default_diff_options()
> +{
> +	static int initialized = 0;
> +	if (initialized)
> +		return;
> +
> +	default_diff_options.dirstat_percent = 3;
> +
> +	initialized = 1;
> +}

This smells fishy on two counts.

 . The rest of the diff machinery is designed to be callable multiple
   times by calling diff_setup(), and there should be no place for any
   call-once function like this one.

 . Why is dirstat-percent _so_ special that it is the only one that has to
   be initialized this way, when the function name implies that this is
   the central place to control the initialization of all diff related
   options?

> @@ -114,6 +125,44 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
>  	return git_diff_basic_config(var, value, cb);
>  }
>  
> +static void dirstat_opt_args(struct diff_options *options, const char *args)
> +{
> +	const char *p = args;
> +	while (*p) {
> +		if (!prefixcmp(p, "changes")) {
> + ...
> +		}
> +	}
> +}

Please move this part to the previous patch in your reroll.  This helper
is what the previous patch should have been written with.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 5/6] Use floating point for --dirstat percentages
  2011-04-26  0:01                           ` [PATCH 5/6] Use floating point for --dirstat percentages Johan Herland
@ 2011-04-26 16:52                             ` Junio C Hamano
  2011-04-27  2:02                               ` Johan Herland
  0 siblings, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2011-04-26 16:52 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Linus Torvalds

Johan Herland <johan@herland.net> writes:

> Allow specifying --dirstat cut-off percentage as a floating point number.
>
> When printing the dirstat output, floating point numbers are presented in
> rounded form (as opposed to truncated).

Why isn't it sufficient to change

	permille = this_dir * 1000 / changed

to

	permille = (this_dir * 2000 + changed) / (changed * 2)

or something?  If rounding is the only issue that bothers you (I admit
that it does bother me, now that you brought it up), that is.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat
  2011-04-26  0:01                           ` [PATCH 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
@ 2011-04-26 16:59                             ` Junio C Hamano
  2011-04-27  2:02                               ` Johan Herland
  0 siblings, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2011-04-26 16:59 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Linus Torvalds

Johan Herland <johan@herland.net> writes:

> This patch adds an alternative implementation of show_dirstat(), called
> show_dirstat_by_line(), which uses the more expensive diffstat analysis
> (as opposed to show_dirstat()'s own (relatively inexpensive) analysis)
> to derive the numbers from which the --dirstat output is computed.
>
> The alternative implementation is controlled by the new "lines" argument
> to the --dirstat option (or the diff.dirstat config variable).
>
> In linux-2.6.git, running the three different --dirstat modes:
>
>   time git diff v2.6.20..v2.6.30 --dirstat=changes > /dev/null
> vs.
>   time git diff v2.6.20..v2.6.30 --dirstat=lines > /dev/null
> vs.
>   time git diff v2.6.20..v2.6.30 --dirstat=files > /dev/null
>
> yields the following average runtimes on my machine:
>
> - "changes" (default): ~6.0 s
> - "lines":             ~9.6 s
> - "files":             ~0.1 s
>
> So, as expected, there's a considerable performance hit (~60%) by going
> through the full diffstat analysis as compared to the default "changes"
> analysis (obviously, "files" is much faster than both). As such, the
> "lines" mode is probably only useful if you really need the --dirstat
> numbers to be consistent with the numbers returned from the other
> --*stat options.
>
> The patch also includes documentation and tests for the new dirstat mode.

It needs to document and also mention in the proposed commit log message
how binary files are accounted for.

> @@ -1677,6 +1684,48 @@ found_damage:
>  	gather_dirstat(options, &dir, changed, "", 0);
>  }
>  
> +static void show_dirstat_by_line(struct diffstat_t *data, struct diff_options *options)
> +{
> +	int i;
> +	unsigned long changed;
> +	struct dirstat_dir dir;
> +
> +	if (data->nr == 0)
> +		return;
> +
> +	dir.files = NULL;
> +	dir.alloc = 0;
> +	dir.nr = 0;
> +	dir.percent = options->dirstat_percent;
> +	dir.cumulative = DIFF_OPT_TST(options, DIRSTAT_CUMULATIVE);
> +
> +	changed = 0;
> +	for (i = 0; i < data->nr; i++) {
> +		struct diffstat_file *file = data->files[i];
> +		unsigned long damage = file->added + file->deleted;
> +		if (damage && file->is_binary)
> +			/*
> +			 * binary files counts bytes, not lines. Must find some
> +			 * way to normalize binary bytes vs. textual lines.
> +			 * The following heuristic is cheap, but beyond ugly...
> +			 */
> +			damage = damage < 52 ? 1 : damage / 52;

If 52 is just as good as any number around 50-70 range, I would prefer to
see 64, just because I am superstitious and dividing by a power of two
feels nicer.

> +cat <<EOF >expect_diff_dirstat_CC
> +  16.7% changed/
> +  16.7% dst/copy/changed/
> +  16.7% dst/copy/rearranged/
> +  16.7% dst/move/changed/
> +  16.7% dst/move/rearranged/
> +  16.7% rearranged/
> +EOF

I really wish you can come up with a way to express expected results in
much less strict way in the test vector (not limited to the test vectors
for this patch but for the entire series).  The underlying count-damages
(for the purpose of rename detection) implementation may improve over time
and the textual diff generation may too.  Here what we want to preserve is
that these six entries show more-or-less the same amount of contribution,
not precisely 16.666666% each.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
  2011-04-26 16:36                             ` Junio C Hamano
@ 2011-04-27  2:02                               ` Johan Herland
  2011-04-27  4:53                                 ` Junio C Hamano
  2011-04-27 20:51                                 ` Junio C Hamano
  0 siblings, 2 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-27  2:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

On Tuesday 26 April 2011, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > +--dirstat[=<arg1,arg2,...>]::
> > +	Output the distribution of relative amount of changes for each
> > +	sub-directory. The behavior of `--dirstat` can be customized by
> > +	passing it a comma separated list of arguments. The defaults
> > +	are controlled by the `diff.dirstat` configuration variable (see
> > +	linkgit:git-config[1]). The following arguments are available:
>
> These "arguments" feel more like "options" (or "parameters"), no?  Your
> code in diff.c also calls it "opt".  The second line of the proposed log
> message has the same issue.

I have tried to consistently use "option" for referring to the entire
"--dirstat=whatever" entity, and then use "argument" for referring to
each comma-separated token following "--dirstat=". I based this on the
function naming in diff.c, which uses "diff_opt_parse()" to parse diff
options, "stat_opt()" to parse the '--stat*' options, and "opt_arg()"
to parse arguments to options (i.e. "--option=argument").

To me, "argument" and "parameter" are synonyms, but English is not my
first language. I'll replace "argument" with "parameter" in the re-roll.
I.e. "option" refers to the option name AND the option parameters, while
"parameters" refers to the option parameters only.

> > +--
> > +`changes`;;
> > +	Compute the dirstat numbers by counting the lines that have been
> > +	removed from the source, or added to the destination. This ignores
> > +	the amount of pure code movements within a file.  In other words,
> > +	rearranging lines in a file is not counted as much as other changes.
> > +	This is the default `--dirstat` behavior.
> 
> "default behavior when no option is given"?

"default behavior when no parameter is given"?

> > +`cumulative`;;
> > +	Count changes in a child directory for the parent directory as well.
> > +	Note that when using `cumulative`, the sum of the percentages
> > +	reported may exceed 100%. The default (non-cumulative) behavior can
> > +	be specified with the `noncumulative` argument.
> 
> So the later one wins?  I.e. --dirstat=cumulative,noncumulative from the
> command line (which seems silly), or more importantly with
> 
>     [alias]
>     	dstat = diff --dirstat=cumulative
> 
> and you can say "git dstat --dirstat=noncumulative A..B"?

Indeed. The intention is that dirstat parameters are parsed in order
(first from config, then from command line), and the later parameters
override earlier (conflicting) parameters.

> > diff --git a/diff.c b/diff.c
> > index cfbfa92..08aaa47 100644
> > --- a/diff.c
> > +++ b/diff.c
> > @@ -3144,6 +3144,72 @@ static int stat_opt(struct diff_options
> > *options, const char **av)
> > 
> >  	return argcount;
> >  
> >  }
> 
> /*
>  * Document what the return value from this function means here.
>  */
> > +static int dirstat_opt(struct diff_options *options, const char **av)
> 
> Do you have to pass "const char **av", or just "const char *arg"?

dirstat_opt() was modeled on stat_opt(). dirstat_opt() obviously needs
just "const char *arg". Will fix.

> > +{
> > +	const char *p, *arg = av[0];
> > +	char *mangled = NULL;
> > +	char sep = '=';
> > +
> > +	if (!strcmp(arg, "--cumulative")) /* deprecated */
> > +		/* handle '--cumulative' like '--dirstat=cumulative' */
> > +		p = "=cumulative";
> > +	else if (!strcmp(arg, "--dirstat-by-file") ||
> > +		 !prefixcmp(arg, "--dirstat-by-file=")) { /* deprecated */
> > +		/* handle '--dirstat-by-file=*' like '--dirstat=files,*' */
> > +		mangled = xstrdup(arg + 2);
> > +		memcpy(mangled, "--dirstat=files", 15);
> > +		if (mangled[15]) {
> > +			assert(mangled[15] == '=');
> > +			mangled[15] = ',';
> > +		}
> > +		arg = mangled;
> > +		p = mangled + 9;
> 
> I understand you wanted to reuse the while() loop below, but I do not
> think it is worth it.  Isn't it easier to read if you handled the above
> cases in their if/else body and return?
> 
> 	if (--cumulative) {
> 		options->output_format |= DIFF_FORMAT_DIRSTAT;
>         	DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
>                 return 1;
> 	}
>         if (--dirstat-by-file) {
> 		options->output_format |= DIFF_FORMAT_DIRSTAT;
> 		DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
> 		return 1;
> 	}
> 	...
> 
> Even better, probably they can be left to diff_opt_parse() without
> calling this function, as you are deprecating them and do not have to
> allow them to take the opt1,opt2,... form of parameter.

I understand, but politely disagree: Patch 6/6 complicates the logic
that DIFF_OPT_SET()/CLR() various bits in the diff options. I'd rather
keep that logic in one place, than duplicate it into diff_opt_parse().

> > +	}
> > +	else if (!prefixcmp(arg, "-X"))
> > +		p = arg + 2;
> > +	else if (!prefixcmp(arg, "--dirstat"))
> > +		p = arg + 9;
> > +	else
> > +		return 0;
> > +
> > +	options->output_format |= DIFF_FORMAT_DIRSTAT;
> > +
> > +	while (*p) {
> > +		if (*p != sep)
> 
> What happens to "diff -X3 A..B"?

Oops. Will fix, and add testcases verifying the fix.

> > +			die("Missing argument separator ('%c'), at index %lu of '%s'",
> > +			    sep, p - arg, arg);
> 
> Don't you need to cast (p-arg) for %lu from ptrdiff type here?

Copied PD_FMT from builtin/mktag.c instead.

> It probably is more common to say s/index/char/;

Indeed.

> > +		if (!prefixcmp(p, "changes")) {
> > +			p += 7;
> > +			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
> > +		}
> > +		else if (!prefixcmp(p, "files")) {
> > +			p += 5;
> > +			DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
> > +		}
> > +		else if (!prefixcmp(p, "noncumulative")) {
> > +			p += 13;
> > +			DIFF_OPT_CLR(options, DIRSTAT_CUMULATIVE);
> > +		}
> > +		else if (!prefixcmp(p, "cumulative")) {
> > +			p += 10;
> > +			DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
> > +		}
> > +		else if (isdigit(*p)) {
> > +			char *end;
> > +			options->dirstat_percent = strtoul(p, &end, 10);
> > +			assert(end > p);
> > +			p = end;
> > +		}
> 
> That's a senseless assert(), isn't it?
> 
> You already know the first letter is a digit, so assert(p < end) will
> always be true.  You may want to check that this particular option is all
> digit by checking (*end == '\0' || *end == ',') but that is done at the
> beginning of this loop anyway, so I don't think there is anything to
> check here.

True. I guess I just wanted a sanity check that aborts, rather than
entering an infinite loop in case I got my logic wrong somewhere...
Removed in the re-roll.

> > +		else
> > +			die("Unknown --dirstat argument '%s'", p);
> 
> The function parses dirstat_OPT, but this says argument?

Again, the "option" refers to the option name ("--dirstat") AND its
s/arguments/parameters/ ("changes,noncumulative,3")


Your other comments (that I felt no need to comment on) will also be
incorporated in the re-roll.


Thanks for the feedback!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 4/6] Add config variable for specifying default --dirstat behavior
  2011-04-26 16:43                             ` Junio C Hamano
@ 2011-04-27  2:02                               ` Johan Herland
  0 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-27  2:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

On Tuesday 26 April 2011, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > +static void init_default_diff_options()
> > +{
> > +	static int initialized = 0;
> > +	if (initialized)
> > +		return;
> > +
> > +	default_diff_options.dirstat_percent = 3;
> > +
> > +	initialized = 1;
> > +}
> 
> This smells fishy on two counts.
> 
>  . The rest of the diff machinery is designed to be callable multiple
>    times by calling diff_setup(), and there should be no place for any
>    call-once function like this one.

True. I needed it because the hardcoded "options->dirstat_percent = 3" in 
diff_setup() would overwrite the "diff.dirstat=10" stored in 
default_diff_options.dirstat_percent. Instead, I needed the fallback "3" to 
be stored in default_diff_options.dirstat_percent before "diff.dirstat" was 
parsed.

>  . Why is dirstat-percent _so_ special that it is the only one that has
>    to be initialized this way, when the function name implies that this
>    is the central place to control the initialization of all diff
>    related options?

Once I added init_default_diff_options(), I did in fact try to move the 
other hardcoded diff options from diff_setup(). However, I ended up with a 
lot of test failures, so I quickly gave up on that.

In the upcoming re-roll, I have solved the problem in a different way (using 
a static variable to store the default dirstat percentage).

> > +static void dirstat_opt_args(struct diff_options *options, const char
> > *args) +{
> > +	const char *p = args;
> > +	while (*p) {
> > +		if (!prefixcmp(p, "changes")) {
> > + ...
> > +		}
> > +	}
> > +}
> 
> Please move this part to the previous patch in your reroll.  This helper
> is what the previous patch should have been written with.

Will do.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 5/6] Use floating point for --dirstat percentages
  2011-04-26 16:52                             ` Junio C Hamano
@ 2011-04-27  2:02                               ` Johan Herland
  2011-04-27  4:42                                 ` Junio C Hamano
  0 siblings, 1 reply; 91+ messages in thread
From: Johan Herland @ 2011-04-27  2:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

On Tuesday 26 April 2011, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > Allow specifying --dirstat cut-off percentage as a floating point
> > number.
> > 
> > When printing the dirstat output, floating point numbers are presented
> > in rounded form (as opposed to truncated).
> 
> Why isn't it sufficient to change
> 
> 	permille = this_dir * 1000 / changed
> 
> to
> 
> 	permille = (this_dir * 2000 + changed) / (changed * 2)
> 
> or something?  If rounding is the only issue that bothers you (I admit
> that it does bother me, now that you brought it up), that is.

Actually, rounding doesn't bother me at all (or rather, I don't really care 
if we round or truncate, as long as we're consistent).

It's just that once I s/strtoul/strtod/, and started propagating the 
"double"s through the code, I found that doing the final calculation and 
output with "double"s was more natural than the (somewhat hackish, IMHO) 
permille/percent thing. And that's when I finally came across the fact that 
"%6.1f" rounds whereas the earlier version truncated.

I thought about it for a second, and figured that rounding was probably what 
most users expected.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat
  2011-04-26 16:59                             ` Junio C Hamano
@ 2011-04-27  2:02                               ` Johan Herland
  0 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-27  2:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

On Tuesday 26 April 2011, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
>
> [...]
> It needs to document and also mention in the proposed commit log message
> how binary files are accounted for.

Will do.

> [...]
> > +		if (damage && file->is_binary)
> > +			/*
> > +			 * binary files counts bytes, not lines. Must find some
> > +			 * way to normalize binary bytes vs. textual lines.
> > +			 * The following heuristic is cheap, but beyond ugly...
> > +			 */
> > +			damage = damage < 52 ? 1 : damage / 52;
> 
> If 52 is just as good as any number around 50-70 range, I would prefer to
> see 64, just because I am superstitious and dividing by a power of two
> feels nicer.

Will do.

> > +cat <<EOF >expect_diff_dirstat_CC
> > +  16.7% changed/
> > +  16.7% dst/copy/changed/
> > +  16.7% dst/copy/rearranged/
> > +  16.7% dst/move/changed/
> > +  16.7% dst/move/rearranged/
> > +  16.7% rearranged/
> > +EOF
> 
> I really wish you can come up with a way to express expected results in
> much less strict way in the test vector (not limited to the test vectors
> for this patch but for the entire series).  The underlying count-damages
> (for the purpose of rename detection) implementation may improve over
> time and the textual diff generation may too.  Here what we want to
> preserve is that these six entries show more-or-less the same amount of
> contribution, not precisely 16.666666% each.

Yeah, that does make sense, although I haven't yet thought of a good way to 
do this without losing important details. I thought of assigning letters to 
each percentage value, so that instead of

	cat <<EOF >expect_diff_dirstat_M
	   5.3% changed/
	  26.3% dst/copy/changed/
	  26.3% dst/copy/rearranged/
	  26.3% dst/copy/unchanged/
	   5.3% dst/move/changed/
	   5.3% dst/move/rearranged/
	   5.3% rearranged/
	EOF

	cat <<EOF >expect_diff_dirstat_CC
	  16.7% changed/
	  16.7% dst/copy/changed/
	  16.7% dst/copy/rearranged/
	  16.7% dst/move/changed/
	  16.7% dst/move/rearranged/
	  16.7% rearranged/
	EOF

you'd have

	cat <<EOF >expect_diff_dirstat_M
	     A% changed/
	     B% dst/copy/changed/
	     B% dst/copy/rearranged/
	     B% dst/copy/unchanged/
	     A% dst/move/changed/
	     A% dst/move/rearranged/
	     A% rearranged/
	EOF

	cat <<EOF >expect_diff_dirstat_CC
	     A% changed/
	     A% dst/copy/changed/
	     A% dst/copy/rearranged/
	     A% dst/move/changed/
	     A% dst/move/rearranged/
	     A% rearranged/
	EOF

but that would lose too much detail in many cases...

Also, I partly like the fact that we test the output strictly. That way, 
when someone _does_ change the underlying details, they get an immediate 
test failure telling them exactly _how_ the numbers changed. They can then 
use that to verify that their changes produce the expected results, before 
finally updating the test in accordance with their new numbers.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCHv2 0/6] --dirstat fixes, part 2
  2011-04-26  0:01                         ` [PATCH 0/6] --dirstat fixes, part 2 Johan Herland
                                             ` (6 preceding siblings ...)
  2011-04-26  0:15                           ` [PATCH 0/6] --dirstat fixes, part 2 Linus Torvalds
@ 2011-04-27  2:12                           ` Johan Herland
  2011-04-27  2:12                             ` [PATCHv2 1/6] Add several testcases for --dirstat and friends Johan Herland
                                               ` (6 more replies)
  7 siblings, 7 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-27  2:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Hi,

Here's version 2 with a lot of improvements suggested by Junio.
Especially patches #3, #4 and #6 have been updated. The other
patches are unchanged, or have only received minor/trivial updates.


Have fun! :)

...Johan

Johan Herland (6):
  Add several testcases for --dirstat and friends
  Make --dirstat=0 output directories that contribute < 0.1% of changes
  Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
  Add config variable for specifying default --dirstat behavior
  Use floating point for --dirstat percentages
  New --dirstat=lines mode, doing dirstat analysis based on diffstat

 Documentation/config.txt       |   44 ++
 Documentation/diff-options.txt |   54 ++-
 diff.c                         |  183 ++++++++-
 diff.h                         |    3 +-
 t/t4046-diff-dirstat.sh        |  908 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 1160 insertions(+), 32 deletions(-)
 create mode 100755 t/t4046-diff-dirstat.sh

-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCHv2 1/6] Add several testcases for --dirstat and friends
  2011-04-27  2:12                           ` [PATCHv2 " Johan Herland
@ 2011-04-27  2:12                             ` Johan Herland
  2011-04-27  2:12                             ` [PATCHv2 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
                                               ` (5 subsequent siblings)
  6 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-27  2:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Currently, t4013 is the only selftest that exercises the --dirstat machinery,
but it only does a superficial verification of --dirstat's output.

This patch adds a new selftest - t4046-diff-dirstat.sh - which prepares a
commit containing:
 - unchanged files, changed files and files with rearranged lines
 - copied files, moved files, and unmoved files

It then verifies the correct dirstat output for that commit in the following
dirstat modes:
 - --dirstat
 - -X
 - --dirstat=0
 - -X0
 - --cumulative
 - --dirstat-by-file
 - (plus combinations of the above)

Each of the above tests are also run with:
 - no rename detection
 - rename detection (-M)
 - expensive copy detection (-C -C)

Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t4046-diff-dirstat.sh |  580 +++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 580 insertions(+), 0 deletions(-)
 create mode 100755 t/t4046-diff-dirstat.sh

diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
new file mode 100755
index 0000000..eb6bf47
--- /dev/null
+++ b/t/t4046-diff-dirstat.sh
@@ -0,0 +1,580 @@
+#!/bin/sh
+
+test_description='diff --dirstat tests'
+. ./test-lib.sh
+
+# set up two commits where the second commit has these files
+# (10 lines in each file):
+#
+#   unchanged/text           (unchanged from 1st commit)
+#   changed/text             (changed 1st line)
+#   rearranged/text          (swapped 1st and 2nd line)
+#   dst/copy/unchanged/text  (copied from src/copy/unchanged/text, unchanged)
+#   dst/copy/changed/text    (copied from src/copy/changed/text, changed)
+#   dst/copy/rearranged/text (copied from src/copy/rearranged/text, rearranged)
+#   dst/move/unchanged/text  (moved from src/move/unchanged/text, unchanged)
+#   dst/move/changed/text    (moved from src/move/changed/text, changed)
+#   dst/move/rearranged/text (moved from src/move/rearranged/text, rearranged)
+
+test_expect_success 'setup' '
+	mkdir unchanged &&
+	mkdir changed &&
+	mkdir rearranged &&
+	mkdir src &&
+	mkdir src/copy &&
+	mkdir src/copy/unchanged &&
+	mkdir src/copy/changed &&
+	mkdir src/copy/rearranged &&
+	mkdir src/move &&
+	mkdir src/move/unchanged &&
+	mkdir src/move/changed &&
+	mkdir src/move/rearranged &&
+	cat <<EOF >unchanged/text &&
+unchanged       line #0
+unchanged       line #1
+unchanged       line #2
+unchanged       line #3
+unchanged       line #4
+unchanged       line #5
+unchanged       line #6
+unchanged       line #7
+unchanged       line #8
+unchanged       line #9
+EOF
+	cat <<EOF >changed/text &&
+changed         line #0
+changed         line #1
+changed         line #2
+changed         line #3
+changed         line #4
+changed         line #5
+changed         line #6
+changed         line #7
+changed         line #8
+changed         line #9
+EOF
+	cat <<EOF >rearranged/text &&
+rearranged      line #0
+rearranged      line #1
+rearranged      line #2
+rearranged      line #3
+rearranged      line #4
+rearranged      line #5
+rearranged      line #6
+rearranged      line #7
+rearranged      line #8
+rearranged      line #9
+EOF
+	cat <<EOF >src/copy/unchanged/text &&
+copy  unchanged line #0
+copy  unchanged line #1
+copy  unchanged line #2
+copy  unchanged line #3
+copy  unchanged line #4
+copy  unchanged line #5
+copy  unchanged line #6
+copy  unchanged line #7
+copy  unchanged line #8
+copy  unchanged line #9
+EOF
+	cat <<EOF >src/copy/changed/text &&
+copy    changed line #0
+copy    changed line #1
+copy    changed line #2
+copy    changed line #3
+copy    changed line #4
+copy    changed line #5
+copy    changed line #6
+copy    changed line #7
+copy    changed line #8
+copy    changed line #9
+EOF
+	cat <<EOF >src/copy/rearranged/text &&
+copy rearranged line #0
+copy rearranged line #1
+copy rearranged line #2
+copy rearranged line #3
+copy rearranged line #4
+copy rearranged line #5
+copy rearranged line #6
+copy rearranged line #7
+copy rearranged line #8
+copy rearranged line #9
+EOF
+	cat <<EOF >src/move/unchanged/text &&
+move  unchanged line #0
+move  unchanged line #1
+move  unchanged line #2
+move  unchanged line #3
+move  unchanged line #4
+move  unchanged line #5
+move  unchanged line #6
+move  unchanged line #7
+move  unchanged line #8
+move  unchanged line #9
+EOF
+	cat <<EOF >src/move/changed/text &&
+move    changed line #0
+move    changed line #1
+move    changed line #2
+move    changed line #3
+move    changed line #4
+move    changed line #5
+move    changed line #6
+move    changed line #7
+move    changed line #8
+move    changed line #9
+EOF
+	cat <<EOF >src/move/rearranged/text &&
+move rearranged line #0
+move rearranged line #1
+move rearranged line #2
+move rearranged line #3
+move rearranged line #4
+move rearranged line #5
+move rearranged line #6
+move rearranged line #7
+move rearranged line #8
+move rearranged line #9
+EOF
+	git add . &&
+	git commit -m "initial" &&
+	mkdir dst &&
+	mkdir dst/copy &&
+	mkdir dst/copy/unchanged &&
+	mkdir dst/copy/changed &&
+	mkdir dst/copy/rearranged &&
+	mkdir dst/move &&
+	mkdir dst/move/unchanged &&
+	mkdir dst/move/changed &&
+	mkdir dst/move/rearranged &&
+	cat <<EOF >changed/text &&
+CHANGED XXXXXXX line #0
+changed         line #1
+changed         line #2
+changed         line #3
+changed         line #4
+changed         line #5
+changed         line #6
+changed         line #7
+changed         line #8
+changed         line #9
+EOF
+	cat <<EOF >rearranged/text &&
+rearranged      line #1
+rearranged      line #0
+rearranged      line #2
+rearranged      line #3
+rearranged      line #4
+rearranged      line #5
+rearranged      line #6
+rearranged      line #7
+rearranged      line #8
+rearranged      line #9
+EOF
+	cat <<EOF >dst/copy/unchanged/text &&
+copy  unchanged line #0
+copy  unchanged line #1
+copy  unchanged line #2
+copy  unchanged line #3
+copy  unchanged line #4
+copy  unchanged line #5
+copy  unchanged line #6
+copy  unchanged line #7
+copy  unchanged line #8
+copy  unchanged line #9
+EOF
+	cat <<EOF >dst/copy/changed/text &&
+copy XXXCHANGED line #0
+copy    changed line #1
+copy    changed line #2
+copy    changed line #3
+copy    changed line #4
+copy    changed line #5
+copy    changed line #6
+copy    changed line #7
+copy    changed line #8
+copy    changed line #9
+EOF
+	cat <<EOF >dst/copy/rearranged/text &&
+copy rearranged line #1
+copy rearranged line #0
+copy rearranged line #2
+copy rearranged line #3
+copy rearranged line #4
+copy rearranged line #5
+copy rearranged line #6
+copy rearranged line #7
+copy rearranged line #8
+copy rearranged line #9
+EOF
+	cat <<EOF >dst/move/unchanged/text &&
+move  unchanged line #0
+move  unchanged line #1
+move  unchanged line #2
+move  unchanged line #3
+move  unchanged line #4
+move  unchanged line #5
+move  unchanged line #6
+move  unchanged line #7
+move  unchanged line #8
+move  unchanged line #9
+EOF
+	cat <<EOF >dst/move/changed/text &&
+move XXXCHANGED line #0
+move    changed line #1
+move    changed line #2
+move    changed line #3
+move    changed line #4
+move    changed line #5
+move    changed line #6
+move    changed line #7
+move    changed line #8
+move    changed line #9
+EOF
+	cat <<EOF >dst/move/rearranged/text &&
+move rearranged line #1
+move rearranged line #0
+move rearranged line #2
+move rearranged line #3
+move rearranged line #4
+move rearranged line #5
+move rearranged line #6
+move rearranged line #7
+move rearranged line #8
+move rearranged line #9
+EOF
+	git add . &&
+	git rm -r src/move/unchanged &&
+	git rm -r src/move/changed &&
+	git rm -r src/move/rearranged &&
+	git commit -m "changes"
+'
+
+cat <<EOF >expect_diff_stat
+ changed/text             |    2 +-
+ dst/copy/changed/text    |   10 ++++++++++
+ dst/copy/rearranged/text |   10 ++++++++++
+ dst/copy/unchanged/text  |   10 ++++++++++
+ dst/move/changed/text    |   10 ++++++++++
+ dst/move/rearranged/text |   10 ++++++++++
+ dst/move/unchanged/text  |   10 ++++++++++
+ rearranged/text          |    2 +-
+ src/move/changed/text    |   10 ----------
+ src/move/rearranged/text |   10 ----------
+ src/move/unchanged/text  |   10 ----------
+ 11 files changed, 62 insertions(+), 32 deletions(-)
+EOF
+
+cat <<EOF >expect_diff_stat_M
+ changed/text                      |    2 +-
+ dst/copy/changed/text             |   10 ++++++++++
+ dst/copy/rearranged/text          |   10 ++++++++++
+ dst/copy/unchanged/text           |   10 ++++++++++
+ {src => dst}/move/changed/text    |    2 +-
+ {src => dst}/move/rearranged/text |    2 +-
+ {src => dst}/move/unchanged/text  |    0
+ rearranged/text                   |    2 +-
+ 8 files changed, 34 insertions(+), 4 deletions(-)
+EOF
+
+cat <<EOF >expect_diff_stat_CC
+ changed/text                      |    2 +-
+ {src => dst}/copy/changed/text    |    2 +-
+ {src => dst}/copy/rearranged/text |    2 +-
+ {src => dst}/copy/unchanged/text  |    0
+ {src => dst}/move/changed/text    |    2 +-
+ {src => dst}/move/rearranged/text |    2 +-
+ {src => dst}/move/unchanged/text  |    0
+ rearranged/text                   |    2 +-
+ 8 files changed, 6 insertions(+), 6 deletions(-)
+EOF
+
+test_expect_success 'sanity check setup (--stat)' '
+	git diff --stat HEAD^..HEAD >actual_diff_stat &&
+	test_cmp expect_diff_stat actual_diff_stat &&
+	git diff --stat -M HEAD^..HEAD >actual_diff_stat_M &&
+	test_cmp expect_diff_stat_M actual_diff_stat_M &&
+	git diff --stat -C -C HEAD^..HEAD >actual_diff_stat_CC &&
+	test_cmp expect_diff_stat_CC actual_diff_stat_CC
+'
+
+# changed/text and rearranged/text falls below default 3% threshold
+cat <<EOF >expect_diff_dirstat
+  10.8% dst/copy/changed/
+  10.8% dst/copy/rearranged/
+  10.8% dst/copy/unchanged/
+  10.8% dst/move/changed/
+  10.8% dst/move/rearranged/
+  10.8% dst/move/unchanged/
+  10.8% src/move/changed/
+  10.8% src/move/rearranged/
+  10.8% src/move/unchanged/
+EOF
+
+# rearranged/text falls below default 3% threshold
+cat <<EOF >expect_diff_dirstat_M
+   5.8% changed/
+  29.3% dst/copy/changed/
+  29.3% dst/copy/rearranged/
+  29.3% dst/copy/unchanged/
+   5.8% dst/move/changed/
+EOF
+
+# rearranged/text falls below default 3% threshold
+cat <<EOF >expect_diff_dirstat_CC
+  32.6% changed/
+  32.6% dst/copy/changed/
+  32.6% dst/move/changed/
+EOF
+
+test_expect_success 'vanilla --dirstat' '
+	git diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'vanilla -X' '
+	git diff -X HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff -X -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff -X -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
+cat <<EOF >expect_diff_dirstat
+   2.1% changed/
+  10.8% dst/copy/changed/
+  10.8% dst/copy/rearranged/
+  10.8% dst/copy/unchanged/
+  10.8% dst/move/changed/
+  10.8% dst/move/rearranged/
+  10.8% dst/move/unchanged/
+  10.8% src/move/changed/
+  10.8% src/move/rearranged/
+  10.8% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.8% changed/
+  29.3% dst/copy/changed/
+  29.3% dst/copy/rearranged/
+  29.3% dst/copy/unchanged/
+   5.8% dst/move/changed/
+   0.1% dst/move/rearranged/
+   0.1% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  32.6% changed/
+  32.6% dst/copy/changed/
+   0.6% dst/copy/rearranged/
+  32.6% dst/move/changed/
+   0.6% dst/move/rearranged/
+   0.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=0' '
+	git diff --dirstat=0 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=0 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success '-X0' '
+	git diff -X0 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff -X0 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff -X0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
+cat <<EOF >expect_diff_dirstat
+   2.1% changed/
+  10.8% dst/copy/changed/
+  10.8% dst/copy/rearranged/
+  10.8% dst/copy/unchanged/
+  32.5% dst/copy/
+  10.8% dst/move/changed/
+  10.8% dst/move/rearranged/
+  10.8% dst/move/unchanged/
+  32.5% dst/move/
+  65.1% dst/
+  10.8% src/move/changed/
+  10.8% src/move/rearranged/
+  10.8% src/move/unchanged/
+  32.5% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.8% changed/
+  29.3% dst/copy/changed/
+  29.3% dst/copy/rearranged/
+  29.3% dst/copy/unchanged/
+  88.0% dst/copy/
+   5.8% dst/move/changed/
+   0.1% dst/move/rearranged/
+   5.9% dst/move/
+  94.0% dst/
+   0.1% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  32.6% changed/
+  32.6% dst/copy/changed/
+   0.6% dst/copy/rearranged/
+  33.3% dst/copy/
+  32.6% dst/move/changed/
+   0.6% dst/move/rearranged/
+  33.3% dst/move/
+  66.6% dst/
+   0.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=0 --cumulative' '
+	git diff --dirstat=0 --cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=0 --cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=0 --cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+   9.0% changed/
+   9.0% dst/copy/changed/
+   9.0% dst/copy/rearranged/
+   9.0% dst/copy/unchanged/
+   9.0% dst/move/changed/
+   9.0% dst/move/rearranged/
+   9.0% dst/move/unchanged/
+   9.0% rearranged/
+   9.0% src/move/changed/
+   9.0% src/move/rearranged/
+   9.0% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat-by-file' '
+	git diff --dirstat-by-file HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat-by-file -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat-by-file -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+  27.2% dst/copy/
+  27.2% dst/move/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat-by-file=10' '
+	git diff --dirstat-by-file=10 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat-by-file=10 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat-by-file=10 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+   9.0% changed/
+   9.0% dst/copy/changed/
+   9.0% dst/copy/rearranged/
+   9.0% dst/copy/unchanged/
+  27.2% dst/copy/
+   9.0% dst/move/changed/
+   9.0% dst/move/rearranged/
+   9.0% dst/move/unchanged/
+  27.2% dst/move/
+  54.5% dst/
+   9.0% rearranged/
+   9.0% src/move/changed/
+   9.0% src/move/rearranged/
+   9.0% src/move/unchanged/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  42.8% dst/copy/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  28.5% dst/move/
+  71.4% dst/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  33.3% dst/copy/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  33.3% dst/move/
+  66.6% dst/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat-by-file --cumulative' '
+	git diff --dirstat-by-file --cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat-by-file --cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat-by-file --cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv2 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes
  2011-04-27  2:12                           ` [PATCHv2 " Johan Herland
  2011-04-27  2:12                             ` [PATCHv2 1/6] Add several testcases for --dirstat and friends Johan Herland
@ 2011-04-27  2:12                             ` Johan Herland
  2011-04-27  2:12                             ` [PATCHv2 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
                                               ` (4 subsequent siblings)
  6 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-27  2:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

The expected output from --dirstat=0, is to include any directory with
changes, even if those changes contribute a minuscule portion of the total
changes. However, currently, directories that contribute less than 0.1% are
not included, since their 'permille' value is 0, and there is an
'if (permille)' check in gather_dirstat() that causes them to be ignored.

This test is obviously intended to exclude directories that contribute no
changes whatsoever, but in this case, it hits too broadly. The correct
check is against 'this_dir' from which the permille is calculated. Only if
this value is 0 does the directory truly contribute no changes, and should
be skipped from the output.

This patches fixes this issue, and updates corresponding testcases to
expect the new behvaior.

Signed-off-by: Johan Herland <johan@herland.net>
---
 diff.c                  |    4 ++--
 t/t4046-diff-dirstat.sh |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index abd9cd5..cfbfa92 100644
--- a/diff.c
+++ b/diff.c
@@ -1500,8 +1500,8 @@ static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
 	 *    under this directory (sources == 1).
 	 */
 	if (baselen && sources != 1) {
-		int permille = this_dir * 1000 / changed;
-		if (permille) {
+		if (this_dir) {
+			int permille = this_dir * 1000 / changed;
 			int percent = permille / 10;
 			if (percent >= dir->percent) {
 				fprintf(opt->file, "%s%4d.%01d%% %.*s\n", line_prefix,
diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
index eb6bf47..6ff7f9f 100755
--- a/t/t4046-diff-dirstat.sh
+++ b/t/t4046-diff-dirstat.sh
@@ -346,7 +346,6 @@ test_expect_success 'vanilla -X' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
-# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -355,6 +354,7 @@ cat <<EOF >expect_diff_dirstat
   10.8% dst/move/changed/
   10.8% dst/move/rearranged/
   10.8% dst/move/unchanged/
+   0.0% rearranged/
   10.8% src/move/changed/
   10.8% src/move/rearranged/
   10.8% src/move/unchanged/
@@ -397,7 +397,6 @@ test_expect_success '-X0' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
-# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -409,6 +408,7 @@ cat <<EOF >expect_diff_dirstat
   10.8% dst/move/unchanged/
   32.5% dst/move/
   65.1% dst/
+   0.0% rearranged/
   10.8% src/move/changed/
   10.8% src/move/rearranged/
   10.8% src/move/unchanged/
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv2 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
  2011-04-27  2:12                           ` [PATCHv2 " Johan Herland
  2011-04-27  2:12                             ` [PATCHv2 1/6] Add several testcases for --dirstat and friends Johan Herland
  2011-04-27  2:12                             ` [PATCHv2 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
@ 2011-04-27  2:12                             ` Johan Herland
  2011-04-27  2:12                             ` [PATCHv2 4/6] Add config variable for specifying default --dirstat behavior Johan Herland
                                               ` (3 subsequent siblings)
  6 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-27  2:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Instead of having multiple interconnected dirstat-related options, teach
the --dirstat option itself to accept all behavior modifiers as parameters.

 - Preserve the current --dirstat=<limit> (where <limit> is an integer
   specifying a cut-off percentage)
 - Add --dirstat=cumulative, replacing --cumulative
 - Add --dirstat=files, replacing --dirstat-by-file
 - Also add --dirstat=changes and --dirstat=noncumulative for specifying the
   current default behavior. These allow the user to reset other --dirstat
   parameters (e.g. 'cumulative' and 'files') occuring earlier on the
   command line.

The deprecated options (--cumulative and --dirstat-by-file) are still
functional, although they have been removed from the documentation.

Allow multiple parameters to be separated by commas, e.g.:
  --dirstat=files,10,cumulative

Update the documentation accordingly, and add testcases verifying the
behavior of the new syntax.

Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/diff-options.txt |   44 +++++++++++----
 diff.c                         |   99 ++++++++++++++++++++++++++++++---
 t/t4046-diff-dirstat.sh        |  119 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 240 insertions(+), 22 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 7e4bd42..6a3a9c1 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -66,19 +66,39 @@ endif::git-format-patch[]
 	number of modified files, as well as number of added and deleted
 	lines.
 
---dirstat[=<limit>]::
-	Output the distribution of relative amount of changes (number of lines added or
-	removed) for each sub-directory. Directories with changes below
-	a cut-off percent (3% by default) are not shown. The cut-off percent
-	can be set with `--dirstat=<limit>`. Changes in a child directory are not
-	counted for the parent directory, unless `--cumulative` is used.
+--dirstat[=<param1,param2,...>]::
+	Output the distribution of relative amount of changes for each
+	sub-directory. The behavior of `--dirstat` can be customized by
+	passing it a comma separated list of parameters.
+	The following parameters are available:
 +
-Note that the `--dirstat` option computes the changes while ignoring
-the amount of pure code movements within a file.  In other words,
-rearranging lines in a file is not counted as much as other changes.
-
---dirstat-by-file[=<limit>]::
-	Same as `--dirstat`, but counts changed files instead of lines.
+--
+`changes`;;
+	Compute the dirstat numbers by counting the lines that have been
+	removed from the source, or added to the destination. This ignores
+	the amount of pure code movements within a file.  In other words,
+	rearranging lines in a file is not counted as much as other changes.
+	This is the default behavior when no parameter is given.
+`files`;;
+	Compute the dirstat numbers by counting the number of files changed.
+	Each changed file counts equally in the dirstat analysis. This is
+	the computationally cheapest `--dirstat` behavior, since it does
+	not have to look at the file contents at all.
+`cumulative`;;
+	Count changes in a child directory for the parent directory as well.
+	Note that when using `cumulative`, the sum of the percentages
+	reported may exceed 100%. The default (non-cumulative) behavior can
+	be specified with the `noncumulative` parameter.
+<limit>;;
+	An integer parameter specifies a cut-off percent (3% by default).
+	Directories contributing less than this percentage of the changes
+	are not shown in the output.
+--
++
+Example: The following will count changed files, while ignoring
+directories with less than 10% of the total amount of changed files,
+and accumulating child directory counts in the parent directories:
+`--dirstat=files,10,cumulative`.
 
 --summary::
 	Output a condensed summary of extended header information
diff --git a/diff.c b/diff.c
index cfbfa92..0387e4f 100644
--- a/diff.c
+++ b/diff.c
@@ -66,6 +66,49 @@ static int parse_diff_color_slot(const char *var, int ofs)
 	return -1;
 }
 
+#ifdef NO_C99_FORMAT
+#define PD_FMT "%d"
+#else
+#define PD_FMT "%td"
+#endif
+
+static void parse_dirstat_params(struct diff_options *options, const char *params)
+{
+	const char *p = params;
+	while (*p) {
+		if (!prefixcmp(p, "changes")) {
+			p += 7;
+			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
+		}
+		else if (!prefixcmp(p, "files")) {
+			p += 5;
+			DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
+		}
+		else if (!prefixcmp(p, "noncumulative")) {
+			p += 13;
+			DIFF_OPT_CLR(options, DIRSTAT_CUMULATIVE);
+		}
+		else if (!prefixcmp(p, "cumulative")) {
+			p += 10;
+			DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
+		}
+		else if (isdigit(*p)) {
+			char *end;
+			options->dirstat_percent = strtoul(p, &end, 10);
+			p = end;
+		}
+		else
+			die("Unknown --dirstat parameter '%s'", p);
+
+		if (*p) { /* more parameters, swallow separator */
+			if (*p != ',')
+				die("Missing comma separator at char " PD_FMT
+				    " of '%s'", p - params, params);
+			p++;
+		}
+	}
+}
+
 static int git_config_rename(const char *var, const char *value)
 {
 	if (!value)
@@ -3144,6 +3187,48 @@ static int stat_opt(struct diff_options *options, const char **av)
 	return argcount;
 }
 
+/*
+ * Parse dirstat-related options and any parameters given to those options.
+ * Returns 1 if the option in 'arg' is a recognized dirstat-related option;
+ * otherwise returns 0.
+ */
+static int dirstat_opt(struct diff_options *options, const char *arg)
+{
+	const char *p;
+	char *mangled = NULL;
+
+	if (!strcmp(arg, "--cumulative")) /* deprecated */
+		/* handle '--cumulative' like '--dirstat=cumulative' */
+		p = "cumulative";
+	else if (!strcmp(arg, "--dirstat-by-file") ||
+		 !prefixcmp(arg, "--dirstat-by-file=")) { /* deprecated */
+		/* handle '--dirstat-by-file=*' like '--dirstat=files,*' */
+		mangled = xstrdup(arg + 12);
+		memcpy(mangled, "files", 5);
+		if (mangled[5]) {
+			assert(mangled[5] == '=');
+			mangled[5] = ',';
+		}
+		p = mangled;
+	}
+	else if (!prefixcmp(arg, "-X"))
+		p = arg + 2;
+	else if (!prefixcmp(arg, "--dirstat"))
+		p = arg + 9;
+	else
+		return 0;
+
+	options->output_format |= DIFF_FORMAT_DIRSTAT;
+
+	if (*p)
+		if (*p == '=')
+			p++;
+		parse_dirstat_params(options, p);
+
+	free(mangled);
+	return 1;
+}
+
 int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 {
 	const char *arg = av[0];
@@ -3163,16 +3248,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		options->output_format |= DIFF_FORMAT_NUMSTAT;
 	else if (!strcmp(arg, "--shortstat"))
 		options->output_format |= DIFF_FORMAT_SHORTSTAT;
-	else if (opt_arg(arg, 'X', "dirstat", &options->dirstat_percent))
-		options->output_format |= DIFF_FORMAT_DIRSTAT;
-	else if (!strcmp(arg, "--cumulative")) {
-		options->output_format |= DIFF_FORMAT_DIRSTAT;
-		DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
-	} else if (opt_arg(arg, 0, "dirstat-by-file",
-			   &options->dirstat_percent)) {
-		options->output_format |= DIFF_FORMAT_DIRSTAT;
-		DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
-	}
+	else if (!prefixcmp(arg, "-X") || !prefixcmp(arg, "--dirstat") ||
+		 !strcmp(arg, "--cumulative"))
+		/* -X, --dirstat[=<args>], --dirstat-by-file, or --cumulative */
+		return dirstat_opt(options, arg);
 	else if (!strcmp(arg, "--check"))
 		options->output_format |= DIFF_FORMAT_CHECKDIFF;
 	else if (!strcmp(arg, "--summary"))
diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
index 6ff7f9f..0ede619 100755
--- a/t/t4046-diff-dirstat.sh
+++ b/t/t4046-diff-dirstat.sh
@@ -346,6 +346,39 @@ test_expect_success 'vanilla -X' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'explicit defaults: --dirstat=changes,noncumulative,3' '
+	git diff --dirstat=changes,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=changes,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=changes,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'explicit defaults: -Xchanges,noncumulative,3' '
+	git diff -Xchanges,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff -Xchanges,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff -Xchanges,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'later options override earlier options:' '
+	git diff --dirstat=files,10,cumulative,changes,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,10,cumulative,changes,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,10,cumulative,changes,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+	git diff --dirstat=files --dirstat=10 --dirstat=cumulative --dirstat=changes --dirstat=noncumulative -X3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files --dirstat=10 --dirstat=cumulative --dirstat=changes --dirstat=noncumulative -X3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files --dirstat=10 --dirstat=cumulative --dirstat=changes --dirstat=noncumulative -X3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -449,6 +482,24 @@ test_expect_success '--dirstat=0 --cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=0,cumulative' '
+	git diff --dirstat=0,cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=0,cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=0,cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success '-X0,cumulative' '
+	git diff -X0,cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff -X0,cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff -X0,cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -491,6 +542,15 @@ test_expect_success '--dirstat-by-file' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=files' '
+	git diff --dirstat=files HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
   27.2% dst/copy/
   27.2% dst/move/
@@ -525,6 +585,15 @@ test_expect_success '--dirstat-by-file=10' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=files,10' '
+	git diff --dirstat=files,10 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,10 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,10 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -577,4 +646,54 @@ test_expect_success '--dirstat-by-file --cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=files,cumulative' '
+	git diff --dirstat=files,cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+  27.2% dst/copy/
+  27.2% dst/move/
+  54.5% dst/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  42.8% dst/copy/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  28.5% dst/move/
+  71.4% dst/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  33.3% dst/copy/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  33.3% dst/move/
+  66.6% dst/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=files,cumulative,10' '
+	git diff --dirstat=files,cumulative,10 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative,10 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative,10 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv2 4/6] Add config variable for specifying default --dirstat behavior
  2011-04-27  2:12                           ` [PATCHv2 " Johan Herland
                                               ` (2 preceding siblings ...)
  2011-04-27  2:12                             ` [PATCHv2 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
@ 2011-04-27  2:12                             ` Johan Herland
  2011-04-27  2:12                             ` [PATCHv2 5/6] Use floating point for --dirstat percentages Johan Herland
                                               ` (2 subsequent siblings)
  6 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-27  2:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

The new diff.dirstat config variable takes the same arguments as
'--dirstat=<args>', and specifies the default arguments for --dirstat.
The config is obviously overridden by --dirstat arguments passed on the
command line.

When not specified, the --dirstat defaults are 'changes,noncumulative,3'.

The patch also adds several tests verifying the interaction between the
diff.dirstat config variable, and the --dirstat command line option.

Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/config.txt       |   36 ++++++++++++++++++++
 Documentation/diff-options.txt |    2 +
 diff.c                         |   10 +++++-
 t/t4046-diff-dirstat.sh        |   72 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 119 insertions(+), 1 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 6babbc7..c18dd5a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -822,6 +822,42 @@ diff.autorefreshindex::
 	affects only 'git diff' Porcelain, and not lower level
 	'diff' commands such as 'git diff-files'.
 
+diff.dirstat::
+	A comma separated list of `--dirstat` parameters specifying the
+	default behavior of the `--dirstat` option to linkgit:git-diff[1]`
+	and friends. The defaults can be overridden on the command line
+	(using `--dirstat=<param1,param2,...>`). The fallback defaults
+	(when not changed by `diff.dirstat`) are `changes,noncumulative,3`.
+	The following parameters are available:
++
+--
+`changes`;;
+	Compute the dirstat numbers by counting the lines that have been
+	removed from the source, or added to the destination. This ignores
+	the amount of pure code movements within a file.  In other words,
+	rearranging lines in a file is not counted as much as other changes.
+	This is the default behavior when no parameter is given.
+`files`;;
+	Compute the dirstat numbers by counting the number of files changed.
+	Each changed file counts equally in the dirstat analysis. This is
+	the computationally cheapest `--dirstat` behavior, since it does
+	not have to look at the file contents at all.
+`cumulative`;;
+	Count changes in a child directory for the parent directory as well.
+	Note that when using `cumulative`, the sum of the percentages
+	reported may exceed 100%. The default (non-cumulative) behavior can
+	be specified with the `noncumulative` parameter.
+<limit>;;
+	An integer parameter specifies a cut-off percent (3% by default).
+	Directories contributing less than this percentage of the changes
+	are not shown in the output.
+--
++
+Example: The following will count changed files, while ignoring
+directories with less than 10% of the total amount of changed files,
+and accumulating child directory counts in the parent directories:
+`files,10,cumulative`.
+
 diff.external::
 	If this config variable is set, diff generation is not
 	performed using the internal diff machinery, but using the
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 6a3a9c1..4ad50b9 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -70,6 +70,8 @@ endif::git-format-patch[]
 	Output the distribution of relative amount of changes for each
 	sub-directory. The behavior of `--dirstat` can be customized by
 	passing it a comma separated list of parameters.
+	The defaults are controlled by the `diff.dirstat` configuration
+	variable (see linkgit:git-config[1]).
 	The following parameters are available:
 +
 --
diff --git a/diff.c b/diff.c
index 0387e4f..1b6e8c0 100644
--- a/diff.c
+++ b/diff.c
@@ -31,6 +31,7 @@ static const char *external_diff_cmd_cfg;
 int diff_auto_refresh_index = 1;
 static int diff_mnemonic_prefix;
 static int diff_no_prefix;
+static int diff_dirstat_percent_default = 3;
 static struct diff_options default_diff_options;
 
 static char diff_colors[][COLOR_MAXLEN] = {
@@ -188,6 +189,13 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (!strcmp(var, "diff.dirstat")) {
+		default_diff_options.dirstat_percent = diff_dirstat_percent_default;
+		parse_dirstat_params(&default_diff_options, value);
+		diff_dirstat_percent_default = default_diff_options.dirstat_percent;
+		return 0;
+	}
+
 	if (!prefixcmp(var, "submodule."))
 		return parse_submodule_config_option(var, value);
 
@@ -2929,7 +2937,7 @@ void diff_setup(struct diff_options *options)
 	options->line_termination = '\n';
 	options->break_opt = -1;
 	options->rename_limit = -1;
-	options->dirstat_percent = 3;
+	options->dirstat_percent = diff_dirstat_percent_default;
 	options->context = 3;
 
 	options->change = diff_change;
diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
index 0ede619..fa1885c 100755
--- a/t/t4046-diff-dirstat.sh
+++ b/t/t4046-diff-dirstat.sh
@@ -379,6 +379,15 @@ test_expect_success 'later options override earlier options:' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'non-defaults in config overridden by explicit defaults on command line' '
+	git -c diff.dirstat=files,cumulative,50 diff --dirstat=changes,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=files,cumulative,50 diff --dirstat=changes,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=files,cumulative,50 diff --dirstat=changes,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -430,6 +439,15 @@ test_expect_success '-X0' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=0' '
+	git -c diff.dirstat=0 diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0 diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0 diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -500,6 +518,24 @@ test_expect_success '-X0,cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=0,cumulative' '
+	git -c diff.dirstat=0,cumulative diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0,cumulative diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0,cumulative diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=0 & --dirstat=cumulative' '
+	git -c diff.dirstat=0 diff --dirstat=cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0 diff --dirstat=cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0 diff --dirstat=cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -551,6 +587,15 @@ test_expect_success '--dirstat=files' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=files' '
+	git -c diff.dirstat=files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
   27.2% dst/copy/
   27.2% dst/move/
@@ -594,6 +639,15 @@ test_expect_success '--dirstat=files,10' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=10,files' '
+	git -c diff.dirstat=10,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=10,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=10,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -655,6 +709,15 @@ test_expect_success '--dirstat=files,cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=cumulative,files' '
+	git -c diff.dirstat=cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
   27.2% dst/copy/
   27.2% dst/move/
@@ -696,4 +759,13 @@ test_expect_success '--dirstat=files,cumulative,10' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=10,cumulative,files' '
+	git -c diff.dirstat=10,cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=10,cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=10,cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv2 5/6] Use floating point for --dirstat percentages
  2011-04-27  2:12                           ` [PATCHv2 " Johan Herland
                                               ` (3 preceding siblings ...)
  2011-04-27  2:12                             ` [PATCHv2 4/6] Add config variable for specifying default --dirstat behavior Johan Herland
@ 2011-04-27  2:12                             ` Johan Herland
  2011-04-27  2:45                               ` Linus Torvalds
  2011-04-27  2:12                             ` [PATCHv2 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
  2011-04-27  8:24                             ` [PATCHv3 0/6] --dirstat fixes, part 2 Johan Herland
  6 siblings, 1 reply; 91+ messages in thread
From: Johan Herland @ 2011-04-27  2:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Allow specifying --dirstat cut-off percentage as a floating point number.

When printing the dirstat output, floating point numbers are presented in
rounded form (as opposed to truncated). Therefore, this patch includes a
significant churn in the expected output of the dirstat selftests.

A selftest verifying floating-point percentage input has been added.

Signed-off-by: Johan Herland <johan@herland.net>
---

Remaining questions:

 - Locale issues with strod(), e.g. decimal separator is a comma in certain
   locales.

...Johan

 diff.c                  |   14 +-
 diff.h                  |    2 +-
 t/t4046-diff-dirstat.sh |  327 ++++++++++++++++++++++++++---------------------
 3 files changed, 190 insertions(+), 153 deletions(-)

diff --git a/diff.c b/diff.c
index 1b6e8c0..eb26104 100644
--- a/diff.c
+++ b/diff.c
@@ -31,7 +31,7 @@ static const char *external_diff_cmd_cfg;
 int diff_auto_refresh_index = 1;
 static int diff_mnemonic_prefix;
 static int diff_no_prefix;
-static int diff_dirstat_percent_default = 3;
+static double diff_dirstat_percent_default = 3.0;
 static struct diff_options default_diff_options;
 
 static char diff_colors[][COLOR_MAXLEN] = {
@@ -95,7 +95,7 @@ static void parse_dirstat_params(struct diff_options *options, const char *param
 		}
 		else if (isdigit(*p)) {
 			char *end;
-			options->dirstat_percent = strtoul(p, &end, 10);
+			options->dirstat_percent = strtod(p, &end);
 			p = end;
 		}
 		else
@@ -1504,7 +1504,8 @@ struct dirstat_file {
 
 struct dirstat_dir {
 	struct dirstat_file *files;
-	int alloc, nr, percent, cumulative;
+	double percent;
+	int alloc, nr, cumulative;
 };
 
 static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
@@ -1552,11 +1553,10 @@ static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
 	 */
 	if (baselen && sources != 1) {
 		if (this_dir) {
-			int permille = this_dir * 1000 / changed;
-			int percent = permille / 10;
+			double percent = this_dir * 100.0 / changed;
 			if (percent >= dir->percent) {
-				fprintf(opt->file, "%s%4d.%01d%% %.*s\n", line_prefix,
-					percent, permille % 10, baselen, base);
+				fprintf(opt->file, "%s%6.1f%% %.*s\n", line_prefix,
+					percent, baselen, base);
 				if (!dir->cumulative)
 					return 0;
 			}
diff --git a/diff.h b/diff.h
index 0083d92..781c620 100644
--- a/diff.h
+++ b/diff.h
@@ -111,13 +111,13 @@ struct diff_options {
 	int rename_score;
 	int rename_limit;
 	int warn_on_too_large_rename;
-	int dirstat_percent;
 	int setup;
 	int abbrev;
 	const char *prefix;
 	int prefix_length;
 	const char *stat_sep;
 	long xdl_opts;
+	double dirstat_percent;
 
 	int stat_width;
 	int stat_name_width;
diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
index fa1885c..3cafd0d 100755
--- a/t/t4046-diff-dirstat.sh
+++ b/t/t4046-diff-dirstat.sh
@@ -301,31 +301,31 @@ test_expect_success 'sanity check setup (--stat)' '
 
 # changed/text and rearranged/text falls below default 3% threshold
 cat <<EOF >expect_diff_dirstat
-  10.8% dst/copy/changed/
-  10.8% dst/copy/rearranged/
-  10.8% dst/copy/unchanged/
-  10.8% dst/move/changed/
-  10.8% dst/move/rearranged/
-  10.8% dst/move/unchanged/
-  10.8% src/move/changed/
-  10.8% src/move/rearranged/
-  10.8% src/move/unchanged/
+  10.9% dst/copy/changed/
+  10.9% dst/copy/rearranged/
+  10.9% dst/copy/unchanged/
+  10.9% dst/move/changed/
+  10.9% dst/move/rearranged/
+  10.9% dst/move/unchanged/
+  10.9% src/move/changed/
+  10.9% src/move/rearranged/
+  10.9% src/move/unchanged/
 EOF
 
 # rearranged/text falls below default 3% threshold
 cat <<EOF >expect_diff_dirstat_M
-   5.8% changed/
+   5.9% changed/
   29.3% dst/copy/changed/
   29.3% dst/copy/rearranged/
   29.3% dst/copy/unchanged/
-   5.8% dst/move/changed/
+   5.9% dst/move/changed/
 EOF
 
 # rearranged/text falls below default 3% threshold
 cat <<EOF >expect_diff_dirstat_CC
-  32.6% changed/
-  32.6% dst/copy/changed/
-  32.6% dst/move/changed/
+  32.7% changed/
+  32.7% dst/copy/changed/
+  32.7% dst/move/changed/
 EOF
 
 test_expect_success 'vanilla --dirstat' '
@@ -389,36 +389,36 @@ test_expect_success 'non-defaults in config overridden by explicit defaults on c
 '
 
 cat <<EOF >expect_diff_dirstat
-   2.1% changed/
-  10.8% dst/copy/changed/
-  10.8% dst/copy/rearranged/
-  10.8% dst/copy/unchanged/
-  10.8% dst/move/changed/
-  10.8% dst/move/rearranged/
-  10.8% dst/move/unchanged/
+   2.2% changed/
+  10.9% dst/copy/changed/
+  10.9% dst/copy/rearranged/
+  10.9% dst/copy/unchanged/
+  10.9% dst/move/changed/
+  10.9% dst/move/rearranged/
+  10.9% dst/move/unchanged/
    0.0% rearranged/
-  10.8% src/move/changed/
-  10.8% src/move/rearranged/
-  10.8% src/move/unchanged/
+  10.9% src/move/changed/
+  10.9% src/move/rearranged/
+  10.9% src/move/unchanged/
 EOF
 
 cat <<EOF >expect_diff_dirstat_M
-   5.8% changed/
+   5.9% changed/
   29.3% dst/copy/changed/
   29.3% dst/copy/rearranged/
   29.3% dst/copy/unchanged/
-   5.8% dst/move/changed/
+   5.9% dst/move/changed/
    0.1% dst/move/rearranged/
    0.1% rearranged/
 EOF
 
 cat <<EOF >expect_diff_dirstat_CC
-  32.6% changed/
-  32.6% dst/copy/changed/
-   0.6% dst/copy/rearranged/
-  32.6% dst/move/changed/
-   0.6% dst/move/rearranged/
-   0.6% rearranged/
+  32.7% changed/
+  32.7% dst/copy/changed/
+   0.7% dst/copy/rearranged/
+  32.7% dst/move/changed/
+   0.7% dst/move/rearranged/
+   0.7% rearranged/
 EOF
 
 test_expect_success '--dirstat=0' '
@@ -449,46 +449,46 @@ test_expect_success 'diff.dirstat=0' '
 '
 
 cat <<EOF >expect_diff_dirstat
-   2.1% changed/
-  10.8% dst/copy/changed/
-  10.8% dst/copy/rearranged/
-  10.8% dst/copy/unchanged/
-  32.5% dst/copy/
-  10.8% dst/move/changed/
-  10.8% dst/move/rearranged/
-  10.8% dst/move/unchanged/
-  32.5% dst/move/
-  65.1% dst/
+   2.2% changed/
+  10.9% dst/copy/changed/
+  10.9% dst/copy/rearranged/
+  10.9% dst/copy/unchanged/
+  32.6% dst/copy/
+  10.9% dst/move/changed/
+  10.9% dst/move/rearranged/
+  10.9% dst/move/unchanged/
+  32.6% dst/move/
+  65.2% dst/
    0.0% rearranged/
-  10.8% src/move/changed/
-  10.8% src/move/rearranged/
-  10.8% src/move/unchanged/
-  32.5% src/move/
+  10.9% src/move/changed/
+  10.9% src/move/rearranged/
+  10.9% src/move/unchanged/
+  32.6% src/move/
 EOF
 
 cat <<EOF >expect_diff_dirstat_M
-   5.8% changed/
+   5.9% changed/
   29.3% dst/copy/changed/
   29.3% dst/copy/rearranged/
   29.3% dst/copy/unchanged/
   88.0% dst/copy/
-   5.8% dst/move/changed/
+   5.9% dst/move/changed/
    0.1% dst/move/rearranged/
-   5.9% dst/move/
+   6.0% dst/move/
   94.0% dst/
    0.1% rearranged/
 EOF
 
 cat <<EOF >expect_diff_dirstat_CC
-  32.6% changed/
-  32.6% dst/copy/changed/
-   0.6% dst/copy/rearranged/
+  32.7% changed/
+  32.7% dst/copy/changed/
+   0.7% dst/copy/rearranged/
   33.3% dst/copy/
-  32.6% dst/move/changed/
-   0.6% dst/move/rearranged/
+  32.7% dst/move/changed/
+   0.7% dst/move/rearranged/
   33.3% dst/move/
-  66.6% dst/
-   0.6% rearranged/
+  66.7% dst/
+   0.7% rearranged/
 EOF
 
 test_expect_success '--dirstat=0 --cumulative' '
@@ -537,36 +537,36 @@ test_expect_success 'diff.dirstat=0 & --dirstat=cumulative' '
 '
 
 cat <<EOF >expect_diff_dirstat
-   9.0% changed/
-   9.0% dst/copy/changed/
-   9.0% dst/copy/rearranged/
-   9.0% dst/copy/unchanged/
-   9.0% dst/move/changed/
-   9.0% dst/move/rearranged/
-   9.0% dst/move/unchanged/
-   9.0% rearranged/
-   9.0% src/move/changed/
-   9.0% src/move/rearranged/
-   9.0% src/move/unchanged/
+   9.1% changed/
+   9.1% dst/copy/changed/
+   9.1% dst/copy/rearranged/
+   9.1% dst/copy/unchanged/
+   9.1% dst/move/changed/
+   9.1% dst/move/rearranged/
+   9.1% dst/move/unchanged/
+   9.1% rearranged/
+   9.1% src/move/changed/
+   9.1% src/move/rearranged/
+   9.1% src/move/unchanged/
 EOF
 
 cat <<EOF >expect_diff_dirstat_M
-  14.2% changed/
-  14.2% dst/copy/changed/
-  14.2% dst/copy/rearranged/
-  14.2% dst/copy/unchanged/
-  14.2% dst/move/changed/
-  14.2% dst/move/rearranged/
-  14.2% rearranged/
+  14.3% changed/
+  14.3% dst/copy/changed/
+  14.3% dst/copy/rearranged/
+  14.3% dst/copy/unchanged/
+  14.3% dst/move/changed/
+  14.3% dst/move/rearranged/
+  14.3% rearranged/
 EOF
 
 cat <<EOF >expect_diff_dirstat_CC
-  16.6% changed/
-  16.6% dst/copy/changed/
-  16.6% dst/copy/rearranged/
-  16.6% dst/move/changed/
-  16.6% dst/move/rearranged/
-  16.6% rearranged/
+  16.7% changed/
+  16.7% dst/copy/changed/
+  16.7% dst/copy/rearranged/
+  16.7% dst/move/changed/
+  16.7% dst/move/rearranged/
+  16.7% rearranged/
 EOF
 
 test_expect_success '--dirstat-by-file' '
@@ -597,28 +597,28 @@ test_expect_success 'diff.dirstat=files' '
 '
 
 cat <<EOF >expect_diff_dirstat
-  27.2% dst/copy/
-  27.2% dst/move/
-  27.2% src/move/
+  27.3% dst/copy/
+  27.3% dst/move/
+  27.3% src/move/
 EOF
 
 cat <<EOF >expect_diff_dirstat_M
-  14.2% changed/
-  14.2% dst/copy/changed/
-  14.2% dst/copy/rearranged/
-  14.2% dst/copy/unchanged/
-  14.2% dst/move/changed/
-  14.2% dst/move/rearranged/
-  14.2% rearranged/
+  14.3% changed/
+  14.3% dst/copy/changed/
+  14.3% dst/copy/rearranged/
+  14.3% dst/copy/unchanged/
+  14.3% dst/move/changed/
+  14.3% dst/move/rearranged/
+  14.3% rearranged/
 EOF
 
 cat <<EOF >expect_diff_dirstat_CC
-  16.6% changed/
-  16.6% dst/copy/changed/
-  16.6% dst/copy/rearranged/
-  16.6% dst/move/changed/
-  16.6% dst/move/rearranged/
-  16.6% rearranged/
+  16.7% changed/
+  16.7% dst/copy/changed/
+  16.7% dst/copy/rearranged/
+  16.7% dst/move/changed/
+  16.7% dst/move/rearranged/
+  16.7% rearranged/
 EOF
 
 test_expect_success '--dirstat-by-file=10' '
@@ -649,46 +649,46 @@ test_expect_success 'diff.dirstat=10,files' '
 '
 
 cat <<EOF >expect_diff_dirstat
-   9.0% changed/
-   9.0% dst/copy/changed/
-   9.0% dst/copy/rearranged/
-   9.0% dst/copy/unchanged/
-  27.2% dst/copy/
-   9.0% dst/move/changed/
-   9.0% dst/move/rearranged/
-   9.0% dst/move/unchanged/
-  27.2% dst/move/
+   9.1% changed/
+   9.1% dst/copy/changed/
+   9.1% dst/copy/rearranged/
+   9.1% dst/copy/unchanged/
+  27.3% dst/copy/
+   9.1% dst/move/changed/
+   9.1% dst/move/rearranged/
+   9.1% dst/move/unchanged/
+  27.3% dst/move/
   54.5% dst/
-   9.0% rearranged/
-   9.0% src/move/changed/
-   9.0% src/move/rearranged/
-   9.0% src/move/unchanged/
-  27.2% src/move/
+   9.1% rearranged/
+   9.1% src/move/changed/
+   9.1% src/move/rearranged/
+   9.1% src/move/unchanged/
+  27.3% src/move/
 EOF
 
 cat <<EOF >expect_diff_dirstat_M
-  14.2% changed/
-  14.2% dst/copy/changed/
-  14.2% dst/copy/rearranged/
-  14.2% dst/copy/unchanged/
-  42.8% dst/copy/
-  14.2% dst/move/changed/
-  14.2% dst/move/rearranged/
-  28.5% dst/move/
+  14.3% changed/
+  14.3% dst/copy/changed/
+  14.3% dst/copy/rearranged/
+  14.3% dst/copy/unchanged/
+  42.9% dst/copy/
+  14.3% dst/move/changed/
+  14.3% dst/move/rearranged/
+  28.6% dst/move/
   71.4% dst/
-  14.2% rearranged/
+  14.3% rearranged/
 EOF
 
 cat <<EOF >expect_diff_dirstat_CC
-  16.6% changed/
-  16.6% dst/copy/changed/
-  16.6% dst/copy/rearranged/
+  16.7% changed/
+  16.7% dst/copy/changed/
+  16.7% dst/copy/rearranged/
   33.3% dst/copy/
-  16.6% dst/move/changed/
-  16.6% dst/move/rearranged/
+  16.7% dst/move/changed/
+  16.7% dst/move/rearranged/
   33.3% dst/move/
-  66.6% dst/
-  16.6% rearranged/
+  66.7% dst/
+  16.7% rearranged/
 EOF
 
 test_expect_success '--dirstat-by-file --cumulative' '
@@ -719,35 +719,35 @@ test_expect_success 'diff.dirstat=cumulative,files' '
 '
 
 cat <<EOF >expect_diff_dirstat
-  27.2% dst/copy/
-  27.2% dst/move/
+  27.3% dst/copy/
+  27.3% dst/move/
   54.5% dst/
-  27.2% src/move/
+  27.3% src/move/
 EOF
 
 cat <<EOF >expect_diff_dirstat_M
-  14.2% changed/
-  14.2% dst/copy/changed/
-  14.2% dst/copy/rearranged/
-  14.2% dst/copy/unchanged/
-  42.8% dst/copy/
-  14.2% dst/move/changed/
-  14.2% dst/move/rearranged/
-  28.5% dst/move/
+  14.3% changed/
+  14.3% dst/copy/changed/
+  14.3% dst/copy/rearranged/
+  14.3% dst/copy/unchanged/
+  42.9% dst/copy/
+  14.3% dst/move/changed/
+  14.3% dst/move/rearranged/
+  28.6% dst/move/
   71.4% dst/
-  14.2% rearranged/
+  14.3% rearranged/
 EOF
 
 cat <<EOF >expect_diff_dirstat_CC
-  16.6% changed/
-  16.6% dst/copy/changed/
-  16.6% dst/copy/rearranged/
+  16.7% changed/
+  16.7% dst/copy/changed/
+  16.7% dst/copy/rearranged/
   33.3% dst/copy/
-  16.6% dst/move/changed/
-  16.6% dst/move/rearranged/
+  16.7% dst/move/changed/
+  16.7% dst/move/rearranged/
   33.3% dst/move/
-  66.6% dst/
-  16.6% rearranged/
+  66.7% dst/
+  16.7% rearranged/
 EOF
 
 test_expect_success '--dirstat=files,cumulative,10' '
@@ -768,4 +768,41 @@ test_expect_success 'diff.dirstat=10,cumulative,files' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+cat <<EOF >expect_diff_dirstat
+  27.3% dst/copy/
+  27.3% dst/move/
+  54.5% dst/
+  27.3% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  42.9% dst/copy/
+  28.6% dst/move/
+  71.4% dst/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  33.3% dst/copy/
+  33.3% dst/move/
+  66.7% dst/
+EOF
+
+test_expect_success '--dirstat=files,cumulative,16.7' '
+	git diff --dirstat=files,cumulative,16.7 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative,16.7 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative,16.7 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=16.7,cumulative,files' '
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv2 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat
  2011-04-27  2:12                           ` [PATCHv2 " Johan Herland
                                               ` (4 preceding siblings ...)
  2011-04-27  2:12                             ` [PATCHv2 5/6] Use floating point for --dirstat percentages Johan Herland
@ 2011-04-27  2:12                             ` Johan Herland
  2011-04-27  8:24                             ` [PATCHv3 0/6] --dirstat fixes, part 2 Johan Herland
  6 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-27  2:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

This patch adds an alternative implementation of show_dirstat(), called
show_dirstat_by_line(), which uses the more expensive diffstat analysis
(as opposed to show_dirstat()'s own (relatively inexpensive) analysis)
to derive the numbers from which the --dirstat output is computed.

The alternative implementation is controlled by the new "lines" parameter
to the --dirstat option (or the diff.dirstat config variable).

For binary files, the diffstat analysis counts bytes instead of lines,
so to prevent binary files from dominating the dirstat results, the
byte counts for binary files are divided by 64 before being compared to
their textual/line-based counterparts. This is a stupid and ugly - but
very cheap - heuristic.

In linux-2.6.git, running the three different --dirstat modes:

  time git diff v2.6.20..v2.6.30 --dirstat=changes > /dev/null
vs.
  time git diff v2.6.20..v2.6.30 --dirstat=lines > /dev/null
vs.
  time git diff v2.6.20..v2.6.30 --dirstat=files > /dev/null

yields the following average runtimes on my machine:

 - "changes" (default): ~6.0 s
 - "lines":             ~9.6 s
 - "files":             ~0.1 s

So, as expected, there's a considerable performance hit (~60%) by going
through the full diffstat analysis as compared to the default "changes"
analysis (obviously, "files" is much faster than both). As such, the
"lines" mode is probably only useful if you really need the --dirstat
numbers to be consistent with the numbers returned from the other
--*stat options.

The patch also includes documentation and tests for the new dirstat mode.

Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/config.txt       |    8 +++
 Documentation/diff-options.txt |    8 +++
 diff.c                         |   62 ++++++++++++++++++++++++-
 diff.h                         |    1 +
 t/t4046-diff-dirstat.sh        |  100 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 177 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index c18dd5a..0cad75c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -837,6 +837,14 @@ diff.dirstat::
 	the amount of pure code movements within a file.  In other words,
 	rearranging lines in a file is not counted as much as other changes.
 	This is the default behavior when no parameter is given.
+`lines`;;
+	Compute the dirstat numbers by doing the regular line-based diff
+	analysis, and summing the removed/added line counts. (For binary
+	files, count 64-byte chunks instead, since binary files have no
+	natural concept of lines). This is a more expensive `--dirstat`
+	behavior than the `changes` behavior, but it does count rearranged
+	lines within a file as much as other changes. The resulting output
+	is consistent with what you get from the other `--*stat` options.
 `files`;;
 	Compute the dirstat numbers by counting the number of files changed.
 	Each changed file counts equally in the dirstat analysis. This is
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 4ad50b9..327d10a 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -81,6 +81,14 @@ endif::git-format-patch[]
 	the amount of pure code movements within a file.  In other words,
 	rearranging lines in a file is not counted as much as other changes.
 	This is the default behavior when no parameter is given.
+`lines`;;
+	Compute the dirstat numbers by doing the regular line-based diff
+	analysis, and summing the removed/added line counts. (For binary
+	files, count 64-byte chunks instead, since binary files have no
+	natural concept of lines). This is a more expensive `--dirstat`
+	behavior than the `changes` behavior, but it does count rearranged
+	lines within a file as much as other changes. The resulting output
+	is consistent with what you get from the other `--*stat` options.
 `files`;;
 	Compute the dirstat numbers by counting the number of files changed.
 	Each changed file counts equally in the dirstat analysis. This is
diff --git a/diff.c b/diff.c
index eb26104..37c5fb5 100644
--- a/diff.c
+++ b/diff.c
@@ -79,10 +79,17 @@ static void parse_dirstat_params(struct diff_options *options, const char *param
 	while (*p) {
 		if (!prefixcmp(p, "changes")) {
 			p += 7;
+			DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
+			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
+		}
+		else if (!prefixcmp(p, "lines")) {
+			p += 5;
+			DIFF_OPT_SET(options, DIRSTAT_BY_LINE);
 			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
 		}
 		else if (!prefixcmp(p, "files")) {
 			p += 5;
+			DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
 			DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
 		}
 		else if (!prefixcmp(p, "noncumulative")) {
@@ -1671,6 +1678,50 @@ found_damage:
 	gather_dirstat(options, &dir, changed, "", 0);
 }
 
+static void show_dirstat_by_line(struct diffstat_t *data, struct diff_options *options)
+{
+	int i;
+	unsigned long changed;
+	struct dirstat_dir dir;
+
+	if (data->nr == 0)
+		return;
+
+	dir.files = NULL;
+	dir.alloc = 0;
+	dir.nr = 0;
+	dir.percent = options->dirstat_percent;
+	dir.cumulative = DIFF_OPT_TST(options, DIRSTAT_CUMULATIVE);
+
+	changed = 0;
+	for (i = 0; i < data->nr; i++) {
+		struct diffstat_file *file = data->files[i];
+		unsigned long damage = file->added + file->deleted;
+		if (file->is_binary)
+			/*
+			 * binary files counts bytes, not lines. Must find some
+			 * way to normalize binary bytes vs. textual lines.
+			 * The following heuristic assumes that there are 64
+			 * bytes per "line".
+			 * This is stupid and ugly, but very cheap...
+			 */
+			damage = (damage + 63) / 64;
+		ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
+		dir.files[dir.nr].name = file->name;
+		dir.files[dir.nr].changed = damage;
+		changed += damage;
+		dir.nr++;
+	}
+
+	/* This can happen even with many files, if everything was renames */
+	if (!changed)
+		return;
+
+	/* Show all directories with more than x% of the changes */
+	qsort(dir.files, dir.nr, sizeof(dir.files[0]), dirstat_compare);
+	gather_dirstat(options, &dir, changed, "", 0);
+}
+
 static void free_diffstat_info(struct diffstat_t *diffstat)
 {
 	int i;
@@ -4080,6 +4131,7 @@ void diff_flush(struct diff_options *options)
 	struct diff_queue_struct *q = &diff_queued_diff;
 	int i, output_format = options->output_format;
 	int separator = 0;
+	int dirstat_by_line = 0;
 
 	/*
 	 * Order: raw, stat, summary, patch
@@ -4100,7 +4152,11 @@ void diff_flush(struct diff_options *options)
 		separator++;
 	}
 
-	if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT)) {
+	if (output_format & DIFF_FORMAT_DIRSTAT && DIFF_OPT_TST(options, DIRSTAT_BY_LINE))
+		dirstat_by_line = 1;
+
+	if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT) ||
+	    dirstat_by_line) {
 		struct diffstat_t diffstat;
 
 		memset(&diffstat, 0, sizeof(struct diffstat_t));
@@ -4115,10 +4171,12 @@ void diff_flush(struct diff_options *options)
 			show_stats(&diffstat, options);
 		if (output_format & DIFF_FORMAT_SHORTSTAT)
 			show_shortstats(&diffstat, options);
+		if (output_format & DIFF_FORMAT_DIRSTAT)
+			show_dirstat_by_line(&diffstat, options);
 		free_diffstat_info(&diffstat);
 		separator++;
 	}
-	if (output_format & DIFF_FORMAT_DIRSTAT)
+	if ((output_format & DIFF_FORMAT_DIRSTAT) && !dirstat_by_line)
 		show_dirstat(options);
 
 	if (output_format & DIFF_FORMAT_SUMMARY && !is_summary_empty(q)) {
diff --git a/diff.h b/diff.h
index 781c620..5f12049 100644
--- a/diff.h
+++ b/diff.h
@@ -78,6 +78,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
 #define DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES (1 << 25)
 #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
 #define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
+#define DIFF_OPT_DIRSTAT_BY_LINE     (1 << 28)
 
 #define DIFF_OPT_TST(opts, flag)    ((opts)->flags & DIFF_OPT_##flag)
 #define DIFF_OPT_SET(opts, flag)    ((opts)->flags |= DIFF_OPT_##flag)
diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
index 3cafd0d..f1946a1 100755
--- a/t/t4046-diff-dirstat.sh
+++ b/t/t4046-diff-dirstat.sh
@@ -805,4 +805,104 @@ test_expect_success 'diff.dirstat=16.7,cumulative,files' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+cat <<EOF >expect_diff_dirstat
+  10.6% dst/copy/changed/
+  10.6% dst/copy/rearranged/
+  10.6% dst/copy/unchanged/
+  10.6% dst/move/changed/
+  10.6% dst/move/rearranged/
+  10.6% dst/move/unchanged/
+  10.6% src/move/changed/
+  10.6% src/move/rearranged/
+  10.6% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.3% changed/
+  26.3% dst/copy/changed/
+  26.3% dst/copy/rearranged/
+  26.3% dst/copy/unchanged/
+   5.3% dst/move/changed/
+   5.3% dst/move/rearranged/
+   5.3% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.7% changed/
+  16.7% dst/copy/changed/
+  16.7% dst/copy/rearranged/
+  16.7% dst/move/changed/
+  16.7% dst/move/rearranged/
+  16.7% rearranged/
+EOF
+
+test_expect_success '--dirstat=lines' '
+	git diff --dirstat=lines HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=lines -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=lines -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=lines' '
+	git -c diff.dirstat=lines diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=lines diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=lines diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+   2.1% changed/
+  10.6% dst/copy/changed/
+  10.6% dst/copy/rearranged/
+  10.6% dst/copy/unchanged/
+  10.6% dst/move/changed/
+  10.6% dst/move/rearranged/
+  10.6% dst/move/unchanged/
+   2.1% rearranged/
+  10.6% src/move/changed/
+  10.6% src/move/rearranged/
+  10.6% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.3% changed/
+  26.3% dst/copy/changed/
+  26.3% dst/copy/rearranged/
+  26.3% dst/copy/unchanged/
+   5.3% dst/move/changed/
+   5.3% dst/move/rearranged/
+   5.3% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.7% changed/
+  16.7% dst/copy/changed/
+  16.7% dst/copy/rearranged/
+  16.7% dst/move/changed/
+  16.7% dst/move/rearranged/
+  16.7% rearranged/
+EOF
+
+test_expect_success '--dirstat=lines,0' '
+	git diff --dirstat=lines,0 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=lines,0 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=lines,0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=0,lines' '
+	git -c diff.dirstat=0,lines diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0,lines diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0,lines diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCHv2 5/6] Use floating point for --dirstat percentages
  2011-04-27  2:12                             ` [PATCHv2 5/6] Use floating point for --dirstat percentages Johan Herland
@ 2011-04-27  2:45                               ` Linus Torvalds
  0 siblings, 0 replies; 91+ messages in thread
From: Linus Torvalds @ 2011-04-27  2:45 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Junio C Hamano

On Tue, Apr 26, 2011 at 7:12 PM, Johan Herland <johan@herland.net> wrote:
> Allow specifying --dirstat cut-off percentage as a floating point number.
>
> When printing the dirstat output, floating point numbers are presented in
> rounded form (as opposed to truncated). Therefore, this patch includes a
> significant churn in the expected output of the dirstat selftests.

Hmm. So thinking more about this, I do think it's preferable that the
percentages never be rounded up.

Why? Having them add up to 99.9 due to rounding error makes a ton more
sense to me than having percentages add up to 100.1 But I guess I
don't care _that_ much, since most of the time I use "cumulative",
where you don't much have that issue anyway.

Also:

> Remaining questions:
>
>  - Locale issues with strod(), e.g. decimal separator is a comma in certain
>   locales.

I think this is a serious issue. Of course, any sane user will have
the numeric locale be set to "C", but we know that any argument that
depends on sane users is likely broken.

So regardless of the rounding, I do think we should make sure that we have a

  setlocale(LC_NUMERIC, "C");

both for the parsing _and_ for the printout. And if somebody really
wants the locale to affect these kinds of things, we might have a
config option to allow it, but I think we should default to a nice
fixed format in this case.

(I don't know anybody who uses LC_NUMERIC that actually sets the
decimal point character to ',' though - it is bound to break a ton of
applications. And I say that as somebody who grew up with a "decimal
comma").

                   Linus

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 5/6] Use floating point for --dirstat percentages
  2011-04-27  2:02                               ` Johan Herland
@ 2011-04-27  4:42                                 ` Junio C Hamano
  2011-04-27  4:53                                   ` Linus Torvalds
  0 siblings, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2011-04-27  4:42 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Linus Torvalds

Johan Herland <johan@herland.net> writes:

> On Tuesday 26 April 2011, Junio C Hamano wrote:
>> Johan Herland <johan@herland.net> writes:
>> > Allow specifying --dirstat cut-off percentage as a floating point
>> > number.
>> > 
>> > When printing the dirstat output, floating point numbers are presented
>> > in rounded form (as opposed to truncated).
>> 
>> Why isn't it sufficient to change
>> 
>> 	permille = this_dir * 1000 / changed
>> 
>> to
>> 
>> 	permille = (this_dir * 2000 + changed) / (changed * 2)
>> 
>> or something?  If rounding is the only issue that bothers you (I admit
>> that it does bother me, now that you brought it up), that is.
>
> Actually, rounding doesn't bother me at all (or rather, I don't really care 
> if we round or truncate, as long as we're consistent).

If that is the case, I would rather not see us use floating point for
this.

I have used "#define MAX_SCORE 60000.0" to sneakily run rename similarity
score in floating point, feeling ashamed of it, and have been meaning to
fix that for quite a long time, but other than that, I do not think we
have anything that uses float.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 5/6] Use floating point for --dirstat percentages
  2011-04-27  4:42                                 ` Junio C Hamano
@ 2011-04-27  4:53                                   ` Linus Torvalds
  2011-04-27  5:20                                     ` Junio C Hamano
  0 siblings, 1 reply; 91+ messages in thread
From: Linus Torvalds @ 2011-04-27  4:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, git

On Tue, Apr 26, 2011 at 9:42 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> If that is the case, I would rather not see us use floating point for
> this.

Considering that we still just output with a tenth of a percent
granularity, I'd suggest just continuing with using permille
internally - including for the limit.

So instead of actually using floating point, just parsing a single
digit worth of fractional percent would be beautiful. IOW, being able
to say

  --dirstat=1.5

to give a 1.5% cut-off point would be really nice - but then
internally just saying "15 permille" and using integers all the way?

Doing all the fake floating point by hand also obviously then avoids
the whole LC_NUMERIC locale issue.

                        Linus

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
  2011-04-27  2:02                               ` Johan Herland
@ 2011-04-27  4:53                                 ` Junio C Hamano
  2011-04-27 20:51                                 ` Junio C Hamano
  1 sibling, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2011-04-27  4:53 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Linus Torvalds

Johan Herland <johan@herland.net> writes:

> I have tried to consistently use "option" for referring to the entire
> "--dirstat=whatever" entity, and then use "argument" for referring to
> each comma-separated token following "--dirstat=".

Ok.

Your terminology is pretty much consistent with how POSIX calls these
things (Cf. *1*, *2*)

 * -X is an OPTION;
 * 3 in -X3 is an OPTION ARGUMENT; and
 * OPTION ARGUMENT is explained as "A parameter that follows certain options".

Between "--option=parameter" or "--option=option argument", the former is
easier to type and read, so it is slightly more preferable.


>> > +--
>> > +`changes`;;
>> > +	Compute the dirstat numbers by counting the lines that have been
>> > +	removed from the source, or added to the destination. This ignores
>> > +	the amount of pure code movements within a file.  In other words,
>> > +	rearranging lines in a file is not counted as much as other changes.
>> > +	This is the default `--dirstat` behavior.
>> 
>> "default behavior when no parameter is given"?

Right.  Thanks.

[References]

*1* 12.1 Utility Argument Syntax

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html#tag_12_01

*2* 3.256 Option, 3.257 Option-Argument

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_256.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 5/6] Use floating point for --dirstat percentages
  2011-04-27  4:53                                   ` Linus Torvalds
@ 2011-04-27  5:20                                     ` Junio C Hamano
  0 siblings, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2011-04-27  5:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johan Herland, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> So instead of actually using floating point, just parsing a single
> digit worth of fractional percent would be beautiful. IOW, being able
> to say
>
>   --dirstat=1.5
>
> to give a 1.5% cut-off point would be really nice - but then
> internally just saying "15 permille" and using integers all the way?

Makes perfect sense to me.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCHv3 0/6] --dirstat fixes, part 2
  2011-04-27  2:12                           ` [PATCHv2 " Johan Herland
                                               ` (5 preceding siblings ...)
  2011-04-27  2:12                             ` [PATCHv2 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
@ 2011-04-27  8:24                             ` Johan Herland
  2011-04-27  8:24                               ` [PATCHv3 1/6] Add several testcases for --dirstat and friends Johan Herland
                                                 ` (6 more replies)
  6 siblings, 7 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-27  8:24 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Hi,

Here's version 3 with patch 5/6 changed to use integer permille calculation
instead of floating point percent calculations. Patch 6/6 also has minor
updates caused by the 5/6 changes (s/percent/permille/ plus updated test
vectors). The other patches are unchanged.


Have fun! :)

...Johan

Johan Herland (6):
  Add several testcases for --dirstat and friends
  Make --dirstat=0 output directories that contribute < 0.1% of changes
  Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
  Add config variable for specifying default --dirstat behavior
  Allow specifying --dirstat cut-off percentage as a floating point number
  New --dirstat=lines mode, doing dirstat analysis based on diffstat

 Documentation/config.txt       |   44 ++
 Documentation/diff-options.txt |   54 ++-
 diff.c                         |  191 ++++++++-
 diff.h                         |    3 +-
 t/t4046-diff-dirstat.sh        |  908 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 1167 insertions(+), 33 deletions(-)
 create mode 100755 t/t4046-diff-dirstat.sh

-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCHv3 1/6] Add several testcases for --dirstat and friends
  2011-04-27  8:24                             ` [PATCHv3 0/6] --dirstat fixes, part 2 Johan Herland
@ 2011-04-27  8:24                               ` Johan Herland
  2011-04-27  8:24                               ` [PATCHv3 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
                                                 ` (5 subsequent siblings)
  6 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-27  8:24 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Currently, t4013 is the only selftest that exercises the --dirstat machinery,
but it only does a superficial verification of --dirstat's output.

This patch adds a new selftest - t4046-diff-dirstat.sh - which prepares a
commit containing:
 - unchanged files, changed files and files with rearranged lines
 - copied files, moved files, and unmoved files

It then verifies the correct dirstat output for that commit in the following
dirstat modes:
 - --dirstat
 - -X
 - --dirstat=0
 - -X0
 - --cumulative
 - --dirstat-by-file
 - (plus combinations of the above)

Each of the above tests are also run with:
 - no rename detection
 - rename detection (-M)
 - expensive copy detection (-C -C)

Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t4046-diff-dirstat.sh |  580 +++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 580 insertions(+), 0 deletions(-)
 create mode 100755 t/t4046-diff-dirstat.sh

diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
new file mode 100755
index 0000000..eb6bf47
--- /dev/null
+++ b/t/t4046-diff-dirstat.sh
@@ -0,0 +1,580 @@
+#!/bin/sh
+
+test_description='diff --dirstat tests'
+. ./test-lib.sh
+
+# set up two commits where the second commit has these files
+# (10 lines in each file):
+#
+#   unchanged/text           (unchanged from 1st commit)
+#   changed/text             (changed 1st line)
+#   rearranged/text          (swapped 1st and 2nd line)
+#   dst/copy/unchanged/text  (copied from src/copy/unchanged/text, unchanged)
+#   dst/copy/changed/text    (copied from src/copy/changed/text, changed)
+#   dst/copy/rearranged/text (copied from src/copy/rearranged/text, rearranged)
+#   dst/move/unchanged/text  (moved from src/move/unchanged/text, unchanged)
+#   dst/move/changed/text    (moved from src/move/changed/text, changed)
+#   dst/move/rearranged/text (moved from src/move/rearranged/text, rearranged)
+
+test_expect_success 'setup' '
+	mkdir unchanged &&
+	mkdir changed &&
+	mkdir rearranged &&
+	mkdir src &&
+	mkdir src/copy &&
+	mkdir src/copy/unchanged &&
+	mkdir src/copy/changed &&
+	mkdir src/copy/rearranged &&
+	mkdir src/move &&
+	mkdir src/move/unchanged &&
+	mkdir src/move/changed &&
+	mkdir src/move/rearranged &&
+	cat <<EOF >unchanged/text &&
+unchanged       line #0
+unchanged       line #1
+unchanged       line #2
+unchanged       line #3
+unchanged       line #4
+unchanged       line #5
+unchanged       line #6
+unchanged       line #7
+unchanged       line #8
+unchanged       line #9
+EOF
+	cat <<EOF >changed/text &&
+changed         line #0
+changed         line #1
+changed         line #2
+changed         line #3
+changed         line #4
+changed         line #5
+changed         line #6
+changed         line #7
+changed         line #8
+changed         line #9
+EOF
+	cat <<EOF >rearranged/text &&
+rearranged      line #0
+rearranged      line #1
+rearranged      line #2
+rearranged      line #3
+rearranged      line #4
+rearranged      line #5
+rearranged      line #6
+rearranged      line #7
+rearranged      line #8
+rearranged      line #9
+EOF
+	cat <<EOF >src/copy/unchanged/text &&
+copy  unchanged line #0
+copy  unchanged line #1
+copy  unchanged line #2
+copy  unchanged line #3
+copy  unchanged line #4
+copy  unchanged line #5
+copy  unchanged line #6
+copy  unchanged line #7
+copy  unchanged line #8
+copy  unchanged line #9
+EOF
+	cat <<EOF >src/copy/changed/text &&
+copy    changed line #0
+copy    changed line #1
+copy    changed line #2
+copy    changed line #3
+copy    changed line #4
+copy    changed line #5
+copy    changed line #6
+copy    changed line #7
+copy    changed line #8
+copy    changed line #9
+EOF
+	cat <<EOF >src/copy/rearranged/text &&
+copy rearranged line #0
+copy rearranged line #1
+copy rearranged line #2
+copy rearranged line #3
+copy rearranged line #4
+copy rearranged line #5
+copy rearranged line #6
+copy rearranged line #7
+copy rearranged line #8
+copy rearranged line #9
+EOF
+	cat <<EOF >src/move/unchanged/text &&
+move  unchanged line #0
+move  unchanged line #1
+move  unchanged line #2
+move  unchanged line #3
+move  unchanged line #4
+move  unchanged line #5
+move  unchanged line #6
+move  unchanged line #7
+move  unchanged line #8
+move  unchanged line #9
+EOF
+	cat <<EOF >src/move/changed/text &&
+move    changed line #0
+move    changed line #1
+move    changed line #2
+move    changed line #3
+move    changed line #4
+move    changed line #5
+move    changed line #6
+move    changed line #7
+move    changed line #8
+move    changed line #9
+EOF
+	cat <<EOF >src/move/rearranged/text &&
+move rearranged line #0
+move rearranged line #1
+move rearranged line #2
+move rearranged line #3
+move rearranged line #4
+move rearranged line #5
+move rearranged line #6
+move rearranged line #7
+move rearranged line #8
+move rearranged line #9
+EOF
+	git add . &&
+	git commit -m "initial" &&
+	mkdir dst &&
+	mkdir dst/copy &&
+	mkdir dst/copy/unchanged &&
+	mkdir dst/copy/changed &&
+	mkdir dst/copy/rearranged &&
+	mkdir dst/move &&
+	mkdir dst/move/unchanged &&
+	mkdir dst/move/changed &&
+	mkdir dst/move/rearranged &&
+	cat <<EOF >changed/text &&
+CHANGED XXXXXXX line #0
+changed         line #1
+changed         line #2
+changed         line #3
+changed         line #4
+changed         line #5
+changed         line #6
+changed         line #7
+changed         line #8
+changed         line #9
+EOF
+	cat <<EOF >rearranged/text &&
+rearranged      line #1
+rearranged      line #0
+rearranged      line #2
+rearranged      line #3
+rearranged      line #4
+rearranged      line #5
+rearranged      line #6
+rearranged      line #7
+rearranged      line #8
+rearranged      line #9
+EOF
+	cat <<EOF >dst/copy/unchanged/text &&
+copy  unchanged line #0
+copy  unchanged line #1
+copy  unchanged line #2
+copy  unchanged line #3
+copy  unchanged line #4
+copy  unchanged line #5
+copy  unchanged line #6
+copy  unchanged line #7
+copy  unchanged line #8
+copy  unchanged line #9
+EOF
+	cat <<EOF >dst/copy/changed/text &&
+copy XXXCHANGED line #0
+copy    changed line #1
+copy    changed line #2
+copy    changed line #3
+copy    changed line #4
+copy    changed line #5
+copy    changed line #6
+copy    changed line #7
+copy    changed line #8
+copy    changed line #9
+EOF
+	cat <<EOF >dst/copy/rearranged/text &&
+copy rearranged line #1
+copy rearranged line #0
+copy rearranged line #2
+copy rearranged line #3
+copy rearranged line #4
+copy rearranged line #5
+copy rearranged line #6
+copy rearranged line #7
+copy rearranged line #8
+copy rearranged line #9
+EOF
+	cat <<EOF >dst/move/unchanged/text &&
+move  unchanged line #0
+move  unchanged line #1
+move  unchanged line #2
+move  unchanged line #3
+move  unchanged line #4
+move  unchanged line #5
+move  unchanged line #6
+move  unchanged line #7
+move  unchanged line #8
+move  unchanged line #9
+EOF
+	cat <<EOF >dst/move/changed/text &&
+move XXXCHANGED line #0
+move    changed line #1
+move    changed line #2
+move    changed line #3
+move    changed line #4
+move    changed line #5
+move    changed line #6
+move    changed line #7
+move    changed line #8
+move    changed line #9
+EOF
+	cat <<EOF >dst/move/rearranged/text &&
+move rearranged line #1
+move rearranged line #0
+move rearranged line #2
+move rearranged line #3
+move rearranged line #4
+move rearranged line #5
+move rearranged line #6
+move rearranged line #7
+move rearranged line #8
+move rearranged line #9
+EOF
+	git add . &&
+	git rm -r src/move/unchanged &&
+	git rm -r src/move/changed &&
+	git rm -r src/move/rearranged &&
+	git commit -m "changes"
+'
+
+cat <<EOF >expect_diff_stat
+ changed/text             |    2 +-
+ dst/copy/changed/text    |   10 ++++++++++
+ dst/copy/rearranged/text |   10 ++++++++++
+ dst/copy/unchanged/text  |   10 ++++++++++
+ dst/move/changed/text    |   10 ++++++++++
+ dst/move/rearranged/text |   10 ++++++++++
+ dst/move/unchanged/text  |   10 ++++++++++
+ rearranged/text          |    2 +-
+ src/move/changed/text    |   10 ----------
+ src/move/rearranged/text |   10 ----------
+ src/move/unchanged/text  |   10 ----------
+ 11 files changed, 62 insertions(+), 32 deletions(-)
+EOF
+
+cat <<EOF >expect_diff_stat_M
+ changed/text                      |    2 +-
+ dst/copy/changed/text             |   10 ++++++++++
+ dst/copy/rearranged/text          |   10 ++++++++++
+ dst/copy/unchanged/text           |   10 ++++++++++
+ {src => dst}/move/changed/text    |    2 +-
+ {src => dst}/move/rearranged/text |    2 +-
+ {src => dst}/move/unchanged/text  |    0
+ rearranged/text                   |    2 +-
+ 8 files changed, 34 insertions(+), 4 deletions(-)
+EOF
+
+cat <<EOF >expect_diff_stat_CC
+ changed/text                      |    2 +-
+ {src => dst}/copy/changed/text    |    2 +-
+ {src => dst}/copy/rearranged/text |    2 +-
+ {src => dst}/copy/unchanged/text  |    0
+ {src => dst}/move/changed/text    |    2 +-
+ {src => dst}/move/rearranged/text |    2 +-
+ {src => dst}/move/unchanged/text  |    0
+ rearranged/text                   |    2 +-
+ 8 files changed, 6 insertions(+), 6 deletions(-)
+EOF
+
+test_expect_success 'sanity check setup (--stat)' '
+	git diff --stat HEAD^..HEAD >actual_diff_stat &&
+	test_cmp expect_diff_stat actual_diff_stat &&
+	git diff --stat -M HEAD^..HEAD >actual_diff_stat_M &&
+	test_cmp expect_diff_stat_M actual_diff_stat_M &&
+	git diff --stat -C -C HEAD^..HEAD >actual_diff_stat_CC &&
+	test_cmp expect_diff_stat_CC actual_diff_stat_CC
+'
+
+# changed/text and rearranged/text falls below default 3% threshold
+cat <<EOF >expect_diff_dirstat
+  10.8% dst/copy/changed/
+  10.8% dst/copy/rearranged/
+  10.8% dst/copy/unchanged/
+  10.8% dst/move/changed/
+  10.8% dst/move/rearranged/
+  10.8% dst/move/unchanged/
+  10.8% src/move/changed/
+  10.8% src/move/rearranged/
+  10.8% src/move/unchanged/
+EOF
+
+# rearranged/text falls below default 3% threshold
+cat <<EOF >expect_diff_dirstat_M
+   5.8% changed/
+  29.3% dst/copy/changed/
+  29.3% dst/copy/rearranged/
+  29.3% dst/copy/unchanged/
+   5.8% dst/move/changed/
+EOF
+
+# rearranged/text falls below default 3% threshold
+cat <<EOF >expect_diff_dirstat_CC
+  32.6% changed/
+  32.6% dst/copy/changed/
+  32.6% dst/move/changed/
+EOF
+
+test_expect_success 'vanilla --dirstat' '
+	git diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'vanilla -X' '
+	git diff -X HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff -X -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff -X -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
+cat <<EOF >expect_diff_dirstat
+   2.1% changed/
+  10.8% dst/copy/changed/
+  10.8% dst/copy/rearranged/
+  10.8% dst/copy/unchanged/
+  10.8% dst/move/changed/
+  10.8% dst/move/rearranged/
+  10.8% dst/move/unchanged/
+  10.8% src/move/changed/
+  10.8% src/move/rearranged/
+  10.8% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.8% changed/
+  29.3% dst/copy/changed/
+  29.3% dst/copy/rearranged/
+  29.3% dst/copy/unchanged/
+   5.8% dst/move/changed/
+   0.1% dst/move/rearranged/
+   0.1% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  32.6% changed/
+  32.6% dst/copy/changed/
+   0.6% dst/copy/rearranged/
+  32.6% dst/move/changed/
+   0.6% dst/move/rearranged/
+   0.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=0' '
+	git diff --dirstat=0 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=0 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success '-X0' '
+	git diff -X0 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff -X0 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff -X0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
+cat <<EOF >expect_diff_dirstat
+   2.1% changed/
+  10.8% dst/copy/changed/
+  10.8% dst/copy/rearranged/
+  10.8% dst/copy/unchanged/
+  32.5% dst/copy/
+  10.8% dst/move/changed/
+  10.8% dst/move/rearranged/
+  10.8% dst/move/unchanged/
+  32.5% dst/move/
+  65.1% dst/
+  10.8% src/move/changed/
+  10.8% src/move/rearranged/
+  10.8% src/move/unchanged/
+  32.5% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.8% changed/
+  29.3% dst/copy/changed/
+  29.3% dst/copy/rearranged/
+  29.3% dst/copy/unchanged/
+  88.0% dst/copy/
+   5.8% dst/move/changed/
+   0.1% dst/move/rearranged/
+   5.9% dst/move/
+  94.0% dst/
+   0.1% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  32.6% changed/
+  32.6% dst/copy/changed/
+   0.6% dst/copy/rearranged/
+  33.3% dst/copy/
+  32.6% dst/move/changed/
+   0.6% dst/move/rearranged/
+  33.3% dst/move/
+  66.6% dst/
+   0.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=0 --cumulative' '
+	git diff --dirstat=0 --cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=0 --cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=0 --cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+   9.0% changed/
+   9.0% dst/copy/changed/
+   9.0% dst/copy/rearranged/
+   9.0% dst/copy/unchanged/
+   9.0% dst/move/changed/
+   9.0% dst/move/rearranged/
+   9.0% dst/move/unchanged/
+   9.0% rearranged/
+   9.0% src/move/changed/
+   9.0% src/move/rearranged/
+   9.0% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat-by-file' '
+	git diff --dirstat-by-file HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat-by-file -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat-by-file -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+  27.2% dst/copy/
+  27.2% dst/move/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat-by-file=10' '
+	git diff --dirstat-by-file=10 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat-by-file=10 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat-by-file=10 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+   9.0% changed/
+   9.0% dst/copy/changed/
+   9.0% dst/copy/rearranged/
+   9.0% dst/copy/unchanged/
+  27.2% dst/copy/
+   9.0% dst/move/changed/
+   9.0% dst/move/rearranged/
+   9.0% dst/move/unchanged/
+  27.2% dst/move/
+  54.5% dst/
+   9.0% rearranged/
+   9.0% src/move/changed/
+   9.0% src/move/rearranged/
+   9.0% src/move/unchanged/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  42.8% dst/copy/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  28.5% dst/move/
+  71.4% dst/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  33.3% dst/copy/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  33.3% dst/move/
+  66.6% dst/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat-by-file --cumulative' '
+	git diff --dirstat-by-file --cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat-by-file --cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat-by-file --cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv3 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes
  2011-04-27  8:24                             ` [PATCHv3 0/6] --dirstat fixes, part 2 Johan Herland
  2011-04-27  8:24                               ` [PATCHv3 1/6] Add several testcases for --dirstat and friends Johan Herland
@ 2011-04-27  8:24                               ` Johan Herland
  2011-04-27  8:24                               ` [PATCHv3 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
                                                 ` (4 subsequent siblings)
  6 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-27  8:24 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

The expected output from --dirstat=0, is to include any directory with
changes, even if those changes contribute a minuscule portion of the total
changes. However, currently, directories that contribute less than 0.1% are
not included, since their 'permille' value is 0, and there is an
'if (permille)' check in gather_dirstat() that causes them to be ignored.

This test is obviously intended to exclude directories that contribute no
changes whatsoever, but in this case, it hits too broadly. The correct
check is against 'this_dir' from which the permille is calculated. Only if
this value is 0 does the directory truly contribute no changes, and should
be skipped from the output.

This patches fixes this issue, and updates corresponding testcases to
expect the new behvaior.

Signed-off-by: Johan Herland <johan@herland.net>
---
 diff.c                  |    4 ++--
 t/t4046-diff-dirstat.sh |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index abd9cd5..cfbfa92 100644
--- a/diff.c
+++ b/diff.c
@@ -1500,8 +1500,8 @@ static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
 	 *    under this directory (sources == 1).
 	 */
 	if (baselen && sources != 1) {
-		int permille = this_dir * 1000 / changed;
-		if (permille) {
+		if (this_dir) {
+			int permille = this_dir * 1000 / changed;
 			int percent = permille / 10;
 			if (percent >= dir->percent) {
 				fprintf(opt->file, "%s%4d.%01d%% %.*s\n", line_prefix,
diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
index eb6bf47..6ff7f9f 100755
--- a/t/t4046-diff-dirstat.sh
+++ b/t/t4046-diff-dirstat.sh
@@ -346,7 +346,6 @@ test_expect_success 'vanilla -X' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
-# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -355,6 +354,7 @@ cat <<EOF >expect_diff_dirstat
   10.8% dst/move/changed/
   10.8% dst/move/rearranged/
   10.8% dst/move/unchanged/
+   0.0% rearranged/
   10.8% src/move/changed/
   10.8% src/move/rearranged/
   10.8% src/move/unchanged/
@@ -397,7 +397,6 @@ test_expect_success '-X0' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
-# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -409,6 +408,7 @@ cat <<EOF >expect_diff_dirstat
   10.8% dst/move/unchanged/
   32.5% dst/move/
   65.1% dst/
+   0.0% rearranged/
   10.8% src/move/changed/
   10.8% src/move/rearranged/
   10.8% src/move/unchanged/
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv3 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
  2011-04-27  8:24                             ` [PATCHv3 0/6] --dirstat fixes, part 2 Johan Herland
  2011-04-27  8:24                               ` [PATCHv3 1/6] Add several testcases for --dirstat and friends Johan Herland
  2011-04-27  8:24                               ` [PATCHv3 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
@ 2011-04-27  8:24                               ` Johan Herland
  2011-04-27  8:24                               ` [PATCHv3 4/6] Add config variable for specifying default --dirstat behavior Johan Herland
                                                 ` (3 subsequent siblings)
  6 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-27  8:24 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Instead of having multiple interconnected dirstat-related options, teach
the --dirstat option itself to accept all behavior modifiers as parameters.

 - Preserve the current --dirstat=<limit> (where <limit> is an integer
   specifying a cut-off percentage)
 - Add --dirstat=cumulative, replacing --cumulative
 - Add --dirstat=files, replacing --dirstat-by-file
 - Also add --dirstat=changes and --dirstat=noncumulative for specifying the
   current default behavior. These allow the user to reset other --dirstat
   parameters (e.g. 'cumulative' and 'files') occuring earlier on the
   command line.

The deprecated options (--cumulative and --dirstat-by-file) are still
functional, although they have been removed from the documentation.

Allow multiple parameters to be separated by commas, e.g.:
  --dirstat=files,10,cumulative

Update the documentation accordingly, and add testcases verifying the
behavior of the new syntax.

Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/diff-options.txt |   44 +++++++++++----
 diff.c                         |   99 ++++++++++++++++++++++++++++++---
 t/t4046-diff-dirstat.sh        |  119 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 240 insertions(+), 22 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 7e4bd42..6a3a9c1 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -66,19 +66,39 @@ endif::git-format-patch[]
 	number of modified files, as well as number of added and deleted
 	lines.
 
---dirstat[=<limit>]::
-	Output the distribution of relative amount of changes (number of lines added or
-	removed) for each sub-directory. Directories with changes below
-	a cut-off percent (3% by default) are not shown. The cut-off percent
-	can be set with `--dirstat=<limit>`. Changes in a child directory are not
-	counted for the parent directory, unless `--cumulative` is used.
+--dirstat[=<param1,param2,...>]::
+	Output the distribution of relative amount of changes for each
+	sub-directory. The behavior of `--dirstat` can be customized by
+	passing it a comma separated list of parameters.
+	The following parameters are available:
 +
-Note that the `--dirstat` option computes the changes while ignoring
-the amount of pure code movements within a file.  In other words,
-rearranging lines in a file is not counted as much as other changes.
-
---dirstat-by-file[=<limit>]::
-	Same as `--dirstat`, but counts changed files instead of lines.
+--
+`changes`;;
+	Compute the dirstat numbers by counting the lines that have been
+	removed from the source, or added to the destination. This ignores
+	the amount of pure code movements within a file.  In other words,
+	rearranging lines in a file is not counted as much as other changes.
+	This is the default behavior when no parameter is given.
+`files`;;
+	Compute the dirstat numbers by counting the number of files changed.
+	Each changed file counts equally in the dirstat analysis. This is
+	the computationally cheapest `--dirstat` behavior, since it does
+	not have to look at the file contents at all.
+`cumulative`;;
+	Count changes in a child directory for the parent directory as well.
+	Note that when using `cumulative`, the sum of the percentages
+	reported may exceed 100%. The default (non-cumulative) behavior can
+	be specified with the `noncumulative` parameter.
+<limit>;;
+	An integer parameter specifies a cut-off percent (3% by default).
+	Directories contributing less than this percentage of the changes
+	are not shown in the output.
+--
++
+Example: The following will count changed files, while ignoring
+directories with less than 10% of the total amount of changed files,
+and accumulating child directory counts in the parent directories:
+`--dirstat=files,10,cumulative`.
 
 --summary::
 	Output a condensed summary of extended header information
diff --git a/diff.c b/diff.c
index cfbfa92..0387e4f 100644
--- a/diff.c
+++ b/diff.c
@@ -66,6 +66,49 @@ static int parse_diff_color_slot(const char *var, int ofs)
 	return -1;
 }
 
+#ifdef NO_C99_FORMAT
+#define PD_FMT "%d"
+#else
+#define PD_FMT "%td"
+#endif
+
+static void parse_dirstat_params(struct diff_options *options, const char *params)
+{
+	const char *p = params;
+	while (*p) {
+		if (!prefixcmp(p, "changes")) {
+			p += 7;
+			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
+		}
+		else if (!prefixcmp(p, "files")) {
+			p += 5;
+			DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
+		}
+		else if (!prefixcmp(p, "noncumulative")) {
+			p += 13;
+			DIFF_OPT_CLR(options, DIRSTAT_CUMULATIVE);
+		}
+		else if (!prefixcmp(p, "cumulative")) {
+			p += 10;
+			DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
+		}
+		else if (isdigit(*p)) {
+			char *end;
+			options->dirstat_percent = strtoul(p, &end, 10);
+			p = end;
+		}
+		else
+			die("Unknown --dirstat parameter '%s'", p);
+
+		if (*p) { /* more parameters, swallow separator */
+			if (*p != ',')
+				die("Missing comma separator at char " PD_FMT
+				    " of '%s'", p - params, params);
+			p++;
+		}
+	}
+}
+
 static int git_config_rename(const char *var, const char *value)
 {
 	if (!value)
@@ -3144,6 +3187,48 @@ static int stat_opt(struct diff_options *options, const char **av)
 	return argcount;
 }
 
+/*
+ * Parse dirstat-related options and any parameters given to those options.
+ * Returns 1 if the option in 'arg' is a recognized dirstat-related option;
+ * otherwise returns 0.
+ */
+static int dirstat_opt(struct diff_options *options, const char *arg)
+{
+	const char *p;
+	char *mangled = NULL;
+
+	if (!strcmp(arg, "--cumulative")) /* deprecated */
+		/* handle '--cumulative' like '--dirstat=cumulative' */
+		p = "cumulative";
+	else if (!strcmp(arg, "--dirstat-by-file") ||
+		 !prefixcmp(arg, "--dirstat-by-file=")) { /* deprecated */
+		/* handle '--dirstat-by-file=*' like '--dirstat=files,*' */
+		mangled = xstrdup(arg + 12);
+		memcpy(mangled, "files", 5);
+		if (mangled[5]) {
+			assert(mangled[5] == '=');
+			mangled[5] = ',';
+		}
+		p = mangled;
+	}
+	else if (!prefixcmp(arg, "-X"))
+		p = arg + 2;
+	else if (!prefixcmp(arg, "--dirstat"))
+		p = arg + 9;
+	else
+		return 0;
+
+	options->output_format |= DIFF_FORMAT_DIRSTAT;
+
+	if (*p)
+		if (*p == '=')
+			p++;
+		parse_dirstat_params(options, p);
+
+	free(mangled);
+	return 1;
+}
+
 int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 {
 	const char *arg = av[0];
@@ -3163,16 +3248,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		options->output_format |= DIFF_FORMAT_NUMSTAT;
 	else if (!strcmp(arg, "--shortstat"))
 		options->output_format |= DIFF_FORMAT_SHORTSTAT;
-	else if (opt_arg(arg, 'X', "dirstat", &options->dirstat_percent))
-		options->output_format |= DIFF_FORMAT_DIRSTAT;
-	else if (!strcmp(arg, "--cumulative")) {
-		options->output_format |= DIFF_FORMAT_DIRSTAT;
-		DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
-	} else if (opt_arg(arg, 0, "dirstat-by-file",
-			   &options->dirstat_percent)) {
-		options->output_format |= DIFF_FORMAT_DIRSTAT;
-		DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
-	}
+	else if (!prefixcmp(arg, "-X") || !prefixcmp(arg, "--dirstat") ||
+		 !strcmp(arg, "--cumulative"))
+		/* -X, --dirstat[=<args>], --dirstat-by-file, or --cumulative */
+		return dirstat_opt(options, arg);
 	else if (!strcmp(arg, "--check"))
 		options->output_format |= DIFF_FORMAT_CHECKDIFF;
 	else if (!strcmp(arg, "--summary"))
diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
index 6ff7f9f..0ede619 100755
--- a/t/t4046-diff-dirstat.sh
+++ b/t/t4046-diff-dirstat.sh
@@ -346,6 +346,39 @@ test_expect_success 'vanilla -X' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'explicit defaults: --dirstat=changes,noncumulative,3' '
+	git diff --dirstat=changes,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=changes,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=changes,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'explicit defaults: -Xchanges,noncumulative,3' '
+	git diff -Xchanges,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff -Xchanges,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff -Xchanges,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'later options override earlier options:' '
+	git diff --dirstat=files,10,cumulative,changes,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,10,cumulative,changes,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,10,cumulative,changes,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+	git diff --dirstat=files --dirstat=10 --dirstat=cumulative --dirstat=changes --dirstat=noncumulative -X3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files --dirstat=10 --dirstat=cumulative --dirstat=changes --dirstat=noncumulative -X3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files --dirstat=10 --dirstat=cumulative --dirstat=changes --dirstat=noncumulative -X3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -449,6 +482,24 @@ test_expect_success '--dirstat=0 --cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=0,cumulative' '
+	git diff --dirstat=0,cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=0,cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=0,cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success '-X0,cumulative' '
+	git diff -X0,cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff -X0,cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff -X0,cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -491,6 +542,15 @@ test_expect_success '--dirstat-by-file' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=files' '
+	git diff --dirstat=files HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
   27.2% dst/copy/
   27.2% dst/move/
@@ -525,6 +585,15 @@ test_expect_success '--dirstat-by-file=10' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=files,10' '
+	git diff --dirstat=files,10 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,10 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,10 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -577,4 +646,54 @@ test_expect_success '--dirstat-by-file --cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=files,cumulative' '
+	git diff --dirstat=files,cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+  27.2% dst/copy/
+  27.2% dst/move/
+  54.5% dst/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  42.8% dst/copy/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  28.5% dst/move/
+  71.4% dst/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  33.3% dst/copy/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  33.3% dst/move/
+  66.6% dst/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=files,cumulative,10' '
+	git diff --dirstat=files,cumulative,10 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative,10 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative,10 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv3 4/6] Add config variable for specifying default --dirstat behavior
  2011-04-27  8:24                             ` [PATCHv3 0/6] --dirstat fixes, part 2 Johan Herland
                                                 ` (2 preceding siblings ...)
  2011-04-27  8:24                               ` [PATCHv3 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
@ 2011-04-27  8:24                               ` Johan Herland
  2011-04-27  8:24                               ` [PATCHv3 5/6] Allow specifying --dirstat cut-off percentage as a floating point number Johan Herland
                                                 ` (2 subsequent siblings)
  6 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-27  8:24 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

The new diff.dirstat config variable takes the same arguments as
'--dirstat=<args>', and specifies the default arguments for --dirstat.
The config is obviously overridden by --dirstat arguments passed on the
command line.

When not specified, the --dirstat defaults are 'changes,noncumulative,3'.

The patch also adds several tests verifying the interaction between the
diff.dirstat config variable, and the --dirstat command line option.

Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/config.txt       |   36 ++++++++++++++++++++
 Documentation/diff-options.txt |    2 +
 diff.c                         |   10 +++++-
 t/t4046-diff-dirstat.sh        |   72 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 119 insertions(+), 1 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 6babbc7..c18dd5a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -822,6 +822,42 @@ diff.autorefreshindex::
 	affects only 'git diff' Porcelain, and not lower level
 	'diff' commands such as 'git diff-files'.
 
+diff.dirstat::
+	A comma separated list of `--dirstat` parameters specifying the
+	default behavior of the `--dirstat` option to linkgit:git-diff[1]`
+	and friends. The defaults can be overridden on the command line
+	(using `--dirstat=<param1,param2,...>`). The fallback defaults
+	(when not changed by `diff.dirstat`) are `changes,noncumulative,3`.
+	The following parameters are available:
++
+--
+`changes`;;
+	Compute the dirstat numbers by counting the lines that have been
+	removed from the source, or added to the destination. This ignores
+	the amount of pure code movements within a file.  In other words,
+	rearranging lines in a file is not counted as much as other changes.
+	This is the default behavior when no parameter is given.
+`files`;;
+	Compute the dirstat numbers by counting the number of files changed.
+	Each changed file counts equally in the dirstat analysis. This is
+	the computationally cheapest `--dirstat` behavior, since it does
+	not have to look at the file contents at all.
+`cumulative`;;
+	Count changes in a child directory for the parent directory as well.
+	Note that when using `cumulative`, the sum of the percentages
+	reported may exceed 100%. The default (non-cumulative) behavior can
+	be specified with the `noncumulative` parameter.
+<limit>;;
+	An integer parameter specifies a cut-off percent (3% by default).
+	Directories contributing less than this percentage of the changes
+	are not shown in the output.
+--
++
+Example: The following will count changed files, while ignoring
+directories with less than 10% of the total amount of changed files,
+and accumulating child directory counts in the parent directories:
+`files,10,cumulative`.
+
 diff.external::
 	If this config variable is set, diff generation is not
 	performed using the internal diff machinery, but using the
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 6a3a9c1..4ad50b9 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -70,6 +70,8 @@ endif::git-format-patch[]
 	Output the distribution of relative amount of changes for each
 	sub-directory. The behavior of `--dirstat` can be customized by
 	passing it a comma separated list of parameters.
+	The defaults are controlled by the `diff.dirstat` configuration
+	variable (see linkgit:git-config[1]).
 	The following parameters are available:
 +
 --
diff --git a/diff.c b/diff.c
index 0387e4f..1b6e8c0 100644
--- a/diff.c
+++ b/diff.c
@@ -31,6 +31,7 @@ static const char *external_diff_cmd_cfg;
 int diff_auto_refresh_index = 1;
 static int diff_mnemonic_prefix;
 static int diff_no_prefix;
+static int diff_dirstat_percent_default = 3;
 static struct diff_options default_diff_options;
 
 static char diff_colors[][COLOR_MAXLEN] = {
@@ -188,6 +189,13 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (!strcmp(var, "diff.dirstat")) {
+		default_diff_options.dirstat_percent = diff_dirstat_percent_default;
+		parse_dirstat_params(&default_diff_options, value);
+		diff_dirstat_percent_default = default_diff_options.dirstat_percent;
+		return 0;
+	}
+
 	if (!prefixcmp(var, "submodule."))
 		return parse_submodule_config_option(var, value);
 
@@ -2929,7 +2937,7 @@ void diff_setup(struct diff_options *options)
 	options->line_termination = '\n';
 	options->break_opt = -1;
 	options->rename_limit = -1;
-	options->dirstat_percent = 3;
+	options->dirstat_percent = diff_dirstat_percent_default;
 	options->context = 3;
 
 	options->change = diff_change;
diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
index 0ede619..fa1885c 100755
--- a/t/t4046-diff-dirstat.sh
+++ b/t/t4046-diff-dirstat.sh
@@ -379,6 +379,15 @@ test_expect_success 'later options override earlier options:' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'non-defaults in config overridden by explicit defaults on command line' '
+	git -c diff.dirstat=files,cumulative,50 diff --dirstat=changes,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=files,cumulative,50 diff --dirstat=changes,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=files,cumulative,50 diff --dirstat=changes,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -430,6 +439,15 @@ test_expect_success '-X0' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=0' '
+	git -c diff.dirstat=0 diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0 diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0 diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -500,6 +518,24 @@ test_expect_success '-X0,cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=0,cumulative' '
+	git -c diff.dirstat=0,cumulative diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0,cumulative diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0,cumulative diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=0 & --dirstat=cumulative' '
+	git -c diff.dirstat=0 diff --dirstat=cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0 diff --dirstat=cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0 diff --dirstat=cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -551,6 +587,15 @@ test_expect_success '--dirstat=files' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=files' '
+	git -c diff.dirstat=files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
   27.2% dst/copy/
   27.2% dst/move/
@@ -594,6 +639,15 @@ test_expect_success '--dirstat=files,10' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=10,files' '
+	git -c diff.dirstat=10,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=10,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=10,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -655,6 +709,15 @@ test_expect_success '--dirstat=files,cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=cumulative,files' '
+	git -c diff.dirstat=cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
   27.2% dst/copy/
   27.2% dst/move/
@@ -696,4 +759,13 @@ test_expect_success '--dirstat=files,cumulative,10' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=10,cumulative,files' '
+	git -c diff.dirstat=10,cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=10,cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=10,cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv3 5/6] Allow specifying --dirstat cut-off percentage as a floating point number
  2011-04-27  8:24                             ` [PATCHv3 0/6] --dirstat fixes, part 2 Johan Herland
                                                 ` (3 preceding siblings ...)
  2011-04-27  8:24                               ` [PATCHv3 4/6] Add config variable for specifying default --dirstat behavior Johan Herland
@ 2011-04-27  8:24                               ` Johan Herland
  2011-04-27  8:37                                 ` Linus Torvalds
  2011-04-27  8:24                               ` [PATCHv3 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
  2011-04-28  1:17                               ` [PATCHv5 0/7] --dirstat fixes, part 2 Johan Herland
  6 siblings, 1 reply; 91+ messages in thread
From: Johan Herland @ 2011-04-27  8:24 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Only the first digit after the decimal point is kept, as the dirstat
calculations all happen in permille.

A selftest verifying floating-point percentage input has been added.

Improved-by: Junio C Hamano <gitster@pobox.com>
Improved-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Johan Herland <johan@herland.net>
---
 diff.c                  |   26 ++++++++++++++++----------
 diff.h                  |    2 +-
 t/t4046-diff-dirstat.sh |   37 +++++++++++++++++++++++++++++++++++++
 3 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/diff.c b/diff.c
index 1b6e8c0..3ca129c 100644
--- a/diff.c
+++ b/diff.c
@@ -31,7 +31,7 @@ static const char *external_diff_cmd_cfg;
 int diff_auto_refresh_index = 1;
 static int diff_mnemonic_prefix;
 static int diff_no_prefix;
-static int diff_dirstat_percent_default = 3;
+static int diff_dirstat_permille_default = 30;
 static struct diff_options default_diff_options;
 
 static char diff_colors[][COLOR_MAXLEN] = {
@@ -95,8 +95,15 @@ static void parse_dirstat_params(struct diff_options *options, const char *param
 		}
 		else if (isdigit(*p)) {
 			char *end;
-			options->dirstat_percent = strtoul(p, &end, 10);
+			options->dirstat_permille = strtoul(p, &end, 10) * 10;
 			p = end;
+			if (*p == '.' && isdigit(*(++p))) {
+				int permille = strtoul(p, &end, 10);
+				p = end;
+				while (permille >= 10)
+					permille /= 10; /* only use first digit */
+				options->dirstat_permille += permille;
+			}
 		}
 		else
 			die("Unknown --dirstat parameter '%s'", p);
@@ -190,9 +197,9 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 	}
 
 	if (!strcmp(var, "diff.dirstat")) {
-		default_diff_options.dirstat_percent = diff_dirstat_percent_default;
+		default_diff_options.dirstat_permille = diff_dirstat_permille_default;
 		parse_dirstat_params(&default_diff_options, value);
-		diff_dirstat_percent_default = default_diff_options.dirstat_percent;
+		diff_dirstat_permille_default = default_diff_options.dirstat_permille;
 		return 0;
 	}
 
@@ -1504,7 +1511,7 @@ struct dirstat_file {
 
 struct dirstat_dir {
 	struct dirstat_file *files;
-	int alloc, nr, percent, cumulative;
+	int alloc, nr, permille, cumulative;
 };
 
 static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
@@ -1553,10 +1560,9 @@ static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
 	if (baselen && sources != 1) {
 		if (this_dir) {
 			int permille = this_dir * 1000 / changed;
-			int percent = permille / 10;
-			if (percent >= dir->percent) {
+			if (permille >= dir->permille) {
 				fprintf(opt->file, "%s%4d.%01d%% %.*s\n", line_prefix,
-					percent, permille % 10, baselen, base);
+					permille / 10, permille % 10, baselen, base);
 				if (!dir->cumulative)
 					return 0;
 			}
@@ -1582,7 +1588,7 @@ static void show_dirstat(struct diff_options *options)
 	dir.files = NULL;
 	dir.alloc = 0;
 	dir.nr = 0;
-	dir.percent = options->dirstat_percent;
+	dir.permille = options->dirstat_permille;
 	dir.cumulative = DIFF_OPT_TST(options, DIRSTAT_CUMULATIVE);
 
 	changed = 0;
@@ -2937,7 +2943,7 @@ void diff_setup(struct diff_options *options)
 	options->line_termination = '\n';
 	options->break_opt = -1;
 	options->rename_limit = -1;
-	options->dirstat_percent = diff_dirstat_percent_default;
+	options->dirstat_permille = diff_dirstat_permille_default;
 	options->context = 3;
 
 	options->change = diff_change;
diff --git a/diff.h b/diff.h
index 0083d92..08b4fe0 100644
--- a/diff.h
+++ b/diff.h
@@ -111,7 +111,7 @@ struct diff_options {
 	int rename_score;
 	int rename_limit;
 	int warn_on_too_large_rename;
-	int dirstat_percent;
+	int dirstat_permille;
 	int setup;
 	int abbrev;
 	const char *prefix;
diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
index fa1885c..cf75a38 100755
--- a/t/t4046-diff-dirstat.sh
+++ b/t/t4046-diff-dirstat.sh
@@ -768,4 +768,41 @@ test_expect_success 'diff.dirstat=10,cumulative,files' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+cat <<EOF >expect_diff_dirstat
+  27.2% dst/copy/
+  27.2% dst/move/
+  54.5% dst/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  42.8% dst/copy/
+  28.5% dst/move/
+  71.4% dst/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  33.3% dst/copy/
+  33.3% dst/move/
+  66.6% dst/
+EOF
+
+test_expect_success '--dirstat=files,cumulative,16.7' '
+	git diff --dirstat=files,cumulative,16.7 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative,16.7 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative,16.7 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=16.7,cumulative,files' '
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv3 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat
  2011-04-27  8:24                             ` [PATCHv3 0/6] --dirstat fixes, part 2 Johan Herland
                                                 ` (4 preceding siblings ...)
  2011-04-27  8:24                               ` [PATCHv3 5/6] Allow specifying --dirstat cut-off percentage as a floating point number Johan Herland
@ 2011-04-27  8:24                               ` Johan Herland
  2011-04-28  1:17                               ` [PATCHv5 0/7] --dirstat fixes, part 2 Johan Herland
  6 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-27  8:24 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

This patch adds an alternative implementation of show_dirstat(), called
show_dirstat_by_line(), which uses the more expensive diffstat analysis
(as opposed to show_dirstat()'s own (relatively inexpensive) analysis)
to derive the numbers from which the --dirstat output is computed.

The alternative implementation is controlled by the new "lines" parameter
to the --dirstat option (or the diff.dirstat config variable).

For binary files, the diffstat analysis counts bytes instead of lines,
so to prevent binary files from dominating the dirstat results, the
byte counts for binary files are divided by 64 before being compared to
their textual/line-based counterparts. This is a stupid and ugly - but
very cheap - heuristic.

In linux-2.6.git, running the three different --dirstat modes:

  time git diff v2.6.20..v2.6.30 --dirstat=changes > /dev/null
vs.
  time git diff v2.6.20..v2.6.30 --dirstat=lines > /dev/null
vs.
  time git diff v2.6.20..v2.6.30 --dirstat=files > /dev/null

yields the following average runtimes on my machine:

 - "changes" (default): ~6.0 s
 - "lines":             ~9.6 s
 - "files":             ~0.1 s

So, as expected, there's a considerable performance hit (~60%) by going
through the full diffstat analysis as compared to the default "changes"
analysis (obviously, "files" is much faster than both). As such, the
"lines" mode is probably only useful if you really need the --dirstat
numbers to be consistent with the numbers returned from the other
--*stat options.

The patch also includes documentation and tests for the new dirstat mode.

Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/config.txt       |    8 +++
 Documentation/diff-options.txt |    8 +++
 diff.c                         |   62 ++++++++++++++++++++++++-
 diff.h                         |    1 +
 t/t4046-diff-dirstat.sh        |  100 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 177 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index c18dd5a..0cad75c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -837,6 +837,14 @@ diff.dirstat::
 	the amount of pure code movements within a file.  In other words,
 	rearranging lines in a file is not counted as much as other changes.
 	This is the default behavior when no parameter is given.
+`lines`;;
+	Compute the dirstat numbers by doing the regular line-based diff
+	analysis, and summing the removed/added line counts. (For binary
+	files, count 64-byte chunks instead, since binary files have no
+	natural concept of lines). This is a more expensive `--dirstat`
+	behavior than the `changes` behavior, but it does count rearranged
+	lines within a file as much as other changes. The resulting output
+	is consistent with what you get from the other `--*stat` options.
 `files`;;
 	Compute the dirstat numbers by counting the number of files changed.
 	Each changed file counts equally in the dirstat analysis. This is
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 4ad50b9..327d10a 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -81,6 +81,14 @@ endif::git-format-patch[]
 	the amount of pure code movements within a file.  In other words,
 	rearranging lines in a file is not counted as much as other changes.
 	This is the default behavior when no parameter is given.
+`lines`;;
+	Compute the dirstat numbers by doing the regular line-based diff
+	analysis, and summing the removed/added line counts. (For binary
+	files, count 64-byte chunks instead, since binary files have no
+	natural concept of lines). This is a more expensive `--dirstat`
+	behavior than the `changes` behavior, but it does count rearranged
+	lines within a file as much as other changes. The resulting output
+	is consistent with what you get from the other `--*stat` options.
 `files`;;
 	Compute the dirstat numbers by counting the number of files changed.
 	Each changed file counts equally in the dirstat analysis. This is
diff --git a/diff.c b/diff.c
index 3ca129c..cefa145 100644
--- a/diff.c
+++ b/diff.c
@@ -79,10 +79,17 @@ static void parse_dirstat_params(struct diff_options *options, const char *param
 	while (*p) {
 		if (!prefixcmp(p, "changes")) {
 			p += 7;
+			DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
+			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
+		}
+		else if (!prefixcmp(p, "lines")) {
+			p += 5;
+			DIFF_OPT_SET(options, DIRSTAT_BY_LINE);
 			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
 		}
 		else if (!prefixcmp(p, "files")) {
 			p += 5;
+			DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
 			DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
 		}
 		else if (!prefixcmp(p, "noncumulative")) {
@@ -1677,6 +1684,50 @@ found_damage:
 	gather_dirstat(options, &dir, changed, "", 0);
 }
 
+static void show_dirstat_by_line(struct diffstat_t *data, struct diff_options *options)
+{
+	int i;
+	unsigned long changed;
+	struct dirstat_dir dir;
+
+	if (data->nr == 0)
+		return;
+
+	dir.files = NULL;
+	dir.alloc = 0;
+	dir.nr = 0;
+	dir.permille = options->dirstat_permille;
+	dir.cumulative = DIFF_OPT_TST(options, DIRSTAT_CUMULATIVE);
+
+	changed = 0;
+	for (i = 0; i < data->nr; i++) {
+		struct diffstat_file *file = data->files[i];
+		unsigned long damage = file->added + file->deleted;
+		if (file->is_binary)
+			/*
+			 * binary files counts bytes, not lines. Must find some
+			 * way to normalize binary bytes vs. textual lines.
+			 * The following heuristic assumes that there are 64
+			 * bytes per "line".
+			 * This is stupid and ugly, but very cheap...
+			 */
+			damage = (damage + 63) / 64;
+		ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
+		dir.files[dir.nr].name = file->name;
+		dir.files[dir.nr].changed = damage;
+		changed += damage;
+		dir.nr++;
+	}
+
+	/* This can happen even with many files, if everything was renames */
+	if (!changed)
+		return;
+
+	/* Show all directories with more than x% of the changes */
+	qsort(dir.files, dir.nr, sizeof(dir.files[0]), dirstat_compare);
+	gather_dirstat(options, &dir, changed, "", 0);
+}
+
 static void free_diffstat_info(struct diffstat_t *diffstat)
 {
 	int i;
@@ -4086,6 +4137,7 @@ void diff_flush(struct diff_options *options)
 	struct diff_queue_struct *q = &diff_queued_diff;
 	int i, output_format = options->output_format;
 	int separator = 0;
+	int dirstat_by_line = 0;
 
 	/*
 	 * Order: raw, stat, summary, patch
@@ -4106,7 +4158,11 @@ void diff_flush(struct diff_options *options)
 		separator++;
 	}
 
-	if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT)) {
+	if (output_format & DIFF_FORMAT_DIRSTAT && DIFF_OPT_TST(options, DIRSTAT_BY_LINE))
+		dirstat_by_line = 1;
+
+	if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT) ||
+	    dirstat_by_line) {
 		struct diffstat_t diffstat;
 
 		memset(&diffstat, 0, sizeof(struct diffstat_t));
@@ -4121,10 +4177,12 @@ void diff_flush(struct diff_options *options)
 			show_stats(&diffstat, options);
 		if (output_format & DIFF_FORMAT_SHORTSTAT)
 			show_shortstats(&diffstat, options);
+		if (output_format & DIFF_FORMAT_DIRSTAT)
+			show_dirstat_by_line(&diffstat, options);
 		free_diffstat_info(&diffstat);
 		separator++;
 	}
-	if (output_format & DIFF_FORMAT_DIRSTAT)
+	if ((output_format & DIFF_FORMAT_DIRSTAT) && !dirstat_by_line)
 		show_dirstat(options);
 
 	if (output_format & DIFF_FORMAT_SUMMARY && !is_summary_empty(q)) {
diff --git a/diff.h b/diff.h
index 08b4fe0..1a8b685 100644
--- a/diff.h
+++ b/diff.h
@@ -78,6 +78,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
 #define DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES (1 << 25)
 #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
 #define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
+#define DIFF_OPT_DIRSTAT_BY_LINE     (1 << 28)
 
 #define DIFF_OPT_TST(opts, flag)    ((opts)->flags & DIFF_OPT_##flag)
 #define DIFF_OPT_SET(opts, flag)    ((opts)->flags |= DIFF_OPT_##flag)
diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
index cf75a38..9f8ecf1 100755
--- a/t/t4046-diff-dirstat.sh
+++ b/t/t4046-diff-dirstat.sh
@@ -805,4 +805,104 @@ test_expect_success 'diff.dirstat=16.7,cumulative,files' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+cat <<EOF >expect_diff_dirstat
+  10.6% dst/copy/changed/
+  10.6% dst/copy/rearranged/
+  10.6% dst/copy/unchanged/
+  10.6% dst/move/changed/
+  10.6% dst/move/rearranged/
+  10.6% dst/move/unchanged/
+  10.6% src/move/changed/
+  10.6% src/move/rearranged/
+  10.6% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.2% changed/
+  26.3% dst/copy/changed/
+  26.3% dst/copy/rearranged/
+  26.3% dst/copy/unchanged/
+   5.2% dst/move/changed/
+   5.2% dst/move/rearranged/
+   5.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=lines' '
+	git diff --dirstat=lines HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=lines -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=lines -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=lines' '
+	git -c diff.dirstat=lines diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=lines diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=lines diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+   2.1% changed/
+  10.6% dst/copy/changed/
+  10.6% dst/copy/rearranged/
+  10.6% dst/copy/unchanged/
+  10.6% dst/move/changed/
+  10.6% dst/move/rearranged/
+  10.6% dst/move/unchanged/
+   2.1% rearranged/
+  10.6% src/move/changed/
+  10.6% src/move/rearranged/
+  10.6% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.2% changed/
+  26.3% dst/copy/changed/
+  26.3% dst/copy/rearranged/
+  26.3% dst/copy/unchanged/
+   5.2% dst/move/changed/
+   5.2% dst/move/rearranged/
+   5.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=lines,0' '
+	git diff --dirstat=lines,0 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=lines,0 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=lines,0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=0,lines' '
+	git -c diff.dirstat=0,lines diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0,lines diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0,lines diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCHv3 5/6] Allow specifying --dirstat cut-off percentage as a floating point number
  2011-04-27  8:24                               ` [PATCHv3 5/6] Allow specifying --dirstat cut-off percentage as a floating point number Johan Herland
@ 2011-04-27  8:37                                 ` Linus Torvalds
  2011-04-27 10:29                                   ` [PATCHv4 " Johan Herland
  0 siblings, 1 reply; 91+ messages in thread
From: Linus Torvalds @ 2011-04-27  8:37 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Junio C Hamano

On Wed, Apr 27, 2011 at 1:24 AM, Johan Herland <johan@herland.net> wrote:
> +                       options->dirstat_permille = strtoul(p, &end, 10) * 10;
>                        p = end;
> +                       if (*p == '.' && isdigit(*(++p))) {
> +                               int permille = strtoul(p, &end, 10);
> +                               p = end;
> +                               while (permille >= 10)
> +                                       permille /= 10; /* only use first digit */
> +                               options->dirstat_permille += permille;
> +                       }

Heh. That's both unnecessarily complicated, and doesn't work.

It gets the wrong answer for something like "0.0001", since 'permille'
in that case ends up starting out as '0001', ie just 1, so you never
actually do that whole while-loop.

So the right approach is just something like

  if (*p == '.' && isdigit(*++p)) {
    /* only use first digit */
    options->dirstat_permille += *p - '0';
    /* .. and ignore any further digits */
    while (isdigit(*++p))
      /* nothing */;
  }

(totally untested, of course)

                    Linus

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCHv4 5/6] Allow specifying --dirstat cut-off percentage as a floating point number
  2011-04-27  8:37                                 ` Linus Torvalds
@ 2011-04-27 10:29                                   ` Johan Herland
  0 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-27 10:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Junio C Hamano

Only the first digit after the decimal point is kept, as the dirstat
calculations all happen in permille.

Selftests verifying floating-point percentage input has been added.

Improved-by: Junio C Hamano <gitster@pobox.com>
Improved-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Johan Herland <johan@herland.net>
---

On Wednesday 27 April 2011, Linus Torvalds wrote:
> On Wed, Apr 27, 2011 at 1:24 AM, Johan Herland <johan@herland.net> wrote:
> > +                       options->dirstat_permille = strtoul(p, &end, 10) * 10;
> > +                       p = end;
> > +                       if (*p == '.' && isdigit(*(++p))) {
> > +                               int permille = strtoul(p, &end, 10);
> > +                               p = end;
> > +                               while (permille >= 10)
> > +                                       permille /= 10; /* only use first digit */
> > +                               options->dirstat_permille += permille;
> > +                       }
>
> Heh. That's both unnecessarily complicated, and doesn't work.
>
> It gets the wrong answer for something like "0.0001", since
> 'permille' in that case ends up starting out as '0001', ie just 1, so
> you never actually do that whole while-loop.

*facepalm* Of course. That'll teach me not to code before breakfast...

> So the right approach is just something like
>
>   if (*p == '.' && isdigit(*++p)) {
>     /* only use first digit */
>     options->dirstat_permille += *p - '0';
>     /* .. and ignore any further digits */
>     while (isdigit(*++p))
>       /* nothing */;
>   }
>
> (totally untested, of course)

Here it is, tested and everything. :)


Thanks! :)

...Johan

 diff.c                  |   26 +++++++++++-------
 diff.h                  |    2 +-
 t/t4046-diff-dirstat.sh |   64 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 81 insertions(+), 11 deletions(-)

diff --git a/diff.c b/diff.c
index 1b6e8c0..2cd815c 100644
--- a/diff.c
+++ b/diff.c
@@ -31,7 +31,7 @@ static const char *external_diff_cmd_cfg;
 int diff_auto_refresh_index = 1;
 static int diff_mnemonic_prefix;
 static int diff_no_prefix;
-static int diff_dirstat_percent_default = 3;
+static int diff_dirstat_permille_default = 30;
 static struct diff_options default_diff_options;
 
 static char diff_colors[][COLOR_MAXLEN] = {
@@ -95,8 +95,15 @@ static void parse_dirstat_params(struct diff_options *options, const char *param
 		}
 		else if (isdigit(*p)) {
 			char *end;
-			options->dirstat_percent = strtoul(p, &end, 10);
+			options->dirstat_permille = strtoul(p, &end, 10) * 10;
 			p = end;
+			if (*p == '.' && isdigit(*++p)) {
+				/* only use first digit */
+				options->dirstat_permille += *p - '0';
+				/* .. and ignore any further digits */
+				while (isdigit(*++p))
+					/* nothing */;
+			}
 		}
 		else
 			die("Unknown --dirstat parameter '%s'", p);
@@ -190,9 +197,9 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 	}
 
 	if (!strcmp(var, "diff.dirstat")) {
-		default_diff_options.dirstat_percent = diff_dirstat_percent_default;
+		default_diff_options.dirstat_permille = diff_dirstat_permille_default;
 		parse_dirstat_params(&default_diff_options, value);
-		diff_dirstat_percent_default = default_diff_options.dirstat_percent;
+		diff_dirstat_permille_default = default_diff_options.dirstat_permille;
 		return 0;
 	}
 
@@ -1504,7 +1511,7 @@ struct dirstat_file {
 
 struct dirstat_dir {
 	struct dirstat_file *files;
-	int alloc, nr, percent, cumulative;
+	int alloc, nr, permille, cumulative;
 };
 
 static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
@@ -1553,10 +1560,9 @@ static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
 	if (baselen && sources != 1) {
 		if (this_dir) {
 			int permille = this_dir * 1000 / changed;
-			int percent = permille / 10;
-			if (percent >= dir->percent) {
+			if (permille >= dir->permille) {
 				fprintf(opt->file, "%s%4d.%01d%% %.*s\n", line_prefix,
-					percent, permille % 10, baselen, base);
+					permille / 10, permille % 10, baselen, base);
 				if (!dir->cumulative)
 					return 0;
 			}
@@ -1582,7 +1588,7 @@ static void show_dirstat(struct diff_options *options)
 	dir.files = NULL;
 	dir.alloc = 0;
 	dir.nr = 0;
-	dir.percent = options->dirstat_percent;
+	dir.permille = options->dirstat_permille;
 	dir.cumulative = DIFF_OPT_TST(options, DIRSTAT_CUMULATIVE);
 
 	changed = 0;
@@ -2937,7 +2943,7 @@ void diff_setup(struct diff_options *options)
 	options->line_termination = '\n';
 	options->break_opt = -1;
 	options->rename_limit = -1;
-	options->dirstat_percent = diff_dirstat_percent_default;
+	options->dirstat_permille = diff_dirstat_permille_default;
 	options->context = 3;
 
 	options->change = diff_change;
diff --git a/diff.h b/diff.h
index 0083d92..08b4fe0 100644
--- a/diff.h
+++ b/diff.h
@@ -111,7 +111,7 @@ struct diff_options {
 	int rename_score;
 	int rename_limit;
 	int warn_on_too_large_rename;
-	int dirstat_percent;
+	int dirstat_permille;
 	int setup;
 	int abbrev;
 	const char *prefix;
diff --git a/t/t4046-diff-dirstat.sh b/t/t4046-diff-dirstat.sh
index fa1885c..b3062b4 100755
--- a/t/t4046-diff-dirstat.sh
+++ b/t/t4046-diff-dirstat.sh
@@ -768,4 +768,68 @@ test_expect_success 'diff.dirstat=10,cumulative,files' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+cat <<EOF >expect_diff_dirstat
+  27.2% dst/copy/
+  27.2% dst/move/
+  54.5% dst/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  42.8% dst/copy/
+  28.5% dst/move/
+  71.4% dst/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  33.3% dst/copy/
+  33.3% dst/move/
+  66.6% dst/
+EOF
+
+test_expect_success '--dirstat=files,cumulative,16.7' '
+	git diff --dirstat=files,cumulative,16.7 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative,16.7 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative,16.7 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=16.7,cumulative,files' '
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=16.70,cumulative,files' '
+	git -c diff.dirstat=16.70,cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=16.70,cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=16.70,cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success '--dirstat=files,cumulative,27.2' '
+	git diff --dirstat=files,cumulative,27.2 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative,27.2 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative,27.2 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success '--dirstat=files,cumulative,27.09' '
+	git diff --dirstat=files,cumulative,27.09 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative,27.09 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative,27.09 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
  2011-04-27  2:02                               ` Johan Herland
  2011-04-27  4:53                                 ` Junio C Hamano
@ 2011-04-27 20:51                                 ` Junio C Hamano
  2011-04-27 21:01                                   ` Junio C Hamano
  1 sibling, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2011-04-27 20:51 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Linus Torvalds

Johan Herland <johan@herland.net> writes:

>> Even better, probably they can be left to diff_opt_parse() without
>> calling this function, as you are deprecating them and do not have to
>> allow them to take the opt1,opt2,... form of parameter.
>
> I understand, but politely disagree: Patch 6/6 complicates the logic
> that DIFF_OPT_SET()/CLR() various bits in the diff options. I'd rather
> keep that logic in one place, than duplicate it into diff_opt_parse().

I've given a brief look at the v3.  Looks better than the previous one;
not using double is especailly a big and good thing.

There still are a few things I noticed, two of which I'd attempt to show
how to fix in the attached patch (on top of the whole 6 patch series, as I
don't have time to break it down and I know you are capable enough to do
so yourself).

 * We eradicated the use of C99_FORMAT at 28bd70d (unbreak and eliminate
   NO_C99_FORMAT, 2011-03-16) and "%td" at 31d713d (mktag: avoid %td in
   format string, 2011-03-16).

 * parse_dirstat_params() can die when it sees an input that it does not
   understand, instead of silently returning and indicating an error by
   its return value, even when it is called from the codepath to read the
   configuration files.  Dying upon an erroneous command line argument is
   fine and diagnosing it is preferred, but the configuration parser
   should ignore values that it does not understand (may want to warn) so
   that you can keep using older git (i.e. the version resulting from your
   patch) in a repository you usually use newer git that supports even
   more features with its --dirstat option.

 * Temporary memory allocation in your dirstat_opt() to handle commonly
   used shorthand stands out as a sore thumb.

 * The parsing implemented in dirstat_opt() is a bit too loose.  For
   example, we never accepted "-X=3" nor "--dirstat40" but I suspect your
   parser would.  Accepting the former might not be such a big deal, but
   not the latter.

The attached is not a complete fix-up, but addresses the last two issues,
and it also should be a good starting point for the second issue.

I tried not to fix style issues, but parse_dirstat_params() should follow

	if (...) {
            ... compound ...
	} else if (...) {
            ... compound ...
	} else if (...) {
	    ...

i.e. close brace just before the "else if" on the same line.

 diff.c |   94 +++++++++++++++++++++++++++++----------------------------------
 1 files changed, 43 insertions(+), 51 deletions(-)

diff --git a/diff.c b/diff.c
index 9008e88..7c6a8d1 100644
--- a/diff.c
+++ b/diff.c
@@ -73,9 +73,14 @@ static int parse_diff_color_slot(const char *var, int ofs)
 #define PD_FMT "%td"
 #endif
 
-static void parse_dirstat_params(struct diff_options *options, const char *params)
+static int parse_dirstat_params(struct diff_options *options, const char *params,
+				int die_on_error)
 {
 	const char *p = params;
+	const char *unknown_param_error = "Unknown --dirstat parameter '%s'";
+	const char *missing_comma_error = "Missing comma separator at char " PD_FMT
+		" of '%s'";
+
 	while (*p) {
 		if (!prefixcmp(p, "changes")) {
 			p += 7;
@@ -109,19 +114,27 @@ static void parse_dirstat_params(struct diff_options *options, const char *param
 				options->dirstat_permille += *p - '0';
 				/* .. and ignore any further digits */
 				while (isdigit(*++p))
-					/* nothing */;
+					; /* nothing */
 			}
+		} else if (die_on_error) {
+			die(unknown_param_error, p);
+		} else {
+			return error(unknown_param_error, p);
 		}
-		else
-			die("Unknown --dirstat parameter '%s'", p);
 
-		if (*p) { /* more parameters, swallow separator */
-			if (*p != ',')
-				die("Missing comma separator at char " PD_FMT
-				    " of '%s'", p - params, params);
-			p++;
+		if (*p) {
+			/* more parameters, swallow separator */
+			if (*p == ',') {
+				p++;
+				continue;
+			}
+			if (die_on_error)
+				die(missing_comma_error, p - params, params);
+			else
+				return error(missing_comma_error, p - params, params);
 		}
 	}
+	return 0;
 }
 
 static int git_config_rename(const char *var, const char *value)
@@ -205,7 +218,7 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 
 	if (!strcmp(var, "diff.dirstat")) {
 		default_diff_options.dirstat_permille = diff_dirstat_permille_default;
-		parse_dirstat_params(&default_diff_options, value);
+		(void) parse_dirstat_params(&default_diff_options, value, 0);
 		diff_dirstat_permille_default = default_diff_options.dirstat_permille;
 		return 0;
 	}
@@ -3252,45 +3265,14 @@ static int stat_opt(struct diff_options *options, const char **av)
 	return argcount;
 }
 
-/*
- * Parse dirstat-related options and any parameters given to those options.
- * Returns 1 if the option in 'arg' is a recognized dirstat-related option;
- * otherwise returns 0.
- */
-static int dirstat_opt(struct diff_options *options, const char *arg)
+static int parse_dirstat_opt(struct diff_options *options, const char *params)
 {
-	const char *p;
-	char *mangled = NULL;
-
-	if (!strcmp(arg, "--cumulative")) /* deprecated */
-		/* handle '--cumulative' like '--dirstat=cumulative' */
-		p = "cumulative";
-	else if (!strcmp(arg, "--dirstat-by-file") ||
-		 !prefixcmp(arg, "--dirstat-by-file=")) { /* deprecated */
-		/* handle '--dirstat-by-file=*' like '--dirstat=files,*' */
-		mangled = xstrdup(arg + 12);
-		memcpy(mangled, "files", 5);
-		if (mangled[5]) {
-			assert(mangled[5] == '=');
-			mangled[5] = ',';
-		}
-		p = mangled;
-	}
-	else if (!prefixcmp(arg, "-X"))
-		p = arg + 2;
-	else if (!prefixcmp(arg, "--dirstat"))
-		p = arg + 9;
-	else
-		return 0;
-
+	parse_dirstat_params(options, params, 1);
+	/*
+	 * The caller knows a dirstat-related option is given from the command
+	 * line; allow it to say "return this_function();"
+	 */
 	options->output_format |= DIFF_FORMAT_DIRSTAT;
-
-	if (*p)
-		if (*p == '=')
-			p++;
-		parse_dirstat_params(options, p);
-
-	free(mangled);
 	return 1;
 }
 
@@ -3313,10 +3295,20 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		options->output_format |= DIFF_FORMAT_NUMSTAT;
 	else if (!strcmp(arg, "--shortstat"))
 		options->output_format |= DIFF_FORMAT_SHORTSTAT;
-	else if (!prefixcmp(arg, "-X") || !prefixcmp(arg, "--dirstat") ||
-		 !strcmp(arg, "--cumulative"))
-		/* -X, --dirstat[=<args>], --dirstat-by-file, or --cumulative */
-		return dirstat_opt(options, arg);
+	else if (!strcmp(arg, "-X") || !strcmp(arg, "--dirstat"))
+		return parse_dirstat_opt(options, "");
+	else if (!prefixcmp(arg, "-X"))
+		return parse_dirstat_opt(options, arg + 2);
+	else if (!prefixcmp(arg, "--dirstat="))
+		return parse_dirstat_opt(options, arg + 10);
+	else if (!strcmp(arg, "--cumulative"))
+		return parse_dirstat_opt(options, "cumulative");
+	else if (!strcmp(arg, "--dirstat-by-file"))
+		return parse_dirstat_opt(options, "files");
+	else if (!prefixcmp(arg, "--dirstat-by-file=")) {
+		parse_dirstat_opt(options, "files");
+		return parse_dirstat_opt(options, arg + 18);
+	}
 	else if (!strcmp(arg, "--check"))
 		options->output_format |= DIFF_FORMAT_CHECKDIFF;
 	else if (!strcmp(arg, "--summary"))

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
  2011-04-27 20:51                                 ` Junio C Hamano
@ 2011-04-27 21:01                                   ` Junio C Hamano
  0 siblings, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2011-04-27 21:01 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Linus Torvalds

Junio C Hamano <gitster@pobox.com> writes:

> Johan Herland <johan@herland.net> writes:
>
>>> Even better, probably they can be left to diff_opt_parse() without
>>> calling this function, as you are deprecating them and do not have to
>>> allow them to take the opt1,opt2,... form of parameter.
>>
>> I understand, but politely disagree: Patch 6/6 complicates the logic
>> that DIFF_OPT_SET()/CLR() various bits in the diff options. I'd rather
>> keep that logic in one place, than duplicate it into diff_opt_parse().
>
> I've given a brief look at the v3.  Looks better than the previous one;
> not using double is especailly a big and good thing.

Also, I had to rename the new test to t4047 to avoid crashing with an
existing test when merged to 'pu'.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCHv5 0/7]  --dirstat fixes, part 2
  2011-04-27  8:24                             ` [PATCHv3 0/6] --dirstat fixes, part 2 Johan Herland
                                                 ` (5 preceding siblings ...)
  2011-04-27  8:24                               ` [PATCHv3 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
@ 2011-04-28  1:17                               ` Johan Herland
  2011-04-28  1:17                                 ` [PATCHv5 1/7] Add several testcases for --dirstat and friends Johan Herland
                                                   ` (6 more replies)
  6 siblings, 7 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-28  1:17 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Hi,

Here's version 5 incorporating Junio's feedback and patch from elsewhere
in this thread.

Have fun! :)

...Johan


Johan Herland (7):
  Add several testcases for --dirstat and friends
  Make --dirstat=0 output directories that contribute < 0.1% of changes
  Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
  Add config variable for specifying default --dirstat behavior
  Allow specifying --dirstat cut-off percentage as a floating point number
  New --dirstat=lines mode, doing dirstat analysis based on diffstat
  Improve error handling when parsing dirstat parameters

 Documentation/config.txt       |   44 ++
 Documentation/diff-options.txt |   54 ++-
 diff.c                         |  158 ++++++-
 diff.h                         |    3 +-
 t/t4047-diff-dirstat.sh        |  965 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 1192 insertions(+), 32 deletions(-)
 create mode 100755 t/t4047-diff-dirstat.sh

-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCHv5 1/7] Add several testcases for --dirstat and friends
  2011-04-28  1:17                               ` [PATCHv5 0/7] --dirstat fixes, part 2 Johan Herland
@ 2011-04-28  1:17                                 ` Johan Herland
  2011-04-28  1:17                                 ` [PATCHv5 2/7] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
                                                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-28  1:17 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Currently, t4013 is the only selftest that exercises the --dirstat machinery,
but it only does a superficial verification of --dirstat's output.

This patch adds a new selftest - t4047-diff-dirstat.sh - which prepares a
commit containing:
 - unchanged files, changed files and files with rearranged lines
 - copied files, moved files, and unmoved files

It then verifies the correct dirstat output for that commit in the following
dirstat modes:
 - --dirstat
 - -X
 - --dirstat=0
 - -X0
 - --cumulative
 - --dirstat-by-file
 - (plus combinations of the above)

Each of the above tests are also run with:
 - no rename detection
 - rename detection (-M)
 - expensive copy detection (-C -C)

Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t4047-diff-dirstat.sh |  580 +++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 580 insertions(+), 0 deletions(-)
 create mode 100755 t/t4047-diff-dirstat.sh

diff --git a/t/t4047-diff-dirstat.sh b/t/t4047-diff-dirstat.sh
new file mode 100755
index 0000000..eb6bf47
--- /dev/null
+++ b/t/t4047-diff-dirstat.sh
@@ -0,0 +1,580 @@
+#!/bin/sh
+
+test_description='diff --dirstat tests'
+. ./test-lib.sh
+
+# set up two commits where the second commit has these files
+# (10 lines in each file):
+#
+#   unchanged/text           (unchanged from 1st commit)
+#   changed/text             (changed 1st line)
+#   rearranged/text          (swapped 1st and 2nd line)
+#   dst/copy/unchanged/text  (copied from src/copy/unchanged/text, unchanged)
+#   dst/copy/changed/text    (copied from src/copy/changed/text, changed)
+#   dst/copy/rearranged/text (copied from src/copy/rearranged/text, rearranged)
+#   dst/move/unchanged/text  (moved from src/move/unchanged/text, unchanged)
+#   dst/move/changed/text    (moved from src/move/changed/text, changed)
+#   dst/move/rearranged/text (moved from src/move/rearranged/text, rearranged)
+
+test_expect_success 'setup' '
+	mkdir unchanged &&
+	mkdir changed &&
+	mkdir rearranged &&
+	mkdir src &&
+	mkdir src/copy &&
+	mkdir src/copy/unchanged &&
+	mkdir src/copy/changed &&
+	mkdir src/copy/rearranged &&
+	mkdir src/move &&
+	mkdir src/move/unchanged &&
+	mkdir src/move/changed &&
+	mkdir src/move/rearranged &&
+	cat <<EOF >unchanged/text &&
+unchanged       line #0
+unchanged       line #1
+unchanged       line #2
+unchanged       line #3
+unchanged       line #4
+unchanged       line #5
+unchanged       line #6
+unchanged       line #7
+unchanged       line #8
+unchanged       line #9
+EOF
+	cat <<EOF >changed/text &&
+changed         line #0
+changed         line #1
+changed         line #2
+changed         line #3
+changed         line #4
+changed         line #5
+changed         line #6
+changed         line #7
+changed         line #8
+changed         line #9
+EOF
+	cat <<EOF >rearranged/text &&
+rearranged      line #0
+rearranged      line #1
+rearranged      line #2
+rearranged      line #3
+rearranged      line #4
+rearranged      line #5
+rearranged      line #6
+rearranged      line #7
+rearranged      line #8
+rearranged      line #9
+EOF
+	cat <<EOF >src/copy/unchanged/text &&
+copy  unchanged line #0
+copy  unchanged line #1
+copy  unchanged line #2
+copy  unchanged line #3
+copy  unchanged line #4
+copy  unchanged line #5
+copy  unchanged line #6
+copy  unchanged line #7
+copy  unchanged line #8
+copy  unchanged line #9
+EOF
+	cat <<EOF >src/copy/changed/text &&
+copy    changed line #0
+copy    changed line #1
+copy    changed line #2
+copy    changed line #3
+copy    changed line #4
+copy    changed line #5
+copy    changed line #6
+copy    changed line #7
+copy    changed line #8
+copy    changed line #9
+EOF
+	cat <<EOF >src/copy/rearranged/text &&
+copy rearranged line #0
+copy rearranged line #1
+copy rearranged line #2
+copy rearranged line #3
+copy rearranged line #4
+copy rearranged line #5
+copy rearranged line #6
+copy rearranged line #7
+copy rearranged line #8
+copy rearranged line #9
+EOF
+	cat <<EOF >src/move/unchanged/text &&
+move  unchanged line #0
+move  unchanged line #1
+move  unchanged line #2
+move  unchanged line #3
+move  unchanged line #4
+move  unchanged line #5
+move  unchanged line #6
+move  unchanged line #7
+move  unchanged line #8
+move  unchanged line #9
+EOF
+	cat <<EOF >src/move/changed/text &&
+move    changed line #0
+move    changed line #1
+move    changed line #2
+move    changed line #3
+move    changed line #4
+move    changed line #5
+move    changed line #6
+move    changed line #7
+move    changed line #8
+move    changed line #9
+EOF
+	cat <<EOF >src/move/rearranged/text &&
+move rearranged line #0
+move rearranged line #1
+move rearranged line #2
+move rearranged line #3
+move rearranged line #4
+move rearranged line #5
+move rearranged line #6
+move rearranged line #7
+move rearranged line #8
+move rearranged line #9
+EOF
+	git add . &&
+	git commit -m "initial" &&
+	mkdir dst &&
+	mkdir dst/copy &&
+	mkdir dst/copy/unchanged &&
+	mkdir dst/copy/changed &&
+	mkdir dst/copy/rearranged &&
+	mkdir dst/move &&
+	mkdir dst/move/unchanged &&
+	mkdir dst/move/changed &&
+	mkdir dst/move/rearranged &&
+	cat <<EOF >changed/text &&
+CHANGED XXXXXXX line #0
+changed         line #1
+changed         line #2
+changed         line #3
+changed         line #4
+changed         line #5
+changed         line #6
+changed         line #7
+changed         line #8
+changed         line #9
+EOF
+	cat <<EOF >rearranged/text &&
+rearranged      line #1
+rearranged      line #0
+rearranged      line #2
+rearranged      line #3
+rearranged      line #4
+rearranged      line #5
+rearranged      line #6
+rearranged      line #7
+rearranged      line #8
+rearranged      line #9
+EOF
+	cat <<EOF >dst/copy/unchanged/text &&
+copy  unchanged line #0
+copy  unchanged line #1
+copy  unchanged line #2
+copy  unchanged line #3
+copy  unchanged line #4
+copy  unchanged line #5
+copy  unchanged line #6
+copy  unchanged line #7
+copy  unchanged line #8
+copy  unchanged line #9
+EOF
+	cat <<EOF >dst/copy/changed/text &&
+copy XXXCHANGED line #0
+copy    changed line #1
+copy    changed line #2
+copy    changed line #3
+copy    changed line #4
+copy    changed line #5
+copy    changed line #6
+copy    changed line #7
+copy    changed line #8
+copy    changed line #9
+EOF
+	cat <<EOF >dst/copy/rearranged/text &&
+copy rearranged line #1
+copy rearranged line #0
+copy rearranged line #2
+copy rearranged line #3
+copy rearranged line #4
+copy rearranged line #5
+copy rearranged line #6
+copy rearranged line #7
+copy rearranged line #8
+copy rearranged line #9
+EOF
+	cat <<EOF >dst/move/unchanged/text &&
+move  unchanged line #0
+move  unchanged line #1
+move  unchanged line #2
+move  unchanged line #3
+move  unchanged line #4
+move  unchanged line #5
+move  unchanged line #6
+move  unchanged line #7
+move  unchanged line #8
+move  unchanged line #9
+EOF
+	cat <<EOF >dst/move/changed/text &&
+move XXXCHANGED line #0
+move    changed line #1
+move    changed line #2
+move    changed line #3
+move    changed line #4
+move    changed line #5
+move    changed line #6
+move    changed line #7
+move    changed line #8
+move    changed line #9
+EOF
+	cat <<EOF >dst/move/rearranged/text &&
+move rearranged line #1
+move rearranged line #0
+move rearranged line #2
+move rearranged line #3
+move rearranged line #4
+move rearranged line #5
+move rearranged line #6
+move rearranged line #7
+move rearranged line #8
+move rearranged line #9
+EOF
+	git add . &&
+	git rm -r src/move/unchanged &&
+	git rm -r src/move/changed &&
+	git rm -r src/move/rearranged &&
+	git commit -m "changes"
+'
+
+cat <<EOF >expect_diff_stat
+ changed/text             |    2 +-
+ dst/copy/changed/text    |   10 ++++++++++
+ dst/copy/rearranged/text |   10 ++++++++++
+ dst/copy/unchanged/text  |   10 ++++++++++
+ dst/move/changed/text    |   10 ++++++++++
+ dst/move/rearranged/text |   10 ++++++++++
+ dst/move/unchanged/text  |   10 ++++++++++
+ rearranged/text          |    2 +-
+ src/move/changed/text    |   10 ----------
+ src/move/rearranged/text |   10 ----------
+ src/move/unchanged/text  |   10 ----------
+ 11 files changed, 62 insertions(+), 32 deletions(-)
+EOF
+
+cat <<EOF >expect_diff_stat_M
+ changed/text                      |    2 +-
+ dst/copy/changed/text             |   10 ++++++++++
+ dst/copy/rearranged/text          |   10 ++++++++++
+ dst/copy/unchanged/text           |   10 ++++++++++
+ {src => dst}/move/changed/text    |    2 +-
+ {src => dst}/move/rearranged/text |    2 +-
+ {src => dst}/move/unchanged/text  |    0
+ rearranged/text                   |    2 +-
+ 8 files changed, 34 insertions(+), 4 deletions(-)
+EOF
+
+cat <<EOF >expect_diff_stat_CC
+ changed/text                      |    2 +-
+ {src => dst}/copy/changed/text    |    2 +-
+ {src => dst}/copy/rearranged/text |    2 +-
+ {src => dst}/copy/unchanged/text  |    0
+ {src => dst}/move/changed/text    |    2 +-
+ {src => dst}/move/rearranged/text |    2 +-
+ {src => dst}/move/unchanged/text  |    0
+ rearranged/text                   |    2 +-
+ 8 files changed, 6 insertions(+), 6 deletions(-)
+EOF
+
+test_expect_success 'sanity check setup (--stat)' '
+	git diff --stat HEAD^..HEAD >actual_diff_stat &&
+	test_cmp expect_diff_stat actual_diff_stat &&
+	git diff --stat -M HEAD^..HEAD >actual_diff_stat_M &&
+	test_cmp expect_diff_stat_M actual_diff_stat_M &&
+	git diff --stat -C -C HEAD^..HEAD >actual_diff_stat_CC &&
+	test_cmp expect_diff_stat_CC actual_diff_stat_CC
+'
+
+# changed/text and rearranged/text falls below default 3% threshold
+cat <<EOF >expect_diff_dirstat
+  10.8% dst/copy/changed/
+  10.8% dst/copy/rearranged/
+  10.8% dst/copy/unchanged/
+  10.8% dst/move/changed/
+  10.8% dst/move/rearranged/
+  10.8% dst/move/unchanged/
+  10.8% src/move/changed/
+  10.8% src/move/rearranged/
+  10.8% src/move/unchanged/
+EOF
+
+# rearranged/text falls below default 3% threshold
+cat <<EOF >expect_diff_dirstat_M
+   5.8% changed/
+  29.3% dst/copy/changed/
+  29.3% dst/copy/rearranged/
+  29.3% dst/copy/unchanged/
+   5.8% dst/move/changed/
+EOF
+
+# rearranged/text falls below default 3% threshold
+cat <<EOF >expect_diff_dirstat_CC
+  32.6% changed/
+  32.6% dst/copy/changed/
+  32.6% dst/move/changed/
+EOF
+
+test_expect_success 'vanilla --dirstat' '
+	git diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'vanilla -X' '
+	git diff -X HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff -X -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff -X -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
+cat <<EOF >expect_diff_dirstat
+   2.1% changed/
+  10.8% dst/copy/changed/
+  10.8% dst/copy/rearranged/
+  10.8% dst/copy/unchanged/
+  10.8% dst/move/changed/
+  10.8% dst/move/rearranged/
+  10.8% dst/move/unchanged/
+  10.8% src/move/changed/
+  10.8% src/move/rearranged/
+  10.8% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.8% changed/
+  29.3% dst/copy/changed/
+  29.3% dst/copy/rearranged/
+  29.3% dst/copy/unchanged/
+   5.8% dst/move/changed/
+   0.1% dst/move/rearranged/
+   0.1% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  32.6% changed/
+  32.6% dst/copy/changed/
+   0.6% dst/copy/rearranged/
+  32.6% dst/move/changed/
+   0.6% dst/move/rearranged/
+   0.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=0' '
+	git diff --dirstat=0 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=0 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success '-X0' '
+	git diff -X0 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff -X0 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff -X0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
+cat <<EOF >expect_diff_dirstat
+   2.1% changed/
+  10.8% dst/copy/changed/
+  10.8% dst/copy/rearranged/
+  10.8% dst/copy/unchanged/
+  32.5% dst/copy/
+  10.8% dst/move/changed/
+  10.8% dst/move/rearranged/
+  10.8% dst/move/unchanged/
+  32.5% dst/move/
+  65.1% dst/
+  10.8% src/move/changed/
+  10.8% src/move/rearranged/
+  10.8% src/move/unchanged/
+  32.5% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.8% changed/
+  29.3% dst/copy/changed/
+  29.3% dst/copy/rearranged/
+  29.3% dst/copy/unchanged/
+  88.0% dst/copy/
+   5.8% dst/move/changed/
+   0.1% dst/move/rearranged/
+   5.9% dst/move/
+  94.0% dst/
+   0.1% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  32.6% changed/
+  32.6% dst/copy/changed/
+   0.6% dst/copy/rearranged/
+  33.3% dst/copy/
+  32.6% dst/move/changed/
+   0.6% dst/move/rearranged/
+  33.3% dst/move/
+  66.6% dst/
+   0.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=0 --cumulative' '
+	git diff --dirstat=0 --cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=0 --cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=0 --cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+   9.0% changed/
+   9.0% dst/copy/changed/
+   9.0% dst/copy/rearranged/
+   9.0% dst/copy/unchanged/
+   9.0% dst/move/changed/
+   9.0% dst/move/rearranged/
+   9.0% dst/move/unchanged/
+   9.0% rearranged/
+   9.0% src/move/changed/
+   9.0% src/move/rearranged/
+   9.0% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat-by-file' '
+	git diff --dirstat-by-file HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat-by-file -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat-by-file -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+  27.2% dst/copy/
+  27.2% dst/move/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat-by-file=10' '
+	git diff --dirstat-by-file=10 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat-by-file=10 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat-by-file=10 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+   9.0% changed/
+   9.0% dst/copy/changed/
+   9.0% dst/copy/rearranged/
+   9.0% dst/copy/unchanged/
+  27.2% dst/copy/
+   9.0% dst/move/changed/
+   9.0% dst/move/rearranged/
+   9.0% dst/move/unchanged/
+  27.2% dst/move/
+  54.5% dst/
+   9.0% rearranged/
+   9.0% src/move/changed/
+   9.0% src/move/rearranged/
+   9.0% src/move/unchanged/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  42.8% dst/copy/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  28.5% dst/move/
+  71.4% dst/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  33.3% dst/copy/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  33.3% dst/move/
+  66.6% dst/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat-by-file --cumulative' '
+	git diff --dirstat-by-file --cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat-by-file --cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat-by-file --cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv5 2/7] Make --dirstat=0 output directories that contribute < 0.1% of changes
  2011-04-28  1:17                               ` [PATCHv5 0/7] --dirstat fixes, part 2 Johan Herland
  2011-04-28  1:17                                 ` [PATCHv5 1/7] Add several testcases for --dirstat and friends Johan Herland
@ 2011-04-28  1:17                                 ` Johan Herland
  2011-04-28  1:17                                 ` [PATCHv5 3/7] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
                                                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-28  1:17 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

The expected output from --dirstat=0, is to include any directory with
changes, even if those changes contribute a minuscule portion of the total
changes. However, currently, directories that contribute less than 0.1% are
not included, since their 'permille' value is 0, and there is an
'if (permille)' check in gather_dirstat() that causes them to be ignored.

This test is obviously intended to exclude directories that contribute no
changes whatsoever, but in this case, it hits too broadly. The correct
check is against 'this_dir' from which the permille is calculated. Only if
this value is 0 does the directory truly contribute no changes, and should
be skipped from the output.

This patches fixes this issue, and updates corresponding testcases to
expect the new behvaior.

Signed-off-by: Johan Herland <johan@herland.net>
---
 diff.c                  |    4 ++--
 t/t4047-diff-dirstat.sh |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index abd9cd5..cfbfa92 100644
--- a/diff.c
+++ b/diff.c
@@ -1500,8 +1500,8 @@ static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
 	 *    under this directory (sources == 1).
 	 */
 	if (baselen && sources != 1) {
-		int permille = this_dir * 1000 / changed;
-		if (permille) {
+		if (this_dir) {
+			int permille = this_dir * 1000 / changed;
 			int percent = permille / 10;
 			if (percent >= dir->percent) {
 				fprintf(opt->file, "%s%4d.%01d%% %.*s\n", line_prefix,
diff --git a/t/t4047-diff-dirstat.sh b/t/t4047-diff-dirstat.sh
index eb6bf47..6ff7f9f 100755
--- a/t/t4047-diff-dirstat.sh
+++ b/t/t4047-diff-dirstat.sh
@@ -346,7 +346,6 @@ test_expect_success 'vanilla -X' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
-# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -355,6 +354,7 @@ cat <<EOF >expect_diff_dirstat
   10.8% dst/move/changed/
   10.8% dst/move/rearranged/
   10.8% dst/move/unchanged/
+   0.0% rearranged/
   10.8% src/move/changed/
   10.8% src/move/rearranged/
   10.8% src/move/unchanged/
@@ -397,7 +397,6 @@ test_expect_success '-X0' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
-# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -409,6 +408,7 @@ cat <<EOF >expect_diff_dirstat
   10.8% dst/move/unchanged/
   32.5% dst/move/
   65.1% dst/
+   0.0% rearranged/
   10.8% src/move/changed/
   10.8% src/move/rearranged/
   10.8% src/move/unchanged/
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv5 3/7] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
  2011-04-28  1:17                               ` [PATCHv5 0/7] --dirstat fixes, part 2 Johan Herland
  2011-04-28  1:17                                 ` [PATCHv5 1/7] Add several testcases for --dirstat and friends Johan Herland
  2011-04-28  1:17                                 ` [PATCHv5 2/7] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
@ 2011-04-28  1:17                                 ` Johan Herland
  2011-04-28  1:17                                 ` [PATCHv5 4/7] Add config variable for specifying default --dirstat behavior Johan Herland
                                                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-28  1:17 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Instead of having multiple interconnected dirstat-related options, teach
the --dirstat option itself to accept all behavior modifiers as parameters.

 - Preserve the current --dirstat=<limit> (where <limit> is an integer
   specifying a cut-off percentage)
 - Add --dirstat=cumulative, replacing --cumulative
 - Add --dirstat=files, replacing --dirstat-by-file
 - Also add --dirstat=changes and --dirstat=noncumulative for specifying the
   current default behavior. These allow the user to reset other --dirstat
   parameters (e.g. 'cumulative' and 'files') occuring earlier on the
   command line.

The deprecated options (--cumulative and --dirstat-by-file) are still
functional, although they have been removed from the documentation.

Allow multiple parameters to be separated by commas, e.g.:
  --dirstat=files,10,cumulative

Update the documentation accordingly, and add testcases verifying the
behavior of the new syntax.

Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/diff-options.txt |   44 +++++++++++----
 diff.c                         |   69 ++++++++++++++++++++---
 t/t4047-diff-dirstat.sh        |  119 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 211 insertions(+), 21 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 7e4bd42..6a3a9c1 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -66,19 +66,39 @@ endif::git-format-patch[]
 	number of modified files, as well as number of added and deleted
 	lines.
 
---dirstat[=<limit>]::
-	Output the distribution of relative amount of changes (number of lines added or
-	removed) for each sub-directory. Directories with changes below
-	a cut-off percent (3% by default) are not shown. The cut-off percent
-	can be set with `--dirstat=<limit>`. Changes in a child directory are not
-	counted for the parent directory, unless `--cumulative` is used.
+--dirstat[=<param1,param2,...>]::
+	Output the distribution of relative amount of changes for each
+	sub-directory. The behavior of `--dirstat` can be customized by
+	passing it a comma separated list of parameters.
+	The following parameters are available:
 +
-Note that the `--dirstat` option computes the changes while ignoring
-the amount of pure code movements within a file.  In other words,
-rearranging lines in a file is not counted as much as other changes.
-
---dirstat-by-file[=<limit>]::
-	Same as `--dirstat`, but counts changed files instead of lines.
+--
+`changes`;;
+	Compute the dirstat numbers by counting the lines that have been
+	removed from the source, or added to the destination. This ignores
+	the amount of pure code movements within a file.  In other words,
+	rearranging lines in a file is not counted as much as other changes.
+	This is the default behavior when no parameter is given.
+`files`;;
+	Compute the dirstat numbers by counting the number of files changed.
+	Each changed file counts equally in the dirstat analysis. This is
+	the computationally cheapest `--dirstat` behavior, since it does
+	not have to look at the file contents at all.
+`cumulative`;;
+	Count changes in a child directory for the parent directory as well.
+	Note that when using `cumulative`, the sum of the percentages
+	reported may exceed 100%. The default (non-cumulative) behavior can
+	be specified with the `noncumulative` parameter.
+<limit>;;
+	An integer parameter specifies a cut-off percent (3% by default).
+	Directories contributing less than this percentage of the changes
+	are not shown in the output.
+--
++
+Example: The following will count changed files, while ignoring
+directories with less than 10% of the total amount of changed files,
+and accumulating child directory counts in the parent directories:
+`--dirstat=files,10,cumulative`.
 
 --summary::
 	Output a condensed summary of extended header information
diff --git a/diff.c b/diff.c
index cfbfa92..0e4a510 100644
--- a/diff.c
+++ b/diff.c
@@ -66,6 +66,41 @@ static int parse_diff_color_slot(const char *var, int ofs)
 	return -1;
 }
 
+static int parse_dirstat_params(struct diff_options *options, const char *params)
+{
+	const char *p = params;
+	while (*p) {
+		if (!prefixcmp(p, "changes")) {
+			p += 7;
+			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
+		} else if (!prefixcmp(p, "files")) {
+			p += 5;
+			DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
+		} else if (!prefixcmp(p, "noncumulative")) {
+			p += 13;
+			DIFF_OPT_CLR(options, DIRSTAT_CUMULATIVE);
+		} else if (!prefixcmp(p, "cumulative")) {
+			p += 10;
+			DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
+		} else if (isdigit(*p)) {
+			char *end;
+			options->dirstat_percent = strtoul(p, &end, 10);
+			p = end;
+		} else
+			return error("Unknown --dirstat parameter '%s'", p);
+
+		if (*p) {
+			/* more parameters, swallow separator */
+			if (*p != ',')
+				return error("Missing comma separator at char "
+					"%"PRIuMAX" of '%s'",
+					(uintmax_t) (p - params), params);
+			p++;
+		}
+	}
+	return 0;
+}
+
 static int git_config_rename(const char *var, const char *value)
 {
 	if (!value)
@@ -3144,6 +3179,18 @@ static int stat_opt(struct diff_options *options, const char **av)
 	return argcount;
 }
 
+static int parse_dirstat_opt(struct diff_options *options, const char *params)
+{
+	if (parse_dirstat_params(options, params))
+		die("Failed to parse --dirstat/-X option parameter");
+	/*
+	 * The caller knows a dirstat-related option is given from the command
+	 * line; allow it to say "return this_function();"
+	 */
+	options->output_format |= DIFF_FORMAT_DIRSTAT;
+	return 1;
+}
+
 int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 {
 	const char *arg = av[0];
@@ -3163,15 +3210,19 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		options->output_format |= DIFF_FORMAT_NUMSTAT;
 	else if (!strcmp(arg, "--shortstat"))
 		options->output_format |= DIFF_FORMAT_SHORTSTAT;
-	else if (opt_arg(arg, 'X', "dirstat", &options->dirstat_percent))
-		options->output_format |= DIFF_FORMAT_DIRSTAT;
-	else if (!strcmp(arg, "--cumulative")) {
-		options->output_format |= DIFF_FORMAT_DIRSTAT;
-		DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
-	} else if (opt_arg(arg, 0, "dirstat-by-file",
-			   &options->dirstat_percent)) {
-		options->output_format |= DIFF_FORMAT_DIRSTAT;
-		DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
+	else if (!strcmp(arg, "-X") || !strcmp(arg, "--dirstat"))
+		return parse_dirstat_opt(options, "");
+	else if (!prefixcmp(arg, "-X"))
+		return parse_dirstat_opt(options, arg + 2);
+	else if (!prefixcmp(arg, "--dirstat="))
+		return parse_dirstat_opt(options, arg + 10);
+	else if (!strcmp(arg, "--cumulative"))
+		return parse_dirstat_opt(options, "cumulative");
+	else if (!strcmp(arg, "--dirstat-by-file"))
+		return parse_dirstat_opt(options, "files");
+	else if (!prefixcmp(arg, "--dirstat-by-file=")) {
+		parse_dirstat_opt(options, "files");
+		return parse_dirstat_opt(options, arg + 18);
 	}
 	else if (!strcmp(arg, "--check"))
 		options->output_format |= DIFF_FORMAT_CHECKDIFF;
diff --git a/t/t4047-diff-dirstat.sh b/t/t4047-diff-dirstat.sh
index 6ff7f9f..0ede619 100755
--- a/t/t4047-diff-dirstat.sh
+++ b/t/t4047-diff-dirstat.sh
@@ -346,6 +346,39 @@ test_expect_success 'vanilla -X' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'explicit defaults: --dirstat=changes,noncumulative,3' '
+	git diff --dirstat=changes,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=changes,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=changes,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'explicit defaults: -Xchanges,noncumulative,3' '
+	git diff -Xchanges,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff -Xchanges,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff -Xchanges,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'later options override earlier options:' '
+	git diff --dirstat=files,10,cumulative,changes,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,10,cumulative,changes,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,10,cumulative,changes,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+	git diff --dirstat=files --dirstat=10 --dirstat=cumulative --dirstat=changes --dirstat=noncumulative -X3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files --dirstat=10 --dirstat=cumulative --dirstat=changes --dirstat=noncumulative -X3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files --dirstat=10 --dirstat=cumulative --dirstat=changes --dirstat=noncumulative -X3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -449,6 +482,24 @@ test_expect_success '--dirstat=0 --cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=0,cumulative' '
+	git diff --dirstat=0,cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=0,cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=0,cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success '-X0,cumulative' '
+	git diff -X0,cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff -X0,cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff -X0,cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -491,6 +542,15 @@ test_expect_success '--dirstat-by-file' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=files' '
+	git diff --dirstat=files HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
   27.2% dst/copy/
   27.2% dst/move/
@@ -525,6 +585,15 @@ test_expect_success '--dirstat-by-file=10' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=files,10' '
+	git diff --dirstat=files,10 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,10 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,10 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -577,4 +646,54 @@ test_expect_success '--dirstat-by-file --cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=files,cumulative' '
+	git diff --dirstat=files,cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+  27.2% dst/copy/
+  27.2% dst/move/
+  54.5% dst/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  42.8% dst/copy/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  28.5% dst/move/
+  71.4% dst/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  33.3% dst/copy/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  33.3% dst/move/
+  66.6% dst/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=files,cumulative,10' '
+	git diff --dirstat=files,cumulative,10 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative,10 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative,10 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv5 4/7] Add config variable for specifying default --dirstat behavior
  2011-04-28  1:17                               ` [PATCHv5 0/7] --dirstat fixes, part 2 Johan Herland
                                                   ` (2 preceding siblings ...)
  2011-04-28  1:17                                 ` [PATCHv5 3/7] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
@ 2011-04-28  1:17                                 ` Johan Herland
  2011-04-28  1:17                                 ` [PATCHv5 5/7] Allow specifying --dirstat cut-off percentage as a floating point number Johan Herland
                                                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-28  1:17 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

The new diff.dirstat config variable takes the same arguments as
'--dirstat=<args>', and specifies the default arguments for --dirstat.
The config is obviously overridden by --dirstat arguments passed on the
command line.

When not specified, the --dirstat defaults are 'changes,noncumulative,3'.

The patch also adds several tests verifying the interaction between the
diff.dirstat config variable, and the --dirstat command line option.

Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/config.txt       |   36 ++++++++++++++++++++
 Documentation/diff-options.txt |    2 +
 diff.c                         |   10 +++++-
 t/t4047-diff-dirstat.sh        |   72 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 119 insertions(+), 1 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 6babbc7..c18dd5a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -822,6 +822,42 @@ diff.autorefreshindex::
 	affects only 'git diff' Porcelain, and not lower level
 	'diff' commands such as 'git diff-files'.
 
+diff.dirstat::
+	A comma separated list of `--dirstat` parameters specifying the
+	default behavior of the `--dirstat` option to linkgit:git-diff[1]`
+	and friends. The defaults can be overridden on the command line
+	(using `--dirstat=<param1,param2,...>`). The fallback defaults
+	(when not changed by `diff.dirstat`) are `changes,noncumulative,3`.
+	The following parameters are available:
++
+--
+`changes`;;
+	Compute the dirstat numbers by counting the lines that have been
+	removed from the source, or added to the destination. This ignores
+	the amount of pure code movements within a file.  In other words,
+	rearranging lines in a file is not counted as much as other changes.
+	This is the default behavior when no parameter is given.
+`files`;;
+	Compute the dirstat numbers by counting the number of files changed.
+	Each changed file counts equally in the dirstat analysis. This is
+	the computationally cheapest `--dirstat` behavior, since it does
+	not have to look at the file contents at all.
+`cumulative`;;
+	Count changes in a child directory for the parent directory as well.
+	Note that when using `cumulative`, the sum of the percentages
+	reported may exceed 100%. The default (non-cumulative) behavior can
+	be specified with the `noncumulative` parameter.
+<limit>;;
+	An integer parameter specifies a cut-off percent (3% by default).
+	Directories contributing less than this percentage of the changes
+	are not shown in the output.
+--
++
+Example: The following will count changed files, while ignoring
+directories with less than 10% of the total amount of changed files,
+and accumulating child directory counts in the parent directories:
+`files,10,cumulative`.
+
 diff.external::
 	If this config variable is set, diff generation is not
 	performed using the internal diff machinery, but using the
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 6a3a9c1..4ad50b9 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -70,6 +70,8 @@ endif::git-format-patch[]
 	Output the distribution of relative amount of changes for each
 	sub-directory. The behavior of `--dirstat` can be customized by
 	passing it a comma separated list of parameters.
+	The defaults are controlled by the `diff.dirstat` configuration
+	variable (see linkgit:git-config[1]).
 	The following parameters are available:
 +
 --
diff --git a/diff.c b/diff.c
index 0e4a510..92508b0 100644
--- a/diff.c
+++ b/diff.c
@@ -31,6 +31,7 @@ static const char *external_diff_cmd_cfg;
 int diff_auto_refresh_index = 1;
 static int diff_mnemonic_prefix;
 static int diff_no_prefix;
+static int diff_dirstat_percent_default = 3;
 static struct diff_options default_diff_options;
 
 static char diff_colors[][COLOR_MAXLEN] = {
@@ -180,6 +181,13 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (!strcmp(var, "diff.dirstat")) {
+		default_diff_options.dirstat_percent = diff_dirstat_percent_default;
+		(void) parse_dirstat_params(&default_diff_options, value);
+		diff_dirstat_percent_default = default_diff_options.dirstat_percent;
+		return 0;
+	}
+
 	if (!prefixcmp(var, "submodule."))
 		return parse_submodule_config_option(var, value);
 
@@ -2921,7 +2929,7 @@ void diff_setup(struct diff_options *options)
 	options->line_termination = '\n';
 	options->break_opt = -1;
 	options->rename_limit = -1;
-	options->dirstat_percent = 3;
+	options->dirstat_percent = diff_dirstat_percent_default;
 	options->context = 3;
 
 	options->change = diff_change;
diff --git a/t/t4047-diff-dirstat.sh b/t/t4047-diff-dirstat.sh
index 0ede619..fa1885c 100755
--- a/t/t4047-diff-dirstat.sh
+++ b/t/t4047-diff-dirstat.sh
@@ -379,6 +379,15 @@ test_expect_success 'later options override earlier options:' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'non-defaults in config overridden by explicit defaults on command line' '
+	git -c diff.dirstat=files,cumulative,50 diff --dirstat=changes,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=files,cumulative,50 diff --dirstat=changes,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=files,cumulative,50 diff --dirstat=changes,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -430,6 +439,15 @@ test_expect_success '-X0' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=0' '
+	git -c diff.dirstat=0 diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0 diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0 diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -500,6 +518,24 @@ test_expect_success '-X0,cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=0,cumulative' '
+	git -c diff.dirstat=0,cumulative diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0,cumulative diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0,cumulative diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=0 & --dirstat=cumulative' '
+	git -c diff.dirstat=0 diff --dirstat=cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0 diff --dirstat=cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0 diff --dirstat=cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -551,6 +587,15 @@ test_expect_success '--dirstat=files' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=files' '
+	git -c diff.dirstat=files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
   27.2% dst/copy/
   27.2% dst/move/
@@ -594,6 +639,15 @@ test_expect_success '--dirstat=files,10' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=10,files' '
+	git -c diff.dirstat=10,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=10,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=10,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -655,6 +709,15 @@ test_expect_success '--dirstat=files,cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=cumulative,files' '
+	git -c diff.dirstat=cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
   27.2% dst/copy/
   27.2% dst/move/
@@ -696,4 +759,13 @@ test_expect_success '--dirstat=files,cumulative,10' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=10,cumulative,files' '
+	git -c diff.dirstat=10,cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=10,cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=10,cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv5 5/7] Allow specifying --dirstat cut-off percentage as a floating point number
  2011-04-28  1:17                               ` [PATCHv5 0/7] --dirstat fixes, part 2 Johan Herland
                                                   ` (3 preceding siblings ...)
  2011-04-28  1:17                                 ` [PATCHv5 4/7] Add config variable for specifying default --dirstat behavior Johan Herland
@ 2011-04-28  1:17                                 ` Johan Herland
  2011-04-28  1:17                                 ` [PATCHv5 6/7] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
  2011-04-28  1:17                                 ` [PATCHv5 7/7] Improve error handling when parsing dirstat parameters Johan Herland
  6 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-28  1:17 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Only the first digit after the decimal point is kept, as the dirstat
calculations all happen in permille.

Selftests verifying floating-point percentage input has been added.

Improved-by: Junio C Hamano <gitster@pobox.com>
Improved-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Johan Herland <johan@herland.net>
---
 diff.c                  |   26 +++++++++++-------
 diff.h                  |    2 +-
 t/t4047-diff-dirstat.sh |   64 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 81 insertions(+), 11 deletions(-)

diff --git a/diff.c b/diff.c
index 92508b0..e0de4fa 100644
--- a/diff.c
+++ b/diff.c
@@ -31,7 +31,7 @@ static const char *external_diff_cmd_cfg;
 int diff_auto_refresh_index = 1;
 static int diff_mnemonic_prefix;
 static int diff_no_prefix;
-static int diff_dirstat_percent_default = 3;
+static int diff_dirstat_permille_default = 30;
 static struct diff_options default_diff_options;
 
 static char diff_colors[][COLOR_MAXLEN] = {
@@ -85,8 +85,15 @@ static int parse_dirstat_params(struct diff_options *options, const char *params
 			DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
 		} else if (isdigit(*p)) {
 			char *end;
-			options->dirstat_percent = strtoul(p, &end, 10);
+			options->dirstat_permille = strtoul(p, &end, 10) * 10;
 			p = end;
+			if (*p == '.' && isdigit(*++p)) {
+				/* only use first digit */
+				options->dirstat_permille += *p - '0';
+				/* .. and ignore any further digits */
+				while (isdigit(*++p))
+					; /* nothing */
+			}
 		} else
 			return error("Unknown --dirstat parameter '%s'", p);
 
@@ -182,9 +189,9 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 	}
 
 	if (!strcmp(var, "diff.dirstat")) {
-		default_diff_options.dirstat_percent = diff_dirstat_percent_default;
+		default_diff_options.dirstat_permille = diff_dirstat_permille_default;
 		(void) parse_dirstat_params(&default_diff_options, value);
-		diff_dirstat_percent_default = default_diff_options.dirstat_percent;
+		diff_dirstat_permille_default = default_diff_options.dirstat_permille;
 		return 0;
 	}
 
@@ -1496,7 +1503,7 @@ struct dirstat_file {
 
 struct dirstat_dir {
 	struct dirstat_file *files;
-	int alloc, nr, percent, cumulative;
+	int alloc, nr, permille, cumulative;
 };
 
 static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
@@ -1545,10 +1552,9 @@ static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
 	if (baselen && sources != 1) {
 		if (this_dir) {
 			int permille = this_dir * 1000 / changed;
-			int percent = permille / 10;
-			if (percent >= dir->percent) {
+			if (permille >= dir->permille) {
 				fprintf(opt->file, "%s%4d.%01d%% %.*s\n", line_prefix,
-					percent, permille % 10, baselen, base);
+					permille / 10, permille % 10, baselen, base);
 				if (!dir->cumulative)
 					return 0;
 			}
@@ -1574,7 +1580,7 @@ static void show_dirstat(struct diff_options *options)
 	dir.files = NULL;
 	dir.alloc = 0;
 	dir.nr = 0;
-	dir.percent = options->dirstat_percent;
+	dir.permille = options->dirstat_permille;
 	dir.cumulative = DIFF_OPT_TST(options, DIRSTAT_CUMULATIVE);
 
 	changed = 0;
@@ -2929,7 +2935,7 @@ void diff_setup(struct diff_options *options)
 	options->line_termination = '\n';
 	options->break_opt = -1;
 	options->rename_limit = -1;
-	options->dirstat_percent = diff_dirstat_percent_default;
+	options->dirstat_permille = diff_dirstat_permille_default;
 	options->context = 3;
 
 	options->change = diff_change;
diff --git a/diff.h b/diff.h
index 0083d92..08b4fe0 100644
--- a/diff.h
+++ b/diff.h
@@ -111,7 +111,7 @@ struct diff_options {
 	int rename_score;
 	int rename_limit;
 	int warn_on_too_large_rename;
-	int dirstat_percent;
+	int dirstat_permille;
 	int setup;
 	int abbrev;
 	const char *prefix;
diff --git a/t/t4047-diff-dirstat.sh b/t/t4047-diff-dirstat.sh
index fa1885c..b3062b4 100755
--- a/t/t4047-diff-dirstat.sh
+++ b/t/t4047-diff-dirstat.sh
@@ -768,4 +768,68 @@ test_expect_success 'diff.dirstat=10,cumulative,files' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+cat <<EOF >expect_diff_dirstat
+  27.2% dst/copy/
+  27.2% dst/move/
+  54.5% dst/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  42.8% dst/copy/
+  28.5% dst/move/
+  71.4% dst/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  33.3% dst/copy/
+  33.3% dst/move/
+  66.6% dst/
+EOF
+
+test_expect_success '--dirstat=files,cumulative,16.7' '
+	git diff --dirstat=files,cumulative,16.7 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative,16.7 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative,16.7 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=16.7,cumulative,files' '
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=16.70,cumulative,files' '
+	git -c diff.dirstat=16.70,cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=16.70,cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=16.70,cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success '--dirstat=files,cumulative,27.2' '
+	git diff --dirstat=files,cumulative,27.2 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative,27.2 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative,27.2 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success '--dirstat=files,cumulative,27.09' '
+	git diff --dirstat=files,cumulative,27.09 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative,27.09 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative,27.09 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv5 6/7] New --dirstat=lines mode, doing dirstat analysis based on diffstat
  2011-04-28  1:17                               ` [PATCHv5 0/7] --dirstat fixes, part 2 Johan Herland
                                                   ` (4 preceding siblings ...)
  2011-04-28  1:17                                 ` [PATCHv5 5/7] Allow specifying --dirstat cut-off percentage as a floating point number Johan Herland
@ 2011-04-28  1:17                                 ` Johan Herland
  2011-04-28  1:17                                 ` [PATCHv5 7/7] Improve error handling when parsing dirstat parameters Johan Herland
  6 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-28  1:17 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

This patch adds an alternative implementation of show_dirstat(), called
show_dirstat_by_line(), which uses the more expensive diffstat analysis
(as opposed to show_dirstat()'s own (relatively inexpensive) analysis)
to derive the numbers from which the --dirstat output is computed.

The alternative implementation is controlled by the new "lines" parameter
to the --dirstat option (or the diff.dirstat config variable).

For binary files, the diffstat analysis counts bytes instead of lines,
so to prevent binary files from dominating the dirstat results, the
byte counts for binary files are divided by 64 before being compared to
their textual/line-based counterparts. This is a stupid and ugly - but
very cheap - heuristic.

In linux-2.6.git, running the three different --dirstat modes:

  time git diff v2.6.20..v2.6.30 --dirstat=changes > /dev/null
vs.
  time git diff v2.6.20..v2.6.30 --dirstat=lines > /dev/null
vs.
  time git diff v2.6.20..v2.6.30 --dirstat=files > /dev/null

yields the following average runtimes on my machine:

 - "changes" (default): ~6.0 s
 - "lines":             ~9.6 s
 - "files":             ~0.1 s

So, as expected, there's a considerable performance hit (~60%) by going
through the full diffstat analysis as compared to the default "changes"
analysis (obviously, "files" is much faster than both). As such, the
"lines" mode is probably only useful if you really need the --dirstat
numbers to be consistent with the numbers returned from the other
--*stat options.

The patch also includes documentation and tests for the new dirstat mode.

Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/config.txt       |    8 +++
 Documentation/diff-options.txt |    8 +++
 diff.c                         |   61 +++++++++++++++++++++++-
 diff.h                         |    1 +
 t/t4047-diff-dirstat.sh        |  100 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 176 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index c18dd5a..0cad75c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -837,6 +837,14 @@ diff.dirstat::
 	the amount of pure code movements within a file.  In other words,
 	rearranging lines in a file is not counted as much as other changes.
 	This is the default behavior when no parameter is given.
+`lines`;;
+	Compute the dirstat numbers by doing the regular line-based diff
+	analysis, and summing the removed/added line counts. (For binary
+	files, count 64-byte chunks instead, since binary files have no
+	natural concept of lines). This is a more expensive `--dirstat`
+	behavior than the `changes` behavior, but it does count rearranged
+	lines within a file as much as other changes. The resulting output
+	is consistent with what you get from the other `--*stat` options.
 `files`;;
 	Compute the dirstat numbers by counting the number of files changed.
 	Each changed file counts equally in the dirstat analysis. This is
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 4ad50b9..327d10a 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -81,6 +81,14 @@ endif::git-format-patch[]
 	the amount of pure code movements within a file.  In other words,
 	rearranging lines in a file is not counted as much as other changes.
 	This is the default behavior when no parameter is given.
+`lines`;;
+	Compute the dirstat numbers by doing the regular line-based diff
+	analysis, and summing the removed/added line counts. (For binary
+	files, count 64-byte chunks instead, since binary files have no
+	natural concept of lines). This is a more expensive `--dirstat`
+	behavior than the `changes` behavior, but it does count rearranged
+	lines within a file as much as other changes. The resulting output
+	is consistent with what you get from the other `--*stat` options.
 `files`;;
 	Compute the dirstat numbers by counting the number of files changed.
 	Each changed file counts equally in the dirstat analysis. This is
diff --git a/diff.c b/diff.c
index e0de4fa..8703763 100644
--- a/diff.c
+++ b/diff.c
@@ -73,9 +73,15 @@ static int parse_dirstat_params(struct diff_options *options, const char *params
 	while (*p) {
 		if (!prefixcmp(p, "changes")) {
 			p += 7;
+			DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
+			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
+		} else if (!prefixcmp(p, "lines")) {
+			p += 5;
+			DIFF_OPT_SET(options, DIRSTAT_BY_LINE);
 			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
 		} else if (!prefixcmp(p, "files")) {
 			p += 5;
+			DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
 			DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
 		} else if (!prefixcmp(p, "noncumulative")) {
 			p += 13;
@@ -1669,6 +1675,50 @@ found_damage:
 	gather_dirstat(options, &dir, changed, "", 0);
 }
 
+static void show_dirstat_by_line(struct diffstat_t *data, struct diff_options *options)
+{
+	int i;
+	unsigned long changed;
+	struct dirstat_dir dir;
+
+	if (data->nr == 0)
+		return;
+
+	dir.files = NULL;
+	dir.alloc = 0;
+	dir.nr = 0;
+	dir.permille = options->dirstat_permille;
+	dir.cumulative = DIFF_OPT_TST(options, DIRSTAT_CUMULATIVE);
+
+	changed = 0;
+	for (i = 0; i < data->nr; i++) {
+		struct diffstat_file *file = data->files[i];
+		unsigned long damage = file->added + file->deleted;
+		if (file->is_binary)
+			/*
+			 * binary files counts bytes, not lines. Must find some
+			 * way to normalize binary bytes vs. textual lines.
+			 * The following heuristic assumes that there are 64
+			 * bytes per "line".
+			 * This is stupid and ugly, but very cheap...
+			 */
+			damage = (damage + 63) / 64;
+		ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
+		dir.files[dir.nr].name = file->name;
+		dir.files[dir.nr].changed = damage;
+		changed += damage;
+		dir.nr++;
+	}
+
+	/* This can happen even with many files, if everything was renames */
+	if (!changed)
+		return;
+
+	/* Show all directories with more than x% of the changes */
+	qsort(dir.files, dir.nr, sizeof(dir.files[0]), dirstat_compare);
+	gather_dirstat(options, &dir, changed, "", 0);
+}
+
 static void free_diffstat_info(struct diffstat_t *diffstat)
 {
 	int i;
@@ -4058,6 +4108,7 @@ void diff_flush(struct diff_options *options)
 	struct diff_queue_struct *q = &diff_queued_diff;
 	int i, output_format = options->output_format;
 	int separator = 0;
+	int dirstat_by_line = 0;
 
 	/*
 	 * Order: raw, stat, summary, patch
@@ -4078,7 +4129,11 @@ void diff_flush(struct diff_options *options)
 		separator++;
 	}
 
-	if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT)) {
+	if (output_format & DIFF_FORMAT_DIRSTAT && DIFF_OPT_TST(options, DIRSTAT_BY_LINE))
+		dirstat_by_line = 1;
+
+	if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT) ||
+	    dirstat_by_line) {
 		struct diffstat_t diffstat;
 
 		memset(&diffstat, 0, sizeof(struct diffstat_t));
@@ -4093,10 +4148,12 @@ void diff_flush(struct diff_options *options)
 			show_stats(&diffstat, options);
 		if (output_format & DIFF_FORMAT_SHORTSTAT)
 			show_shortstats(&diffstat, options);
+		if (output_format & DIFF_FORMAT_DIRSTAT)
+			show_dirstat_by_line(&diffstat, options);
 		free_diffstat_info(&diffstat);
 		separator++;
 	}
-	if (output_format & DIFF_FORMAT_DIRSTAT)
+	if ((output_format & DIFF_FORMAT_DIRSTAT) && !dirstat_by_line)
 		show_dirstat(options);
 
 	if (output_format & DIFF_FORMAT_SUMMARY && !is_summary_empty(q)) {
diff --git a/diff.h b/diff.h
index 08b4fe0..1a8b685 100644
--- a/diff.h
+++ b/diff.h
@@ -78,6 +78,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
 #define DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES (1 << 25)
 #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
 #define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
+#define DIFF_OPT_DIRSTAT_BY_LINE     (1 << 28)
 
 #define DIFF_OPT_TST(opts, flag)    ((opts)->flags & DIFF_OPT_##flag)
 #define DIFF_OPT_SET(opts, flag)    ((opts)->flags |= DIFF_OPT_##flag)
diff --git a/t/t4047-diff-dirstat.sh b/t/t4047-diff-dirstat.sh
index b3062b4..8ca1d58 100755
--- a/t/t4047-diff-dirstat.sh
+++ b/t/t4047-diff-dirstat.sh
@@ -832,4 +832,104 @@ test_expect_success '--dirstat=files,cumulative,27.09' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+cat <<EOF >expect_diff_dirstat
+  10.6% dst/copy/changed/
+  10.6% dst/copy/rearranged/
+  10.6% dst/copy/unchanged/
+  10.6% dst/move/changed/
+  10.6% dst/move/rearranged/
+  10.6% dst/move/unchanged/
+  10.6% src/move/changed/
+  10.6% src/move/rearranged/
+  10.6% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.2% changed/
+  26.3% dst/copy/changed/
+  26.3% dst/copy/rearranged/
+  26.3% dst/copy/unchanged/
+   5.2% dst/move/changed/
+   5.2% dst/move/rearranged/
+   5.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=lines' '
+	git diff --dirstat=lines HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=lines -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=lines -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=lines' '
+	git -c diff.dirstat=lines diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=lines diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=lines diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+   2.1% changed/
+  10.6% dst/copy/changed/
+  10.6% dst/copy/rearranged/
+  10.6% dst/copy/unchanged/
+  10.6% dst/move/changed/
+  10.6% dst/move/rearranged/
+  10.6% dst/move/unchanged/
+   2.1% rearranged/
+  10.6% src/move/changed/
+  10.6% src/move/rearranged/
+  10.6% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.2% changed/
+  26.3% dst/copy/changed/
+  26.3% dst/copy/rearranged/
+  26.3% dst/copy/unchanged/
+   5.2% dst/move/changed/
+   5.2% dst/move/rearranged/
+   5.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=lines,0' '
+	git diff --dirstat=lines,0 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=lines,0 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=lines,0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=0,lines' '
+	git -c diff.dirstat=0,lines diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0,lines diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0,lines diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv5 7/7] Improve error handling when parsing dirstat parameters
  2011-04-28  1:17                               ` [PATCHv5 0/7] --dirstat fixes, part 2 Johan Herland
                                                   ` (5 preceding siblings ...)
  2011-04-28  1:17                                 ` [PATCHv5 6/7] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
@ 2011-04-28  1:17                                 ` Johan Herland
  2011-04-28 18:41                                   ` Junio C Hamano
  6 siblings, 1 reply; 91+ messages in thread
From: Johan Herland @ 2011-04-28  1:17 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

When encountering errors or unknown tokens while parsing parameters to the
--dirstat option, it makes sense to die() with an error message informing
the user of which parameter did not make sense. However, when parsing the
diff.dirstat config variable, we cannot simply die(), but should instead
(after warning the user) ignore the erroneous or unrecognized parameter.
After all, future Git versions might add more dirstat parameters, and
using two different Git versions on the same repo should not cripple the
older Git version just because of a parameter that is only understood by
a more recent Git version.

This patch fixes the issue by refactoring the dirstat parameter parsing
so that parse_dirstat_params() keeps on parsing parameters, even if an
earlier parameter was not recognized. When parsing has finished, it returns
zero if all parameters were successfully parsed, and non-zero if one or
more parameters were not recognized.

The parse_dirstat_params() callers then decide (based on the return value
from parse_dirstat_params()) whether to warn and ignore (in case of
diff.dirstat), or to warn and die (in case of --dirstat).

The patch also adds a couple of tests verifying the correct behavior of
--dirstat and diff.dirstat in the face of unknown (possibly future) dirstat
parameters.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---
 diff.c                  |   52 ++++++++++++++++++++++------------------------
 t/t4047-diff-dirstat.sh |   30 +++++++++++++++++++++++++++
 2 files changed, 55 insertions(+), 27 deletions(-)

diff --git a/diff.c b/diff.c
index 8703763..1ce21f1 100644
--- a/diff.c
+++ b/diff.c
@@ -70,49 +70,46 @@ static int parse_diff_color_slot(const char *var, int ofs)
 static int parse_dirstat_params(struct diff_options *options, const char *params)
 {
 	const char *p = params;
+	int p_len, ret = 0;
+
 	while (*p) {
-		if (!prefixcmp(p, "changes")) {
-			p += 7;
+		p_len = strchrnul(p, ',') - p;
+		if (!memcmp(p, "changes", p_len)) {
 			DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
 			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
-		} else if (!prefixcmp(p, "lines")) {
-			p += 5;
+		} else if (!memcmp(p, "lines", p_len)) {
 			DIFF_OPT_SET(options, DIRSTAT_BY_LINE);
 			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
-		} else if (!prefixcmp(p, "files")) {
-			p += 5;
+		} else if (!memcmp(p, "files", p_len)) {
 			DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
 			DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
-		} else if (!prefixcmp(p, "noncumulative")) {
-			p += 13;
+		} else if (!memcmp(p, "noncumulative", p_len)) {
 			DIFF_OPT_CLR(options, DIRSTAT_CUMULATIVE);
-		} else if (!prefixcmp(p, "cumulative")) {
-			p += 10;
+		} else if (!memcmp(p, "cumulative", p_len)) {
 			DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
 		} else if (isdigit(*p)) {
 			char *end;
-			options->dirstat_permille = strtoul(p, &end, 10) * 10;
-			p = end;
-			if (*p == '.' && isdigit(*++p)) {
+			int permille = strtoul(p, &end, 10) * 10;
+			if (*end == '.' && isdigit(*++end)) {
 				/* only use first digit */
-				options->dirstat_permille += *p - '0';
+				permille += *end - '0';
 				/* .. and ignore any further digits */
-				while (isdigit(*++p))
+				while (isdigit(*++end))
 					; /* nothing */
 			}
+			if (end - p == p_len)
+				options->dirstat_permille = permille;
+			else
+				ret = error("Failed to parse dirstat cut-off percentage '%.*s'", p_len, p);
 		} else
-			return error("Unknown --dirstat parameter '%s'", p);
-
-		if (*p) {
-			/* more parameters, swallow separator */
-			if (*p != ',')
-				return error("Missing comma separator at char "
-					"%"PRIuMAX" of '%s'",
-					(uintmax_t) (p - params), params);
-			p++;
-		}
+			ret = error("Unknown dirstat parameter '%.*s'", p_len, p);
+
+		p += p_len;
+
+		if (*p)
+			p++; /* more parameters, swallow separator */
 	}
-	return 0;
+	return ret;
 }
 
 static int git_config_rename(const char *var, const char *value)
@@ -196,7 +193,8 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 
 	if (!strcmp(var, "diff.dirstat")) {
 		default_diff_options.dirstat_permille = diff_dirstat_permille_default;
-		(void) parse_dirstat_params(&default_diff_options, value);
+		if (parse_dirstat_params(&default_diff_options, value))
+			warning("Found errors in 'diff.dirstat' config variable");
 		diff_dirstat_permille_default = default_diff_options.dirstat_permille;
 		return 0;
 	}
diff --git a/t/t4047-diff-dirstat.sh b/t/t4047-diff-dirstat.sh
index 8ca1d58..20a59ac 100755
--- a/t/t4047-diff-dirstat.sh
+++ b/t/t4047-diff-dirstat.sh
@@ -932,4 +932,34 @@ test_expect_success 'diff.dirstat=0,lines' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=future_param,lines,0 should fail loudly' '
+	test_must_fail git diff --dirstat=future_param,lines,0 HEAD^..HEAD >actual_diff_dirstat 2>actual_error &&
+	test_cmp /dev/null actual_diff_dirstat &&
+	grep -q "future_param" actual_error &&
+	grep -q "\--dirstat" actual_error &&
+	test_must_fail git diff --dirstat=future_param,lines,0 -M HEAD^..HEAD >actual_diff_dirstat_M 2>actual_error &&
+	test_cmp /dev/null actual_diff_dirstat_M &&
+	grep -q "future_param" actual_error &&
+	grep -q "\--dirstat" actual_error &&
+	test_must_fail git diff --dirstat=future_param,lines,0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC 2>actual_error &&
+	test_cmp /dev/null actual_diff_dirstat_CC &&
+	grep -q "future_param" actual_error &&
+	grep -q "\--dirstat" actual_error
+'
+
+test_expect_success 'diff.dirstat=future_param,0,lines should warn, but still work' '
+	git -c diff.dirstat=future_param,0,lines diff --dirstat HEAD^..HEAD >actual_diff_dirstat 2>actual_error &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	grep -q "future_param" actual_error &&
+	grep -q "diff.dirstat" actual_error &&
+	git -c diff.dirstat=future_param,0,lines diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M 2>actual_error &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	grep -q "future_param" actual_error &&
+	grep -q "diff.dirstat" actual_error &&
+	git -c diff.dirstat=future_param,0,lines diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC 2>actual_error &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC &&
+	grep -q "future_param" actual_error &&
+	grep -q "diff.dirstat" actual_error
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCHv5 7/7] Improve error handling when parsing dirstat parameters
  2011-04-28  1:17                                 ` [PATCHv5 7/7] Improve error handling when parsing dirstat parameters Johan Herland
@ 2011-04-28 18:41                                   ` Junio C Hamano
  2011-04-28 19:20                                     ` Junio C Hamano
  2011-04-28 23:13                                     ` Johan Herland
  0 siblings, 2 replies; 91+ messages in thread
From: Junio C Hamano @ 2011-04-28 18:41 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Linus Torvalds

Johan Herland <johan@herland.net> writes:

> ...
> The patch also adds a couple of tests verifying the correct behavior of
> --dirstat and diff.dirstat in the face of unknown (possibly future) dirstat
> parameters.

Thanks.  Patches 1-6/7 looks much better.

When writing a shiny new feature, people tend to test only the cases they
expect to work, leaving the cases that should error out unspecified,
leading to future confusion.  Negative tests to specify and guard error
behaviour are very important, and I like this 7/7 very much.

Having said that, you might want to add tests for parsing --dirstat/-X
options themselves for the same reason.  I think you had troubles with -X3
(the first round), --dirstat40 (the third round), and possibly -X=3; they
could have been avoided if you had such tests.  They probably should be
added to 1/7.

> diff --git a/t/t4047-diff-dirstat.sh b/t/t4047-diff-dirstat.sh
> index 8ca1d58..20a59ac 100755
> --- a/t/t4047-diff-dirstat.sh
> +++ b/t/t4047-diff-dirstat.sh
> @@ -932,4 +932,34 @@ test_expect_success 'diff.dirstat=0,lines' '
>  	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
>  '
>  
> +test_expect_success '--dirstat=future_param,lines,0 should fail loudly' '
> +	test_must_fail git diff --dirstat=future_param,lines,0 HEAD^..HEAD >actual_diff_dirstat 2>actual_error &&
> +	test_cmp /dev/null actual_diff_dirstat &&
> +	grep -q "future_param" actual_error &&
> +	grep -q "\--dirstat" actual_error &&
> +	test_must_fail git diff --dirstat=future_param,lines,0 -M HEAD^..HEAD >actual_diff_dirstat_M 2>actual_error &&
> +	test_cmp /dev/null actual_diff_dirstat_M &&
> +	grep -q "future_param" actual_error &&
> +	grep -q "\--dirstat" actual_error &&
> +	test_must_fail git diff --dirstat=future_param,lines,0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC 2>actual_error &&
> +	test_cmp /dev/null actual_diff_dirstat_CC &&
> +	grep -q "future_param" actual_error &&
> +	grep -q "\--dirstat" actual_error
> +'

I am not sure if three combinations (vanilla, -M and -C -C) need to be
tested to produce an empty result.  If so, it would make it easier to read
if they are split into three tests, or at least have a blank line between
them, but I suspect that you would agree that it is not worth to have
three separate test_expect_success for these.

I also wanted to see the error output.  How about adding:

	test_debug "cat actual_error" &&

immediately after invocation of "git diff"?

The error output shows "error:" followed by "warning:", which looked
somewhat questionable.  Perhaps allow a pointer to a structure be passed
in to describe the nature of a breakage to parse_dirstat_params()?

Telling "grep" that the pattern string is not an option by quoting the
first dash (i.e. "\--dirstat") is clever, and it is more portable than
using an explicit "-e" to accomodate ancient implementations of grep.

	Side note: we seem to already use "grep -e" in some other tests
	(2200, 2204 and 5540).  We probably should get rid of -e from
	these places.

> +test_expect_success 'diff.dirstat=future_param,0,lines should warn, but still work' '
> +	git -c diff.dirstat=future_param,0,lines diff --dirstat HEAD^..HEAD >actual_diff_dirstat 2>actual_error &&
> +	test_cmp expect_diff_dirstat actual_diff_dirstat &&
> +	grep -q "future_param" actual_error &&
> +	grep -q "diff.dirstat" actual_error &&

This should avoid matching "." with anything, i.e.

	grep -q "diff\\.dirstat" actual_error &&

 t/t4047-diff-dirstat.sh |   25 ++++++++-----------------
 1 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/t/t4047-diff-dirstat.sh b/t/t4047-diff-dirstat.sh
index 20a59ac..0942bdb 100755
--- a/t/t4047-diff-dirstat.sh
+++ b/t/t4047-diff-dirstat.sh
@@ -934,32 +934,23 @@ test_expect_success 'diff.dirstat=0,lines' '
 
 test_expect_success '--dirstat=future_param,lines,0 should fail loudly' '
 	test_must_fail git diff --dirstat=future_param,lines,0 HEAD^..HEAD >actual_diff_dirstat 2>actual_error &&
+	test_debug "cat actual_error" &&
 	test_cmp /dev/null actual_diff_dirstat &&
 	grep -q "future_param" actual_error &&
-	grep -q "\--dirstat" actual_error &&
-	test_must_fail git diff --dirstat=future_param,lines,0 -M HEAD^..HEAD >actual_diff_dirstat_M 2>actual_error &&
-	test_cmp /dev/null actual_diff_dirstat_M &&
-	grep -q "future_param" actual_error &&
-	grep -q "\--dirstat" actual_error &&
-	test_must_fail git diff --dirstat=future_param,lines,0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC 2>actual_error &&
-	test_cmp /dev/null actual_diff_dirstat_CC &&
-	grep -q "future_param" actual_error &&
 	grep -q "\--dirstat" actual_error
 '
 
 test_expect_success 'diff.dirstat=future_param,0,lines should warn, but still work' '
 	git -c diff.dirstat=future_param,0,lines diff --dirstat HEAD^..HEAD >actual_diff_dirstat 2>actual_error &&
+	test_debug "cat actual_error" &&
 	test_cmp expect_diff_dirstat actual_diff_dirstat &&
 	grep -q "future_param" actual_error &&
-	grep -q "diff.dirstat" actual_error &&
-	git -c diff.dirstat=future_param,0,lines diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M 2>actual_error &&
-	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
-	grep -q "future_param" actual_error &&
-	grep -q "diff.dirstat" actual_error &&
-	git -c diff.dirstat=future_param,0,lines diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC 2>actual_error &&
-	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC &&
-	grep -q "future_param" actual_error &&
-	grep -q "diff.dirstat" actual_error
+	grep -q "diff\\.dirstat" actual_error
+'
+
+test_expect_success 'various ways to misspell --dirstat' '
+	test_must_fail git show --dirstat10,files &&
+	test_must_fail git show -X=20
 '
 
 test_done

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCHv5 7/7] Improve error handling when parsing dirstat parameters
  2011-04-28 18:41                                   ` Junio C Hamano
@ 2011-04-28 19:20                                     ` Junio C Hamano
  2011-04-28 23:16                                       ` Johan Herland
  2011-04-28 23:13                                     ` Johan Herland
  1 sibling, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2011-04-28 19:20 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Linus Torvalds

Junio C Hamano <gitster@pobox.com> writes:

> I am not sure if three combinations (vanilla, -M and -C -C) need to be
> tested to produce an empty result.  If so, it would make it easier to read
> if they are split into three tests, or at least have a blank line between
> them, but I suspect that you would agree that it is not worth to have
> three separate test_expect_success for these.

I think it makes sense to cull these three cases into one for the case we
expect the command to stop without doing anything, but we would still want
to validate the output for three variants in the "config" case.

Also I forgot to say that the new "grep" invocations added to check the
error output might have to be test_i18ngrep.  Please check with

    make GETTEXT_POISON=YesPlease test

The configuration variable names and typo in user input should appear
somewhere in the output for any real locale, but I think gettext-poison
would throw these away.

By the way, should the following two entries make any difference, and if
so how?

	[diff]

        	dirstat = unknown,0,lines
                dirstat = 0,lines,unknown

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv5 7/7] Improve error handling when parsing dirstat parameters
  2011-04-28 18:41                                   ` Junio C Hamano
  2011-04-28 19:20                                     ` Junio C Hamano
@ 2011-04-28 23:13                                     ` Johan Herland
  2011-04-29  4:06                                       ` Junio C Hamano
  1 sibling, 1 reply; 91+ messages in thread
From: Johan Herland @ 2011-04-28 23:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

(I've snipped the comments that I don't dispute or need to comment on)

On Thursday 28 April 2011, Junio C Hamano wrote:
> The error output shows "error:" followed by "warning:", which looked
> somewhat questionable.  Perhaps allow a pointer to a structure be passed
> in to describe the nature of a breakage to parse_dirstat_params()?

Not sure what you mean here. You want the caller to supply a string_list, to 
which parse_dirstat_params() appends error messages, and then the caller 
determines how to display those error messages to the user after 
parse_dirstat_params() has returned?

I'd rather go the simpler way, and simply turn the first "error:" into a 
"warning:". In other words, parse_dirstat_params() should only output 
"warning:", and then it's up to the caller whether to follow up with another 
"warning:" (in the diff.dirstat case), or a "fatal:" (in the --dirstat 
case).


Otherwise, I agree with all your comments, and the next re-roll will be 
updated accordingly.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv5 7/7] Improve error handling when parsing dirstat parameters
  2011-04-28 19:20                                     ` Junio C Hamano
@ 2011-04-28 23:16                                       ` Johan Herland
  0 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-28 23:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

On Thursday 28 April 2011, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> > I am not sure if three combinations (vanilla, -M and -C -C) need to be
> > tested to produce an empty result.  If so, it would make it easier to
> > read if they are split into three tests, or at least have a blank line
> > between them, but I suspect that you would agree that it is not worth
> > to have three separate test_expect_success for these.
> 
> I think it makes sense to cull these three cases into one for the case we
> expect the command to stop without doing anything, but we would still
> want to validate the output for three variants in the "config" case.

Agreed.

> Also I forgot to say that the new "grep" invocations added to check the
> error output might have to be test_i18ngrep.  Please check with
> 
>     make GETTEXT_POISON=YesPlease test
> 
> The configuration variable names and typo in user input should appear
> somewhere in the output for any real locale, but I think gettext-poison
> would throw these away.

Hmm. Not exactly sure how this is supposed to work. I ran the above command 
(after a test merge with 'pu' to get GETTEXT_POISON in my working tree), and 
it succeeded. But then, I have not marked my added strings for translation 
with "_()". Should I? AFAICS no other strings in diff.c are marked for 
translation either...

> By the way, should the following two entries make any difference, and if
> so how?
> 
> 	[diff]
>           dirstat = unknown,0,lines
>           dirstat = 0,lines,unknown

No difference. The "rules" that apply in this case are:
- Tokens are separated by commas
- Unrecognized tokens are ignored

This is fundamentally what 7/7 tries to accomplish.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv5 7/7] Improve error handling when parsing dirstat parameters
  2011-04-28 23:13                                     ` Johan Herland
@ 2011-04-29  4:06                                       ` Junio C Hamano
  2011-04-29  9:36                                         ` [PATCHv6 0/8] --dirstat fixes, part 2 Johan Herland
  0 siblings, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2011-04-29  4:06 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Linus Torvalds

Johan Herland <johan@herland.net> writes:

> Not sure what you mean here. You want the caller to supply a
> string_list, to which parse_dirstat_params() appends error messages, and
> then the caller determines how to display those error messages to the
> user after parse_dirstat_params() has returned?

A rough outline of what I had in mind was:

	struct dirstat_param_error {
        	enum {
                	ERR_DIRSTAT_PERCENT = 1,
                        ERR_DIRSTAT_UNKNOWN
		} kind;
                strbuf msg;
	};

	static int parse_dirstat_params(struct diff_options *options,
		        	const char *params,
				struct dirstat_param_error *errinfo)
	{
		while (...) {
                        ...
                        else if (isdigit(*p)) {
                                ...
                                if (end - p == p_len)
                                        options->dirstat_permille = permille;
                                else {
                                        errinfo->kind = ERR_DIRSTAT_PERCENT;
                                        strbuf_add(&errinfo->msg, p, p_len);
                                        ret = -1;
                                }
                        } else {
                                errinfo->kind = ERR_DIRSTAT_UNKNOWN;
                                strbuf_add(&errinfo->sb, p, p_len);
                                ret = -1;
                        }
                        p += p_len;
                        if (*p)
                                p++;
                }
                return ret;
	}

and then the caller can extract the information to format.

But you produce more than one one error messages, so a single errinfo
approach would not work.  Instead, we should be able to pass in the
pointer to a single strbuf errmsg, and accumulate the errors in it by
calling strbuf_addf() for the same effect.  The format string given to
strbuf_addf() may probably need to be marked with _().

The caller can then check errmsg->len to see if there was an error.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCHv6 0/8] --dirstat fixes, part 2
  2011-04-29  4:06                                       ` Junio C Hamano
@ 2011-04-29  9:36                                         ` Johan Herland
  2011-04-29  9:36                                           ` [PATCHv6 1/8] Add several testcases for --dirstat and friends Johan Herland
                                                             ` (7 more replies)
  0 siblings, 8 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-29  9:36 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Hi,

Here's version 6 incorporating Junio's feedback regarding tests,
error messages, and i18n. The following patches have changed:

 - 1/8: Added test: 'various ways to misspell --dirstat'

 - 3/8: Added a couple of more misspellings to the above test

 - 7/8: Improved error message handling, and tests covering error messages

 - 8/8: New patch marking dirstat error messages for translation.

The new 8/8 patch WILL NOT COMPILE until the topic branch is merged with
an i18n/gettext-aware branch (i.e. where _() and test_i18ngrep is present).


Have fun! :)

...Johan

Johan Herland (8):
  Add several testcases for --dirstat and friends
  Make --dirstat=0 output directories that contribute < 0.1% of changes
  Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
  Add config variable for specifying default --dirstat behavior
  Allow specifying --dirstat cut-off percentage as a floating point number
  New --dirstat=lines mode, doing dirstat analysis based on diffstat
  Improve error handling when parsing dirstat parameters
  Mark dirstat error messages for translation

 Documentation/config.txt       |   44 ++
 Documentation/diff-options.txt |   54 ++-
 diff.c                         |  171 +++++++-
 diff.h                         |    3 +-
 t/t4047-diff-dirstat.sh        |  979 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 1219 insertions(+), 32 deletions(-)
 create mode 100755 t/t4047-diff-dirstat.sh

-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCHv6 1/8] Add several testcases for --dirstat and friends
  2011-04-29  9:36                                         ` [PATCHv6 0/8] --dirstat fixes, part 2 Johan Herland
@ 2011-04-29  9:36                                           ` Johan Herland
  2011-04-29  9:36                                           ` [PATCHv6 2/8] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
                                                             ` (6 subsequent siblings)
  7 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-29  9:36 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Currently, t4013 is the only selftest that exercises the --dirstat machinery,
but it only does a superficial verification of --dirstat's output.

This patch adds a new selftest - t4047-diff-dirstat.sh - which prepares a
commit containing:
 - unchanged files, changed files and files with rearranged lines
 - copied files, moved files, and unmoved files

It then verifies the correct dirstat output for that commit in the following
dirstat modes:
 - --dirstat
 - -X
 - --dirstat=0
 - -X0
 - --cumulative
 - --dirstat-by-file
 - (plus combinations of the above)

Each of the above tests are also run with:
 - no rename detection
 - rename detection (-M)
 - expensive copy detection (-C -C)

Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t4047-diff-dirstat.sh |  585 +++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 585 insertions(+), 0 deletions(-)
 create mode 100755 t/t4047-diff-dirstat.sh

diff --git a/t/t4047-diff-dirstat.sh b/t/t4047-diff-dirstat.sh
new file mode 100755
index 0000000..ce7c403
--- /dev/null
+++ b/t/t4047-diff-dirstat.sh
@@ -0,0 +1,585 @@
+#!/bin/sh
+
+test_description='diff --dirstat tests'
+. ./test-lib.sh
+
+# set up two commits where the second commit has these files
+# (10 lines in each file):
+#
+#   unchanged/text           (unchanged from 1st commit)
+#   changed/text             (changed 1st line)
+#   rearranged/text          (swapped 1st and 2nd line)
+#   dst/copy/unchanged/text  (copied from src/copy/unchanged/text, unchanged)
+#   dst/copy/changed/text    (copied from src/copy/changed/text, changed)
+#   dst/copy/rearranged/text (copied from src/copy/rearranged/text, rearranged)
+#   dst/move/unchanged/text  (moved from src/move/unchanged/text, unchanged)
+#   dst/move/changed/text    (moved from src/move/changed/text, changed)
+#   dst/move/rearranged/text (moved from src/move/rearranged/text, rearranged)
+
+test_expect_success 'setup' '
+	mkdir unchanged &&
+	mkdir changed &&
+	mkdir rearranged &&
+	mkdir src &&
+	mkdir src/copy &&
+	mkdir src/copy/unchanged &&
+	mkdir src/copy/changed &&
+	mkdir src/copy/rearranged &&
+	mkdir src/move &&
+	mkdir src/move/unchanged &&
+	mkdir src/move/changed &&
+	mkdir src/move/rearranged &&
+	cat <<EOF >unchanged/text &&
+unchanged       line #0
+unchanged       line #1
+unchanged       line #2
+unchanged       line #3
+unchanged       line #4
+unchanged       line #5
+unchanged       line #6
+unchanged       line #7
+unchanged       line #8
+unchanged       line #9
+EOF
+	cat <<EOF >changed/text &&
+changed         line #0
+changed         line #1
+changed         line #2
+changed         line #3
+changed         line #4
+changed         line #5
+changed         line #6
+changed         line #7
+changed         line #8
+changed         line #9
+EOF
+	cat <<EOF >rearranged/text &&
+rearranged      line #0
+rearranged      line #1
+rearranged      line #2
+rearranged      line #3
+rearranged      line #4
+rearranged      line #5
+rearranged      line #6
+rearranged      line #7
+rearranged      line #8
+rearranged      line #9
+EOF
+	cat <<EOF >src/copy/unchanged/text &&
+copy  unchanged line #0
+copy  unchanged line #1
+copy  unchanged line #2
+copy  unchanged line #3
+copy  unchanged line #4
+copy  unchanged line #5
+copy  unchanged line #6
+copy  unchanged line #7
+copy  unchanged line #8
+copy  unchanged line #9
+EOF
+	cat <<EOF >src/copy/changed/text &&
+copy    changed line #0
+copy    changed line #1
+copy    changed line #2
+copy    changed line #3
+copy    changed line #4
+copy    changed line #5
+copy    changed line #6
+copy    changed line #7
+copy    changed line #8
+copy    changed line #9
+EOF
+	cat <<EOF >src/copy/rearranged/text &&
+copy rearranged line #0
+copy rearranged line #1
+copy rearranged line #2
+copy rearranged line #3
+copy rearranged line #4
+copy rearranged line #5
+copy rearranged line #6
+copy rearranged line #7
+copy rearranged line #8
+copy rearranged line #9
+EOF
+	cat <<EOF >src/move/unchanged/text &&
+move  unchanged line #0
+move  unchanged line #1
+move  unchanged line #2
+move  unchanged line #3
+move  unchanged line #4
+move  unchanged line #5
+move  unchanged line #6
+move  unchanged line #7
+move  unchanged line #8
+move  unchanged line #9
+EOF
+	cat <<EOF >src/move/changed/text &&
+move    changed line #0
+move    changed line #1
+move    changed line #2
+move    changed line #3
+move    changed line #4
+move    changed line #5
+move    changed line #6
+move    changed line #7
+move    changed line #8
+move    changed line #9
+EOF
+	cat <<EOF >src/move/rearranged/text &&
+move rearranged line #0
+move rearranged line #1
+move rearranged line #2
+move rearranged line #3
+move rearranged line #4
+move rearranged line #5
+move rearranged line #6
+move rearranged line #7
+move rearranged line #8
+move rearranged line #9
+EOF
+	git add . &&
+	git commit -m "initial" &&
+	mkdir dst &&
+	mkdir dst/copy &&
+	mkdir dst/copy/unchanged &&
+	mkdir dst/copy/changed &&
+	mkdir dst/copy/rearranged &&
+	mkdir dst/move &&
+	mkdir dst/move/unchanged &&
+	mkdir dst/move/changed &&
+	mkdir dst/move/rearranged &&
+	cat <<EOF >changed/text &&
+CHANGED XXXXXXX line #0
+changed         line #1
+changed         line #2
+changed         line #3
+changed         line #4
+changed         line #5
+changed         line #6
+changed         line #7
+changed         line #8
+changed         line #9
+EOF
+	cat <<EOF >rearranged/text &&
+rearranged      line #1
+rearranged      line #0
+rearranged      line #2
+rearranged      line #3
+rearranged      line #4
+rearranged      line #5
+rearranged      line #6
+rearranged      line #7
+rearranged      line #8
+rearranged      line #9
+EOF
+	cat <<EOF >dst/copy/unchanged/text &&
+copy  unchanged line #0
+copy  unchanged line #1
+copy  unchanged line #2
+copy  unchanged line #3
+copy  unchanged line #4
+copy  unchanged line #5
+copy  unchanged line #6
+copy  unchanged line #7
+copy  unchanged line #8
+copy  unchanged line #9
+EOF
+	cat <<EOF >dst/copy/changed/text &&
+copy XXXCHANGED line #0
+copy    changed line #1
+copy    changed line #2
+copy    changed line #3
+copy    changed line #4
+copy    changed line #5
+copy    changed line #6
+copy    changed line #7
+copy    changed line #8
+copy    changed line #9
+EOF
+	cat <<EOF >dst/copy/rearranged/text &&
+copy rearranged line #1
+copy rearranged line #0
+copy rearranged line #2
+copy rearranged line #3
+copy rearranged line #4
+copy rearranged line #5
+copy rearranged line #6
+copy rearranged line #7
+copy rearranged line #8
+copy rearranged line #9
+EOF
+	cat <<EOF >dst/move/unchanged/text &&
+move  unchanged line #0
+move  unchanged line #1
+move  unchanged line #2
+move  unchanged line #3
+move  unchanged line #4
+move  unchanged line #5
+move  unchanged line #6
+move  unchanged line #7
+move  unchanged line #8
+move  unchanged line #9
+EOF
+	cat <<EOF >dst/move/changed/text &&
+move XXXCHANGED line #0
+move    changed line #1
+move    changed line #2
+move    changed line #3
+move    changed line #4
+move    changed line #5
+move    changed line #6
+move    changed line #7
+move    changed line #8
+move    changed line #9
+EOF
+	cat <<EOF >dst/move/rearranged/text &&
+move rearranged line #1
+move rearranged line #0
+move rearranged line #2
+move rearranged line #3
+move rearranged line #4
+move rearranged line #5
+move rearranged line #6
+move rearranged line #7
+move rearranged line #8
+move rearranged line #9
+EOF
+	git add . &&
+	git rm -r src/move/unchanged &&
+	git rm -r src/move/changed &&
+	git rm -r src/move/rearranged &&
+	git commit -m "changes"
+'
+
+cat <<EOF >expect_diff_stat
+ changed/text             |    2 +-
+ dst/copy/changed/text    |   10 ++++++++++
+ dst/copy/rearranged/text |   10 ++++++++++
+ dst/copy/unchanged/text  |   10 ++++++++++
+ dst/move/changed/text    |   10 ++++++++++
+ dst/move/rearranged/text |   10 ++++++++++
+ dst/move/unchanged/text  |   10 ++++++++++
+ rearranged/text          |    2 +-
+ src/move/changed/text    |   10 ----------
+ src/move/rearranged/text |   10 ----------
+ src/move/unchanged/text  |   10 ----------
+ 11 files changed, 62 insertions(+), 32 deletions(-)
+EOF
+
+cat <<EOF >expect_diff_stat_M
+ changed/text                      |    2 +-
+ dst/copy/changed/text             |   10 ++++++++++
+ dst/copy/rearranged/text          |   10 ++++++++++
+ dst/copy/unchanged/text           |   10 ++++++++++
+ {src => dst}/move/changed/text    |    2 +-
+ {src => dst}/move/rearranged/text |    2 +-
+ {src => dst}/move/unchanged/text  |    0
+ rearranged/text                   |    2 +-
+ 8 files changed, 34 insertions(+), 4 deletions(-)
+EOF
+
+cat <<EOF >expect_diff_stat_CC
+ changed/text                      |    2 +-
+ {src => dst}/copy/changed/text    |    2 +-
+ {src => dst}/copy/rearranged/text |    2 +-
+ {src => dst}/copy/unchanged/text  |    0
+ {src => dst}/move/changed/text    |    2 +-
+ {src => dst}/move/rearranged/text |    2 +-
+ {src => dst}/move/unchanged/text  |    0
+ rearranged/text                   |    2 +-
+ 8 files changed, 6 insertions(+), 6 deletions(-)
+EOF
+
+test_expect_success 'sanity check setup (--stat)' '
+	git diff --stat HEAD^..HEAD >actual_diff_stat &&
+	test_cmp expect_diff_stat actual_diff_stat &&
+	git diff --stat -M HEAD^..HEAD >actual_diff_stat_M &&
+	test_cmp expect_diff_stat_M actual_diff_stat_M &&
+	git diff --stat -C -C HEAD^..HEAD >actual_diff_stat_CC &&
+	test_cmp expect_diff_stat_CC actual_diff_stat_CC
+'
+
+# changed/text and rearranged/text falls below default 3% threshold
+cat <<EOF >expect_diff_dirstat
+  10.8% dst/copy/changed/
+  10.8% dst/copy/rearranged/
+  10.8% dst/copy/unchanged/
+  10.8% dst/move/changed/
+  10.8% dst/move/rearranged/
+  10.8% dst/move/unchanged/
+  10.8% src/move/changed/
+  10.8% src/move/rearranged/
+  10.8% src/move/unchanged/
+EOF
+
+# rearranged/text falls below default 3% threshold
+cat <<EOF >expect_diff_dirstat_M
+   5.8% changed/
+  29.3% dst/copy/changed/
+  29.3% dst/copy/rearranged/
+  29.3% dst/copy/unchanged/
+   5.8% dst/move/changed/
+EOF
+
+# rearranged/text falls below default 3% threshold
+cat <<EOF >expect_diff_dirstat_CC
+  32.6% changed/
+  32.6% dst/copy/changed/
+  32.6% dst/move/changed/
+EOF
+
+test_expect_success 'various ways to misspell --dirstat' '
+	test_must_fail git show --dirstat10 &&
+	test_must_fail git show -X=20
+'
+
+test_expect_success 'vanilla --dirstat' '
+	git diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'vanilla -X' '
+	git diff -X HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff -X -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff -X -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
+cat <<EOF >expect_diff_dirstat
+   2.1% changed/
+  10.8% dst/copy/changed/
+  10.8% dst/copy/rearranged/
+  10.8% dst/copy/unchanged/
+  10.8% dst/move/changed/
+  10.8% dst/move/rearranged/
+  10.8% dst/move/unchanged/
+  10.8% src/move/changed/
+  10.8% src/move/rearranged/
+  10.8% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.8% changed/
+  29.3% dst/copy/changed/
+  29.3% dst/copy/rearranged/
+  29.3% dst/copy/unchanged/
+   5.8% dst/move/changed/
+   0.1% dst/move/rearranged/
+   0.1% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  32.6% changed/
+  32.6% dst/copy/changed/
+   0.6% dst/copy/rearranged/
+  32.6% dst/move/changed/
+   0.6% dst/move/rearranged/
+   0.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=0' '
+	git diff --dirstat=0 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=0 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success '-X0' '
+	git diff -X0 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff -X0 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff -X0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
+cat <<EOF >expect_diff_dirstat
+   2.1% changed/
+  10.8% dst/copy/changed/
+  10.8% dst/copy/rearranged/
+  10.8% dst/copy/unchanged/
+  32.5% dst/copy/
+  10.8% dst/move/changed/
+  10.8% dst/move/rearranged/
+  10.8% dst/move/unchanged/
+  32.5% dst/move/
+  65.1% dst/
+  10.8% src/move/changed/
+  10.8% src/move/rearranged/
+  10.8% src/move/unchanged/
+  32.5% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.8% changed/
+  29.3% dst/copy/changed/
+  29.3% dst/copy/rearranged/
+  29.3% dst/copy/unchanged/
+  88.0% dst/copy/
+   5.8% dst/move/changed/
+   0.1% dst/move/rearranged/
+   5.9% dst/move/
+  94.0% dst/
+   0.1% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  32.6% changed/
+  32.6% dst/copy/changed/
+   0.6% dst/copy/rearranged/
+  33.3% dst/copy/
+  32.6% dst/move/changed/
+   0.6% dst/move/rearranged/
+  33.3% dst/move/
+  66.6% dst/
+   0.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=0 --cumulative' '
+	git diff --dirstat=0 --cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=0 --cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=0 --cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+   9.0% changed/
+   9.0% dst/copy/changed/
+   9.0% dst/copy/rearranged/
+   9.0% dst/copy/unchanged/
+   9.0% dst/move/changed/
+   9.0% dst/move/rearranged/
+   9.0% dst/move/unchanged/
+   9.0% rearranged/
+   9.0% src/move/changed/
+   9.0% src/move/rearranged/
+   9.0% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat-by-file' '
+	git diff --dirstat-by-file HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat-by-file -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat-by-file -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+  27.2% dst/copy/
+  27.2% dst/move/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat-by-file=10' '
+	git diff --dirstat-by-file=10 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat-by-file=10 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat-by-file=10 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+   9.0% changed/
+   9.0% dst/copy/changed/
+   9.0% dst/copy/rearranged/
+   9.0% dst/copy/unchanged/
+  27.2% dst/copy/
+   9.0% dst/move/changed/
+   9.0% dst/move/rearranged/
+   9.0% dst/move/unchanged/
+  27.2% dst/move/
+  54.5% dst/
+   9.0% rearranged/
+   9.0% src/move/changed/
+   9.0% src/move/rearranged/
+   9.0% src/move/unchanged/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  42.8% dst/copy/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  28.5% dst/move/
+  71.4% dst/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  33.3% dst/copy/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  33.3% dst/move/
+  66.6% dst/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat-by-file --cumulative' '
+	git diff --dirstat-by-file --cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat-by-file --cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat-by-file --cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv6 2/8] Make --dirstat=0 output directories that contribute < 0.1% of changes
  2011-04-29  9:36                                         ` [PATCHv6 0/8] --dirstat fixes, part 2 Johan Herland
  2011-04-29  9:36                                           ` [PATCHv6 1/8] Add several testcases for --dirstat and friends Johan Herland
@ 2011-04-29  9:36                                           ` Johan Herland
  2011-04-29  9:36                                           ` [PATCHv6 3/8] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
                                                             ` (5 subsequent siblings)
  7 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-29  9:36 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

The expected output from --dirstat=0, is to include any directory with
changes, even if those changes contribute a minuscule portion of the total
changes. However, currently, directories that contribute less than 0.1% are
not included, since their 'permille' value is 0, and there is an
'if (permille)' check in gather_dirstat() that causes them to be ignored.

This test is obviously intended to exclude directories that contribute no
changes whatsoever, but in this case, it hits too broadly. The correct
check is against 'this_dir' from which the permille is calculated. Only if
this value is 0 does the directory truly contribute no changes, and should
be skipped from the output.

This patches fixes this issue, and updates corresponding testcases to
expect the new behvaior.

Signed-off-by: Johan Herland <johan@herland.net>
---
 diff.c                  |    4 ++--
 t/t4047-diff-dirstat.sh |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index abd9cd5..cfbfa92 100644
--- a/diff.c
+++ b/diff.c
@@ -1500,8 +1500,8 @@ static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
 	 *    under this directory (sources == 1).
 	 */
 	if (baselen && sources != 1) {
-		int permille = this_dir * 1000 / changed;
-		if (permille) {
+		if (this_dir) {
+			int permille = this_dir * 1000 / changed;
 			int percent = permille / 10;
 			if (percent >= dir->percent) {
 				fprintf(opt->file, "%s%4d.%01d%% %.*s\n", line_prefix,
diff --git a/t/t4047-diff-dirstat.sh b/t/t4047-diff-dirstat.sh
index ce7c403..1c5adad 100755
--- a/t/t4047-diff-dirstat.sh
+++ b/t/t4047-diff-dirstat.sh
@@ -351,7 +351,6 @@ test_expect_success 'vanilla -X' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
-# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -360,6 +359,7 @@ cat <<EOF >expect_diff_dirstat
   10.8% dst/move/changed/
   10.8% dst/move/rearranged/
   10.8% dst/move/unchanged/
+   0.0% rearranged/
   10.8% src/move/changed/
   10.8% src/move/rearranged/
   10.8% src/move/unchanged/
@@ -402,7 +402,6 @@ test_expect_success '-X0' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
-# rearranged/text falls below 0% threshold (1 / (240 * 9 + 48 + 1) ~= 0.045 %)
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -414,6 +413,7 @@ cat <<EOF >expect_diff_dirstat
   10.8% dst/move/unchanged/
   32.5% dst/move/
   65.1% dst/
+   0.0% rearranged/
   10.8% src/move/changed/
   10.8% src/move/rearranged/
   10.8% src/move/unchanged/
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv6 3/8] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
  2011-04-29  9:36                                         ` [PATCHv6 0/8] --dirstat fixes, part 2 Johan Herland
  2011-04-29  9:36                                           ` [PATCHv6 1/8] Add several testcases for --dirstat and friends Johan Herland
  2011-04-29  9:36                                           ` [PATCHv6 2/8] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
@ 2011-04-29  9:36                                           ` Johan Herland
  2011-04-29  9:36                                           ` [PATCHv6 4/8] Add config variable for specifying default --dirstat behavior Johan Herland
                                                             ` (4 subsequent siblings)
  7 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-29  9:36 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Instead of having multiple interconnected dirstat-related options, teach
the --dirstat option itself to accept all behavior modifiers as parameters.

 - Preserve the current --dirstat=<limit> (where <limit> is an integer
   specifying a cut-off percentage)
 - Add --dirstat=cumulative, replacing --cumulative
 - Add --dirstat=files, replacing --dirstat-by-file
 - Also add --dirstat=changes and --dirstat=noncumulative for specifying the
   current default behavior. These allow the user to reset other --dirstat
   parameters (e.g. 'cumulative' and 'files') occuring earlier on the
   command line.

The deprecated options (--cumulative and --dirstat-by-file) are still
functional, although they have been removed from the documentation.

Allow multiple parameters to be separated by commas, e.g.:
  --dirstat=files,10,cumulative

Update the documentation accordingly, and add testcases verifying the
behavior of the new syntax.

Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/diff-options.txt |   44 ++++++++++----
 diff.c                         |   69 +++++++++++++++++++---
 t/t4047-diff-dirstat.sh        |  123 +++++++++++++++++++++++++++++++++++++++-
 3 files changed, 214 insertions(+), 22 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 7e4bd42..6a3a9c1 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -66,19 +66,39 @@ endif::git-format-patch[]
 	number of modified files, as well as number of added and deleted
 	lines.
 
---dirstat[=<limit>]::
-	Output the distribution of relative amount of changes (number of lines added or
-	removed) for each sub-directory. Directories with changes below
-	a cut-off percent (3% by default) are not shown. The cut-off percent
-	can be set with `--dirstat=<limit>`. Changes in a child directory are not
-	counted for the parent directory, unless `--cumulative` is used.
+--dirstat[=<param1,param2,...>]::
+	Output the distribution of relative amount of changes for each
+	sub-directory. The behavior of `--dirstat` can be customized by
+	passing it a comma separated list of parameters.
+	The following parameters are available:
 +
-Note that the `--dirstat` option computes the changes while ignoring
-the amount of pure code movements within a file.  In other words,
-rearranging lines in a file is not counted as much as other changes.
-
---dirstat-by-file[=<limit>]::
-	Same as `--dirstat`, but counts changed files instead of lines.
+--
+`changes`;;
+	Compute the dirstat numbers by counting the lines that have been
+	removed from the source, or added to the destination. This ignores
+	the amount of pure code movements within a file.  In other words,
+	rearranging lines in a file is not counted as much as other changes.
+	This is the default behavior when no parameter is given.
+`files`;;
+	Compute the dirstat numbers by counting the number of files changed.
+	Each changed file counts equally in the dirstat analysis. This is
+	the computationally cheapest `--dirstat` behavior, since it does
+	not have to look at the file contents at all.
+`cumulative`;;
+	Count changes in a child directory for the parent directory as well.
+	Note that when using `cumulative`, the sum of the percentages
+	reported may exceed 100%. The default (non-cumulative) behavior can
+	be specified with the `noncumulative` parameter.
+<limit>;;
+	An integer parameter specifies a cut-off percent (3% by default).
+	Directories contributing less than this percentage of the changes
+	are not shown in the output.
+--
++
+Example: The following will count changed files, while ignoring
+directories with less than 10% of the total amount of changed files,
+and accumulating child directory counts in the parent directories:
+`--dirstat=files,10,cumulative`.
 
 --summary::
 	Output a condensed summary of extended header information
diff --git a/diff.c b/diff.c
index cfbfa92..0e4a510 100644
--- a/diff.c
+++ b/diff.c
@@ -66,6 +66,41 @@ static int parse_diff_color_slot(const char *var, int ofs)
 	return -1;
 }
 
+static int parse_dirstat_params(struct diff_options *options, const char *params)
+{
+	const char *p = params;
+	while (*p) {
+		if (!prefixcmp(p, "changes")) {
+			p += 7;
+			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
+		} else if (!prefixcmp(p, "files")) {
+			p += 5;
+			DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
+		} else if (!prefixcmp(p, "noncumulative")) {
+			p += 13;
+			DIFF_OPT_CLR(options, DIRSTAT_CUMULATIVE);
+		} else if (!prefixcmp(p, "cumulative")) {
+			p += 10;
+			DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
+		} else if (isdigit(*p)) {
+			char *end;
+			options->dirstat_percent = strtoul(p, &end, 10);
+			p = end;
+		} else
+			return error("Unknown --dirstat parameter '%s'", p);
+
+		if (*p) {
+			/* more parameters, swallow separator */
+			if (*p != ',')
+				return error("Missing comma separator at char "
+					"%"PRIuMAX" of '%s'",
+					(uintmax_t) (p - params), params);
+			p++;
+		}
+	}
+	return 0;
+}
+
 static int git_config_rename(const char *var, const char *value)
 {
 	if (!value)
@@ -3144,6 +3179,18 @@ static int stat_opt(struct diff_options *options, const char **av)
 	return argcount;
 }
 
+static int parse_dirstat_opt(struct diff_options *options, const char *params)
+{
+	if (parse_dirstat_params(options, params))
+		die("Failed to parse --dirstat/-X option parameter");
+	/*
+	 * The caller knows a dirstat-related option is given from the command
+	 * line; allow it to say "return this_function();"
+	 */
+	options->output_format |= DIFF_FORMAT_DIRSTAT;
+	return 1;
+}
+
 int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 {
 	const char *arg = av[0];
@@ -3163,15 +3210,19 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		options->output_format |= DIFF_FORMAT_NUMSTAT;
 	else if (!strcmp(arg, "--shortstat"))
 		options->output_format |= DIFF_FORMAT_SHORTSTAT;
-	else if (opt_arg(arg, 'X', "dirstat", &options->dirstat_percent))
-		options->output_format |= DIFF_FORMAT_DIRSTAT;
-	else if (!strcmp(arg, "--cumulative")) {
-		options->output_format |= DIFF_FORMAT_DIRSTAT;
-		DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
-	} else if (opt_arg(arg, 0, "dirstat-by-file",
-			   &options->dirstat_percent)) {
-		options->output_format |= DIFF_FORMAT_DIRSTAT;
-		DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
+	else if (!strcmp(arg, "-X") || !strcmp(arg, "--dirstat"))
+		return parse_dirstat_opt(options, "");
+	else if (!prefixcmp(arg, "-X"))
+		return parse_dirstat_opt(options, arg + 2);
+	else if (!prefixcmp(arg, "--dirstat="))
+		return parse_dirstat_opt(options, arg + 10);
+	else if (!strcmp(arg, "--cumulative"))
+		return parse_dirstat_opt(options, "cumulative");
+	else if (!strcmp(arg, "--dirstat-by-file"))
+		return parse_dirstat_opt(options, "files");
+	else if (!prefixcmp(arg, "--dirstat-by-file=")) {
+		parse_dirstat_opt(options, "files");
+		return parse_dirstat_opt(options, arg + 18);
 	}
 	else if (!strcmp(arg, "--check"))
 		options->output_format |= DIFF_FORMAT_CHECKDIFF;
diff --git a/t/t4047-diff-dirstat.sh b/t/t4047-diff-dirstat.sh
index 1c5adad..d0ed62c 100755
--- a/t/t4047-diff-dirstat.sh
+++ b/t/t4047-diff-dirstat.sh
@@ -330,7 +330,9 @@ EOF
 
 test_expect_success 'various ways to misspell --dirstat' '
 	test_must_fail git show --dirstat10 &&
-	test_must_fail git show -X=20
+	test_must_fail git show --dirstat10,files &&
+	test_must_fail git show -X=20 &&
+	test_must_fail git show -X=20,cumulative
 '
 
 test_expect_success 'vanilla --dirstat' '
@@ -351,6 +353,39 @@ test_expect_success 'vanilla -X' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'explicit defaults: --dirstat=changes,noncumulative,3' '
+	git diff --dirstat=changes,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=changes,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=changes,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'explicit defaults: -Xchanges,noncumulative,3' '
+	git diff -Xchanges,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff -Xchanges,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff -Xchanges,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'later options override earlier options:' '
+	git diff --dirstat=files,10,cumulative,changes,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,10,cumulative,changes,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,10,cumulative,changes,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+	git diff --dirstat=files --dirstat=10 --dirstat=cumulative --dirstat=changes --dirstat=noncumulative -X3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files --dirstat=10 --dirstat=cumulative --dirstat=changes --dirstat=noncumulative -X3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files --dirstat=10 --dirstat=cumulative --dirstat=changes --dirstat=noncumulative -X3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -454,6 +489,24 @@ test_expect_success '--dirstat=0 --cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=0,cumulative' '
+	git diff --dirstat=0,cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=0,cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=0,cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success '-X0,cumulative' '
+	git diff -X0,cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff -X0,cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff -X0,cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -496,6 +549,15 @@ test_expect_success '--dirstat-by-file' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=files' '
+	git diff --dirstat=files HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
   27.2% dst/copy/
   27.2% dst/move/
@@ -530,6 +592,15 @@ test_expect_success '--dirstat-by-file=10' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=files,10' '
+	git diff --dirstat=files,10 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,10 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,10 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -582,4 +653,54 @@ test_expect_success '--dirstat-by-file --cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=files,cumulative' '
+	git diff --dirstat=files,cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+  27.2% dst/copy/
+  27.2% dst/move/
+  54.5% dst/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  14.2% changed/
+  14.2% dst/copy/changed/
+  14.2% dst/copy/rearranged/
+  14.2% dst/copy/unchanged/
+  42.8% dst/copy/
+  14.2% dst/move/changed/
+  14.2% dst/move/rearranged/
+  28.5% dst/move/
+  71.4% dst/
+  14.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  33.3% dst/copy/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  33.3% dst/move/
+  66.6% dst/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=files,cumulative,10' '
+	git diff --dirstat=files,cumulative,10 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative,10 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative,10 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv6 4/8] Add config variable for specifying default --dirstat behavior
  2011-04-29  9:36                                         ` [PATCHv6 0/8] --dirstat fixes, part 2 Johan Herland
                                                             ` (2 preceding siblings ...)
  2011-04-29  9:36                                           ` [PATCHv6 3/8] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
@ 2011-04-29  9:36                                           ` Johan Herland
  2011-04-29  9:36                                           ` [PATCHv6 5/8] Allow specifying --dirstat cut-off percentage as a floating point number Johan Herland
                                                             ` (3 subsequent siblings)
  7 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-29  9:36 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

The new diff.dirstat config variable takes the same arguments as
'--dirstat=<args>', and specifies the default arguments for --dirstat.
The config is obviously overridden by --dirstat arguments passed on the
command line.

When not specified, the --dirstat defaults are 'changes,noncumulative,3'.

The patch also adds several tests verifying the interaction between the
diff.dirstat config variable, and the --dirstat command line option.

Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/config.txt       |   36 ++++++++++++++++++++
 Documentation/diff-options.txt |    2 +
 diff.c                         |   10 +++++-
 t/t4047-diff-dirstat.sh        |   72 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 119 insertions(+), 1 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 6babbc7..c18dd5a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -822,6 +822,42 @@ diff.autorefreshindex::
 	affects only 'git diff' Porcelain, and not lower level
 	'diff' commands such as 'git diff-files'.
 
+diff.dirstat::
+	A comma separated list of `--dirstat` parameters specifying the
+	default behavior of the `--dirstat` option to linkgit:git-diff[1]`
+	and friends. The defaults can be overridden on the command line
+	(using `--dirstat=<param1,param2,...>`). The fallback defaults
+	(when not changed by `diff.dirstat`) are `changes,noncumulative,3`.
+	The following parameters are available:
++
+--
+`changes`;;
+	Compute the dirstat numbers by counting the lines that have been
+	removed from the source, or added to the destination. This ignores
+	the amount of pure code movements within a file.  In other words,
+	rearranging lines in a file is not counted as much as other changes.
+	This is the default behavior when no parameter is given.
+`files`;;
+	Compute the dirstat numbers by counting the number of files changed.
+	Each changed file counts equally in the dirstat analysis. This is
+	the computationally cheapest `--dirstat` behavior, since it does
+	not have to look at the file contents at all.
+`cumulative`;;
+	Count changes in a child directory for the parent directory as well.
+	Note that when using `cumulative`, the sum of the percentages
+	reported may exceed 100%. The default (non-cumulative) behavior can
+	be specified with the `noncumulative` parameter.
+<limit>;;
+	An integer parameter specifies a cut-off percent (3% by default).
+	Directories contributing less than this percentage of the changes
+	are not shown in the output.
+--
++
+Example: The following will count changed files, while ignoring
+directories with less than 10% of the total amount of changed files,
+and accumulating child directory counts in the parent directories:
+`files,10,cumulative`.
+
 diff.external::
 	If this config variable is set, diff generation is not
 	performed using the internal diff machinery, but using the
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 6a3a9c1..4ad50b9 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -70,6 +70,8 @@ endif::git-format-patch[]
 	Output the distribution of relative amount of changes for each
 	sub-directory. The behavior of `--dirstat` can be customized by
 	passing it a comma separated list of parameters.
+	The defaults are controlled by the `diff.dirstat` configuration
+	variable (see linkgit:git-config[1]).
 	The following parameters are available:
 +
 --
diff --git a/diff.c b/diff.c
index 0e4a510..92508b0 100644
--- a/diff.c
+++ b/diff.c
@@ -31,6 +31,7 @@ static const char *external_diff_cmd_cfg;
 int diff_auto_refresh_index = 1;
 static int diff_mnemonic_prefix;
 static int diff_no_prefix;
+static int diff_dirstat_percent_default = 3;
 static struct diff_options default_diff_options;
 
 static char diff_colors[][COLOR_MAXLEN] = {
@@ -180,6 +181,13 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (!strcmp(var, "diff.dirstat")) {
+		default_diff_options.dirstat_percent = diff_dirstat_percent_default;
+		(void) parse_dirstat_params(&default_diff_options, value);
+		diff_dirstat_percent_default = default_diff_options.dirstat_percent;
+		return 0;
+	}
+
 	if (!prefixcmp(var, "submodule."))
 		return parse_submodule_config_option(var, value);
 
@@ -2921,7 +2929,7 @@ void diff_setup(struct diff_options *options)
 	options->line_termination = '\n';
 	options->break_opt = -1;
 	options->rename_limit = -1;
-	options->dirstat_percent = 3;
+	options->dirstat_percent = diff_dirstat_percent_default;
 	options->context = 3;
 
 	options->change = diff_change;
diff --git a/t/t4047-diff-dirstat.sh b/t/t4047-diff-dirstat.sh
index d0ed62c..c1b3697 100755
--- a/t/t4047-diff-dirstat.sh
+++ b/t/t4047-diff-dirstat.sh
@@ -386,6 +386,15 @@ test_expect_success 'later options override earlier options:' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'non-defaults in config overridden by explicit defaults on command line' '
+	git -c diff.dirstat=files,cumulative,50 diff --dirstat=changes,noncumulative,3 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=files,cumulative,50 diff --dirstat=changes,noncumulative,3 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=files,cumulative,50 diff --dirstat=changes,noncumulative,3 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -437,6 +446,15 @@ test_expect_success '-X0' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=0' '
+	git -c diff.dirstat=0 diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0 diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0 diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    2.1% changed/
   10.8% dst/copy/changed/
@@ -507,6 +525,24 @@ test_expect_success '-X0,cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=0,cumulative' '
+	git -c diff.dirstat=0,cumulative diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0,cumulative diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0,cumulative diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=0 & --dirstat=cumulative' '
+	git -c diff.dirstat=0 diff --dirstat=cumulative HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0 diff --dirstat=cumulative -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0 diff --dirstat=cumulative -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -558,6 +594,15 @@ test_expect_success '--dirstat=files' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=files' '
+	git -c diff.dirstat=files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
   27.2% dst/copy/
   27.2% dst/move/
@@ -601,6 +646,15 @@ test_expect_success '--dirstat=files,10' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=10,files' '
+	git -c diff.dirstat=10,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=10,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=10,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
    9.0% changed/
    9.0% dst/copy/changed/
@@ -662,6 +716,15 @@ test_expect_success '--dirstat=files,cumulative' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=cumulative,files' '
+	git -c diff.dirstat=cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 cat <<EOF >expect_diff_dirstat
   27.2% dst/copy/
   27.2% dst/move/
@@ -703,4 +766,13 @@ test_expect_success '--dirstat=files,cumulative,10' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success 'diff.dirstat=10,cumulative,files' '
+	git -c diff.dirstat=10,cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=10,cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=10,cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv6 5/8] Allow specifying --dirstat cut-off percentage as a floating point number
  2011-04-29  9:36                                         ` [PATCHv6 0/8] --dirstat fixes, part 2 Johan Herland
                                                             ` (3 preceding siblings ...)
  2011-04-29  9:36                                           ` [PATCHv6 4/8] Add config variable for specifying default --dirstat behavior Johan Herland
@ 2011-04-29  9:36                                           ` Johan Herland
  2011-04-29  9:36                                           ` [PATCHv6 6/8] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
                                                             ` (2 subsequent siblings)
  7 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-29  9:36 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Only the first digit after the decimal point is kept, as the dirstat
calculations all happen in permille.

Selftests verifying floating-point percentage input has been added.

Improved-by: Junio C Hamano <gitster@pobox.com>
Improved-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Johan Herland <johan@herland.net>
---
 diff.c                  |   26 +++++++++++-------
 diff.h                  |    2 +-
 t/t4047-diff-dirstat.sh |   64 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 81 insertions(+), 11 deletions(-)

diff --git a/diff.c b/diff.c
index 92508b0..e0de4fa 100644
--- a/diff.c
+++ b/diff.c
@@ -31,7 +31,7 @@ static const char *external_diff_cmd_cfg;
 int diff_auto_refresh_index = 1;
 static int diff_mnemonic_prefix;
 static int diff_no_prefix;
-static int diff_dirstat_percent_default = 3;
+static int diff_dirstat_permille_default = 30;
 static struct diff_options default_diff_options;
 
 static char diff_colors[][COLOR_MAXLEN] = {
@@ -85,8 +85,15 @@ static int parse_dirstat_params(struct diff_options *options, const char *params
 			DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
 		} else if (isdigit(*p)) {
 			char *end;
-			options->dirstat_percent = strtoul(p, &end, 10);
+			options->dirstat_permille = strtoul(p, &end, 10) * 10;
 			p = end;
+			if (*p == '.' && isdigit(*++p)) {
+				/* only use first digit */
+				options->dirstat_permille += *p - '0';
+				/* .. and ignore any further digits */
+				while (isdigit(*++p))
+					; /* nothing */
+			}
 		} else
 			return error("Unknown --dirstat parameter '%s'", p);
 
@@ -182,9 +189,9 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 	}
 
 	if (!strcmp(var, "diff.dirstat")) {
-		default_diff_options.dirstat_percent = diff_dirstat_percent_default;
+		default_diff_options.dirstat_permille = diff_dirstat_permille_default;
 		(void) parse_dirstat_params(&default_diff_options, value);
-		diff_dirstat_percent_default = default_diff_options.dirstat_percent;
+		diff_dirstat_permille_default = default_diff_options.dirstat_permille;
 		return 0;
 	}
 
@@ -1496,7 +1503,7 @@ struct dirstat_file {
 
 struct dirstat_dir {
 	struct dirstat_file *files;
-	int alloc, nr, percent, cumulative;
+	int alloc, nr, permille, cumulative;
 };
 
 static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
@@ -1545,10 +1552,9 @@ static long gather_dirstat(struct diff_options *opt, struct dirstat_dir *dir,
 	if (baselen && sources != 1) {
 		if (this_dir) {
 			int permille = this_dir * 1000 / changed;
-			int percent = permille / 10;
-			if (percent >= dir->percent) {
+			if (permille >= dir->permille) {
 				fprintf(opt->file, "%s%4d.%01d%% %.*s\n", line_prefix,
-					percent, permille % 10, baselen, base);
+					permille / 10, permille % 10, baselen, base);
 				if (!dir->cumulative)
 					return 0;
 			}
@@ -1574,7 +1580,7 @@ static void show_dirstat(struct diff_options *options)
 	dir.files = NULL;
 	dir.alloc = 0;
 	dir.nr = 0;
-	dir.percent = options->dirstat_percent;
+	dir.permille = options->dirstat_permille;
 	dir.cumulative = DIFF_OPT_TST(options, DIRSTAT_CUMULATIVE);
 
 	changed = 0;
@@ -2929,7 +2935,7 @@ void diff_setup(struct diff_options *options)
 	options->line_termination = '\n';
 	options->break_opt = -1;
 	options->rename_limit = -1;
-	options->dirstat_percent = diff_dirstat_percent_default;
+	options->dirstat_permille = diff_dirstat_permille_default;
 	options->context = 3;
 
 	options->change = diff_change;
diff --git a/diff.h b/diff.h
index 0083d92..08b4fe0 100644
--- a/diff.h
+++ b/diff.h
@@ -111,7 +111,7 @@ struct diff_options {
 	int rename_score;
 	int rename_limit;
 	int warn_on_too_large_rename;
-	int dirstat_percent;
+	int dirstat_permille;
 	int setup;
 	int abbrev;
 	const char *prefix;
diff --git a/t/t4047-diff-dirstat.sh b/t/t4047-diff-dirstat.sh
index c1b3697..4b25e10 100755
--- a/t/t4047-diff-dirstat.sh
+++ b/t/t4047-diff-dirstat.sh
@@ -775,4 +775,68 @@ test_expect_success 'diff.dirstat=10,cumulative,files' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+cat <<EOF >expect_diff_dirstat
+  27.2% dst/copy/
+  27.2% dst/move/
+  54.5% dst/
+  27.2% src/move/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+  42.8% dst/copy/
+  28.5% dst/move/
+  71.4% dst/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  33.3% dst/copy/
+  33.3% dst/move/
+  66.6% dst/
+EOF
+
+test_expect_success '--dirstat=files,cumulative,16.7' '
+	git diff --dirstat=files,cumulative,16.7 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative,16.7 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative,16.7 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=16.7,cumulative,files' '
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=16.7,cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=16.70,cumulative,files' '
+	git -c diff.dirstat=16.70,cumulative,files diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=16.70,cumulative,files diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=16.70,cumulative,files diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success '--dirstat=files,cumulative,27.2' '
+	git diff --dirstat=files,cumulative,27.2 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative,27.2 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative,27.2 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success '--dirstat=files,cumulative,27.09' '
+	git diff --dirstat=files,cumulative,27.09 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=files,cumulative,27.09 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=files,cumulative,27.09 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv6 6/8] New --dirstat=lines mode, doing dirstat analysis based on diffstat
  2011-04-29  9:36                                         ` [PATCHv6 0/8] --dirstat fixes, part 2 Johan Herland
                                                             ` (4 preceding siblings ...)
  2011-04-29  9:36                                           ` [PATCHv6 5/8] Allow specifying --dirstat cut-off percentage as a floating point number Johan Herland
@ 2011-04-29  9:36                                           ` Johan Herland
  2011-04-29  9:36                                           ` [PATCHv6 7/8] Improve error handling when parsing dirstat parameters Johan Herland
  2011-04-29  9:36                                           ` [PATCHv6 8/8] Mark dirstat error messages for translation Johan Herland
  7 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-29  9:36 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

This patch adds an alternative implementation of show_dirstat(), called
show_dirstat_by_line(), which uses the more expensive diffstat analysis
(as opposed to show_dirstat()'s own (relatively inexpensive) analysis)
to derive the numbers from which the --dirstat output is computed.

The alternative implementation is controlled by the new "lines" parameter
to the --dirstat option (or the diff.dirstat config variable).

For binary files, the diffstat analysis counts bytes instead of lines,
so to prevent binary files from dominating the dirstat results, the
byte counts for binary files are divided by 64 before being compared to
their textual/line-based counterparts. This is a stupid and ugly - but
very cheap - heuristic.

In linux-2.6.git, running the three different --dirstat modes:

  time git diff v2.6.20..v2.6.30 --dirstat=changes > /dev/null
vs.
  time git diff v2.6.20..v2.6.30 --dirstat=lines > /dev/null
vs.
  time git diff v2.6.20..v2.6.30 --dirstat=files > /dev/null

yields the following average runtimes on my machine:

 - "changes" (default): ~6.0 s
 - "lines":             ~9.6 s
 - "files":             ~0.1 s

So, as expected, there's a considerable performance hit (~60%) by going
through the full diffstat analysis as compared to the default "changes"
analysis (obviously, "files" is much faster than both). As such, the
"lines" mode is probably only useful if you really need the --dirstat
numbers to be consistent with the numbers returned from the other
--*stat options.

The patch also includes documentation and tests for the new dirstat mode.

Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/config.txt       |    8 +++
 Documentation/diff-options.txt |    8 +++
 diff.c                         |   61 +++++++++++++++++++++++-
 diff.h                         |    1 +
 t/t4047-diff-dirstat.sh        |  100 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 176 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index c18dd5a..0cad75c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -837,6 +837,14 @@ diff.dirstat::
 	the amount of pure code movements within a file.  In other words,
 	rearranging lines in a file is not counted as much as other changes.
 	This is the default behavior when no parameter is given.
+`lines`;;
+	Compute the dirstat numbers by doing the regular line-based diff
+	analysis, and summing the removed/added line counts. (For binary
+	files, count 64-byte chunks instead, since binary files have no
+	natural concept of lines). This is a more expensive `--dirstat`
+	behavior than the `changes` behavior, but it does count rearranged
+	lines within a file as much as other changes. The resulting output
+	is consistent with what you get from the other `--*stat` options.
 `files`;;
 	Compute the dirstat numbers by counting the number of files changed.
 	Each changed file counts equally in the dirstat analysis. This is
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 4ad50b9..327d10a 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -81,6 +81,14 @@ endif::git-format-patch[]
 	the amount of pure code movements within a file.  In other words,
 	rearranging lines in a file is not counted as much as other changes.
 	This is the default behavior when no parameter is given.
+`lines`;;
+	Compute the dirstat numbers by doing the regular line-based diff
+	analysis, and summing the removed/added line counts. (For binary
+	files, count 64-byte chunks instead, since binary files have no
+	natural concept of lines). This is a more expensive `--dirstat`
+	behavior than the `changes` behavior, but it does count rearranged
+	lines within a file as much as other changes. The resulting output
+	is consistent with what you get from the other `--*stat` options.
 `files`;;
 	Compute the dirstat numbers by counting the number of files changed.
 	Each changed file counts equally in the dirstat analysis. This is
diff --git a/diff.c b/diff.c
index e0de4fa..8703763 100644
--- a/diff.c
+++ b/diff.c
@@ -73,9 +73,15 @@ static int parse_dirstat_params(struct diff_options *options, const char *params
 	while (*p) {
 		if (!prefixcmp(p, "changes")) {
 			p += 7;
+			DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
+			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
+		} else if (!prefixcmp(p, "lines")) {
+			p += 5;
+			DIFF_OPT_SET(options, DIRSTAT_BY_LINE);
 			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
 		} else if (!prefixcmp(p, "files")) {
 			p += 5;
+			DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
 			DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
 		} else if (!prefixcmp(p, "noncumulative")) {
 			p += 13;
@@ -1669,6 +1675,50 @@ found_damage:
 	gather_dirstat(options, &dir, changed, "", 0);
 }
 
+static void show_dirstat_by_line(struct diffstat_t *data, struct diff_options *options)
+{
+	int i;
+	unsigned long changed;
+	struct dirstat_dir dir;
+
+	if (data->nr == 0)
+		return;
+
+	dir.files = NULL;
+	dir.alloc = 0;
+	dir.nr = 0;
+	dir.permille = options->dirstat_permille;
+	dir.cumulative = DIFF_OPT_TST(options, DIRSTAT_CUMULATIVE);
+
+	changed = 0;
+	for (i = 0; i < data->nr; i++) {
+		struct diffstat_file *file = data->files[i];
+		unsigned long damage = file->added + file->deleted;
+		if (file->is_binary)
+			/*
+			 * binary files counts bytes, not lines. Must find some
+			 * way to normalize binary bytes vs. textual lines.
+			 * The following heuristic assumes that there are 64
+			 * bytes per "line".
+			 * This is stupid and ugly, but very cheap...
+			 */
+			damage = (damage + 63) / 64;
+		ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
+		dir.files[dir.nr].name = file->name;
+		dir.files[dir.nr].changed = damage;
+		changed += damage;
+		dir.nr++;
+	}
+
+	/* This can happen even with many files, if everything was renames */
+	if (!changed)
+		return;
+
+	/* Show all directories with more than x% of the changes */
+	qsort(dir.files, dir.nr, sizeof(dir.files[0]), dirstat_compare);
+	gather_dirstat(options, &dir, changed, "", 0);
+}
+
 static void free_diffstat_info(struct diffstat_t *diffstat)
 {
 	int i;
@@ -4058,6 +4108,7 @@ void diff_flush(struct diff_options *options)
 	struct diff_queue_struct *q = &diff_queued_diff;
 	int i, output_format = options->output_format;
 	int separator = 0;
+	int dirstat_by_line = 0;
 
 	/*
 	 * Order: raw, stat, summary, patch
@@ -4078,7 +4129,11 @@ void diff_flush(struct diff_options *options)
 		separator++;
 	}
 
-	if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT)) {
+	if (output_format & DIFF_FORMAT_DIRSTAT && DIFF_OPT_TST(options, DIRSTAT_BY_LINE))
+		dirstat_by_line = 1;
+
+	if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT) ||
+	    dirstat_by_line) {
 		struct diffstat_t diffstat;
 
 		memset(&diffstat, 0, sizeof(struct diffstat_t));
@@ -4093,10 +4148,12 @@ void diff_flush(struct diff_options *options)
 			show_stats(&diffstat, options);
 		if (output_format & DIFF_FORMAT_SHORTSTAT)
 			show_shortstats(&diffstat, options);
+		if (output_format & DIFF_FORMAT_DIRSTAT)
+			show_dirstat_by_line(&diffstat, options);
 		free_diffstat_info(&diffstat);
 		separator++;
 	}
-	if (output_format & DIFF_FORMAT_DIRSTAT)
+	if ((output_format & DIFF_FORMAT_DIRSTAT) && !dirstat_by_line)
 		show_dirstat(options);
 
 	if (output_format & DIFF_FORMAT_SUMMARY && !is_summary_empty(q)) {
diff --git a/diff.h b/diff.h
index 08b4fe0..1a8b685 100644
--- a/diff.h
+++ b/diff.h
@@ -78,6 +78,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
 #define DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES (1 << 25)
 #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
 #define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
+#define DIFF_OPT_DIRSTAT_BY_LINE     (1 << 28)
 
 #define DIFF_OPT_TST(opts, flag)    ((opts)->flags & DIFF_OPT_##flag)
 #define DIFF_OPT_SET(opts, flag)    ((opts)->flags |= DIFF_OPT_##flag)
diff --git a/t/t4047-diff-dirstat.sh b/t/t4047-diff-dirstat.sh
index 4b25e10..b8ad92a 100755
--- a/t/t4047-diff-dirstat.sh
+++ b/t/t4047-diff-dirstat.sh
@@ -839,4 +839,104 @@ test_expect_success '--dirstat=files,cumulative,27.09' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+cat <<EOF >expect_diff_dirstat
+  10.6% dst/copy/changed/
+  10.6% dst/copy/rearranged/
+  10.6% dst/copy/unchanged/
+  10.6% dst/move/changed/
+  10.6% dst/move/rearranged/
+  10.6% dst/move/unchanged/
+  10.6% src/move/changed/
+  10.6% src/move/rearranged/
+  10.6% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.2% changed/
+  26.3% dst/copy/changed/
+  26.3% dst/copy/rearranged/
+  26.3% dst/copy/unchanged/
+   5.2% dst/move/changed/
+   5.2% dst/move/rearranged/
+   5.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=lines' '
+	git diff --dirstat=lines HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=lines -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=lines -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=lines' '
+	git -c diff.dirstat=lines diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=lines diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=lines diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+cat <<EOF >expect_diff_dirstat
+   2.1% changed/
+  10.6% dst/copy/changed/
+  10.6% dst/copy/rearranged/
+  10.6% dst/copy/unchanged/
+  10.6% dst/move/changed/
+  10.6% dst/move/rearranged/
+  10.6% dst/move/unchanged/
+   2.1% rearranged/
+  10.6% src/move/changed/
+  10.6% src/move/rearranged/
+  10.6% src/move/unchanged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_M
+   5.2% changed/
+  26.3% dst/copy/changed/
+  26.3% dst/copy/rearranged/
+  26.3% dst/copy/unchanged/
+   5.2% dst/move/changed/
+   5.2% dst/move/rearranged/
+   5.2% rearranged/
+EOF
+
+cat <<EOF >expect_diff_dirstat_CC
+  16.6% changed/
+  16.6% dst/copy/changed/
+  16.6% dst/copy/rearranged/
+  16.6% dst/move/changed/
+  16.6% dst/move/rearranged/
+  16.6% rearranged/
+EOF
+
+test_expect_success '--dirstat=lines,0' '
+	git diff --dirstat=lines,0 HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git diff --dirstat=lines,0 -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git diff --dirstat=lines,0 -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
+test_expect_success 'diff.dirstat=0,lines' '
+	git -c diff.dirstat=0,lines diff --dirstat HEAD^..HEAD >actual_diff_dirstat &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	git -c diff.dirstat=0,lines diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	git -c diff.dirstat=0,lines diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv6 7/8] Improve error handling when parsing dirstat parameters
  2011-04-29  9:36                                         ` [PATCHv6 0/8] --dirstat fixes, part 2 Johan Herland
                                                             ` (5 preceding siblings ...)
  2011-04-29  9:36                                           ` [PATCHv6 6/8] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
@ 2011-04-29  9:36                                           ` Johan Herland
  2011-04-29  9:36                                           ` [PATCHv6 8/8] Mark dirstat error messages for translation Johan Herland
  7 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-29  9:36 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

When encountering errors or unknown tokens while parsing parameters to the
--dirstat option, it makes sense to die() with an error message informing
the user of which parameter did not make sense. However, when parsing the
diff.dirstat config variable, we cannot simply die(), but should instead
(after warning the user) ignore the erroneous or unrecognized parameter.
After all, future Git versions might add more dirstat parameters, and
using two different Git versions on the same repo should not cripple the
older Git version just because of a parameter that is only understood by
a more recent Git version.

This patch fixes the issue by refactoring the dirstat parameter parsing
so that parse_dirstat_params() keeps on parsing parameters, even if an
earlier parameter was not recognized. When parsing has finished, it returns
zero if all parameters were successfully parsed, and non-zero if one or
more parameters were not recognized (with appropriate error messages
appended to the 'errmsg' argument).

The parse_dirstat_params() callers then decide (based on the return value
from parse_dirstat_params()) whether to warn and ignore (in case of
diff.dirstat), or to warn and die (in case of --dirstat).

The patch also adds a couple of tests verifying the correct behavior of
--dirstat and diff.dirstat in the face of unknown (possibly future) dirstat
parameters.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---
 diff.c                  |   71 +++++++++++++++++++++++++++--------------------
 t/t4047-diff-dirstat.sh |   37 ++++++++++++++++++++++++
 2 files changed, 78 insertions(+), 30 deletions(-)

diff --git a/diff.c b/diff.c
index 8703763..b290543 100644
--- a/diff.c
+++ b/diff.c
@@ -67,52 +67,56 @@ static int parse_diff_color_slot(const char *var, int ofs)
 	return -1;
 }
 
-static int parse_dirstat_params(struct diff_options *options, const char *params)
+static int parse_dirstat_params(struct diff_options *options, const char *params,
+				struct strbuf *errmsg)
 {
 	const char *p = params;
+	int p_len, ret = 0;
+
 	while (*p) {
-		if (!prefixcmp(p, "changes")) {
-			p += 7;
+		p_len = strchrnul(p, ',') - p;
+		if (!memcmp(p, "changes", p_len)) {
 			DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
 			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
-		} else if (!prefixcmp(p, "lines")) {
-			p += 5;
+		} else if (!memcmp(p, "lines", p_len)) {
 			DIFF_OPT_SET(options, DIRSTAT_BY_LINE);
 			DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
-		} else if (!prefixcmp(p, "files")) {
-			p += 5;
+		} else if (!memcmp(p, "files", p_len)) {
 			DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
 			DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
-		} else if (!prefixcmp(p, "noncumulative")) {
-			p += 13;
+		} else if (!memcmp(p, "noncumulative", p_len)) {
 			DIFF_OPT_CLR(options, DIRSTAT_CUMULATIVE);
-		} else if (!prefixcmp(p, "cumulative")) {
-			p += 10;
+		} else if (!memcmp(p, "cumulative", p_len)) {
 			DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
 		} else if (isdigit(*p)) {
 			char *end;
-			options->dirstat_permille = strtoul(p, &end, 10) * 10;
-			p = end;
-			if (*p == '.' && isdigit(*++p)) {
+			int permille = strtoul(p, &end, 10) * 10;
+			if (*end == '.' && isdigit(*++end)) {
 				/* only use first digit */
-				options->dirstat_permille += *p - '0';
+				permille += *end - '0';
 				/* .. and ignore any further digits */
-				while (isdigit(*++p))
+				while (isdigit(*++end))
 					; /* nothing */
 			}
-		} else
-			return error("Unknown --dirstat parameter '%s'", p);
-
-		if (*p) {
-			/* more parameters, swallow separator */
-			if (*p != ',')
-				return error("Missing comma separator at char "
-					"%"PRIuMAX" of '%s'",
-					(uintmax_t) (p - params), params);
-			p++;
+			if (end - p == p_len)
+				options->dirstat_permille = permille;
+			else {
+				strbuf_addf(errmsg, "  Failed to parse dirstat cut-off percentage '%.*s'\n",
+					    p_len, p);
+				ret++;
+			}
+		} else {
+			strbuf_addf(errmsg, "  Unknown dirstat parameter '%.*s'\n",
+				    p_len, p);
+			ret++;
 		}
+
+		p += p_len;
+
+		if (*p)
+			p++; /* more parameters, swallow separator */
 	}
-	return 0;
+	return ret;
 }
 
 static int git_config_rename(const char *var, const char *value)
@@ -195,8 +199,12 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 	}
 
 	if (!strcmp(var, "diff.dirstat")) {
+		struct strbuf errmsg = STRBUF_INIT;
 		default_diff_options.dirstat_permille = diff_dirstat_permille_default;
-		(void) parse_dirstat_params(&default_diff_options, value);
+		if (parse_dirstat_params(&default_diff_options, value, &errmsg))
+			warning("Found errors in 'diff.dirstat' config variable:\n%s",
+				errmsg.buf);
+		strbuf_release(&errmsg);
 		diff_dirstat_permille_default = default_diff_options.dirstat_permille;
 		return 0;
 	}
@@ -3245,8 +3253,11 @@ static int stat_opt(struct diff_options *options, const char **av)
 
 static int parse_dirstat_opt(struct diff_options *options, const char *params)
 {
-	if (parse_dirstat_params(options, params))
-		die("Failed to parse --dirstat/-X option parameter");
+	struct strbuf errmsg = STRBUF_INIT;
+	if (parse_dirstat_params(options, params, &errmsg))
+		die("Failed to parse --dirstat/-X option parameter:\n%s",
+		    errmsg.buf);
+	strbuf_release(&errmsg);
 	/*
 	 * The caller knows a dirstat-related option is given from the command
 	 * line; allow it to say "return this_function();"
diff --git a/t/t4047-diff-dirstat.sh b/t/t4047-diff-dirstat.sh
index b8ad92a..cc947fd 100755
--- a/t/t4047-diff-dirstat.sh
+++ b/t/t4047-diff-dirstat.sh
@@ -939,4 +939,41 @@ test_expect_success 'diff.dirstat=0,lines' '
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC
 '
 
+test_expect_success '--dirstat=future_param,lines,0 should fail loudly' '
+	test_must_fail git diff --dirstat=future_param,lines,0 HEAD^..HEAD >actual_diff_dirstat 2>actual_error &&
+	test_debug "cat actual_error" &&
+	test_cmp /dev/null actual_diff_dirstat &&
+	grep -q "future_param" actual_error &&
+	grep -q "\--dirstat" actual_error
+'
+
+test_expect_success '--dirstat=dummy1,cumulative,2dummy should report both unrecognized parameters' '
+	test_must_fail git diff --dirstat=dummy1,cumulative,2dummy HEAD^..HEAD >actual_diff_dirstat 2>actual_error &&
+	test_debug "cat actual_error" &&
+	test_cmp /dev/null actual_diff_dirstat &&
+	grep -q "dummy1" actual_error &&
+	grep -q "2dummy" actual_error &&
+	grep -q "\--dirstat" actual_error
+'
+
+test_expect_success 'diff.dirstat=future_param,0,lines should warn, but still work' '
+	git -c diff.dirstat=future_param,0,lines diff --dirstat HEAD^..HEAD >actual_diff_dirstat 2>actual_error &&
+	test_debug "cat actual_error" &&
+	test_cmp expect_diff_dirstat actual_diff_dirstat &&
+	grep -q "future_param" actual_error &&
+	grep -q "diff\\.dirstat" actual_error &&
+
+	git -c diff.dirstat=future_param,0,lines diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M 2>actual_error &&
+	test_debug "cat actual_error" &&
+	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
+	grep -q "future_param" actual_error &&
+	grep -q "diff\\.dirstat" actual_error &&
+
+	git -c diff.dirstat=future_param,0,lines diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC 2>actual_error &&
+	test_debug "cat actual_error" &&
+	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC &&
+	grep -q "future_param" actual_error &&
+	grep -q "diff\\.dirstat" actual_error
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCHv6 8/8] Mark dirstat error messages for translation
  2011-04-29  9:36                                         ` [PATCHv6 0/8] --dirstat fixes, part 2 Johan Herland
                                                             ` (6 preceding siblings ...)
  2011-04-29  9:36                                           ` [PATCHv6 7/8] Improve error handling when parsing dirstat parameters Johan Herland
@ 2011-04-29  9:36                                           ` Johan Herland
  7 siblings, 0 replies; 91+ messages in thread
From: Johan Herland @ 2011-04-29  9:36 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Linus Torvalds, Johan Herland

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---

This patch WILL NOT COMPILE until the topic branch is merged with an
i18n/gettext-aware branch (i.e. where _() and test_i18ngrep is present).

 diff.c                  |    8 ++++----
 t/t4047-diff-dirstat.sh |   22 +++++++++++-----------
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/diff.c b/diff.c
index b290543..0f65413 100644
--- a/diff.c
+++ b/diff.c
@@ -101,12 +101,12 @@ static int parse_dirstat_params(struct diff_options *options, const char *params
 			if (end - p == p_len)
 				options->dirstat_permille = permille;
 			else {
-				strbuf_addf(errmsg, "  Failed to parse dirstat cut-off percentage '%.*s'\n",
+				strbuf_addf(errmsg, _("  Failed to parse dirstat cut-off percentage '%.*s'\n"),
 					    p_len, p);
 				ret++;
 			}
 		} else {
-			strbuf_addf(errmsg, "  Unknown dirstat parameter '%.*s'\n",
+			strbuf_addf(errmsg, _("  Unknown dirstat parameter '%.*s'\n"),
 				    p_len, p);
 			ret++;
 		}
@@ -202,7 +202,7 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 		struct strbuf errmsg = STRBUF_INIT;
 		default_diff_options.dirstat_permille = diff_dirstat_permille_default;
 		if (parse_dirstat_params(&default_diff_options, value, &errmsg))
-			warning("Found errors in 'diff.dirstat' config variable:\n%s",
+			warning(_("Found errors in 'diff.dirstat' config variable:\n%s"),
 				errmsg.buf);
 		strbuf_release(&errmsg);
 		diff_dirstat_permille_default = default_diff_options.dirstat_permille;
@@ -3255,7 +3255,7 @@ static int parse_dirstat_opt(struct diff_options *options, const char *params)
 {
 	struct strbuf errmsg = STRBUF_INIT;
 	if (parse_dirstat_params(options, params, &errmsg))
-		die("Failed to parse --dirstat/-X option parameter:\n%s",
+		die(_("Failed to parse --dirstat/-X option parameter:\n%s"),
 		    errmsg.buf);
 	strbuf_release(&errmsg);
 	/*
diff --git a/t/t4047-diff-dirstat.sh b/t/t4047-diff-dirstat.sh
index cc947fd..29e80a5 100755
--- a/t/t4047-diff-dirstat.sh
+++ b/t/t4047-diff-dirstat.sh
@@ -943,37 +943,37 @@ test_expect_success '--dirstat=future_param,lines,0 should fail loudly' '
 	test_must_fail git diff --dirstat=future_param,lines,0 HEAD^..HEAD >actual_diff_dirstat 2>actual_error &&
 	test_debug "cat actual_error" &&
 	test_cmp /dev/null actual_diff_dirstat &&
-	grep -q "future_param" actual_error &&
-	grep -q "\--dirstat" actual_error
+	test_i18ngrep -q "future_param" actual_error &&
+	test_i18ngrep -q "\--dirstat" actual_error
 '
 
 test_expect_success '--dirstat=dummy1,cumulative,2dummy should report both unrecognized parameters' '
 	test_must_fail git diff --dirstat=dummy1,cumulative,2dummy HEAD^..HEAD >actual_diff_dirstat 2>actual_error &&
 	test_debug "cat actual_error" &&
 	test_cmp /dev/null actual_diff_dirstat &&
-	grep -q "dummy1" actual_error &&
-	grep -q "2dummy" actual_error &&
-	grep -q "\--dirstat" actual_error
+	test_i18ngrep -q "dummy1" actual_error &&
+	test_i18ngrep -q "2dummy" actual_error &&
+	test_i18ngrep -q "\--dirstat" actual_error
 '
 
 test_expect_success 'diff.dirstat=future_param,0,lines should warn, but still work' '
 	git -c diff.dirstat=future_param,0,lines diff --dirstat HEAD^..HEAD >actual_diff_dirstat 2>actual_error &&
 	test_debug "cat actual_error" &&
 	test_cmp expect_diff_dirstat actual_diff_dirstat &&
-	grep -q "future_param" actual_error &&
-	grep -q "diff\\.dirstat" actual_error &&
+	test_i18ngrep -q "future_param" actual_error &&
+	test_i18ngrep -q "diff\\.dirstat" actual_error &&
 
 	git -c diff.dirstat=future_param,0,lines diff --dirstat -M HEAD^..HEAD >actual_diff_dirstat_M 2>actual_error &&
 	test_debug "cat actual_error" &&
 	test_cmp expect_diff_dirstat_M actual_diff_dirstat_M &&
-	grep -q "future_param" actual_error &&
-	grep -q "diff\\.dirstat" actual_error &&
+	test_i18ngrep -q "future_param" actual_error &&
+	test_i18ngrep -q "diff\\.dirstat" actual_error &&
 
 	git -c diff.dirstat=future_param,0,lines diff --dirstat -C -C HEAD^..HEAD >actual_diff_dirstat_CC 2>actual_error &&
 	test_debug "cat actual_error" &&
 	test_cmp expect_diff_dirstat_CC actual_diff_dirstat_CC &&
-	grep -q "future_param" actual_error &&
-	grep -q "diff\\.dirstat" actual_error
+	test_i18ngrep -q "future_param" actual_error &&
+	test_i18ngrep -q "diff\\.dirstat" actual_error
 '
 
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 91+ messages in thread

end of thread, other threads:[~2011-04-29  9:37 UTC | newest]

Thread overview: 91+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-07 13:49 BUG? in --dirstat when rearranging lines in a file Johan Herland
2011-04-07 14:56 ` Linus Torvalds
2011-04-07 22:43   ` Junio C Hamano
2011-04-07 22:59     ` Linus Torvalds
2011-04-08 14:46   ` Johan Herland
2011-04-08 14:48     ` [PATCH 1/3] --dirstat: Document shortcomings compared to --stat or regular diff Johan Herland
2011-04-08 19:50       ` Junio C Hamano
2011-04-08 14:50     ` [PATCH 2/3] --dirstat-by-file: Make it faster and more correct Johan Herland
2011-04-08 14:55     ` [RFC/PATCH 3/3] Teach --dirstat to not completely ignore rearranged lines Johan Herland
2011-04-08 15:04     ` BUG? in --dirstat when rearranging lines in a file Linus Torvalds
2011-04-08 19:56       ` Junio C Hamano
2011-04-10 22:48         ` [PATCHv2 0/3] --dirstat fixes Johan Herland
2011-04-10 22:48           ` [PATCHv2 1/3] --dirstat: Describe non-obvious differences relative to --stat or regular diff Johan Herland
2011-04-10 22:48           ` [PATCHv2 2/3] --dirstat-by-file: Make it faster and more correct Johan Herland
2011-04-11 18:14             ` Junio C Hamano
2011-04-10 22:48           ` [PATCHv2 3/3] Teach --dirstat to not completely ignore rearranged lines within a file Johan Herland
2011-04-11 21:38             ` Junio C Hamano
2011-04-11 21:56               ` Johan Herland
2011-04-11 22:08                 ` Junio C Hamano
2011-04-12  9:22                   ` Johan Herland
2011-04-12  9:24                     ` [PATCH 4/3] --dirstat: In case of renames, use target filename instead of source filename Johan Herland
2011-04-12 14:59                       ` Linus Torvalds
2011-04-12  9:26                     ` [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis Johan Herland
2011-04-12 14:46                       ` Linus Torvalds
2011-04-12 15:08                         ` Linus Torvalds
2011-04-12 22:03                           ` Johan Herland
2011-04-12 22:12                             ` Linus Torvalds
2011-04-12 22:22                             ` Junio C Hamano
2011-04-26  0:01                         ` [PATCH 0/6] --dirstat fixes, part 2 Johan Herland
2011-04-26  0:01                           ` [PATCH 1/6] Add several testcases for --dirstat and friends Johan Herland
2011-04-26  0:01                           ` [PATCH 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
2011-04-26  0:01                           ` [PATCH 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
2011-04-26 16:36                             ` Junio C Hamano
2011-04-27  2:02                               ` Johan Herland
2011-04-27  4:53                                 ` Junio C Hamano
2011-04-27 20:51                                 ` Junio C Hamano
2011-04-27 21:01                                   ` Junio C Hamano
2011-04-26  0:01                           ` [PATCH 4/6] Add config variable for specifying default --dirstat behavior Johan Herland
2011-04-26 16:43                             ` Junio C Hamano
2011-04-27  2:02                               ` Johan Herland
2011-04-26  0:01                           ` [PATCH 5/6] Use floating point for --dirstat percentages Johan Herland
2011-04-26 16:52                             ` Junio C Hamano
2011-04-27  2:02                               ` Johan Herland
2011-04-27  4:42                                 ` Junio C Hamano
2011-04-27  4:53                                   ` Linus Torvalds
2011-04-27  5:20                                     ` Junio C Hamano
2011-04-26  0:01                           ` [PATCH 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
2011-04-26 16:59                             ` Junio C Hamano
2011-04-27  2:02                               ` Johan Herland
2011-04-26  0:15                           ` [PATCH 0/6] --dirstat fixes, part 2 Linus Torvalds
2011-04-27  2:12                           ` [PATCHv2 " Johan Herland
2011-04-27  2:12                             ` [PATCHv2 1/6] Add several testcases for --dirstat and friends Johan Herland
2011-04-27  2:12                             ` [PATCHv2 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
2011-04-27  2:12                             ` [PATCHv2 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
2011-04-27  2:12                             ` [PATCHv2 4/6] Add config variable for specifying default --dirstat behavior Johan Herland
2011-04-27  2:12                             ` [PATCHv2 5/6] Use floating point for --dirstat percentages Johan Herland
2011-04-27  2:45                               ` Linus Torvalds
2011-04-27  2:12                             ` [PATCHv2 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
2011-04-27  8:24                             ` [PATCHv3 0/6] --dirstat fixes, part 2 Johan Herland
2011-04-27  8:24                               ` [PATCHv3 1/6] Add several testcases for --dirstat and friends Johan Herland
2011-04-27  8:24                               ` [PATCHv3 2/6] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
2011-04-27  8:24                               ` [PATCHv3 3/6] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
2011-04-27  8:24                               ` [PATCHv3 4/6] Add config variable for specifying default --dirstat behavior Johan Herland
2011-04-27  8:24                               ` [PATCHv3 5/6] Allow specifying --dirstat cut-off percentage as a floating point number Johan Herland
2011-04-27  8:37                                 ` Linus Torvalds
2011-04-27 10:29                                   ` [PATCHv4 " Johan Herland
2011-04-27  8:24                               ` [PATCHv3 6/6] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
2011-04-28  1:17                               ` [PATCHv5 0/7] --dirstat fixes, part 2 Johan Herland
2011-04-28  1:17                                 ` [PATCHv5 1/7] Add several testcases for --dirstat and friends Johan Herland
2011-04-28  1:17                                 ` [PATCHv5 2/7] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
2011-04-28  1:17                                 ` [PATCHv5 3/7] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
2011-04-28  1:17                                 ` [PATCHv5 4/7] Add config variable for specifying default --dirstat behavior Johan Herland
2011-04-28  1:17                                 ` [PATCHv5 5/7] Allow specifying --dirstat cut-off percentage as a floating point number Johan Herland
2011-04-28  1:17                                 ` [PATCHv5 6/7] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
2011-04-28  1:17                                 ` [PATCHv5 7/7] Improve error handling when parsing dirstat parameters Johan Herland
2011-04-28 18:41                                   ` Junio C Hamano
2011-04-28 19:20                                     ` Junio C Hamano
2011-04-28 23:16                                       ` Johan Herland
2011-04-28 23:13                                     ` Johan Herland
2011-04-29  4:06                                       ` Junio C Hamano
2011-04-29  9:36                                         ` [PATCHv6 0/8] --dirstat fixes, part 2 Johan Herland
2011-04-29  9:36                                           ` [PATCHv6 1/8] Add several testcases for --dirstat and friends Johan Herland
2011-04-29  9:36                                           ` [PATCHv6 2/8] Make --dirstat=0 output directories that contribute < 0.1% of changes Johan Herland
2011-04-29  9:36                                           ` [PATCHv6 3/8] Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file Johan Herland
2011-04-29  9:36                                           ` [PATCHv6 4/8] Add config variable for specifying default --dirstat behavior Johan Herland
2011-04-29  9:36                                           ` [PATCHv6 5/8] Allow specifying --dirstat cut-off percentage as a floating point number Johan Herland
2011-04-29  9:36                                           ` [PATCHv6 6/8] New --dirstat=lines mode, doing dirstat analysis based on diffstat Johan Herland
2011-04-29  9:36                                           ` [PATCHv6 7/8] Improve error handling when parsing dirstat parameters Johan Herland
2011-04-29  9:36                                           ` [PATCHv6 8/8] Mark dirstat error messages for translation Johan Herland
2011-04-12 18:34                       ` [RFC/PATCH 5/3] Alternative --dirstat implementation, based on diffstat analysis Junio C Hamano
2011-04-10 23:17           ` [PATCHv2 0/3] --dirstat fixes Linus Torvalds

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).