git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly
  2013-05-17 12:23 [PATCH v2 0/2] Improve diffcore-pickaxe documentation Ramkumar Ramachandra
@ 2013-05-17 12:23 ` Ramkumar Ramachandra
  2013-05-18  1:24   ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Ramkumar Ramachandra @ 2013-05-17 12:23 UTC (permalink / raw)
  To: Git List; +Cc: Phil Hord, Junio C Hamano

The documentation of -S and -G is very sketchy.  Completely rewrite the
sections in Documentation/diff-options.txt and
Documentation/gitdiffcore.txt.

References:
52e9578 ([PATCH] Introducing software archaeologist's tool "pickaxe".)
f506b8e (git log/diff: add -G<regexp> that greps in the patch text)

Inputs-from: Phil Hord <phil.hord@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 Documentation/diff-options.txt | 37 ++++++++++++++++++++++++++-------
 Documentation/gitdiffcore.txt  | 47 +++++++++++++++++++++++++-----------------
 2 files changed, 57 insertions(+), 27 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 104579d..b61a666 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -383,14 +383,35 @@ ifndef::git-format-patch[]
 	that matches other criteria, nothing is selected.
 
 -S<string>::
-	Look for differences that introduce or remove an instance of
-	<string>. Note that this is different than the string simply
-	appearing in diff output; see the 'pickaxe' entry in
-	linkgit:gitdiffcore[7] for more details.
+	Look for commits that change the number of occurrences of the
+	specified string (i.e. addition/ deletion) in a file.
+	Intended for the scripter's use.
++
+It is especially useful when you're looking for an exact block of code
+(like a struct), and want to know the history of that block since it
+first came into being.
 
 -G<regex>::
-	Look for differences whose added or removed line matches
-	the given <regex>.
+	Grep through the patch text of commits for added/removed lines
+	that match <regex>.  `--pickaxe-regex` is implied in this
+	mode.
++
+To illustrate the difference between `-S<regex> --pickaxe-regex` and
+`-G<regex>`, consider a commit with the following diff in the same
+file:
++
+----
++    return !regexec(regexp, two->ptr, 1, &regmatch, 0);
+...
+-    hit = !regexec(regexp, mf2.ptr, 1, &regmatch, 0);
+----
++
+While `git log -G"regexec\(regexp"` will show this commit, `git log
+-S"regexec\(regexp" --pickaxe-regex` will not (because the number of
+occurrences of that string didn't change).
++
+See the 'pickaxe' entry in linkgit:gitdiffcore[7] for more
+information.
 
 --pickaxe-all::
 	When `-S` or `-G` finds a change, show all the changes in that
@@ -398,8 +419,8 @@ ifndef::git-format-patch[]
 	in <string>.
 
 --pickaxe-regex::
-	Make the <string> not a plain string but an extended POSIX
-	regex to match.
+	Treat the <string> not as a plain string, but an extended
+	POSIX regex to match.  It is implied when `-G` is used.
 endif::git-format-patch[]
 
 -O<orderfile>::
diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index 568d757..d0f2b91 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -222,25 +222,34 @@ version prefixed with '+'.
 diffcore-pickaxe: For Detecting Addition/Deletion of Specified String
 ---------------------------------------------------------------------
 
-This transformation is used to find filepairs that represent
-changes that touch a specified string, and is controlled by the
--S option and the `--pickaxe-all` option to the 'git diff-*'
-commands.
-
-When diffcore-pickaxe is in use, it checks if there are
-filepairs whose "result" side and whose "origin" side have
-different number of specified string.  Such a filepair represents
-"the string appeared in this changeset".  It also checks for the
-opposite case that loses the specified string.
-
-When `--pickaxe-all` is not in effect, diffcore-pickaxe leaves
-only such filepairs that touch the specified string in its
-output.  When `--pickaxe-all` is used, diffcore-pickaxe leaves all
-filepairs intact if there is such a filepair, or makes the
-output empty otherwise.  The latter behaviour is designed to
-make reviewing of the changes in the context of the whole
-changeset easier.
-
+There are two kinds of pickaxe: the S kind (corresponding to 'git log
+-S') and the G kind (mnemonic: grep; corresponding to 'git log -G').
+
+The S kind detects filepairs whose "result" side and "origin" side
+have different number of occurrences of specified string.  By
+definition, it will not detect in-file moves.  Also, when a commit
+moves a file wholesale without affecting the string being looked at,
+rename detection kicks in as usual, and 'git log -S' omits the commit
+(since the number of occurrences of that string didn't change in that
+rename-detected filepair).  The implementation essentially runs a
+count, and is significantly cheaper than the G kind.
+
+The G kind detects filepairs whose patch text has an added or a
+deleted line that matches the given regexp.  This means that it can
+detect in-file (or what rename-detection considers the same file)
+moves.  The implementation of 'git log -G' runs diff twice and greps,
+and this can be quite expensive.
+
+When `--pickaxe-regex` is used with `-S`, treat the <string> not as a
+plain string, but an extended POSIX regex to match.  It is implied
+when `-G` is used.
+
+When `--pickaxe-all` is not in effect, diffcore-pickaxe leaves only
+the filepairs that touch the specified string in its output.  When in
+effect, diffcore-pickaxe leaves all filepairs intact if there is such
+a filepair, or makes the output empty otherwise.  The latter behavior
+is designed to make reviewing of the changes in the context of the
+whole changeset easier.
 
 diffcore-order: For Sorting the Output Based on Filenames
 ---------------------------------------------------------
-- 
1.8.1.2.432.g070c57d

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly
  2013-05-17 12:23 ` [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly Ramkumar Ramachandra
@ 2013-05-18  1:24   ` Junio C Hamano
  2013-05-19  7:33     ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2013-05-18  1:24 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Phil Hord

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 104579d..b61a666 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -383,14 +383,35 @@ ifndef::git-format-patch[]
>  	that matches other criteria, nothing is selected.
>  
>  -S<string>::
> -	Look for differences that introduce or remove an instance of
> -	<string>. Note that this is different than the string simply
> -	appearing in diff output; see the 'pickaxe' entry in
> -	linkgit:gitdiffcore[7] for more details.
> +	Look for commits that change the number of occurrences of the

The first part of the change is misguided.  First of all, this text
also appears in the documentation of "git diff" and -S limits the
output to those filepairs that match its criteria.

"git log A..B -- foo" looks for commits that has changes in paths
that matches 'foo'.  "git log" that selects commits is affected by
the outcome of what "diff" shows.  This "looks for *commits*" is a
characteristic of "log", not specific to the -S option.

The aspect of "diff" behaviour that is affected by -S is what is
considered as difference.  Usually "diff" says "preimage and
postimage are different" for any change, but -S changes that.  The
preimage and postimage has to have different number of specified
block of text to be considered different.

And that is why the original says "look for differences".

> +	specified string (i.e. addition/ deletion) in a file.
> +	Intended for the scripter's use.
> ++
> +It is especially useful when you're looking for an exact block of code
> +(like a struct), and want to know the history of that block since it
> +first came into being.

I am not sure this half-way description is a good idea.  If you want
to use it to discover what happend "since it first came into being",
you need to use this (and the feature is designed for such a use
pattern) iteratively, find the first commit that changes the block
of the text that appear in the latest version, find the corresponding
block of interest in the preimage and then feed that to -S and start
digging from that first-discovered commit.  If the text describes
that iteration fully, I think that is fine, but if you read the
above literally, it looks as if you feed one version of -S and it
would do the necessary adjustment on its own, which is misleading.

>  -G<regex>::
> -	Look for differences whose added or removed line matches
> -	the given <regex>.
> +	Grep through the patch text of commits for added/removed lines
> +	that match <regex>.  `--pickaxe-regex` is implied in this
> +	mode.

The same comment on differences vs commits apply to this.

If there were any behaviour change --pickaxe-regex introduces to -S
that also applies to -G "other than the pattern is used as a regular
expression", I would agree it is a good idea to say that other
behaviour is implied, but as far as I know, there is no such
implication.  Drop --pickaxe-regex, as it is not even implied in the
code:

	} else if ((argcount = short_opt('G', av, &optarg))) {
		options->pickaxe = optarg;
		options->pickaxe_opts |= DIFF_PICKAXE_KIND_G;
		return argcount;
	}

so even if there were some code added in the future that does

	if (options->pickaxe_opts & DIFF_PICKAXE_REGEX) {
		do this thing only when the user said
                --pickaxe-regex from the command line
	}

it will _not_ apply to users of -G.

"grep through", if the reader knows "grep", with "match <regex>", it
is crystal clear that this expects a regular expression.  And that
is the only thing that makes -G and --pickaxe-regex superficially
related.

> ++
> +To illustrate the difference between `-S<regex> --pickaxe-regex` and
> +`-G<regex>`, consider a commit with the following diff in the same
> +file:
> ++
> +----
> ++    return !regexec(regexp, two->ptr, 1, &regmatch, 0);
> +...
> +-    hit = !regexec(regexp, mf2.ptr, 1, &regmatch, 0);
> +----
> ++
> +While `git log -G"regexec\(regexp"` will show this commit, `git log
> +-S"regexec\(regexp" --pickaxe-regex` will not (because the number of
> +occurrences of that string didn't change).
> ++
> +See the 'pickaxe' entry in linkgit:gitdiffcore[7] for more
> +information.

This is a readable example (I think we tend not to use contraction,
so if I were writing it, I wouldn't have written "didn't change",
though).

>  --pickaxe-regex::
> -	Make the <string> not a plain string but an extended POSIX
> -	regex to match.
> +	Treat the <string> not as a plain string, but an extended
> +	POSIX regex to match.  It is implied when `-G` is used.

Ditto.  Rather, 

	Treat the <string> given to the -S option as an extended
	POSIX regular expression to match.

> diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
> index 568d757..d0f2b91 100644
> --- a/Documentation/gitdiffcore.txt
> +++ b/Documentation/gitdiffcore.txt
> @@ -222,25 +222,34 @@ version prefixed with '+'.
>  diffcore-pickaxe: For Detecting Addition/Deletion of Specified String
>  ---------------------------------------------------------------------
>  
> -This transformation is used to find filepairs that represent
> -changes that touch a specified string, and is controlled by the
> --S option and the `--pickaxe-all` option to the 'git diff-*'
> -commands.
> -
> -When diffcore-pickaxe is in use, it checks if there are
> -filepairs whose "result" side and whose "origin" side have
> -different number of specified string.  Such a filepair represents
> -"the string appeared in this changeset".  It also checks for the
> -opposite case that loses the specified string.
> -
> -When `--pickaxe-all` is not in effect, diffcore-pickaxe leaves
> -only such filepairs that touch the specified string in its
> -output.  When `--pickaxe-all` is used, diffcore-pickaxe leaves all
> -filepairs intact if there is such a filepair, or makes the
> -output empty otherwise.  The latter behaviour is designed to
> -make reviewing of the changes in the context of the whole
> -changeset easier.

This part is impossible to review on a bus, so I won't comment in
this message.

Why did you even have to touch the paragraph for --pickaxe-all?
That applies to both -S and -G.  I thought it would be just the
matter of slightly tweaking the introductory paragraph (which was
written back when there was only -S), keeping the second paragraph
for -S as-is, and insert an additional paragraph for -G before
--pickaxe-all.

> +There are two kinds of pickaxe: the S kind (corresponding to 'git log
> +-S') and the G kind (mnemonic: grep; corresponding to 'git log -G').
> +
> +The S kind detects filepairs whose "result" side and "origin" side
> +have different number of occurrences of specified string.  By
> +definition, it will not detect in-file moves.  Also, when a commit
> +moves a file wholesale without affecting the string being looked at,
> +rename detection kicks in as usual, and 'git log -S' omits the commit
> +(since the number of occurrences of that string didn't change in that
> +rename-detected filepair).  The implementation essentially runs a
> +count, and is significantly cheaper than the G kind.
> +
> +The G kind detects filepairs whose patch text has an added or a
> +deleted line that matches the given regexp.  This means that it can
> +detect in-file (or what rename-detection considers the same file)
> +moves.  The implementation of 'git log -G' runs diff twice and greps,
> +and this can be quite expensive.
> +
> +When `--pickaxe-regex` is used with `-S`, treat the <string> not as a
> +plain string, but an extended POSIX regex to match.  It is implied
> +when `-G` is used.
> +
> +When `--pickaxe-all` is not in effect, diffcore-pickaxe leaves only
> +the filepairs that touch the specified string in its output.  When in
> +effect, diffcore-pickaxe leaves all filepairs intact if there is such
> +a filepair, or makes the output empty otherwise.  The latter behavior
> +is designed to make reviewing of the changes in the context of the
> +whole changeset easier.
>  
>  diffcore-order: For Sorting the Output Based on Filenames
>  ---------------------------------------------------------

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly
  2013-05-18  1:24   ` Junio C Hamano
@ 2013-05-19  7:33     ` Junio C Hamano
  2013-05-24  9:37       ` Ramkumar Ramachandra
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2013-05-19  7:33 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Phil Hord

Junio C Hamano <gitster@pobox.com> writes:

> Ramkumar Ramachandra <artagnon@gmail.com> writes:
> ...
>>  -G<regex>::
>> -	Look for differences whose added or removed line matches
>> -	the given <regex>.
>> +	Grep through the patch text of commits for added/removed lines
>> +	that match <regex>.  `--pickaxe-regex` is implied in this
>> +	mode.
>
> The same comment on differences vs commits apply to this.
> ...
> it will _not_ apply to users of -G.

s/.$/ unless they say --pickaxe-regex./; so -G does not imply it at
all.

> "grep through", if the reader knows "grep", with "match <regex>", it
> is crystal clear that this expects a regular expression.  And that
> is the only thing that makes -G and --pickaxe-regex superficially
> related.

s/^The description begins with /;  Sorry, but I couldn't write
complete sentences on a bus ;-)

>> -This transformation is used to find filepairs that represent
>> -changes that touch a specified string, and is controlled by the
>> --S option and the `--pickaxe-all` option to the 'git diff-*'
>> -commands.
>> -
>> -When diffcore-pickaxe is in use, it checks if there are
>> -filepairs whose "result" side and whose "origin" side have
>> -different number of specified string.  Such a filepair represents
>> -"the string appeared in this changeset".  It also checks for the
>> -opposite case that loses the specified string.
>> -
>> -When `--pickaxe-all` is not in effect, diffcore-pickaxe leaves
>> -only such filepairs that touch the specified string in its
>> -output.  When `--pickaxe-all` is used, diffcore-pickaxe leaves all
>> -filepairs intact if there is such a filepair, or makes the
>> -output empty otherwise.  The latter behaviour is designed to
>> -make reviewing of the changes in the context of the whole
>> -changeset easier.
>
> This part is impossible to review on a bus, so I won't comment in
> this message.
>
> Why did you even have to touch the paragraph for --pickaxe-all?
> That applies to both -S and -G.  I thought it would be just the
> matter of slightly tweaking the introductory paragraph (which was
> written back when there was only -S), keeping the second paragraph
> for -S as-is, and insert an additional paragraph for -G before
> --pickaxe-all.

Now I see that the paragraph for --pickaxe-all needs to be touched;
the original talks about "touch the specified string", which only
applies to -S and needs to be adjusted.

So here is my attempt of clarifying it.

	This transformation is used to find filepairs that represent
	two kinds of changes, and is controlled by the -S, -G and
	--pickaxe-all options.

	The "-S<block of text>" option tells Git to consider that a
	filepair has differences only if the number of occurrences
	of the specified block of text is different between its
	preimage and its postimage, and treat other filepairs as if
	they did not have any change.  This is meant to be used with
	a block of text that is unique enough to occur only once (so
	expected the number of occurences is 1 vs 0 or 0 vs 1) to
	use with "git log" to find a commit that touched the block
	of text the last time.  When used with the "--pickaxe-regex"
	option, the <block of text> is used as a POSIX extended
	regular expression to match, instead of a literal string.

	The "-G<regular expression>" option tells Git to consider
	that a filepair has differences only if a textual diff
	between its preimage and postimage would indicate a line
	that matches the given regular expression is changed, and
	treat other filepairs as if they did not have any change.

	When -S or -G option is used without "--pickaxe-all" option,
	only filepairs that match their respective criterion are
	kept in the output.  When `--pickaxe-all` is used, all
	filepairs intact if there is such a filepair, or makes the
	output empty otherwise.  This behaviour is designed to make
	reviewing of the changes in the context of the whole
	changeset easier.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly
  2013-05-19  7:33     ` Junio C Hamano
@ 2013-05-24  9:37       ` Ramkumar Ramachandra
  2013-05-24 14:58         ` Phil Hord
  0 siblings, 1 reply; 17+ messages in thread
From: Ramkumar Ramachandra @ 2013-05-24  9:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Phil Hord

Junio C Hamano wrote:
> [...]

I agree with the other comments, and have made suitable changes.
Let's review your block now.

>         This transformation is used to find filepairs that represent
>         two kinds of changes, and is controlled by the -S, -G and
>         --pickaxe-all options.

Why do you call this a "transformation"?  Is git log --author="foo" a
transformation on the git-log output?  Then how is git log -Sfoo a
transformation?

Two kinds of changes controlled by three different options?  Isn't the
original much clearer?

The title says diffcore-pickaxe, and the first paragraph says:

There are two kinds of pickaxe: the S kind (corresponding to 'git log
-S') and the G kind (mnemonic: grep; corresponding to 'git log -G').

>         The "-S<block of text>" option tells Git to consider that a
>         filepair has differences only if the number of occurrences
>         of the specified block of text is different between its
>         preimage and its postimage, and treat other filepairs as if
>         they did not have any change.

I'll rewrite this without the trailing "and treat other filepairs as
if they did not have any change" (which I'm not fond of).

>         This is meant to be used with
>         a block of text that is unique enough to occur only once (so
>         expected the number of occurences is 1 vs 0 or 0 vs 1) to
>         use with "git log" to find a commit that touched the block
>         of text the last time.

You're saying how you think it's "meant" to be used, but in doing so
you've failed to describe its operation faithfully.  I've already
described how it's meant to be used in diff-options (digging a block
of text iteratively) and this is the place to explain what it is doing
faithfully.  Hence my previous writeup on changes in number of
occurrences and rename detection: I _had_ to read the code to
understand it properly, and your writeup is not helping by telling me
about idiomatic usage.

Also, you've dropped computational expense which was there in the original.

>         When used with the "--pickaxe-regex"
>         option, the <block of text> is used as a POSIX extended
>         regular expression to match, instead of a literal string.

Better.

>         The "-G<regular expression>" option tells Git to consider
>         that a filepair has differences only if a textual diff
>         between its preimage and postimage would indicate a line
>         that matches the given regular expression is changed, and
>         treat other filepairs as if they did not have any change.

"would indicate"?  Really?  I'll rewrite this without the trailing
"and treat other filepairs ..".

You've once again dropped what it means in the context of in-file
moves (rename detection), and computational expense from the original.

>         When -S or -G option is used without "--pickaxe-all" option,
>         only filepairs that match their respective criterion are
>         kept in the output.

Much better.

>         When `--pickaxe-all` is used, all
>         filepairs intact if there is such a filepair, or makes the
>         output empty otherwise.

-ENOPARSE.  I didn't particularly like the original, and this isn't
better.  I'll rewrite it.

>         This behaviour is designed to make
>         reviewing of the changes in the context of the whole
>         changeset easier.

Same as original.  Okay.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly
  2013-05-24 10:33 [PATCH v3 0/2] Improve diffcore-pickaxe documentation Ramkumar Ramachandra
@ 2013-05-24 10:33 ` Ramkumar Ramachandra
  2013-05-24 17:31   ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Ramkumar Ramachandra @ 2013-05-24 10:33 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano

The documentation of -S and -G is very sketchy.  Completely rewrite the
sections in Documentation/diff-options.txt and
Documentation/gitdiffcore.txt.

References:
52e9578 ([PATCH] Introducing software archaeologist's tool "pickaxe".)
f506b8e (git log/diff: add -G<regexp> that greps in the patch text)

Inputs-from: Phil Hord <phil.hord@gmail.com>
Co-authored-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 Documentation/diff-options.txt | 38 +++++++++++++++++++++++++++++--------
 Documentation/gitdiffcore.txt  | 43 ++++++++++++++++++++++++------------------
 2 files changed, 55 insertions(+), 26 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 104579d..2835eef 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -383,14 +383,36 @@ ifndef::git-format-patch[]
 	that matches other criteria, nothing is selected.
 
 -S<string>::
-	Look for differences that introduce or remove an instance of
-	<string>. Note that this is different than the string simply
-	appearing in diff output; see the 'pickaxe' entry in
-	linkgit:gitdiffcore[7] for more details.
+	Look for differences that change the number of occurrences of
+	the specified string (i.e. addition/deletion) in a file.
+	Intended for the scripter's use.
++
+It is especially useful when you're looking for an exact block of code
+(like a struct), and want to know the history of that block since it
+first came into being: use the feature iteratively to feed the
+interesting block in the preimage back into `-S`, and keep going until
+you get the very first version of the block.
 
 -G<regex>::
-	Look for differences whose added or removed line matches
-	the given <regex>.
+	Look for differences whose patch text contains added/removed
+	lines that match <regex>.
++
+To illustrate the difference between `-S<regex> --pickaxe-regex` and
+`-G<regex>`, consider a commit with the following diff in the same
+file:
++
+----
++    return !regexec(regexp, two->ptr, 1, &regmatch, 0);
+...
+-    hit = !regexec(regexp, mf2.ptr, 1, &regmatch, 0);
+----
++
+While `git log -G"regexec\(regexp"` will show this commit, `git log
+-S"regexec\(regexp" --pickaxe-regex` will not (because the number of
+occurrences of that string did not change).
++
+See the 'pickaxe' entry in linkgit:gitdiffcore[7] for more
+information.
 
 --pickaxe-all::
 	When `-S` or `-G` finds a change, show all the changes in that
@@ -398,8 +420,8 @@ ifndef::git-format-patch[]
 	in <string>.
 
 --pickaxe-regex::
-	Make the <string> not a plain string but an extended POSIX
-	regex to match.
+	Treat the <string> given to `-S` as an extended POSIX regular
+	expression to match.
 endif::git-format-patch[]
 
 -O<orderfile>::
diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index 568d757..ef4c04a 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -222,26 +222,33 @@ version prefixed with '+'.
 diffcore-pickaxe: For Detecting Addition/Deletion of Specified String
 ---------------------------------------------------------------------
 
-This transformation is used to find filepairs that represent
-changes that touch a specified string, and is controlled by the
--S option and the `--pickaxe-all` option to the 'git diff-*'
-commands.
-
-When diffcore-pickaxe is in use, it checks if there are
-filepairs whose "result" side and whose "origin" side have
-different number of specified string.  Such a filepair represents
-"the string appeared in this changeset".  It also checks for the
-opposite case that loses the specified string.
-
-When `--pickaxe-all` is not in effect, diffcore-pickaxe leaves
-only such filepairs that touch the specified string in its
-output.  When `--pickaxe-all` is used, diffcore-pickaxe leaves all
-filepairs intact if there is such a filepair, or makes the
-output empty otherwise.  The latter behaviour is designed to
-make reviewing of the changes in the context of the whole
+There are two kinds of pickaxe: the S kind (corresponding to 'git log
+-S') and the G kind (mnemonic: grep; corresponding to 'git log -G').
+
+"-S<block of text>" detects filepairs whose preimage and postimage
+have different number of occurrences of the specified block of text.
+By definition, it will not detect in-file moves.  Also, when a
+changeset moves a file wholesale without affecting the interesting
+string, rename detection kicks in as usual, and `-S` omits the
+filepair (since the number of occurrences of that string didn't change
+in that rename-detected filepair).  The implementation essentially
+runs a count, and is significantly cheaper than the G kind.  When used
+with `--pickaxe-regex`, treat the <block of text> as an extended POSIX
+regular expression to match, instead of a literal string.
+
+"-G<regular expression>" detects filepairs whose textual diff has an
+added or a deleted line that matches the given regular expression.
+This means that it can detect in-file (or what rename-detection
+considers the same file) moves.  The implementation runs diff twice
+and greps, and this can be quite expensive.
+
+When `-S` or `-G` are used without `--pickaxe-all`, only filepairs
+that match their respective criterion are kept in the output.  When
+`--pickaxe-all` is used, if even one filepair matches their respective
+criterion in a changeset, the entire changeset is kept.  This behavior
+is designed to make reviewing changes in the context of the whole
 changeset easier.
 
-
 diffcore-order: For Sorting the Output Based on Filenames
 ---------------------------------------------------------
 
-- 
1.8.3.rc3.17.gd95ec6c.dirty

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly
  2013-05-24  9:37       ` Ramkumar Ramachandra
@ 2013-05-24 14:58         ` Phil Hord
  2013-05-24 16:01           ` Ramkumar Ramachandra
  2013-05-24 16:54           ` Junio C Hamano
  0 siblings, 2 replies; 17+ messages in thread
From: Phil Hord @ 2013-05-24 14:58 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Junio C Hamano, Git List

On Fri, May 24, 2013 at 5:37 AM, Ramkumar Ramachandra
<artagnon@gmail.com> wrote:
> Junio C Hamano wrote:
>> [...]
>
> I agree with the other comments, and have made suitable changes.
> Let's review your block now.
>
>>         This transformation is used to find filepairs that represent
>>         two kinds of changes, and is controlled by the -S, -G and
>>         --pickaxe-all options.
>
> Why do you call this a "transformation"?  Is git log --author="foo" a
> transformation on the git-log output?  Then how is git log -Sfoo a
> transformation?
>
> Two kinds of changes controlled by three different options?  Isn't the
> original much clearer?

They are all three filters.  They transform the output by limiting it
to commits which meet specific conditions.  Transformation is used in
the network-graphs sense of the word.  It fits the beginning of the
document where it says this:

  The diffcore mechanism is fed a list of such comparison results
  (each of which is called "filepair", although at this point each
  of them talks about a single file), and transforms such a list
  into another list.  There are currently 5 such transformations:

  - diffcore-break
  - diffcore-rename
  - diffcore-merge-broken
  - diffcore-pickaxe
  - diffcore-order

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly
  2013-05-24 14:58         ` Phil Hord
@ 2013-05-24 16:01           ` Ramkumar Ramachandra
  2013-05-24 16:54           ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Ramkumar Ramachandra @ 2013-05-24 16:01 UTC (permalink / raw)
  To: Phil Hord; +Cc: Junio C Hamano, Git List

Phil Hord wrote:
> It fits the beginning of the
> document where it says this:

Ah, I missed that.  Either way, I'm quite happy with v3: we can change
the first paragraph to use the word "transformation" if we really
want.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly
  2013-05-24 14:58         ` Phil Hord
  2013-05-24 16:01           ` Ramkumar Ramachandra
@ 2013-05-24 16:54           ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2013-05-24 16:54 UTC (permalink / raw)
  To: Phil Hord; +Cc: Ramkumar Ramachandra, Git List

Phil Hord <phil.hord@gmail.com> writes:

> On Fri, May 24, 2013 at 5:37 AM, Ramkumar Ramachandra
> <artagnon@gmail.com> wrote:
>> Junio C Hamano wrote:
>>> [...]
>>
>> I agree with the other comments, and have made suitable changes.
>> Let's review your block now.
>>
>>>         This transformation is used to find filepairs that represent
>>>         two kinds of changes, and is controlled by the -S, -G and
>>>         --pickaxe-all options.
>>
>> Why do you call this a "transformation"?  Is git log --author="foo" a
>> transformation on the git-log output?  Then how is git log -Sfoo a
>> transformation?
>>
>> Two kinds of changes controlled by three different options?  Isn't the
>> original much clearer?
>
> They are all three filters.  They transform the output by limiting it
> to commits which meet specific conditions.  Transformation is used in
> the network-graphs sense of the word.  It fits the beginning of the
> document where it says this:
>
>   The diffcore mechanism is fed a list of such comparison results
>   (each of which is called "filepair", although at this point each
>   of them talks about a single file), and transforms such a list
>   into another list.  There are currently 5 such transformations:
>
>   - diffcore-break
>   - diffcore-rename
>   - diffcore-merge-broken
>   - diffcore-pickaxe
>   - diffcore-order

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly
  2013-05-24 10:33 ` [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly Ramkumar Ramachandra
@ 2013-05-24 17:31   ` Junio C Hamano
  2013-05-31 12:04     ` Ramkumar Ramachandra
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2013-05-24 17:31 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> The documentation of -S and -G is very sketchy.  Completely rewrite the
> sections in Documentation/diff-options.txt and
> Documentation/gitdiffcore.txt.
>
> References:
> 52e9578 ([PATCH] Introducing software archaeologist's tool "pickaxe".)
> f506b8e (git log/diff: add -G<regexp> that greps in the patch text)
>
> Inputs-from: Phil Hord <phil.hord@gmail.com>
> Co-authored-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
> ---
>  Documentation/diff-options.txt | 38 +++++++++++++++++++++++++++++--------
>  Documentation/gitdiffcore.txt  | 43 ++++++++++++++++++++++++------------------
>  2 files changed, 55 insertions(+), 26 deletions(-)
>
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 104579d..2835eef 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -383,14 +383,36 @@ ifndef::git-format-patch[]
>  	that matches other criteria, nothing is selected.
>  
>  -S<string>::
> -	Look for differences that introduce or remove an instance of
> -	<string>. Note that this is different than the string simply
> -	appearing in diff output; see the 'pickaxe' entry in
> -	linkgit:gitdiffcore[7] for more details.
> +	Look for differences that change the number of occurrences of
> +	the specified string (i.e. addition/deletion) in a file.
> +	Intended for the scripter's use.
> ++
> +It is especially useful when you're looking for an exact block of code
> +(like a struct), and want to know the history of that block since it
> +first came into being: use the feature iteratively to feed the
> +interesting block in the preimage back into `-S`, and keep going until
> +you get the very first version of the block.

OK, even though I would not say "especially" nor "useful" if I were
writing it, as it is the only use case it was designed for.

>  -G<regex>::
> +	Look for differences whose patch text contains added/removed
> +	lines that match <regex>.
> ++
> +To illustrate the difference between `-S<regex> --pickaxe-regex` and
> +`-G<regex>`, consider a commit with the following diff in the same
> +file:
> ++
> +----
> ++    return !regexec(regexp, two->ptr, 1, &regmatch, 0);
> +...
> +-    hit = !regexec(regexp, mf2.ptr, 1, &regmatch, 0);
> +----
> ++
> +While `git log -G"regexec\(regexp"` will show this commit, `git log
> +-S"regexec\(regexp" --pickaxe-regex` will not (because the number of
> +occurrences of that string did not change).
> ++
> +See the 'pickaxe' entry in linkgit:gitdiffcore[7] for more
> +information.

OK.

>  --pickaxe-regex::
> -	Make the <string> not a plain string but an extended POSIX
> -	regex to match.
> +	Treat the <string> given to `-S` as an extended POSIX regular
> +	expression to match.

OK.

> diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
> index 568d757..ef4c04a 100644
> --- a/Documentation/gitdiffcore.txt
> +++ b/Documentation/gitdiffcore.txt
> @@ -222,26 +222,33 @@ version prefixed with '+'.
>  diffcore-pickaxe: For Detecting Addition/Deletion of Specified String
>  ---------------------------------------------------------------------
>  
> -This transformation is used to find filepairs that represent
> -changes that touch a specified string, and is controlled by the
> --S option and the `--pickaxe-all` option to the 'git diff-*'
> -commands.
> -
> -When diffcore-pickaxe is in use, it checks if there are
> -filepairs whose "result" side and whose "origin" side have
> -different number of specified string.  Such a filepair represents
> -"the string appeared in this changeset".  It also checks for the
> -opposite case that loses the specified string.
> -
> -When `--pickaxe-all` is not in effect, diffcore-pickaxe leaves
> -only such filepairs that touch the specified string in its
> -output.  When `--pickaxe-all` is used, diffcore-pickaxe leaves all
> -filepairs intact if there is such a filepair, or makes the
> -output empty otherwise.  The latter behaviour is designed to
> -make reviewing of the changes in the context of the whole


> +There are two kinds of pickaxe: the S kind (corresponding to 'git log
> +-S') and the G kind (mnemonic: grep; corresponding to 'git log -G').

This is good as the beginning of a second paragraph or the second
sentence of the first paragraph.  This patch loses the description
of the general purpose of this machinery that should come at the
very beginning of the section (the original had a very good ut valid
only back when we had only -S; my "how about this" text did not have
a good one).

For example, the "rename" is about taking one set of filepairs and
expressing (some of) them as renames or copies by merging a deletion
filepair and a creation filepair into a rename-modify filepair, or
turning a creation filepair into a copy-modify filepair by finding a
preimage.  What does this transformation do?

Again here is my attempt for that missing first paragraph:

	This transformation limits the set of filepairs to those
	that change specified strings between the preimage and the
	postimage in a certain way.

        -S<block of text> and -G<regex> options are used to specify
	different ways these strings are sought.  Without
	--pickaxe-all, only the filepairs matching the given
	criterion is left in the output; all filepairs are left in
	the output when --pickaxe-all is used and if at least one
	filepair matches the given criterion.

but I do not have enough time now to condense the above down to a
readable paragraph of reasonable length (I expect that the ideal
final form would be like 5-6 lines at most).

> +"-S<block of text>" detects filepairs whose preimage and postimage
> +have different number of occurrences of the specified block of text.
> +By definition, it will not detect in-file moves.  Also, when a
> +changeset moves a file wholesale without affecting the interesting
> +string, rename detection kicks in as usual, and `-S` omits the
> +filepair (since the number of occurrences of that string didn't change
> +in that rename-detected filepair).

I am not sure why it is necessary to say anything about what the
previous step (diffcore-rename) might have done.  The input of this
(or any other) step in the diffcore pipeline is a preimage-postimage
filepairs, and to this transformation the filename does not matter.
Whether a file was moved (either "wholesale", implying nothing
changed, or renamed with modification at the same time) without
touching the block of text, or a file did not get involved in any
renaming, the only thing that matters is what the preimage and the
postimage in a filepair has (or does not have).

> + The implementation essentially
> +runs a count, and is significantly cheaper than the G kind.  When used
> +with `--pickaxe-regex`, treat the <block of text> as an extended POSIX
> +regular expression to match, instead of a literal string.

Sure.  Is "essentially runs a count" needed, though?  The reader has
just read "number of occurrences of the specified block of text" so
it would be obvious that the implementation counts.  It may be true
that it is significantly cheaper, but because they serve different
purposes, I am not sure it is worth saying.  It is like saying that
a hammer is significantly faster to drive a nail into wood than a
screwdriver to drive a screw into wood, without saying "nail" and
"screw".  It only invites readers to use a hammer to drive a screw.

> +"-G<regular expression>" detects filepairs whose textual diff has an
> +added or a deleted line that matches the given regular expression.
> +This means that it can detect in-file (or what rename-detection
> +considers the same file) moves.

"it can" sounds as if it is always a merit, which is probably not
what you wanted to imply.

When you are trying to see how a particular line came into the
shape, you would want to know what the previous shape of it was, but
a literal move will also be shown, which is a noise for the purpose
of digging.

> +The implementation runs diff twice
> +and greps, and this can be quite expensive.

Unlike the "count" one above which was obvious, the "runs diff and
greps hence expensive" part is worth saying.

> +When `-S` or `-G` are used without `--pickaxe-all`, only filepairs
> +that match their respective criterion are kept in the output.  When
> +`--pickaxe-all` is used, if even one filepair matches their respective
> +criterion in a changeset, the entire changeset is kept.  This behavior
> +is designed to make reviewing changes in the context of the whole
>  changeset easier.

OK.

>  
> -
>  diffcore-order: For Sorting the Output Based on Filenames
>  ---------------------------------------------------------

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly
  2013-05-24 17:31   ` Junio C Hamano
@ 2013-05-31 12:04     ` Ramkumar Ramachandra
  2013-06-02 19:56       ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Ramkumar Ramachandra @ 2013-05-31 12:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List

Junio C Hamano wrote:
> [...]

I agree with everything else, and made changes accordingly.

>         This transformation limits the set of filepairs to those
>         that change specified strings between the preimage and the
>         postimage in a certain way.

Definitely good.

>         -S<block of text> and -G<regex> options are used to specify
>         different ways these strings are sought.

Definitely better than the "two kinds of pickaxe" thing.

>         Without
>         --pickaxe-all, only the filepairs matching the given
>         criterion is left in the output; all filepairs are left in
>         the output when --pickaxe-all is used and if at least one
>         filepair matches the given criterion.

Why do a poor-man's version of --pickaxe-all here, when the last
paragraph already does justice to this?

When `-S` or `-G` are used without `--pickaxe-all`, only filepairs
that match their respective criterion are kept in the output.  When
`--pickaxe-all` is used, if even one filepair matches their respective
criterion in a changeset, the entire changeset is kept.  This behavior
is designed to make reviewing changes in the context of the whole
changeset easier.

> I am not sure why it is necessary to say anything about what the
> previous step (diffcore-rename) might have done.  The input of this
> (or any other) step in the diffcore pipeline is a preimage-postimage
> filepairs, and to this transformation the filename does not matter.
> Whether a file was moved (either "wholesale", implying nothing
> changed, or renamed with modification at the same time) without
> touching the block of text, or a file did not get involved in any
> renaming, the only thing that matters is what the preimage and the
> postimage in a filepair has (or does not have).

While what you're saying is technically true, I think it is important
to explain the interaction between diffcore-pickaxe and
diffcore-rename as I have done.  Someone who wants to understand what
`git log -S` does will come to this page and read this section:
without reading diffcore-rename, she will have an incomplete picture;
what's the harm in explaining diffcore-rename in the context of
diffcore-pickaxe?

I did do a s/rename detection/diffcore-rename/ though, so the user
knows where to look for more on this rename thing.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 0/2] Improve diffcore-pickaxe documentation
@ 2013-05-31 12:12 Ramkumar Ramachandra
  2013-05-31 12:12 ` [PATCH 1/2] diffcore-pickaxe: make error messages more consistent Ramkumar Ramachandra
  2013-05-31 12:12 ` [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly Ramkumar Ramachandra
  0 siblings, 2 replies; 17+ messages in thread
From: Ramkumar Ramachandra @ 2013-05-31 12:12 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano

Junio had some suggestions in the previous round.  The inter-diff
follows.

Yeah, word-diff is a bit messy.  Which brings me to: is it possible to
turn on word-diff only where heuristically appropriate?  word-diff
applied on the rewrite of the first paragraph of gitdiffcore.txt is a
disaster, but okay everywhere else.

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index efb5dfe..a85288f 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -387,11 +387,11 @@ ifndef::git-format-patch[]
	the specified string (i.e. addition/deletion) in a file.
	Intended for the scripter's use.
+
It is[-especially-] useful when you're looking for an exact block of code (like a
struct), and want to know the history of that block since it first
came into being: use the feature iteratively to feed the interesting
block in the preimage back into `-S`, and keep going until you get the
very first version of the block.

-G<regex>::
	Look for differences whose patch text contains added/removed
diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index ef4c04a..c8b3e51 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -222,25 +222,27 @@ version prefixed with '+'.
diffcore-pickaxe: For Detecting Addition/Deletion of Specified String
---------------------------------------------------------------------

[-There are two kinds of pickaxe:-]{+This transformation limits+} the [-S kind (corresponding-]{+set of filepairs+} to [-'git log-]
[--S')-]{+those that change+}
{+specified strings between the preimage+} and the [-G kind (mnemonic: grep; corresponding-]{+postimage in a certain+}
{+way.  -S<block of text> and -G<regular expression> options are used+} to
[-'git log -G').-]{+specify different ways these strings are sought.+}

"-S<block of text>" detects filepairs whose preimage and postimage
have different number of occurrences of the specified block of text.
By definition, it will not detect in-file moves.  Also, when a
changeset moves a file wholesale without affecting the interesting
string, [-rename detection-]{+diffcore-rename+} kicks in as usual, and `-S` omits the filepair
(since the number of occurrences of that string didn't change in that
rename-detected filepair).[-The implementation essentially-]
[-runs a count, and is significantly cheaper than the G kind.-]  When used with `--pickaxe-regex`, treat
the <block of text> as an extended POSIX regular expression to match,
instead of a literal string.

"-G<regular expression>" {+(mnemonic: grep)+} detects filepairs whose
textual diff has an added or a deleted line that matches the given
regular expression.  This means that it [-can-]{+will+} detect in-file (or what
rename-detection considers the same file) [-moves.-]{+moves, which is noise.+}  The
implementation runs diff twice and greps, and this can be quite
expensive.

When `-S` or `-G` are used without `--pickaxe-all`, only filepairs
that match their respective criterion are kept in the output.  When


Ramkumar Ramachandra (2):
  diffcore-pickaxe: make error messages more consistent
  diffcore-pickaxe doc: document -S and -G properly

 Documentation/diff-options.txt | 38 +++++++++++++++++++++++++++--------
 Documentation/gitdiffcore.txt  | 45 +++++++++++++++++++++++++-----------------
 diffcore-pickaxe.c             |  4 ++--
 3 files changed, 59 insertions(+), 28 deletions(-)

-- 
1.8.3.114.gcd03571

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 1/2] diffcore-pickaxe: make error messages more consistent
  2013-05-31 12:12 [PATCH v4 0/2] Improve diffcore-pickaxe documentation Ramkumar Ramachandra
@ 2013-05-31 12:12 ` Ramkumar Ramachandra
  2013-06-03 17:52   ` Junio C Hamano
  2013-05-31 12:12 ` [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly Ramkumar Ramachandra
  1 sibling, 1 reply; 17+ messages in thread
From: Ramkumar Ramachandra @ 2013-05-31 12:12 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano

Currently, diffcore-pickaxe reports two distinct errors for the same
user error:

    $ git log --pickaxe-regex -S'\1'
    fatal: invalid pickaxe regex: Invalid back reference

    $ git log -G'\1' # --pickaxe-regex is implied
    fatal: invalid log-grep regex: Invalid back reference

Since the error has nothing to do with "log-grep", change the -G and -S
error messages to say "invalid regex".

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 diffcore-pickaxe.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 63722f8..c97ac9b 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -122,7 +122,7 @@ static void diffcore_pickaxe_grep(struct diff_options *o)
 		char errbuf[1024];
 		regerror(err, &regex, errbuf, 1024);
 		regfree(&regex);
-		die("invalid log-grep regex: %s", errbuf);
+		die("invalid regex: %s", errbuf);
 	}
 
 	pickaxe(&diff_queued_diff, o, &regex, NULL, diff_grep);
@@ -246,7 +246,7 @@ static void diffcore_pickaxe_count(struct diff_options *o)
 			char errbuf[1024];
 			regerror(err, &regex, errbuf, 1024);
 			regfree(&regex);
-			die("invalid pickaxe regex: %s", errbuf);
+			die("invalid regex: %s", errbuf);
 		}
 		regexp = &regex;
 	} else {
-- 
1.8.3.114.gcd03571

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly
  2013-05-31 12:12 [PATCH v4 0/2] Improve diffcore-pickaxe documentation Ramkumar Ramachandra
  2013-05-31 12:12 ` [PATCH 1/2] diffcore-pickaxe: make error messages more consistent Ramkumar Ramachandra
@ 2013-05-31 12:12 ` Ramkumar Ramachandra
  2013-06-03 17:54   ` Junio C Hamano
  1 sibling, 1 reply; 17+ messages in thread
From: Ramkumar Ramachandra @ 2013-05-31 12:12 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano

The documentation of -S and -G is very sketchy.  Completely rewrite the
sections in Documentation/diff-options.txt and
Documentation/gitdiffcore.txt.

References:
52e9578 ([PATCH] Introducing software archaeologist's tool "pickaxe".)
f506b8e (git log/diff: add -G<regexp> that greps in the patch text)

Inputs-from: Phil Hord <phil.hord@gmail.com>
Co-authored-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 Documentation/diff-options.txt | 38 +++++++++++++++++++++++++++--------
 Documentation/gitdiffcore.txt  | 45 +++++++++++++++++++++++++-----------------
 2 files changed, 57 insertions(+), 26 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index b8a9b86..a85288f 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -383,14 +383,36 @@ ifndef::git-format-patch[]
 	that matches other criteria, nothing is selected.
 
 -S<string>::
-	Look for differences that introduce or remove an instance of
-	<string>. Note that this is different than the string simply
-	appearing in diff output; see the 'pickaxe' entry in
-	linkgit:gitdiffcore[7] for more details.
+	Look for differences that change the number of occurrences of
+	the specified string (i.e. addition/deletion) in a file.
+	Intended for the scripter's use.
++
+It is useful when you're looking for an exact block of code (like a
+struct), and want to know the history of that block since it first
+came into being: use the feature iteratively to feed the interesting
+block in the preimage back into `-S`, and keep going until you get the
+very first version of the block.
 
 -G<regex>::
-	Look for differences whose added or removed line matches
-	the given <regex>.
+	Look for differences whose patch text contains added/removed
+	lines that match <regex>.
++
+To illustrate the difference between `-S<regex> --pickaxe-regex` and
+`-G<regex>`, consider a commit with the following diff in the same
+file:
++
+----
++    return !regexec(regexp, two->ptr, 1, &regmatch, 0);
+...
+-    hit = !regexec(regexp, mf2.ptr, 1, &regmatch, 0);
+----
++
+While `git log -G"regexec\(regexp"` will show this commit, `git log
+-S"regexec\(regexp" --pickaxe-regex` will not (because the number of
+occurrences of that string did not change).
++
+See the 'pickaxe' entry in linkgit:gitdiffcore[7] for more
+information.
 
 --pickaxe-all::
 	When `-S` or `-G` finds a change, show all the changes in that
@@ -398,8 +420,8 @@ ifndef::git-format-patch[]
 	in <string>.
 
 --pickaxe-regex::
-	Make the <string> not a plain string but an extended POSIX
-	regex to match.
+	Treat the <string> given to `-S` as an extended POSIX regular
+	expression to match.
 endif::git-format-patch[]
 
 -O<orderfile>::
diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index 568d757..c8b3e51 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -222,26 +222,35 @@ version prefixed with '+'.
 diffcore-pickaxe: For Detecting Addition/Deletion of Specified String
 ---------------------------------------------------------------------
 
-This transformation is used to find filepairs that represent
-changes that touch a specified string, and is controlled by the
--S option and the `--pickaxe-all` option to the 'git diff-*'
-commands.
-
-When diffcore-pickaxe is in use, it checks if there are
-filepairs whose "result" side and whose "origin" side have
-different number of specified string.  Such a filepair represents
-"the string appeared in this changeset".  It also checks for the
-opposite case that loses the specified string.
-
-When `--pickaxe-all` is not in effect, diffcore-pickaxe leaves
-only such filepairs that touch the specified string in its
-output.  When `--pickaxe-all` is used, diffcore-pickaxe leaves all
-filepairs intact if there is such a filepair, or makes the
-output empty otherwise.  The latter behaviour is designed to
-make reviewing of the changes in the context of the whole
+This transformation limits the set of filepairs to those that change
+specified strings between the preimage and the postimage in a certain
+way.  -S<block of text> and -G<regular expression> options are used to
+specify different ways these strings are sought.
+
+"-S<block of text>" detects filepairs whose preimage and postimage
+have different number of occurrences of the specified block of text.
+By definition, it will not detect in-file moves.  Also, when a
+changeset moves a file wholesale without affecting the interesting
+string, diffcore-rename kicks in as usual, and `-S` omits the filepair
+(since the number of occurrences of that string didn't change in that
+rename-detected filepair).  When used with `--pickaxe-regex`, treat
+the <block of text> as an extended POSIX regular expression to match,
+instead of a literal string.
+
+"-G<regular expression>" (mnemonic: grep) detects filepairs whose
+textual diff has an added or a deleted line that matches the given
+regular expression.  This means that it will detect in-file (or what
+rename-detection considers the same file) moves, which is noise.  The
+implementation runs diff twice and greps, and this can be quite
+expensive.
+
+When `-S` or `-G` are used without `--pickaxe-all`, only filepairs
+that match their respective criterion are kept in the output.  When
+`--pickaxe-all` is used, if even one filepair matches their respective
+criterion in a changeset, the entire changeset is kept.  This behavior
+is designed to make reviewing changes in the context of the whole
 changeset easier.
 
-
 diffcore-order: For Sorting the Output Based on Filenames
 ---------------------------------------------------------
 
-- 
1.8.3.114.gcd03571

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly
  2013-05-31 12:04     ` Ramkumar Ramachandra
@ 2013-06-02 19:56       ` Junio C Hamano
  2013-06-02 20:28         ` Ramkumar Ramachandra
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2013-06-02 19:56 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List

Ramkumar Ramachandra <artagnon@gmail.com> writes:

>>         Without
>>         --pickaxe-all, only the filepairs matching the given
>>         criterion is left in the output; all filepairs are left in
>>         the output when --pickaxe-all is used and if at least one
>>         filepair matches the given criterion.
>
> Why do a poor-man's version of --pickaxe-all here, when the last
> paragraph already does justice to this?

The point of the first paragraph is to serve to help both:

 (1) people who read about this rather technical part of the
     diffcore pipeline machinery, to prepare them by listing what
     they will learn about in the section; and

 (2) those who already have read and want to skim over, by giving a
     concise summary.

It may have been "poor" because it was merely "something like this"
patch, though.


> While what you're saying is technically true, I think it is important
> to explain the interaction between diffcore-pickaxe and
> diffcore-rename as I have done.  Someone who wants to understand what
> `git log -S` does will come to this page and read this section:
> without reading diffcore-rename, she will have an incomplete picture;
> what's the harm in explaining diffcore-rename in the context of
> diffcore-pickaxe?

That is a red-herring; that is exactly why we upfront say that the
diffcore machinery is a pipeline and we describe upstream processing
like rename before we talk about pickaxe in the same document.

This document is the most accurate _technical_ documentation of how
the pipeline works (and it is not in section 1 of the manual set).
If you want to improve end-user documentation by adding explanation
for interactions between pipeline stages and also pathspec, I am all
for it, but I think that belongs to the larger "git help diff", not
"git help diffcore".

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly
  2013-06-02 19:56       ` Junio C Hamano
@ 2013-06-02 20:28         ` Ramkumar Ramachandra
  0 siblings, 0 replies; 17+ messages in thread
From: Ramkumar Ramachandra @ 2013-06-02 20:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List

Junio C Hamano wrote:
>> Why do a poor-man's version of --pickaxe-all here, when the last
>> paragraph already does justice to this?
>
> The point of the first paragraph is to serve to help both:

My question pertains to whether or not the explanation of
--pickaxe-all can wait till the last paragraph.  You want to
explain all the options in the first paragraph, but not
--pickaxe-regex?

>> While what you're saying is technically true, I think it is important
>> to explain the interaction between diffcore-pickaxe and
>> diffcore-rename as I have done.  Someone who wants to understand what
>> `git log -S` does will come to this page and read this section:
>> without reading diffcore-rename, she will have an incomplete picture;
>> what's the harm in explaining diffcore-rename in the context of
>> diffcore-pickaxe?
>
> This document is the most accurate _technical_ documentation of how
> the pipeline works (and it is not in section 1 of the manual set).
> If you want to improve end-user documentation by adding explanation
> for interactions between pipeline stages and also pathspec, I am all
> for it, but I think that belongs to the larger "git help diff", not
> "git help diffcore".

It's impossible to explain in the diff-options, because I do not have
even have access to the word "filepair", and no diffcore-rename
machinery to point to.  I could attempt a vague approximation, but I
do not think it's worth it.  I'll remove it if you insist.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] diffcore-pickaxe: make error messages more consistent
  2013-05-31 12:12 ` [PATCH 1/2] diffcore-pickaxe: make error messages more consistent Ramkumar Ramachandra
@ 2013-06-03 17:52   ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2013-06-03 17:52 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Currently, diffcore-pickaxe reports two distinct errors for the same
> user error:
>
>     $ git log --pickaxe-regex -S'\1'
>     fatal: invalid pickaxe regex: Invalid back reference
>
>     $ git log -G'\1' # --pickaxe-regex is implied
>     fatal: invalid log-grep regex: Invalid back reference
>
> Since the error has nothing to do with "log-grep", change the -G and -S
> error messages to say "invalid regex".

I'll reword the above somewhat; as I repeatedly explained, -G does
*not* imply pickaxe-regex at all.

While -G was being developed, it was internally called log-grep (no
relation to "git log --grep=<pattern>"), and that seeped through to
the error message.  Removing that is a good idea.

Thanks, will queue.



>
> Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
> ---
>  diffcore-pickaxe.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
> index 63722f8..c97ac9b 100644
> --- a/diffcore-pickaxe.c
> +++ b/diffcore-pickaxe.c
> @@ -122,7 +122,7 @@ static void diffcore_pickaxe_grep(struct diff_options *o)
>  		char errbuf[1024];
>  		regerror(err, &regex, errbuf, 1024);
>  		regfree(&regex);
> -		die("invalid log-grep regex: %s", errbuf);
> +		die("invalid regex: %s", errbuf);
>  	}
>  
>  	pickaxe(&diff_queued_diff, o, &regex, NULL, diff_grep);
> @@ -246,7 +246,7 @@ static void diffcore_pickaxe_count(struct diff_options *o)
>  			char errbuf[1024];
>  			regerror(err, &regex, errbuf, 1024);
>  			regfree(&regex);
> -			die("invalid pickaxe regex: %s", errbuf);
> +			die("invalid regex: %s", errbuf);
>  		}
>  		regexp = &regex;
>  	} else {

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly
  2013-05-31 12:12 ` [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly Ramkumar Ramachandra
@ 2013-06-03 17:54   ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2013-06-03 17:54 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> The documentation of -S and -G is very sketchy.  Completely rewrite the
> sections in Documentation/diff-options.txt and
> Documentation/gitdiffcore.txt.

Will queue; thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-06-03 17:54 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-31 12:12 [PATCH v4 0/2] Improve diffcore-pickaxe documentation Ramkumar Ramachandra
2013-05-31 12:12 ` [PATCH 1/2] diffcore-pickaxe: make error messages more consistent Ramkumar Ramachandra
2013-06-03 17:52   ` Junio C Hamano
2013-05-31 12:12 ` [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly Ramkumar Ramachandra
2013-06-03 17:54   ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2013-05-24 10:33 [PATCH v3 0/2] Improve diffcore-pickaxe documentation Ramkumar Ramachandra
2013-05-24 10:33 ` [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly Ramkumar Ramachandra
2013-05-24 17:31   ` Junio C Hamano
2013-05-31 12:04     ` Ramkumar Ramachandra
2013-06-02 19:56       ` Junio C Hamano
2013-06-02 20:28         ` Ramkumar Ramachandra
2013-05-17 12:23 [PATCH v2 0/2] Improve diffcore-pickaxe documentation Ramkumar Ramachandra
2013-05-17 12:23 ` [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly Ramkumar Ramachandra
2013-05-18  1:24   ` Junio C Hamano
2013-05-19  7:33     ` Junio C Hamano
2013-05-24  9:37       ` Ramkumar Ramachandra
2013-05-24 14:58         ` Phil Hord
2013-05-24 16:01           ` Ramkumar Ramachandra
2013-05-24 16:54           ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).