git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Multi-line 'git log -G<regex>'?
@ 2019-04-24 10:26 Eugeniu Rosca
  2019-04-24 15:22 ` [PATCH 0/2] diffcore-pickaxe: implement --pickaxe-raw-diff Ævar Arnfjörð Bjarmason
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Eugeniu Rosca @ 2019-04-24 10:26 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Eugeniu Rosca, Eugeniu Rosca

Hello git community,

In the context of [1], I would like to find all Linux commits which
replaced:
	'devm_request_threaded_irq(* IRQF_SHARED *)'
by:
	'devm_request_threaded_irq(* IRQF_ONESHOT *)'

Actually, I would be happy with a much lower degree of precision, e.g.
finding commits which _removed_ IRQF_SHARED and _added_ IRQF_ONESHOT.

I am aware of the difference between `git log -G` and `git log -S`, but
these two options alone don't seem to help me in this particular
scenario. Is there any git built-in way to achieve my goal?

[1] https://patchwork.kernel.org/patch/10914083/
    ("[v4,1/2] thermal: rcar_gen3_thermal: fix interrupt type")

-- 
Best regards,
Eugeniu.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 0/2] diffcore-pickaxe: implement --pickaxe-raw-diff
  2019-04-24 10:26 Multi-line 'git log -G<regex>'? Eugeniu Rosca
@ 2019-04-24 15:22 ` Ævar Arnfjörð Bjarmason
  2019-04-24 15:22 ` [PATCH 1/2] diffcore-pickaxe: refactor !one or !two case in diff_grep Ævar Arnfjörð Bjarmason
  2019-04-24 15:22 ` [PATCH 2/2] diffcore-pickaxe: add --pickaxe-raw-diff for use with -G Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 14+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-24 15:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Eugeniu Rosca, Eugeniu Rosca, Jeff King,
	Julia Lawall, Ævar Arnfjörð Bjarmason

This implements --pickaxe-raw-diff as explained in 2/2. I based this
on "next" because Duy's in-flight diff option refactoring would have
conflicted with it.

Ævar Arnfjörð Bjarmason (2):
  diffcore-pickaxe: refactor !one or !two case in diff_grep
  diffcore-pickaxe: add --pickaxe-raw-diff for use with -G

 Documentation/diff-options.txt | 17 ++++++++++++
 diff.c                         |  3 +++
 diff.h                         |  2 ++
 diffcore-pickaxe.c             | 48 +++++++++++++++++++++++++++-------
 t/t4013-diff-various.sh        |  1 +
 t/t4209-log-pickaxe.sh         | 45 +++++++++++++++++++++++++++++++
 6 files changed, 107 insertions(+), 9 deletions(-)

-- 
2.21.0.593.g511ec345e18


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] diffcore-pickaxe: refactor !one or !two case in diff_grep
  2019-04-24 10:26 Multi-line 'git log -G<regex>'? Eugeniu Rosca
  2019-04-24 15:22 ` [PATCH 0/2] diffcore-pickaxe: implement --pickaxe-raw-diff Ævar Arnfjörð Bjarmason
@ 2019-04-24 15:22 ` Ævar Arnfjörð Bjarmason
  2019-04-24 15:22 ` [PATCH 2/2] diffcore-pickaxe: add --pickaxe-raw-diff for use with -G Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 14+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-24 15:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Eugeniu Rosca, Eugeniu Rosca, Jeff King,
	Julia Lawall, Ævar Arnfjörð Bjarmason

Refactor the code around processing an added (!one) or deleted (!two)
file in diff_grep, which is used by the -G option.

This makes a subsequent change where we'd like to munge the "one" or
"two" "ptr" smaller. While we're at it let's add an assert that "one"
and "two" can't both be false at the same time, which is always the
case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index a9c6d60df2..3c6416bfe2 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -45,12 +45,16 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 	xpparam_t xpp;
 	xdemitconf_t xecfg;
 
-	if (!one)
-		return !regexec_buf(regexp, two->ptr, two->size,
-				    1, &regmatch, 0);
-	if (!two)
-		return !regexec_buf(regexp, one->ptr, one->size,
-				    1, &regmatch, 0);
+	if (!one || !two) {
+		mmfile_t *which = one ? one : two;
+		int ret;
+		char *string = which->ptr;
+		size_t size = which->size;
+		assert(!(!one && !two));
+		ret = !regexec_buf(regexp, string, size,
+				   1, &regmatch, 0);
+		return ret;
+	}
 
 	/*
 	 * We have both sides; need to run textual diff and see if
-- 
2.21.0.593.g511ec345e18


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] diffcore-pickaxe: add --pickaxe-raw-diff for use with -G
  2019-04-24 10:26 Multi-line 'git log -G<regex>'? Eugeniu Rosca
  2019-04-24 15:22 ` [PATCH 0/2] diffcore-pickaxe: implement --pickaxe-raw-diff Ævar Arnfjörð Bjarmason
  2019-04-24 15:22 ` [PATCH 1/2] diffcore-pickaxe: refactor !one or !two case in diff_grep Ævar Arnfjörð Bjarmason
@ 2019-04-24 15:22 ` Ævar Arnfjörð Bjarmason
  2019-04-24 15:37   ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 14+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-24 15:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Eugeniu Rosca, Eugeniu Rosca, Jeff King,
	Julia Lawall, Ævar Arnfjörð Bjarmason

Add the ability for the -G<regex> pickaxe to search only through added
or removed lines in the diff, or even through an arbitrary amount of
context lines when combined with -U<n>.

This has been requested[1][2] a few times in the past, and isn't
currently possible. Instead users need to do -G<regex> and then write
their own post-parsing script to see if the <regex> matched added or
removed lines, or both. There was no way to match the adjacent context
lines other than running and grepping the equivalent of a "log -p -U<n>".

1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/
2. https://public-inbox.org/git/20190424102609.GA19697@vmlxhi-102.adit-jv.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/diff-options.txt | 17 +++++++++++++
 diff.c                         |  3 +++
 diff.h                         |  2 ++
 diffcore-pickaxe.c             | 34 ++++++++++++++++++++++---
 t/t4013-diff-various.sh        |  1 +
 t/t4209-log-pickaxe.sh         | 45 ++++++++++++++++++++++++++++++++++
 6 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 09faee3b44..f367b40362 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -579,6 +579,9 @@ occurrences of that string did not change).
 Unless `--text` is supplied patches of binary files without a textconv
 filter will be ignored.
 +
+When `--pickaxe-raw-diff` is supplied the whole diff is searched
+instead of just added/removed lines. See below.
++
 See the 'pickaxe' entry in linkgit:gitdiffcore[7] for more
 information.
 
@@ -600,6 +603,20 @@ The object can be a blob or a submodule commit. It implies the `-t` option in
 	Treat the <string> given to `-S` as an extended POSIX regular
 	expression to match.
 
+--pickaxe-raw-diff::
+	When `-G` looks for a change a diff will be generated, and
+	only the added/removed lines will be matched against with the
+	"+" or "-" stripped.
++
+Supplying this option skips that pre-processing. This makes it
+possible to match only lines that added or removed something matching
+a <regex> with "\^\+<regex>" and "^-<regex>", respectively.
++
+It also allows for finding something in the diff context. E.g. "\^
+<regex>" will match the context lines (see `-U<n>` above) around the
+added/removed lines, and doing an unanchored match will match any of
+the the added/removed lines & diff context.
+
 endif::git-format-patch[]
 
 -O<orderfile>::
diff --git a/diff.c b/diff.c
index 4d3cf83a27..4cdc000ee5 100644
--- a/diff.c
+++ b/diff.c
@@ -5503,6 +5503,9 @@ static void prep_parse_options(struct diff_options *options)
 		OPT_BIT_F(0, "pickaxe-regex", &options->pickaxe_opts,
 			  N_("treat <string> in -S as extended POSIX regular expression"),
 			  DIFF_PICKAXE_REGEX, PARSE_OPT_NONEG),
+		OPT_BIT_F(0, "pickaxe-raw-diff", &options->pickaxe_opts,
+			  N_("have <string> in -G match the raw diff output"),
+			  DIFF_PICKAXE_G_RAW_DIFF, PARSE_OPT_NONEG),
 		OPT_FILENAME('O', NULL, &options->orderfile,
 			     N_("control the order in which files appear in the output")),
 		OPT_CALLBACK_F(0, "find-object", options, N_("<object-id>"),
diff --git a/diff.h b/diff.h
index b20cbcc091..d431fbc602 100644
--- a/diff.h
+++ b/diff.h
@@ -370,6 +370,8 @@ int git_config_rename(const char *var, const char *value);
 				 DIFF_PICKAXE_KIND_OBJFIND)
 
 #define DIFF_PICKAXE_IGNORE_CASE	32
+#define DIFF_PICKAXE_G_RAW_DIFF		64
+
 
 void diffcore_std(struct diff_options *);
 void diffcore_fix_diff_index(void);
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 3c6416bfe2..e23f04b4f0 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -17,14 +17,18 @@ typedef int (*pickaxe_fn)(mmfile_t *one, mmfile_t *two,
 struct diffgrep_cb {
 	regex_t *regexp;
 	int hit;
+	int raw_diff;
 };
 
 static void diffgrep_consume(void *priv, char *line, unsigned long len)
 {
 	struct diffgrep_cb *data = priv;
 	regmatch_t regmatch;
+	int raw_diff = data->raw_diff;
+	const char *string = raw_diff ? line : line + 1;
+	size_t size = raw_diff ? len : len - 1;
 
-	if (line[0] != '+' && line[0] != '-')
+	if (!raw_diff && line[0] != '+' && line[0] != '-')
 		return;
 	if (data->hit)
 		/*
@@ -32,7 +36,7 @@ static void diffgrep_consume(void *priv, char *line, unsigned long len)
 		 * caller early.
 		 */
 		return;
-	data->hit = !regexec_buf(data->regexp, line + 1, len - 1, 1,
+	data->hit = !regexec_buf(data->regexp, string, size, 1,
 				 &regmatch, 0);
 }
 
@@ -44,15 +48,36 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 	struct diffgrep_cb ecbdata;
 	xpparam_t xpp;
 	xdemitconf_t xecfg;
+	int raw_diff = o->pickaxe_opts & DIFF_PICKAXE_G_RAW_DIFF;
+	struct strbuf sb = STRBUF_INIT;
+	char *string;
+	size_t size;
 
 	if (!one || !two) {
 		mmfile_t *which = one ? one : two;
 		int ret;
-		char *string = which->ptr;
-		size_t size = which->size;
 		assert(!(!one && !two));
+		if (raw_diff) {
+			/*
+			 * When we have created/deleted files with
+			 * --pickaxe-raw-diff we need to fake up the
+			 * "+" and "-" at the start of the lines, a
+			 * plain -G without --pickaxe-raw-diff didn't
+			 * care since it would indiscriminately search
+			 * through both added and removed lines.
+			 */
+			strbuf_add_lines(&sb, !one ? "+" : "-", which->ptr,
+					 which->size);
+			string = sb.buf;
+			size = sb.len;
+		} else {
+			string = which->ptr;
+			size = which->size;
+		}
 		ret = !regexec_buf(regexp, string, size,
 				   1, &regmatch, 0);
+		if (raw_diff)
+			strbuf_release(&sb);
 		return ret;
 	}
 
@@ -64,6 +89,7 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 	memset(&xecfg, 0, sizeof(xecfg));
 	ecbdata.regexp = regexp;
 	ecbdata.hit = 0;
+	ecbdata.raw_diff = raw_diff;
 	xecfg.ctxlen = o->context;
 	xecfg.interhunkctxlen = o->interhunkcontext;
 	if (xdi_diff_outf(one, two, discard_hunk_line, diffgrep_consume,
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 9f8f0e84ad..39a1f6c230 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -276,6 +276,7 @@ log -SF master --max-count=1
 log -SF master --max-count=2
 log -GF master
 log -GF -p master
+log -GF -p --pickaxe-raw-diff master
 log -GF -p --pickaxe-all master
 log --decorate --all
 log --decorate=full --all
diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index 5d06f5f45e..2d98318d23 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -141,4 +141,49 @@ test_expect_success 'log -S looks into binary files' '
 	test_cmp log full-log
 '
 
+test_expect_success 'setup log -G --pickaxe-raw-diff' '
+	git checkout --orphan G-raw-diff &&
+	test_write_lines A B C D E F G >file &&
+	git add file &&
+	git commit --allow-empty-message file &&
+	sed -i -e "s/B/2/" file &&
+	git add file &&
+	git commit --allow-empty-message file &&
+	sed -i -e "s/D/4/" file &&
+	git add file &&
+	git commit --allow-empty-message file &&
+	git rm file &&
+	git commit --allow-empty-message &&
+	git log --oneline -1 HEAD~0 >file.fourth &&
+	git log --oneline -1 HEAD~1 >file.third &&
+	git log --oneline -1 HEAD~2 >file.second &&
+	git log --oneline -1 HEAD~3 >file.first
+'
+
+test_expect_success 'log -G --pickaxe-raw-diff skips header and range information' '
+	git log --pickaxe-raw-diff -p -G"(@@|file)" >log &&
+	test_must_be_empty log
+'
+
+test_expect_success 'log -G --pickaxe-raw-diff searching in context' '
+	git log --oneline --pickaxe-raw-diff -G"^ F" -U2 -s >log &&
+	test_cmp file.third log &&
+	git log --oneline --pickaxe-raw-diff -G"^ F" -U1 -s >log &&
+	test_must_be_empty log
+'
+
+test_expect_success 'log -G --pickaxe-raw-diff searching added / removed lines (skip create/delete)' '
+	git log --oneline --pickaxe-raw-diff -G"^-[D2]" -s HEAD~1 >log &&
+	test_cmp file.third log &&
+	git log --oneline --pickaxe-raw-diff -G"^\+[D2]" -s -1 >log &&
+	test_cmp file.second log
+'
+
+test_expect_success 'log -G --pickaxe-raw-diff searching created / deleted files' '
+	git log --oneline --pickaxe-raw-diff -G"^\+A" -s >log &&
+	test_cmp file.first log &&
+	git log --oneline --pickaxe-raw-diff -G"^\-A" -s >log &&
+	test_cmp file.fourth log
+'
+
 test_done
-- 
2.21.0.593.g511ec345e18


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe: add --pickaxe-raw-diff for use with -G
  2019-04-24 15:22 ` [PATCH 2/2] diffcore-pickaxe: add --pickaxe-raw-diff for use with -G Ævar Arnfjörð Bjarmason
@ 2019-04-24 15:37   ` Ævar Arnfjörð Bjarmason
  2019-04-24 22:46     ` Eugeniu Rosca
  0 siblings, 1 reply; 14+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-24 15:37 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Eugeniu Rosca, Eugeniu Rosca, Jeff King,
	Julia Lawall


On Wed, Apr 24 2019, Ævar Arnfjörð Bjarmason wrote:

> Add the ability for the -G<regex> pickaxe to search only through added
> or removed lines in the diff, or even through an arbitrary amount of
> context lines when combined with -U<n>.
>
> This has been requested[1][2] a few times in the past, and isn't
> currently possible. Instead users need to do -G<regex> and then write
> their own post-parsing script to see if the <regex> matched added or
> removed lines, or both. There was no way to match the adjacent context
> lines other than running and grepping the equivalent of a "log -p -U<n>".
>
> 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/
> 2. https://public-inbox.org/git/20190424102609.GA19697@vmlxhi-102.adit-jv.com/

I see now once I actually read Eugeniu Rosca's E-Mail upthread instead
of just knee-jerk sending out patches that this doesn't actually solve
his particular problem fully.

I.e. if you want some AND/OR matching support this --pickaxe-raw-diff
won't give you that, but it *does* make it much easier to script up such
an option. Run it twice with -G"\+<regex>" and -G"-<regex>", "sort |
uniq -c" the commit list, and see which things occur once or twice.

Of course that doesn't give you more complex nested and/or cases, but if
git-log grew support for that like git-grep has the -G option could use
that, although at that point we'd probably want to spend effort on
making the underlying machinery smarter to avoid duplicate work.

Furthermore, and quoting Eugeniu upthread:

    In the context of [1], I would like to find all Linux commits which
    replaced:
    	'devm_request_threaded_irq(* IRQF_SHARED *)'
    by:
    	'devm_request_threaded_irq(* IRQF_ONESHOT *)'

Such AND/OR machinery would give you what you wanted *most* of the time,
but it would also find removed/added pairs that were "unrelated" as well
as "related". Solving *that* problem is more complex, but something the
diff machinery could in principle expose.

But the "-G<regex> --pickaxe-raw-diff" feature I have as-is is very
useful, I've had at least two people off-list ask me about a problem
that would be solved by it just in the last 1/2 year (unrelated to them
having seen the WIP patch I sent last October).

It's more general than Junio's suggested --pickaxe-ignore-{add,del}
options[1], but those could be implemented in terms of this underlying
code if anyone cared to have those as aliases. You'd just take the
-G<regex> and prefix the <regex> with "^\+" or "^-" as appropriate and
turn on the DIFF_PICKAXE_G_RAW_DIFF flag.

1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe: add --pickaxe-raw-diff for use with -G
  2019-04-24 15:37   ` Ævar Arnfjörð Bjarmason
@ 2019-04-24 22:46     ` Eugeniu Rosca
  2019-04-24 23:24       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 14+ messages in thread
From: Eugeniu Rosca @ 2019-04-24 22:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Julia Lawall, Eugeniu Rosca,
	Eugeniu Rosca

Hi Ævar,

Thanks for the amazingly fast reply and for the useful feature (yay!).

On Wed, Apr 24, 2019 at 05:37:10PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Apr 24 2019, Ævar Arnfjörð Bjarmason wrote:
> 
> > Add the ability for the -G<regex> pickaxe to search only through added
> > or removed lines in the diff, or even through an arbitrary amount of
> > context lines when combined with -U<n>.
> >
> > This has been requested[1][2] a few times in the past, and isn't
> > currently possible. Instead users need to do -G<regex> and then write
> > their own post-parsing script to see if the <regex> matched added or
> > removed lines, or both. There was no way to match the adjacent context
> > lines other than running and grepping the equivalent of a "log -p -U<n>".
> >
> > 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/
> > 2. https://public-inbox.org/git/20190424102609.GA19697@vmlxhi-102.adit-jv.com/
> 
> I see now once I actually read Eugeniu Rosca's E-Mail upthread instead
> of just knee-jerk sending out patches that this doesn't actually solve
> his particular problem fully.
> 
> I.e. if you want some AND/OR matching support this --pickaxe-raw-diff
> won't give you that, but it *does* make it much easier to script up such
> an option. Run it twice with -G"\+<regex>" and -G"-<regex>", "sort |
> uniq -c" the commit list, and see which things occur once or twice.
> 
> Of course that doesn't give you more complex nested and/or cases, but if
> git-log grew support for that like git-grep has the -G option could use
> that, although at that point we'd probably want to spend effort on
> making the underlying machinery smarter to avoid duplicate work.

Purely from user's standpoint, I feel more comfortable with `git grep`
and `git log --grep` particularly b/c they support '--all-match' [2],
allowing more flexible multi-line searches. Based on your feedback, it
looks to me that `git log -G/-S` did not have a chance to develop their
features to the same level.

> 
> Furthermore, and quoting Eugeniu upthread:
> 
>     In the context of [1], I would like to find all Linux commits which
>     replaced:
>     	'devm_request_threaded_irq(* IRQF_SHARED *)'
>     by:
>     	'devm_request_threaded_irq(* IRQF_ONESHOT *)'
> 
> Such AND/OR machinery would give you what you wanted *most* of the time,
> but it would also find removed/added pairs that were "unrelated" as well
> as "related". Solving *that* problem is more complex, but something the
> diff machinery could in principle expose.

I expect some false positives, since git is agnostic on the language
used to write the versioned files (the latter sounds like a research
topic to me - I hope there is somebody willing to experiment with that
in future).

> 
> But the "-G<regex> --pickaxe-raw-diff" feature I have as-is is very
> useful, 

I agree. I am a bit bothered by the fact that
`git log --oneline -Ux -G<regex> --pickaxe-raw-diff` outputs the
contents/patch of a commit. My expectation is that we have the
`log -p` knob for that?

> I've had at least two people off-list ask me about a problem
> that would be solved by it just in the last 1/2 year (unrelated to them
> having seen the WIP patch I sent last October).
> 
> It's more general than Junio's suggested --pickaxe-ignore-{add,del}

As a user, I would be happier to freely grep in the raw commit contents
rather than learning a dozen of new options which provide small subsets
of the same functionality. So, I personally vote for the approach taken
by --pickaxe-raw-diff. This would also reduce the complexity of my
current git aliases and/or allow dropping some of them altogether.

Quite off topic, but I also needed to come up with a solution to get
the C functions modified/touched by a git commit [3]. It is my
understanding that --pickaxe-raw-diff can't help here and I still have
to rely on parsing the output of `git log -p`?

> options[1], but those could be implemented in terms of this underlying
> code if anyone cared to have those as aliases. You'd just take the
> -G<regex> and prefix the <regex> with "^\+" or "^-" as appropriate and
> turn on the DIFF_PICKAXE_G_RAW_DIFF flag.
> 
> 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/

Thanks!

[2] https://gitster.livejournal.com/30195.html
[3] https://stackoverflow.com/questions/50707171/how-to-get-all-c-functions-modified-by-a-git-commit

-- 
Best regards,
Eugeniu.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe: add --pickaxe-raw-diff for use with -G
  2019-04-24 22:46     ` Eugeniu Rosca
@ 2019-04-24 23:24       ` Ævar Arnfjörð Bjarmason
  2019-04-25  0:44         ` Junio C Hamano
  2019-04-25  0:54         ` Eugeniu Rosca
  0 siblings, 2 replies; 14+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-24 23:24 UTC (permalink / raw)
  To: Eugeniu Rosca; +Cc: git, Junio C Hamano, Jeff King, Julia Lawall, Eugeniu Rosca


On Thu, Apr 25 2019, Eugeniu Rosca wrote:

> Hi Ævar,
>
> Thanks for the amazingly fast reply and for the useful feature (yay!).
>
> On Wed, Apr 24, 2019 at 05:37:10PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Wed, Apr 24 2019, Ævar Arnfjörð Bjarmason wrote:
>>
>> > Add the ability for the -G<regex> pickaxe to search only through added
>> > or removed lines in the diff, or even through an arbitrary amount of
>> > context lines when combined with -U<n>.
>> >
>> > This has been requested[1][2] a few times in the past, and isn't
>> > currently possible. Instead users need to do -G<regex> and then write
>> > their own post-parsing script to see if the <regex> matched added or
>> > removed lines, or both. There was no way to match the adjacent context
>> > lines other than running and grepping the equivalent of a "log -p -U<n>".
>> >
>> > 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/
>> > 2. https://public-inbox.org/git/20190424102609.GA19697@vmlxhi-102.adit-jv.com/
>>
>> I see now once I actually read Eugeniu Rosca's E-Mail upthread instead
>> of just knee-jerk sending out patches that this doesn't actually solve
>> his particular problem fully.
>>
>> I.e. if you want some AND/OR matching support this --pickaxe-raw-diff
>> won't give you that, but it *does* make it much easier to script up such
>> an option. Run it twice with -G"\+<regex>" and -G"-<regex>", "sort |
>> uniq -c" the commit list, and see which things occur once or twice.
>>
>> Of course that doesn't give you more complex nested and/or cases, but if
>> git-log grew support for that like git-grep has the -G option could use
>> that, although at that point we'd probably want to spend effort on
>> making the underlying machinery smarter to avoid duplicate work.
>
> Purely from user's standpoint, I feel more comfortable with `git grep`
> and `git log --grep` particularly b/c they support '--all-match' [2],
> allowing more flexible multi-line searches. Based on your feedback, it
> looks to me that `git log -G/-S` did not have a chance to develop their
> features to the same level.
>
>>
>> Furthermore, and quoting Eugeniu upthread:
>>
>>     In the context of [1], I would like to find all Linux commits which
>>     replaced:
>>     	'devm_request_threaded_irq(* IRQF_SHARED *)'
>>     by:
>>     	'devm_request_threaded_irq(* IRQF_ONESHOT *)'
>>
>> Such AND/OR machinery would give you what you wanted *most* of the time,
>> but it would also find removed/added pairs that were "unrelated" as well
>> as "related". Solving *that* problem is more complex, but something the
>> diff machinery could in principle expose.
>
> I expect some false positives, since git is agnostic on the language
> used to write the versioned files (the latter sounds like a research
> topic to me - I hope there is somebody willing to experiment with that
> in future).

I was thinking of something where the added/removed could be filtered to
cases that occur in the same diff hunk.

>>
>> But the "-G<regex> --pickaxe-raw-diff" feature I have as-is is very
>> useful,
>
> I agree. I am a bit bothered by the fact that
> `git log --oneline -Ux -G<regex> --pickaxe-raw-diff` outputs the
> contents/patch of a commit. My expectation is that we have the
> `log -p` knob for that?

This is unrelated to --pickaxe-raw-diff, -U<n> just implies -p in
general. See e.g. "git log -U1".

>> I've had at least two people off-list ask me about a problem
>> that would be solved by it just in the last 1/2 year (unrelated to them
>> having seen the WIP patch I sent last October).
>>
>> It's more general than Junio's suggested --pickaxe-ignore-{add,del}
>
> As a user, I would be happier to freely grep in the raw commit contents
> rather than learning a dozen of new options which provide small subsets
> of the same functionality. So, I personally vote for the approach taken
> by --pickaxe-raw-diff. This would also reduce the complexity of my
> current git aliases and/or allow dropping some of them altogether.
>
> Quite off topic, but I also needed to come up with a solution to get
> the C functions modified/touched by a git commit [3]. It is my
> understanding that --pickaxe-raw-diff can't help here and I still have
> to rely on parsing the output of `git log -p`?

Yeah, it doesn't help with that. When it runs we haven't generated the
context line or the "@@" line yet, that's later. You can breakpoint on
xdl_format_hunk_hdr and diffgrep_consume to see it in action.

It's a waste of CPU to generate that for all possible hunks, most of
which we won't show at all.

But it's of course possible to do so by running the full diff machinery
over every commit and matching on the result, the current pickaxe is
just taking shortcuts and not doing that.

>> options[1], but those could be implemented in terms of this underlying
>> code if anyone cared to have those as aliases. You'd just take the
>> -G<regex> and prefix the <regex> with "^\+" or "^-" as appropriate and
>> turn on the DIFF_PICKAXE_G_RAW_DIFF flag.
>>
>> 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/
>
> Thanks!
>
> [2] https://gitster.livejournal.com/30195.html
> [3] https://stackoverflow.com/questions/50707171/how-to-get-all-c-functions-modified-by-a-git-commit

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe: add --pickaxe-raw-diff for use with -G
  2019-04-24 23:24       ` Ævar Arnfjörð Bjarmason
@ 2019-04-25  0:44         ` Junio C Hamano
  2019-04-25 12:25           ` Ævar Arnfjörð Bjarmason
  2019-05-03  8:37           ` Eugeniu Rosca
  2019-04-25  0:54         ` Eugeniu Rosca
  1 sibling, 2 replies; 14+ messages in thread
From: Junio C Hamano @ 2019-04-25  0:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Eugeniu Rosca, git, Jeff King, Julia Lawall, Eugeniu Rosca

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> I agree. I am a bit bothered by the fact that
>> `git log --oneline -Ux -G<regex> --pickaxe-raw-diff` outputs the
>> contents/patch of a commit. My expectation is that we have the
>> `log -p` knob for that?
>
> This is unrelated to --pickaxe-raw-diff, -U<n> just implies -p in
> general. See e.g. "git log -U1".

The reason why I found this exchange interesting is because I think
it shows a noteworthy gap between end-user expectations and what the
implementors know.

Stepping back (or sideways) a bit, pretend for a while that there
were no "pickaxe" feature in Git.  Instead there is the "patch-grep"
tool whose design is roughly:

   1. It reads "git log -p" output from its standard input, and
      splits the lines into records, each of which consists of the
      header part (i.e. starting at the "commit <object name>" line,
      to the first blank line before the title), the log message
      part, and the patch part.

   2. It takes command line arguments, which are, like "git grep",
      patterns to match and instructions on how to combine the match
      result.

   3. It applies the match criteria only to the patch part of each
      record.  A record without any match in the patch part is
      discarded.

   4. It uses the surviving record's "commit <object name>" lines
      to decide what commits to show.  It does the moral equivalent
      of invoking "git show" on each of them, and perhaps lets you
      affect how the commits are shown.

      Or perhaps it just lists the commit object names chosen for
      further processing by downstream tools that read from it.


So the user would be able to say something like

	git log -Ux --since=6.months |
	git patch-grep \
		--commit-names-only \
		--all-match \
		-e '+.*devm_request_threaded_irq(IRQF_SHARED)' \
                -e '-.*devm_request_threaded_irq(IRQF_ONESHOT)' |
	xargs git show --oneline -s

As an implementor, you know that is not how your -G<pattern> thing
works, but coming from the end-user side, I think it is a reasonable
mental model to expect a tool to work more like so.  And I think the
expectation from combining --oneline with -Ux was that the -U option
would apply to step 1, not step 4 (as --oneline is a clear
indication that the user wants a very concise final result).

Personally, I think the _best_ match for the original wish would be
to have that hypothetical "git patch-grep" read from "git log -L"
that is limited to the C function in the source the user is
interested in.

And until "git patch-grep" becomes reality, I would probably have
done

	git log -L<function of interest> -U<x> | less

and asked "less" to skip to a match with

	/(IRQF_SHARED|IRQF_ONESHOT)

and then kept hitting 'n' until I find what replaces them, as a
stop-gap measure.

By the way, I think your thing is interesting regardless, even if it
does not match the use case in the original thread (it actually may
match---I didn't think it through).

Because in the context of diff/log family, however, the word "raw"
has a specific connotation about the "--raw" format (as opposed to
"--patch"), I would not call this "grep the patch output itself,
instead of grepping the source (guided by the patch output to tell
what lines are near the lines that got replaced)" feature anything
"raw", by the way.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe: add --pickaxe-raw-diff for use with -G
  2019-04-24 23:24       ` Ævar Arnfjörð Bjarmason
  2019-04-25  0:44         ` Junio C Hamano
@ 2019-04-25  0:54         ` Eugeniu Rosca
  2019-04-25 12:14           ` Ævar Arnfjörð Bjarmason
  2019-05-03  3:15           ` Jeff King
  1 sibling, 2 replies; 14+ messages in thread
From: Eugeniu Rosca @ 2019-04-25  0:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Julia Lawall, Eugeniu Rosca,
	Eugeniu Rosca

On Thu, Apr 25, 2019 at 01:24:56AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Apr 25 2019, Eugeniu Rosca wrote:
> 
> > Hi Ævar,
> >
> > Thanks for the amazingly fast reply and for the useful feature (yay!).
> >
> > On Wed, Apr 24, 2019 at 05:37:10PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >>
> >> On Wed, Apr 24 2019, Ævar Arnfjörð Bjarmason wrote:
> >>
> >> > Add the ability for the -G<regex> pickaxe to search only through added
> >> > or removed lines in the diff, or even through an arbitrary amount of
> >> > context lines when combined with -U<n>.
> >> >
> >> > This has been requested[1][2] a few times in the past, and isn't
> >> > currently possible. Instead users need to do -G<regex> and then write
> >> > their own post-parsing script to see if the <regex> matched added or
> >> > removed lines, or both. There was no way to match the adjacent context
> >> > lines other than running and grepping the equivalent of a "log -p -U<n>".
> >> >
> >> > 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/
> >> > 2. https://public-inbox.org/git/20190424102609.GA19697@vmlxhi-102.adit-jv.com/
> >>
> >> I see now once I actually read Eugeniu Rosca's E-Mail upthread instead
> >> of just knee-jerk sending out patches that this doesn't actually solve
> >> his particular problem fully.
> >>
> >> I.e. if you want some AND/OR matching support this --pickaxe-raw-diff
> >> won't give you that, but it *does* make it much easier to script up such
> >> an option. Run it twice with -G"\+<regex>" and -G"-<regex>", "sort |
> >> uniq -c" the commit list, and see which things occur once or twice.
> >>
> >> Of course that doesn't give you more complex nested and/or cases, but if
> >> git-log grew support for that like git-grep has the -G option could use
> >> that, although at that point we'd probably want to spend effort on
> >> making the underlying machinery smarter to avoid duplicate work.
> >
> > Purely from user's standpoint, I feel more comfortable with `git grep`
> > and `git log --grep` particularly b/c they support '--all-match' [2],
> > allowing more flexible multi-line searches. Based on your feedback, it
> > looks to me that `git log -G/-S` did not have a chance to develop their
> > features to the same level.
> >
> >>
> >> Furthermore, and quoting Eugeniu upthread:
> >>
> >>     In the context of [1], I would like to find all Linux commits which
> >>     replaced:
> >>     	'devm_request_threaded_irq(* IRQF_SHARED *)'
> >>     by:
> >>     	'devm_request_threaded_irq(* IRQF_ONESHOT *)'
> >>
> >> Such AND/OR machinery would give you what you wanted *most* of the time,
> >> but it would also find removed/added pairs that were "unrelated" as well
> >> as "related". Solving *that* problem is more complex, but something the
> >> diff machinery could in principle expose.
> >
> > I expect some false positives, since git is agnostic on the language
> > used to write the versioned files (the latter sounds like a research
> > topic to me - I hope there is somebody willing to experiment with that
> > in future).
> 
> I was thinking of something where the added/removed could be filtered to
> cases that occur in the same diff hunk.
> 
> >>
> >> But the "-G<regex> --pickaxe-raw-diff" feature I have as-is is very
> >> useful,
> >
> > I agree. I am a bit bothered by the fact that
> > `git log --oneline -Ux -G<regex> --pickaxe-raw-diff` outputs the
> > contents/patch of a commit. My expectation is that we have the
> > `log -p` knob for that?
> 
> This is unrelated to --pickaxe-raw-diff, -U<n> just implies -p in
> general. See e.g. "git log -U1".

Oops. Since I use `-U<n>` mostly with `git show`, I missed the
implication. You are right. Then, my question is how users are
going to (quote from commit description):

> >> > [..] search [..] through an arbitrary amount of
> >> > context lines when combined with -U<n>.

and achieve a `git log --oneline` report, given that -U<n> unfolds
the commits?

FTR, based on my quick experiments, --pickaxe-raw-diff does process
several lines of context by default (it appears to default to -U3).

> 
> >> I've had at least two people off-list ask me about a problem
> >> that would be solved by it just in the last 1/2 year (unrelated to them
> >> having seen the WIP patch I sent last October).
> >>
> >> It's more general than Junio's suggested --pickaxe-ignore-{add,del}
> >
> > As a user, I would be happier to freely grep in the raw commit contents
> > rather than learning a dozen of new options which provide small subsets
> > of the same functionality. So, I personally vote for the approach taken
> > by --pickaxe-raw-diff. This would also reduce the complexity of my
> > current git aliases and/or allow dropping some of them altogether.
> >
> > Quite off topic, but I also needed to come up with a solution to get
> > the C functions modified/touched by a git commit [3]. It is my
> > understanding that --pickaxe-raw-diff can't help here and I still have
> > to rely on parsing the output of `git log -p`?
> 
> Yeah, it doesn't help with that. When it runs we haven't generated the
> context line or the "@@" line yet, that's later. You can breakpoint on
> xdl_format_hunk_hdr and diffgrep_consume to see it in action.
> 
> It's a waste of CPU to generate that for all possible hunks, most of
> which we won't show at all.
> 
> But it's of course possible to do so by running the full diff machinery
> over every commit and matching on the result, the current pickaxe is
> just taking shortcuts and not doing that.
> 
> >> options[1], but those could be implemented in terms of this underlying
> >> code if anyone cared to have those as aliases. You'd just take the
> >> -G<regex> and prefix the <regex> with "^\+" or "^-" as appropriate and
> >> turn on the DIFF_PICKAXE_G_RAW_DIFF flag.
> >>
> >> 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/
> >
> > Thanks!
> >
> > [2] https://gitster.livejournal.com/30195.html
> > [3] https://stackoverflow.com/questions/50707171/how-to-get-all-c-functions-modified-by-a-git-commit

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe: add --pickaxe-raw-diff for use with -G
  2019-04-25  0:54         ` Eugeniu Rosca
@ 2019-04-25 12:14           ` Ævar Arnfjörð Bjarmason
  2019-05-03  3:15           ` Jeff King
  1 sibling, 0 replies; 14+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-25 12:14 UTC (permalink / raw)
  To: Eugeniu Rosca; +Cc: git, Junio C Hamano, Jeff King, Julia Lawall, Eugeniu Rosca


On Thu, Apr 25 2019, Eugeniu Rosca wrote:

> On Thu, Apr 25, 2019 at 01:24:56AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Thu, Apr 25 2019, Eugeniu Rosca wrote:
>>
>> > Hi Ævar,
>> >
>> > Thanks for the amazingly fast reply and for the useful feature (yay!).
>> >
>> > On Wed, Apr 24, 2019 at 05:37:10PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >>
>> >> On Wed, Apr 24 2019, Ævar Arnfjörð Bjarmason wrote:
>> >>
>> >> > Add the ability for the -G<regex> pickaxe to search only through added
>> >> > or removed lines in the diff, or even through an arbitrary amount of
>> >> > context lines when combined with -U<n>.
>> >> >
>> >> > This has been requested[1][2] a few times in the past, and isn't
>> >> > currently possible. Instead users need to do -G<regex> and then write
>> >> > their own post-parsing script to see if the <regex> matched added or
>> >> > removed lines, or both. There was no way to match the adjacent context
>> >> > lines other than running and grepping the equivalent of a "log -p -U<n>".
>> >> >
>> >> > 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/
>> >> > 2. https://public-inbox.org/git/20190424102609.GA19697@vmlxhi-102.adit-jv.com/
>> >>
>> >> I see now once I actually read Eugeniu Rosca's E-Mail upthread instead
>> >> of just knee-jerk sending out patches that this doesn't actually solve
>> >> his particular problem fully.
>> >>
>> >> I.e. if you want some AND/OR matching support this --pickaxe-raw-diff
>> >> won't give you that, but it *does* make it much easier to script up such
>> >> an option. Run it twice with -G"\+<regex>" and -G"-<regex>", "sort |
>> >> uniq -c" the commit list, and see which things occur once or twice.
>> >>
>> >> Of course that doesn't give you more complex nested and/or cases, but if
>> >> git-log grew support for that like git-grep has the -G option could use
>> >> that, although at that point we'd probably want to spend effort on
>> >> making the underlying machinery smarter to avoid duplicate work.
>> >
>> > Purely from user's standpoint, I feel more comfortable with `git grep`
>> > and `git log --grep` particularly b/c they support '--all-match' [2],
>> > allowing more flexible multi-line searches. Based on your feedback, it
>> > looks to me that `git log -G/-S` did not have a chance to develop their
>> > features to the same level.
>> >
>> >>
>> >> Furthermore, and quoting Eugeniu upthread:
>> >>
>> >>     In the context of [1], I would like to find all Linux commits which
>> >>     replaced:
>> >>     	'devm_request_threaded_irq(* IRQF_SHARED *)'
>> >>     by:
>> >>     	'devm_request_threaded_irq(* IRQF_ONESHOT *)'
>> >>
>> >> Such AND/OR machinery would give you what you wanted *most* of the time,
>> >> but it would also find removed/added pairs that were "unrelated" as well
>> >> as "related". Solving *that* problem is more complex, but something the
>> >> diff machinery could in principle expose.
>> >
>> > I expect some false positives, since git is agnostic on the language
>> > used to write the versioned files (the latter sounds like a research
>> > topic to me - I hope there is somebody willing to experiment with that
>> > in future).
>>
>> I was thinking of something where the added/removed could be filtered to
>> cases that occur in the same diff hunk.
>>
>> >>
>> >> But the "-G<regex> --pickaxe-raw-diff" feature I have as-is is very
>> >> useful,
>> >
>> > I agree. I am a bit bothered by the fact that
>> > `git log --oneline -Ux -G<regex> --pickaxe-raw-diff` outputs the
>> > contents/patch of a commit. My expectation is that we have the
>> > `log -p` knob for that?
>>
>> This is unrelated to --pickaxe-raw-diff, -U<n> just implies -p in
>> general. See e.g. "git log -U1".
>
> Oops. Since I use `-U<n>` mostly with `git show`, I missed the
> implication. You are right. Then, my question is how users are
> going to (quote from commit description):
>
>> >> > [..] search [..] through an arbitrary amount of
>> >> > context lines when combined with -U<n>.
>
> and achieve a `git log --oneline` report, given that -U<n> unfolds
> the commits?
>
> FTR, based on my quick experiments, --pickaxe-raw-diff does process
> several lines of context by default (it appears to default to -U3).

Yeah I should document this explicitly. We use the default diff context
so if you just -G'foo.*bar' you'll find things in the 6x lines of
context (3 before / 3 after), not just the "-" and "+" lines.

It's a "feature", but we should be really clear about it, i.e. you need
to anchor with "^[+-]" if you want the same thing that -G does for you
now.

I *do* find the default semantics really useful. Sometimes you can use
-L, but I've often done manual greps with -U<n> for "let's find code
changes anywhere in the project near places where we use some API",
maybe we should pick -U0 with --pickaxe-raw-diff by default to avoid
*that* particular surprise by default, but I think that would be even
more confusing...

>>
>> >> I've had at least two people off-list ask me about a problem
>> >> that would be solved by it just in the last 1/2 year (unrelated to them
>> >> having seen the WIP patch I sent last October).
>> >>
>> >> It's more general than Junio's suggested --pickaxe-ignore-{add,del}
>> >
>> > As a user, I would be happier to freely grep in the raw commit contents
>> > rather than learning a dozen of new options which provide small subsets
>> > of the same functionality. So, I personally vote for the approach taken
>> > by --pickaxe-raw-diff. This would also reduce the complexity of my
>> > current git aliases and/or allow dropping some of them altogether.
>> >
>> > Quite off topic, but I also needed to come up with a solution to get
>> > the C functions modified/touched by a git commit [3]. It is my
>> > understanding that --pickaxe-raw-diff can't help here and I still have
>> > to rely on parsing the output of `git log -p`?
>>
>> Yeah, it doesn't help with that. When it runs we haven't generated the
>> context line or the "@@" line yet, that's later. You can breakpoint on
>> xdl_format_hunk_hdr and diffgrep_consume to see it in action.
>>
>> It's a waste of CPU to generate that for all possible hunks, most of
>> which we won't show at all.
>>
>> But it's of course possible to do so by running the full diff machinery
>> over every commit and matching on the result, the current pickaxe is
>> just taking shortcuts and not doing that.
>>
>> >> options[1], but those could be implemented in terms of this underlying
>> >> code if anyone cared to have those as aliases. You'd just take the
>> >> -G<regex> and prefix the <regex> with "^\+" or "^-" as appropriate and
>> >> turn on the DIFF_PICKAXE_G_RAW_DIFF flag.
>> >>
>> >> 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@gitster-ct.c.googlers.com/
>> >
>> > Thanks!
>> >
>> > [2] https://gitster.livejournal.com/30195.html
>> > [3] https://stackoverflow.com/questions/50707171/how-to-get-all-c-functions-modified-by-a-git-commit

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe: add --pickaxe-raw-diff for use with -G
  2019-04-25  0:44         ` Junio C Hamano
@ 2019-04-25 12:25           ` Ævar Arnfjörð Bjarmason
  2019-05-03  9:10             ` Eugeniu Rosca
  2019-05-03  8:37           ` Eugeniu Rosca
  1 sibling, 1 reply; 14+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-25 12:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eugeniu Rosca, git, Jeff King, Julia Lawall, Eugeniu Rosca


On Thu, Apr 25 2019, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>>> I agree. I am a bit bothered by the fact that
>>> `git log --oneline -Ux -G<regex> --pickaxe-raw-diff` outputs the
>>> contents/patch of a commit. My expectation is that we have the
>>> `log -p` knob for that?
>>
>> This is unrelated to --pickaxe-raw-diff, -U<n> just implies -p in
>> general. See e.g. "git log -U1".
>
> The reason why I found this exchange interesting is because I think
> it shows a noteworthy gap between end-user expectations and what the
> implementors know.
>
> Stepping back (or sideways) a bit, pretend for a while that there
> were no "pickaxe" feature in Git.  Instead there is the "patch-grep"
> tool whose design is roughly:
>
>    1. It reads "git log -p" output from its standard input, and
>       splits the lines into records, each of which consists of the
>       header part (i.e. starting at the "commit <object name>" line,
>       to the first blank line before the title), the log message
>       part, and the patch part.
>
>    2. It takes command line arguments, which are, like "git grep",
>       patterns to match and instructions on how to combine the match
>       result.
>
>    3. It applies the match criteria only to the patch part of each
>       record.  A record without any match in the patch part is
>       discarded.
>
>    4. It uses the surviving record's "commit <object name>" lines
>       to decide what commits to show.  It does the moral equivalent
>       of invoking "git show" on each of them, and perhaps lets you
>       affect how the commits are shown.
>
>       Or perhaps it just lists the commit object names chosen for
>       further processing by downstream tools that read from it.
>
>
> So the user would be able to say something like
>
> 	git log -Ux --since=6.months |
> 	git patch-grep \
> 		--commit-names-only \
> 		--all-match \
> 		-e '+.*devm_request_threaded_irq(IRQF_SHARED)' \
>                 -e '-.*devm_request_threaded_irq(IRQF_ONESHOT)' |
> 	xargs git show --oneline -s
>
> As an implementor, you know that is not how your -G<pattern> thing
> works, but coming from the end-user side, I think it is a reasonable
> mental model to expect a tool to work more like so.  And I think the
> expectation from combining --oneline with -Ux was that the -U option
> would apply to step 1, not step 4 (as --oneline is a clear
> indication that the user wants a very concise final result).
>
> Personally, I think the _best_ match for the original wish would be
> to have that hypothetical "git patch-grep" read from "git log -L"
> that is limited to the C function in the source the user is
> interested in.
>
> And until "git patch-grep" becomes reality, I would probably have
> done
>
> 	git log -L<function of interest> -U<x> | less
>
> and asked "less" to skip to a match with
>
> 	/(IRQF_SHARED|IRQF_ONESHOT)
>
> and then kept hitting 'n' until I find what replaces them, as a
> stop-gap measure.
>
> By the way, I think your thing is interesting regardless, even if it
> does not match the use case in the original thread (it actually may
> match---I didn't think it through).

Yeah it's definitely a bit orthagonal, should have sent it in reply to
something else and actually read the E-Mail, but I think it's useful.

> Because in the context of diff/log family, however, the word "raw"
> has a specific connotation about the "--raw" format (as opposed to
> "--patch"), I would not call this "grep the patch output itself,
> instead of grepping the source (guided by the patch output to tell
> what lines are near the lines that got replaced)" feature anything
> "raw", by the way.

I agree, brainfarted on not thinking about "raw". Do you or anyone have
a suggestion for a better CLI option name?

Maybe --pickaxe-patch or --pickaxe-patch-format (to go with git-diff's
-u aka --patch (i.e. not --raw) default format)? Or
--pickaxe-G-with-context or --pickaxe-with-context or
--with-pickaxe-context or --pickaxe-context ? All of these suck, but I'm
coming up blank on a better one :)

Probably the least shitty of those shitty options is --pickaxe-patch,
since we have --patch which triggers the same format, and we can
document that the default is a -G search through --no-pickaxe-patch, and
you can just tweak the format.

It also leaves the door open (unlike having *-G-* in the option) to
support this for -S if anyone cared...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe: add --pickaxe-raw-diff for use with -G
  2019-04-25  0:54         ` Eugeniu Rosca
  2019-04-25 12:14           ` Ævar Arnfjörð Bjarmason
@ 2019-05-03  3:15           ` Jeff King
  1 sibling, 0 replies; 14+ messages in thread
From: Jeff King @ 2019-05-03  3:15 UTC (permalink / raw)
  To: Eugeniu Rosca
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Julia Lawall, Eugeniu Rosca

On Thu, Apr 25, 2019 at 02:54:48AM +0200, Eugeniu Rosca wrote:

> > This is unrelated to --pickaxe-raw-diff, -U<n> just implies -p in
> > general. See e.g. "git log -U1".
> 
> Oops. Since I use `-U<n>` mostly with `git show`, I missed the
> implication. You are right. Then, my question is how users are
> going to (quote from commit description):
> 
> > >> > [..] search [..] through an arbitrary amount of
> > >> > context lines when combined with -U<n>.
> 
> and achieve a `git log --oneline` report, given that -U<n> unfolds
> the commits?

You can use "-s" to suppress patch output; as long as it comes after -U
on the command-line, it will countermand the patch-format part.

(Of course it doesn't matter until we have a raw-diff grep, since
otherwise the context lines do not matter at all, and you should just
omit -U entirely).

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe: add --pickaxe-raw-diff for use with -G
  2019-04-25  0:44         ` Junio C Hamano
  2019-04-25 12:25           ` Ævar Arnfjörð Bjarmason
@ 2019-05-03  8:37           ` Eugeniu Rosca
  1 sibling, 0 replies; 14+ messages in thread
From: Eugeniu Rosca @ 2019-05-03  8:37 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, Jeff King,
	Julia Lawall, Eugeniu Rosca, Eugeniu Rosca

On Thu, Apr 25, 2019 at 09:44:40AM +0900, Junio C Hamano wrote:
[..]
> So the user would be able to say something like
> 
> 	git log -Ux --since=6.months |
> 	git patch-grep \
> 		--commit-names-only \
> 		--all-match \
> 		-e '+.*devm_request_threaded_irq(IRQF_SHARED)' \
>                 -e '-.*devm_request_threaded_irq(IRQF_ONESHOT)' |
> 	xargs git show --oneline -s
[..]

JFTR/FWIW, this looks quite user friendly to me, but I believe users
like me can (and likely do) already handcraft their own variants of
'git patch-grep', since the above model implies piping (as opposed to
in-git processing done by --pickaxe-raw-diff).

-- 
Best Regards,
Eugeniu.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] diffcore-pickaxe: add --pickaxe-raw-diff for use with -G
  2019-04-25 12:25           ` Ævar Arnfjörð Bjarmason
@ 2019-05-03  9:10             ` Eugeniu Rosca
  0 siblings, 0 replies; 14+ messages in thread
From: Eugeniu Rosca @ 2019-05-03  9:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, git, Jeff King, Julia Lawall, Eugeniu Rosca,
	Eugeniu Rosca

On Thu, Apr 25, 2019 at 02:25:13PM +0200, Ævar Arnfjörð Bjarmason wrote:
[..]
> Do you or anyone have a suggestion for a better CLI option name?
> 
> Maybe --pickaxe-patch or --pickaxe-patch-format (to go with git-diff's
> -u aka --patch (i.e. not --raw) default format)? Or
> --pickaxe-G-with-context or --pickaxe-with-context or
> --with-pickaxe-context or --pickaxe-context ? All of these suck, but I'm
> coming up blank on a better one :)

'--pickaxe-patch' is shorter than '--pickaxe-raw-diff', hence more
convenient to me. It looks like 'pickaxe-all' and 'pickaxe-regex' are
the only --pickaxe-* options currently implemented. Both of them are
two-worded only and easy to remember/type. I think '--pickaxe-patch'
is more user-friendly, but I leave git people to say the final word.

> 
> Probably the least shitty of those shitty options is --pickaxe-patch,
> since we have --patch which triggers the same format, and we can
> document that the default is a -G search through --no-pickaxe-patch, and
> you can just tweak the format.
> 
> It also leaves the door open (unlike having *-G-* in the option) to
> support this for -S if anyone cared...

-- 
Best Regards,
Eugeniu.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-05-03  9:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-24 10:26 Multi-line 'git log -G<regex>'? Eugeniu Rosca
2019-04-24 15:22 ` [PATCH 0/2] diffcore-pickaxe: implement --pickaxe-raw-diff Ævar Arnfjörð Bjarmason
2019-04-24 15:22 ` [PATCH 1/2] diffcore-pickaxe: refactor !one or !two case in diff_grep Ævar Arnfjörð Bjarmason
2019-04-24 15:22 ` [PATCH 2/2] diffcore-pickaxe: add --pickaxe-raw-diff for use with -G Ævar Arnfjörð Bjarmason
2019-04-24 15:37   ` Ævar Arnfjörð Bjarmason
2019-04-24 22:46     ` Eugeniu Rosca
2019-04-24 23:24       ` Ævar Arnfjörð Bjarmason
2019-04-25  0:44         ` Junio C Hamano
2019-04-25 12:25           ` Ævar Arnfjörð Bjarmason
2019-05-03  9:10             ` Eugeniu Rosca
2019-05-03  8:37           ` Eugeniu Rosca
2019-04-25  0:54         ` Eugeniu Rosca
2019-04-25 12:14           ` Ævar Arnfjörð Bjarmason
2019-05-03  3:15           ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).