git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* RFH - git-log variant that _does_ search through diffs
@ 2009-06-30  0:08 Eric Raible
  2009-06-30  4:03 ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Raible @ 2009-06-30  0:08 UTC (permalink / raw
  To: Git Mailing List

[Surely this has been address before,  but I wasn't able to find it...]

The documentation for git-log -S includes:

"Look for differences that introduce or remove an instance of <string>.
 Note that this is different than the string simply appearing in diff output"

But I want to do that "different" thing (IOW I want search the diff output).

So must I loop through git-rev-list, grepping git-diff output on each commit?

Or if it _is_ possible to search the diff output directly then it
might be useful
to link to the relevant description instead of saying what -S doesn't do.

Thanks - Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFH - git-log variant that _does_ search through diffs
  2009-06-30  0:08 RFH - git-log variant that _does_ search through diffs Eric Raible
@ 2009-06-30  4:03 ` Jeff King
  2009-06-30  8:09   ` Junio C Hamano
  2009-06-30 18:05   ` Eric Raible
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff King @ 2009-06-30  4:03 UTC (permalink / raw
  To: Eric Raible; +Cc: Git Mailing List

On Mon, Jun 29, 2009 at 05:08:47PM -0700, Eric Raible wrote:

> [Surely this has been address before,  but I wasn't able to find it...]

There is some discussion here:

  http://article.gmane.org/gmane.comp.version-control.git/112077

> The documentation for git-log -S includes:
> 
> "Look for differences that introduce or remove an instance of <string>.
>  Note that this is different than the string simply appearing in diff output"
> 
> But I want to do that "different" thing (IOW I want search the diff output).
> 
> So must I loop through git-rev-list, grepping git-diff output on each commit?

Currently, yes. There is no way to do it internally. A patch to
implement it would probably be accepted, though (see the thread I
mentioned above for more details).

You can at least combine rev-list and diff into one command, and grep
like this (for 'foo'):

  git log -z -p | perl -0ne 'print if /^[-+].*foo/m' | tr '\0' '\n'

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFH - git-log variant that _does_ search through diffs
  2009-06-30  4:03 ` Jeff King
@ 2009-06-30  8:09   ` Junio C Hamano
  2009-06-30 18:06     ` Eric Raible
  2009-06-30 18:05   ` Eric Raible
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2009-06-30  8:09 UTC (permalink / raw
  To: Jeff King; +Cc: Eric Raible, Git Mailing List

Jeff King <peff@peff.net> writes:

> On Mon, Jun 29, 2009 at 05:08:47PM -0700, Eric Raible wrote:
>
>> [Surely this has been address before,  but I wasn't able to find it...]
>
> There is some discussion here:
>
>   http://article.gmane.org/gmane.comp.version-control.git/112077
>
> Currently, yes. There is no way to do it internally. A patch to
> implement it would probably be accepted, though (see the thread I
> mentioned above for more details).

Specifically:

    http://thread.gmane.org/gmane.comp.version-control.git/112077/focus=112114

and its cousin

    http://article.gmane.org/gmane.comp.version-control.git/112141

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFH - git-log variant that _does_ search through diffs
  2009-06-30  4:03 ` Jeff King
  2009-06-30  8:09   ` Junio C Hamano
@ 2009-06-30 18:05   ` Eric Raible
  2009-06-30 19:31     ` Jeff King
  1 sibling, 1 reply; 12+ messages in thread
From: Eric Raible @ 2009-06-30 18:05 UTC (permalink / raw
  To: Jeff King; +Cc: Git Mailing List

On Mon, Jun 29, 2009 at 9:03 PM, Jeff King<peff@peff.net> wrote:

> You can at least combine rev-list and diff into one command, and grep
> like this (for 'foo'):
>
>  git log -z -p | perl -0ne 'print if /^[-+].*foo/m' | tr '\0' '\n'
>
> -Peff

Thank you, that will do very nicely as a starting point.

What I _really_ want is the subset of all commits containing foo
who's oneline commit message doesn't match a given regexp.

So I'm used something like this to extract the commits of interest:

git log -z -p | perl -0ne 'print if /^[-+].*foo/m' | tr '\0' '\n' |
grep "^commit [0-9a-f]" | awk '{print $2}' |
xargs -n1 git log --pretty=oneline -1 |
grep -v dont_want

In this specific case of wanting to ignore particular commits a loop
over git-rev-list might yield a better solution.  But the 'git-log | perl | tr'
snippet is a nice idiom for day-to-day use.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFH - git-log variant that _does_ search through diffs
  2009-06-30  8:09   ` Junio C Hamano
@ 2009-06-30 18:06     ` Eric Raible
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Raible @ 2009-06-30 18:06 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Jeff King, Git Mailing List

On Tue, Jun 30, 2009 at 1:09 AM, Junio C Hamano<gitster@pobox.com> wrote:
>
> Specifically:
>
>    http://thread.gmane.org/gmane.comp.version-control.git/112077/focus=112114
>
> and its cousin
>
>    http://article.gmane.org/gmane.comp.version-control.git/112141

Thanks - that's exactly what I was looking for but didn't find.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFH - git-log variant that _does_ search through diffs
  2009-06-30 18:05   ` Eric Raible
@ 2009-06-30 19:31     ` Jeff King
  2009-06-30 21:22       ` Eric Raible
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2009-06-30 19:31 UTC (permalink / raw
  To: Eric Raible; +Cc: Git Mailing List

On Tue, Jun 30, 2009 at 11:05:08AM -0700, Eric Raible wrote:

> What I _really_ want is the subset of all commits containing foo
> who's oneline commit message doesn't match a given regexp.
> 
> So I'm used something like this to extract the commits of interest:
> 
> git log -z -p | perl -0ne 'print if /^[-+].*foo/m' | tr '\0' '\n' |
> grep "^commit [0-9a-f]" | awk '{print $2}' |
> xargs -n1 git log --pretty=oneline -1 |
> grep -v dont_want

I think you can do this a little more simply and efficiently as:

  git log -z -p --format='GREP: %s' |
    perl -0ne 'print if /^[-+].*foo/m && !/^GREP:.*dont_want/' |
    tr '\0' '\n'

(though note that --format is new as of 1.6.3, I think; before that you
have to use "--pretty=format:"). Many fewer process invocations, and
less typing, though still easy to mess up. At one point I had considered
writing small wrapper scripts that understood the log output so you
could say:

  git log -z -p | filter-author $A | filter-diff $D | filter-subject $S

which is nicely readable and Unix-y, but is really _slow_ compared to
git doing it all in a single process. I think a "--grep-subject" and a
"--grep-diff" (aka "--search") are the only things that are missing now,
and those would both be pretty easy to implement.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFH - git-log variant that _does_ search through diffs
  2009-06-30 19:31     ` Jeff King
@ 2009-06-30 21:22       ` Eric Raible
  2009-06-30 21:34         ` Eric Raible
  2009-07-01  7:02         ` Jeff King
  0 siblings, 2 replies; 12+ messages in thread
From: Eric Raible @ 2009-06-30 21:22 UTC (permalink / raw
  To: Jeff King; +Cc: Git Mailing List

On Tue, Jun 30, 2009 at 12:31 PM, Jeff King<peff@peff.net> wrote:
>
> I think you can do this a little more simply and efficiently as:
>
>  git log -z -p --format='GREP: %s' |
>    perl -0ne 'print if /^[-+].*foo/m && !/^GREP:.*dont_want/' |
>    tr '\0' '\n'
>
> (though note that --format is new as of 1.6.3, I think; before that you
> have to use "--pretty=format:"). Many fewer process invocations, and
> less typing, though still easy to mess up.

I agree that --format leads to a much prettier solution.
Unfortunately --format seems to turn off -z (at least in msysgit):

$ git --version
git version 1.6.3.2.1299.gee46c
$ git log -p > L1
$ git log -p -z > L2
$ diff L1 L2 | wc
   2415    4347   62889
$ git log -p --format=%s > L1
$ git log -p -z --format=%s > L2
$ diff L1 L2 | wc
      0       0       0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFH - git-log variant that _does_ search through diffs
  2009-06-30 21:22       ` Eric Raible
@ 2009-06-30 21:34         ` Eric Raible
  2009-07-01  7:02         ` Jeff King
  1 sibling, 0 replies; 12+ messages in thread
From: Eric Raible @ 2009-06-30 21:34 UTC (permalink / raw
  To: Jeff King; +Cc: Git Mailing List

On Tue, Jun 30, 2009 at 2:22 PM, Eric Raible<raible@gmail.com> wrote:
> On Tue, Jun 30, 2009 at 12:31 PM, Jeff King<peff@peff.net> wrote:
>>
>> I think you can do this a little more simply and efficiently as:
>>
>>  git log -z -p --format='GREP: %s' |
>>    perl -0ne 'print if /^[-+].*foo/m && !/^GREP:.*dont_want/' |
>>    tr '\0' '\n'
>>
>> (though note that --format is new as of 1.6.3, I think; before that you
>> have to use "--pretty=format:"). Many fewer process invocations, and
>> less typing, though still easy to mess up.
>
> I agree that --format leads to a much prettier solution.
> Unfortunately --format seems to turn off -z (at least in msysgit):

Sorry to self-reply, but one obvious workaround is to encode the NULL
explicitly:

git log -z -p --format='%x00GREP: %s' | ...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFH - git-log variant that _does_ search through diffs
  2009-06-30 21:22       ` Eric Raible
  2009-06-30 21:34         ` Eric Raible
@ 2009-07-01  7:02         ` Jeff King
  2009-07-01  7:26           ` [PATCH 1/2] log-tree: fix confusing comment Jeff King
  2009-07-01  7:26           ` [RFC/PATCH 2/2] fix missing NUL termination with pretty=tformat Jeff King
  1 sibling, 2 replies; 12+ messages in thread
From: Jeff King @ 2009-07-01  7:02 UTC (permalink / raw
  To: Eric Raible; +Cc: Junio C Hamano, Git Mailing List

On Tue, Jun 30, 2009 at 02:22:43PM -0700, Eric Raible wrote:

> Unfortunately --format seems to turn off -z (at least in msysgit):
> 
> $ git --version
> git version 1.6.3.2.1299.gee46c
> $ git log -p > L1
> $ git log -p -z > L2
> $ diff L1 L2 | wc
>    2415    4347   62889
> $ git log -p --format=%s > L1
> $ git log -p -z --format=%s > L2
> $ diff L1 L2 | wc
>       0       0       0

Ugh. I did some looking into this. It actually does work if you do:

  git log -p -z --pretty=format:%s

rather than

  git log -p -z --pretty=tformat:%s

And --format=%s behaves as if 'tformat' was given. And if you are not up
to date on the difference, "format" implies "separator semantics", where
the NUL is placed _between_ each record. "tformat" implies "terminator
semantics", where the NUL is placed at the end.

Placing the separator correctly is fairly easy; when we start a new
record, if we are not the first, we print the separator. Placing a
terminator is a little trickier because of the way the code is
structured. I'll post my attempt in a minute; see patch 2/2 for more
discussion.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] log-tree: fix confusing comment
  2009-07-01  7:02         ` Jeff King
@ 2009-07-01  7:26           ` Jeff King
  2009-07-01  7:26           ` [RFC/PATCH 2/2] fix missing NUL termination with pretty=tformat Jeff King
  1 sibling, 0 replies; 12+ messages in thread
From: Jeff King @ 2009-07-01  7:26 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Eric Raible, Git Mailing List

This comment mentions the case where use_terminator is set,
but this case is not handled at all by this chunk of code.

Signed-off-by: Jeff King <peff@peff.net>
---
This comment confused me quite a bit while tracking down the issue in
2/2.

 log-tree.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/log-tree.c b/log-tree.c
index 59d63eb..6f73c17 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -321,7 +321,8 @@ void show_log(struct rev_info *opt)
 	}
 
 	/*
-	 * If use_terminator is set, add a newline at the end of the entry.
+	 * If use_terminator is set, we already handled any record termination
+	 * at the end of the last record.
 	 * Otherwise, add a diffopt.line_termination character before all
 	 * entries but the first.  (IOW, as a separator between entries)
 	 */
-- 
1.6.3.3.485.g0f5d4.dirty

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC/PATCH 2/2] fix missing NUL termination with pretty=tformat
  2009-07-01  7:02         ` Jeff King
  2009-07-01  7:26           ` [PATCH 1/2] log-tree: fix confusing comment Jeff King
@ 2009-07-01  7:26           ` Jeff King
  2009-07-01  8:27             ` Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff King @ 2009-07-01  7:26 UTC (permalink / raw
  To: Eric Raible; +Cc: Junio C Hamano, Git Mailing List

When using "git log -z" with --pretty=format's "separator"
semantics, we correctly insert a NUL between each record.
However, with "--pretty=tformat", we output no NULs at all,
whereas we should output one after each commit.

We can't just put a conditional in the code for the
"separator" case; that code is triggered at a completely
different time: when _starting_ a new commit, and we are not
the first commit to be shown. As opposed to termination
semantics, which means we must print the terminator at the
end of each commit.

Adding to the trickiness is that we must handle two cases:
with and without diff output. In fact, there is already a
spot (log-tree.c, ll. 442-445) which adds a hard-coded
newline as a terminator after the commit message. But we
can't just modify that to use the specified line terminator,
because sometimes it is acting as a separator between commit
message and diff, and sometimes it is acting as the
terminator of the whole record.

Simply adding another terminator after each commit has been
shown will end up with doubled newlines for short user
formats (like '%s'). Instead, we add the record terminator
only if it is not a newline, in which case the output will
actually contain a newline followed by the terminator.

Signed-off-by: Jeff King <peff@peff.net>
---
This one is RFC. It is missing tests, but that is because I am not
completely sure what we want the output to look like. With this change,
you still get a newline at the end of a single-line user-formatted
string, like:

  $ git log -z --format:%s
  three
  ^@two
  ^@one
  ^@

As explained above, it is not correct to simply turn that putchar('\n')
into putchar(opt->diffopt.line_termination), since it may be followed by
the diff, in which case we _want_ the newline. But maybe it makes sense
to suppress it if we have an alternate line terminator and we are not
showing the diff.

The case with a diff looks much better:

  $ git log -z -p --format:%s
  three

  diff ...
  ...
  ^@two

  diff ...
  ...
  ^@one

  diff ...
  ...
  ^@

So I'm not sure if it is OK as-is, or if we should add that newline
suppression tweak.  People aren't going to be looking at "-z" output as
a whole, so it is really about whether somebody saying:

  git log -z --pretty=tformat:"XXX %s YYY" | perl -0ne ...

would be surprised to find a newline after "YYY" in each record.

 log-tree.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/log-tree.c b/log-tree.c
index 6f73c17..36930ff 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -557,6 +557,14 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
 		show_log(opt);
 		shown = 1;
 	}
+	/*
+	 * If we have a terminator, we want to print it here. Unless
+	 * it is a newline, in which case we will already have just
+	 * printed one at the end of the commit message or at the end
+	 * of the diff, and we will end up doubling it.
+	 */
+	if (opt->use_terminator && opt->diffopt.line_termination != '\n')
+		putchar(opt->diffopt.line_termination);
 	opt->loginfo = NULL;
 	maybe_flush_or_die(stdout, "stdout");
 	return shown;
-- 
1.6.3.3.485.g0f5d4.dirty

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH 2/2] fix missing NUL termination with pretty=tformat
  2009-07-01  7:26           ` [RFC/PATCH 2/2] fix missing NUL termination with pretty=tformat Jeff King
@ 2009-07-01  8:27             ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2009-07-01  8:27 UTC (permalink / raw
  To: Jeff King; +Cc: Eric Raible, Git Mailing List

Jeff King <peff@peff.net> writes:

> This one is RFC. It is missing tests, but that is because I am not
> completely sure what we want the output to look like. With this change,
> you still get a newline at the end of a single-line user-formatted
> string, like:
>
>   $ git log -z --format:%s
>   three
>   ^@two
>   ^@one
>   ^@

I haven't thought things through, but my knee-jerk reaction is that we
should treat "%s" itself as not having a final LF in itself.  So I would
actually expect the above to produce

    three^@two^@one^@

In other words, --pretty=format:%s and --pretty=format:X-%s-Y should
behave identically.  Strings produced by these format specifications will
not have terminating LF appended (because %s itself does not end with LF,
and in the second example you did not write any LF after your "Y").  For
consistency, I think %b should include the LF at the end on its own, but I
may be wrong on this last point.

"git log" (with or without -z) would begin with the string that comes out
of the log message part (i.e. with or without any custom format via
format/tformat) first.

When run with -p, the output would be followed by a LF (with -z, or
without), followed by the patch text (each line of which is terminated
with a LF, again with or without -z).  Without -p, this part is omitted.

After that, a LF/NUL (depending on the use of -z) would be appended if
there are more records or if we are using the terminator semantics.  Then
we would move on to the next record.

E.g.  When "log -p" shows a commit with a one-liner message, like this:

    commit e8a39af8d471d190a749c390a0cf614cb59ec8ee
    Author: Junio C Hamano <gitster@pobox.com>
    Date:   Wed Jul 1 00:50:48 2009 -0700

        second commit

    diff --git a/one b/one
    index e69de29..d00491f 100644
    --- a/one
    +++ b/one
    @@ -0,0 +1 @@
    +1

    Side note. Conceptually I would say that canned formats (--pretty,
    --pretty=short etc. but not --pretty=oneline) produce a string that
    ends with "second commit\n" in the above example, and the blank line
    between the log and beginning of "diff --git" is coming from the rule
    "log is always followed by a LF (with or without -z)".

"log -p --pretty=format:X-%s-Y" would begin with

    X-second commit-Y
    diff --git a/one b/one
    index e69de29..d00491f 100644

"log --pretty=format:X-%s-Y" would become

    X-second commit-Y
    X-initial commit-Y

and with -z, the latter would look
    
    X-second commit-Y^@X-initial commit-Y

while "log --pretty=tformat:X-%s-Y -z" would be

    X-second commit-Y^@X-initial commit-Y^@

All of the above was written, ignoring the memory of what the current code
actually does, nor checking if it is easy to implement without tweaking
the current code structure too much.  So it may not help your RFC, but I
at least think it is internally consistent.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-07-01  8:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-30  0:08 RFH - git-log variant that _does_ search through diffs Eric Raible
2009-06-30  4:03 ` Jeff King
2009-06-30  8:09   ` Junio C Hamano
2009-06-30 18:06     ` Eric Raible
2009-06-30 18:05   ` Eric Raible
2009-06-30 19:31     ` Jeff King
2009-06-30 21:22       ` Eric Raible
2009-06-30 21:34         ` Eric Raible
2009-07-01  7:02         ` Jeff King
2009-07-01  7:26           ` [PATCH 1/2] log-tree: fix confusing comment Jeff King
2009-07-01  7:26           ` [RFC/PATCH 2/2] fix missing NUL termination with pretty=tformat Jeff King
2009-07-01  8:27             ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).