git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <rene.scharfe@lsrfire.ath.cx>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Torne (Richard Coles)" <torne@google.com>, git@vger.kernel.org
Subject: Re: [PATCH 3/3] grep: support newline separated pattern list
Date: Mon, 21 May 2012 18:10:09 +0200	[thread overview]
Message-ID: <4FBA68E1.60908@lsrfire.ath.cx> (raw)
In-Reply-To: <7vobpia3l3.fsf@alter.siamese.dyndns.org>

Am 21.05.2012 00:29, schrieb Junio C Hamano:
> René Scharfe<rene.scharfe@lsrfire.ath.cx>  writes:
> 
>> Currently, patterns that contain newline characters don't match anything
>> when given to git grep.  Regular grep(1) interprets patterns as lists of
>> newline separated search strings instead.
>>
>> Implement this functionality by creating and inserting extra grep_pat
>> structures for patterns consisting of multiple lines when appending to
>> the pattern lists.  For simplicity, all pattern strings are duplicated.
>> The original pattern is truncated in place to make it contain only the
>> first line.
> 
> Thanks for a fix; I am assuming that the duplication for simplicity is not
> for simplified allocation but primarily for a simpler freeing?

When we split a search string into multiple parts, we could allocate only
the new parts and remember which strings were allocated, so that we could
later free them (and only them).  Or we could allocate and leak them, as
there aren't that many and we keep them during the whole run of the program
anyway.  Or we could do without any allocations if all match backends
respected length limited strings like kwset does.  Or only allocate
non-fixed pattern.

Allocating any and all pattern strings and freeing them at the end, thus
disregarding these considerations, is simpler, cleaner and still doesn't
cause excessive memory usage.

That reminds me that we can now have this bonus patch:

-- >8 --
Subject: [PATCH 4/3] grep: stop leaking line strings with -f

When reading patterns from a file, we pass the lines as allocated string
buffers to append_grep_pat() and never free them.  That's not a problem
because they are needed until the program ends anyway.

However, now that the function duplicates the pattern string, we can
reuse the strbuf after calling that function.  This simplifies the code
a bit and plugs a minor memory leak.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
 builtin/grep.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 643938d..fe1726f 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -600,15 +600,12 @@ static int file_callback(const struct option *opt, const char *arg, int unset)
 	if (!patterns)
 		die_errno(_("cannot open '%s'"), arg);
 	while (strbuf_getline(&sb, patterns, '\n') == 0) {
-		char *s;
-		size_t len;
-
 		/* ignore empty line like grep does */
 		if (sb.len == 0)
 			continue;
 
-		s = strbuf_detach(&sb, &len);
-		append_grep_pat(grep_opt, s, len, arg, ++lno, GREP_PATTERN);
+		append_grep_pat(grep_opt, sb.buf, sb.len, arg, ++lno,
+				GREP_PATTERN);
 	}
 	if (!from_stdin)
 		fclose(patterns);
-- 
1.7.10.2

      reply	other threads:[~2012-05-21 16:10 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-18 11:00 git grep -F doesn't behave like grep -F? Torne (Richard Coles)
2012-05-18 12:37 ` René Scharfe
2012-05-18 12:41   ` Torne (Richard Coles)
2012-05-20 14:32     ` [PATCH 1/3] grep: factor out create_grep_pat() René Scharfe
2012-05-20 14:32     ` [PATCH 2/3] grep: factor out do_append_grep_pat() René Scharfe
2012-05-20 14:33     ` [PATCH 3/3] grep: support newline separated pattern list René Scharfe
2012-05-20 22:29       ` Junio C Hamano
2012-05-21 16:10         ` René Scharfe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FBA68E1.60908@lsrfire.ath.cx \
    --to=rene.scharfe@lsrfire.ath.cx \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=torne@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).