git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* speeding up cg-log -u
@ 2005-05-14  6:19 Zack Brown
  2005-05-14  8:13 ` Junio C Hamano
  2005-05-14  9:50 ` [PATCH] Add --author match to git-rev-list and git-rev-tree Junio C Hamano
  0 siblings, 2 replies; 14+ messages in thread
From: Zack Brown @ 2005-05-14  6:19 UTC (permalink / raw
  To: git

Hi folks,

I notice that cg-log takes a -u parameter, that lets you restrict the log
entries that are displayed, to match the argument you give it. So

cg-log -uKH

shows all the logs where Greg KH played a part.

Currently cg-log handles this by grabbing all the log data from git, and
then grepping through it.

Would it be faster to handle this on the git side, telling git to only
retrieve the logs that match the specified query? If feasible, this might
speed up various web interfaces into git repositories.

Be well,
Zack

-- 
Zack Brown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: speeding up cg-log -u
  2005-05-14  6:19 speeding up cg-log -u Zack Brown
@ 2005-05-14  8:13 ` Junio C Hamano
  2005-05-14  9:50 ` [PATCH] Add --author match to git-rev-list and git-rev-tree Junio C Hamano
  1 sibling, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2005-05-14  8:13 UTC (permalink / raw
  To: Zack Brown; +Cc: git

>>>>> "ZB" == Zack Brown <zbrown@tumblerings.org> writes:

ZB> Would it be faster to handle this on the git side, telling git to only
ZB> retrieve the logs that match the specified query? If feasible, this might
ZB> speed up various web interfaces into git repositories.

Here are two places you can add a simple hook.  Implementation
of author_match() function is left as an exercise for you ;-).
Let us know if you get speed improvements, please.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
$ jit-diff
# - 9: (Anonymous snapshot)
# + (working tree)
--- a/cache.h
+++ b/cache.h
@@ -178,6 +178,7 @@ extern void *read_object_with_reference(
 const char *show_date(unsigned long time, int timezone);
 void parse_date(char *date, char *buf, int bufsize);
 void datestamp(char *buf, int bufsize);
+int author_match(struct commit *, const char *);
 
 static inline void *xmalloc(int size)
 {
--- a/rev-list.c
+++ b/rev-list.c
@@ -11,6 +11,7 @@ int main(int argc, char **argv)
 	unsigned long max_age = -1;
 	unsigned long min_age = -1;
 	int max_count = -1;
+	const char *author = NULL;
 
 	for (i = 1 ; i < argc; i++) {
 		char *arg = argv[i];
@@ -21,6 +22,8 @@ int main(int argc, char **argv)
 			max_age = atoi(arg + 10);
 		} else if (!strncmp(arg, "--min-age=", 10)) {
 			min_age = atoi(arg + 10);
+		} else if (!strncmp(arg, "--author=", 9)) {
+			author = arg + 9;
 		} else {
 			commit_arg = arg;
 		}
@@ -28,6 +31,7 @@ int main(int argc, char **argv)
 
 	if (!commit_arg || get_sha1(commit_arg, sha1))
 		usage("usage: rev-list [OPTION] commit-id\n"
+		      "  --author=author\n"
 		      "  --max-count=nr\n"
 		      "  --max-age=epoch\n"
 		      "  --min-age=epoch\n");
@@ -44,6 +48,8 @@ int main(int argc, char **argv)
 			continue;
 		if (max_age != -1 && (commit->date < max_age))
 			break;
+		if (!author_match(commit, author))
+			continue;
 		if (max_count != -1 && !max_count--)
 			break;
 		printf("%s\n", sha1_to_hex(commit->object.sha1));
--- a/rev-tree.c
+++ b/rev-tree.c
@@ -64,7 +64,7 @@ void process_commit(unsigned char *sha1)
 }
 
 /*
- * Usage: rev-tree [--edges] [--cache <cache-file>] <commit-id> [<commit-id2>]
+ * Usage: rev-tree [--edges] [--author <author>] [--cache <cache-file>] <commit-id> [<commit-id2>]
  *
  * The cache-file can be quite important for big trees. This is an
  * expensive operation if you have to walk the whole chain of
@@ -75,6 +75,7 @@ int main(int argc, char **argv)
 	int i;
 	int nr = 0;
 	unsigned char sha1[MAX_COMMITS][20];
+	const char *author = NULL; 
 
 	/*
 	 * First - pick up all the revisions we can (both from
@@ -83,6 +84,11 @@ int main(int argc, char **argv)
 	for (i = 1; i < argc ; i++) {
 		char *arg = argv[i];
 
+		if (!strcmp(arg, "--author")) {
+			author = argv[++i];
+			continue;
+		}
+
 		if (!strcmp(arg, "--cache")) {
 			read_cache_file(argv[++i]);
 			continue;
@@ -98,7 +104,7 @@ int main(int argc, char **argv)
 			basemask |= 1<<nr;
 		}
 		if (nr >= MAX_COMMITS || get_sha1(arg, sha1[nr]))
-			usage("rev-tree [--edges] [--cache <cache-file>] <commit-id> [<commit-id>]");
+			usage("rev-tree [--edges] [--author <author>] [--cache <cache-file>] <commit-id> [<commit-id>]");
 		process_commit(sha1[nr]);
 		nr++;
 	}
@@ -125,6 +131,9 @@ int main(int argc, char **argv)
 		if (!interesting(commit))
 			continue;
 
+		if (!author_match(commit, author))
+			continue;
+
 		printf("%lu %s:%d", commit->date, sha1_to_hex(obj->sha1), 
 		       obj->flags);
 		p = commit->parents;




^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] Add --author match to git-rev-list and git-rev-tree.
  2005-05-14  6:19 speeding up cg-log -u Zack Brown
  2005-05-14  8:13 ` Junio C Hamano
@ 2005-05-14  9:50 ` Junio C Hamano
  2005-05-14 10:39   ` speeding up cg-log -u Petr Baudis
  2005-05-14 15:06   ` [PATCH] Add --author match to git-rev-list and git-rev-tree Zack Brown
  1 sibling, 2 replies; 14+ messages in thread
From: Junio C Hamano @ 2005-05-14  9:50 UTC (permalink / raw
  To: Zack Brown, pasky, torvalds; +Cc: git

Zack Brown wondered if handling author match at core GIT level
would make cg-log -u go faster (JIT also can use this in jit-log
--author).  Here is a patch to implement it.

I considered adding author and committer strings to commit
objects for general use as commit objects are parsed, but I was
unsure about the lifetime rules of the commit objects (nobody
seems to free them in the current code), so refrained from doing
so for the time being.  The code instead re-reads the commit
object when it needs to filter them by the author.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

commit.c   |   42 ++++++++++++++++++++++++++++++++++++++++++
commit.h   |    2 ++
rev-list.c |    6 ++++++
rev-tree.c |   13 +++++++++++--
4 files changed, 61 insertions(+), 2 deletions(-)

--- a/commit.c
+++ b/commit.c
@@ -152,3 +152,45 @@
 	}
 	return ret;
 }
+
+int author_match(const struct commit *item, const char *author)
+{
+	char type[20];
+	void *buffer;
+	char *author_line, *ep;
+	unsigned long size;
+	int ret;
+
+	buffer = read_sha1_file(item->object.sha1, type, &size);
+	if (!buffer)
+		return error("Could not read %s",
+			     sha1_to_hex(item->object.sha1));
+	if (strcmp(type, commit_type)) {
+		free(buffer);
+		error("Object %s not a commit",
+		      sha1_to_hex(item->object.sha1));
+		return 0;
+	}
+	/* we do not care what is in the message; we do not
+	 * want to overrun with strstr and strchr.
+	 */
+	ep = buffer + (size -1);
+	*ep = 0;
+	if ((author_line = strstr(buffer, "\nauthor ")) == 0) {
+		free(buffer);
+		error("Commit %s does not have author.",
+		      sha1_to_hex(item->object.sha1));
+		return 0;
+	}
+	author_line = strchr(author_line, ' ') + 1;
+	if ((ep = strchr(author_line, '>')) == 0) {
+		free(buffer);
+		error("Commit %s has a malformed author line.",
+		      sha1_to_hex(item->object.sha1));
+		return 0;
+	}
+	*++ep = 0;
+	ret = (strstr(author_line, author) != 0);
+	free(buffer);
+	return ret;
+}
--- a/commit.h
+++ b/commit.h
@@ -36,4 +36,6 @@
 struct commit *pop_most_recent_commit(struct commit_list **list, 
 				      unsigned int mark);
 
+int author_match(const struct commit *, const char *);
+
 #endif /* COMMIT_H */
--- a/rev-list.c
+++ b/rev-list.c
@@ -11,6 +11,7 @@
 	unsigned long max_age = -1;
 	unsigned long min_age = -1;
 	int max_count = -1;
+	const char *author = NULL;
 
 	for (i = 1 ; i < argc; i++) {
 		char *arg = argv[i];
@@ -21,6 +22,8 @@
 			max_age = atoi(arg + 10);
 		} else if (!strncmp(arg, "--min-age=", 10)) {
 			min_age = atoi(arg + 10);
+		} else if (!strncmp(arg, "--author=", 9)) {
+			author = arg + 9;
 		} else {
 			commit_arg = arg;
 		}
@@ -28,6 +31,7 @@
 
 	if (!commit_arg || get_sha1(commit_arg, sha1))
 		usage("usage: rev-list [OPTION] commit-id\n"
+		      "  --author=author\n"
 		      "  --max-count=nr\n"
 		      "  --max-age=epoch\n"
 		      "  --min-age=epoch\n");
@@ -44,6 +48,8 @@
 			continue;
 		if (max_age != -1 && (commit->date < max_age))
 			break;
+		if (author && !author_match(commit, author))
+			continue;
 		if (max_count != -1 && !max_count--)
 			break;
 		printf("%s\n", sha1_to_hex(commit->object.sha1));
--- a/rev-tree.c
+++ b/rev-tree.c
@@ -64,7 +64,7 @@
 }
 
 /*
- * Usage: rev-tree [--edges] [--cache <cache-file>] <commit-id> [<commit-id2>]
+ * Usage: rev-tree [--edges] [--author <author>] [--cache <cache-file>] <commit-id> [<commit-id2>]
  *
  * The cache-file can be quite important for big trees. This is an
  * expensive operation if you have to walk the whole chain of
@@ -75,6 +75,7 @@
 	int i;
 	int nr = 0;
 	unsigned char sha1[MAX_COMMITS][20];
+	const char *author = NULL; 
 
 	/*
 	 * First - pick up all the revisions we can (both from
@@ -83,6 +84,11 @@
 	for (i = 1; i < argc ; i++) {
 		char *arg = argv[i];
 
+		if (!strcmp(arg, "--author")) {
+			author = argv[++i];
+			continue;
+		}
+
 		if (!strcmp(arg, "--cache")) {
 			read_cache_file(argv[++i]);
 			continue;
@@ -98,7 +104,7 @@
 			basemask |= 1<<nr;
 		}
 		if (nr >= MAX_COMMITS || get_sha1(arg, sha1[nr]))
-			usage("rev-tree [--edges] [--cache <cache-file>] <commit-id> [<commit-id>]");
+			usage("rev-tree [--edges] [--author <author>] [--cache <cache-file>] <commit-id> [<commit-id>]");
 		process_commit(sha1[nr]);
 		nr++;
 	}
@@ -125,6 +131,9 @@
 		if (!interesting(commit))
 			continue;
 
+		if (author && !author_match(commit, author))
+			continue;
+
 		printf("%lu %s:%d", commit->date, sha1_to_hex(obj->sha1), 
 		       obj->flags);
 		p = commit->parents;
------------------------------------------------


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: speeding up cg-log -u
  2005-05-14  9:50 ` [PATCH] Add --author match to git-rev-list and git-rev-tree Junio C Hamano
@ 2005-05-14 10:39   ` Petr Baudis
  2005-05-14 11:14     ` Petr Baudis
  2005-05-14 11:17     ` Junio C Hamano
  2005-05-14 15:06   ` [PATCH] Add --author match to git-rev-list and git-rev-tree Zack Brown
  1 sibling, 2 replies; 14+ messages in thread
From: Petr Baudis @ 2005-05-14 10:39 UTC (permalink / raw
  To: Zack Brown, Junio C Hamano; +Cc: git, torvalds

Dear diary, on Sat, May 14, 2005 at 08:19:14AM CEST, I got a letter
where Zack Brown <zbrown@tumblerings.org> told me that...
> Currently cg-log handles this by grabbing all the log data from git, and
> then grepping through it.

Which is clearly suboptimal since cg-log now does git-cat-file up to
_three_ times per commit. It should instead process the commit only
once.

Dear diary, on Sat, May 14, 2005 at 11:50:24AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> +	buffer = read_sha1_file(item->object.sha1, type, &size);

If it do that, I wonder how much speedup would be using this instead.
But probably still significant one.

What I don't like is that it searches only the author field. I find the
behaviour that it searches in the committer field as well very useful,
since I can easily check whose patches on a file I checked it - that's
useful when porting own stuff to another branch, or frequently it's the
only thing I actually remember. :-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: speeding up cg-log -u
  2005-05-14 10:39   ` speeding up cg-log -u Petr Baudis
@ 2005-05-14 11:14     ` Petr Baudis
  2005-05-14 11:17     ` Junio C Hamano
  1 sibling, 0 replies; 14+ messages in thread
From: Petr Baudis @ 2005-05-14 11:14 UTC (permalink / raw
  To: Zack Brown, Junio C Hamano; +Cc: git, torvalds

Where was my English? Sorry.

> If it do that, I wonder how much speedup would be using this instead.
(does)
> But probably still significant one.
> 
> What I don't like is that it searches only the author field. I find the
> behaviour that it searches in the committer field as well very useful,
> since I can easily check whose patches on a file I checked it
(which patches of a file I checked in)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: speeding up cg-log -u
  2005-05-14 10:39   ` speeding up cg-log -u Petr Baudis
  2005-05-14 11:14     ` Petr Baudis
@ 2005-05-14 11:17     ` Junio C Hamano
  2005-05-14 14:23       ` Zack Brown
  2005-05-15 16:09       ` Daniel Barkalow
  1 sibling, 2 replies; 14+ messages in thread
From: Junio C Hamano @ 2005-05-14 11:17 UTC (permalink / raw
  To: Petr Baudis; +Cc: Zack Brown, git, torvalds

>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:

PB> Dear diary, on Sat, May 14, 2005 at 11:50:24AM CEST, I got a letter
PB> where Junio C Hamano <junkio@cox.net> told me that...
>> +	buffer = read_sha1_file(item->object.sha1, type, &size);

PB> If it do that, I wonder how much speedup would be using this instead.
PB> But probably still significant one.

I would imagine so (only benchmark would tell), because you
would not spawn git-cat-file for each individual commit object.

Whenever I work with those "struct object" derivatives, I get
very frustrated by the fact that they are designed to cater only
to the need of very narrow immediate users.  The first round of
tree objects did not even have names for each entry because the
only thing it cared about was connectivity checking, and for
that purpose callers would not care about what each blob or
subtree was referred as.  Now when I want to use commit objects
I find that it only records the commit date (other than
connectivity information).  It really appears that connectivity
is the primary thing and everything else is bolted on top.

Not wanting to keep the whole object because of their size is
understandable since the users of "struct object" derivatives
rarely if ever seem to free them once they get hold of them.
And not wanting to think ahead about what is worth keeping (like
names for tree entries back then, or commit author names) is
also understandable, but it still is frustrating.  Not that I
would want to solve this myself ...


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: speeding up cg-log -u
  2005-05-14 11:17     ` Junio C Hamano
@ 2005-05-14 14:23       ` Zack Brown
  2005-05-14 14:40         ` Petr Baudis
  2005-05-15 16:09       ` Daniel Barkalow
  1 sibling, 1 reply; 14+ messages in thread
From: Zack Brown @ 2005-05-14 14:23 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Petr Baudis, git, torvalds

On Sat, May 14, 2005 at 04:17:44AM -0700, Junio C Hamano wrote:
> Not wanting to keep the whole object because of their size is
> understandable since the users of "struct object" derivatives
> rarely if ever seem to free them once they get hold of them.
> And not wanting to think ahead about what is worth keeping (like
> names for tree entries back then, or commit author names) is
> also understandable, but it still is frustrating.

So if this design is changed to suit -u, would a further redesign be needed
to support an option to filter on keywords in the body of the changelog
entry? Perhaps this will result in a net slowdown for the usual case of just
grabbing all log entries.

Be well,
Zack

>  Not that I
> would want to solve this myself ...
> 
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Zack Brown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: speeding up cg-log -u
  2005-05-14 14:23       ` Zack Brown
@ 2005-05-14 14:40         ` Petr Baudis
  0 siblings, 0 replies; 14+ messages in thread
From: Petr Baudis @ 2005-05-14 14:40 UTC (permalink / raw
  To: Zack Brown; +Cc: Junio C Hamano, git, torvalds

Dear diary, on Sat, May 14, 2005 at 04:23:25PM CEST, I got a letter
where Zack Brown <zbrown@tumblerings.org> told me that...
> On Sat, May 14, 2005 at 04:17:44AM -0700, Junio C Hamano wrote:
> > Not wanting to keep the whole object because of their size is
> > understandable since the users of "struct object" derivatives
> > rarely if ever seem to free them once they get hold of them.
> > And not wanting to think ahead about what is worth keeping (like
> > names for tree entries back then, or commit author names) is
> > also understandable, but it still is frustrating.
> 
> So if this design is changed to suit -u, would a further redesign be needed
> to support an option to filter on keywords in the body of the changelog
> entry? Perhaps this will result in a net slowdown for the usual case of just
> grabbing all log entries.

I admit not looking at this code for a rather long time, but what about
just telling the commit parser what stuff are you interested in and it
would leave the rest of the structure fields NULL? That would mean only
slight memory usage increase and basically no time increase.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add --author match to git-rev-list and git-rev-tree.
  2005-05-14  9:50 ` [PATCH] Add --author match to git-rev-list and git-rev-tree Junio C Hamano
  2005-05-14 10:39   ` speeding up cg-log -u Petr Baudis
@ 2005-05-14 15:06   ` Zack Brown
  2005-05-14 15:52     ` Petr Baudis
  1 sibling, 1 reply; 14+ messages in thread
From: Zack Brown @ 2005-05-14 15:06 UTC (permalink / raw
  To: Junio C Hamano; +Cc: pasky, torvalds, git

On Sat, May 14, 2005 at 02:50:24AM -0700, Junio C Hamano wrote:
> Zack Brown wondered if handling author match at core GIT level
> would make cg-log -u go faster (JIT also can use this in jit-log
> --author).  Here is a patch to implement it.
> 
> I considered adding author and committer strings to commit
> objects for general use as commit objects are parsed, but I was
> unsure about the lifetime rules of the commit objects (nobody
> seems to free them in the current code), so refrained from doing
> so for the time being.  The code instead re-reads the commit
> object when it needs to filter them by the author.

I applied this, but can't figure out the change to make to cg-log in order to
actually test it.

- git-cat-file commit $commit | grep -e '^author ' -e '^committer ' | grep -qi "$user" || continue
+ git-cat-file --author $user commit $commit || continue

sure didn't work.

Z

> 
> Signed-off-by: Junio C Hamano <junkio@cox.net>
> ---
> 
> commit.c   |   42 ++++++++++++++++++++++++++++++++++++++++++
> commit.h   |    2 ++
> rev-list.c |    6 ++++++
> rev-tree.c |   13 +++++++++++--
> 4 files changed, 61 insertions(+), 2 deletions(-)
> 
> --- a/commit.c
> +++ b/commit.c
> @@ -152,3 +152,45 @@
>  	}
>  	return ret;
>  }
> +
> +int author_match(const struct commit *item, const char *author)
> +{
> +	char type[20];
> +	void *buffer;
> +	char *author_line, *ep;
> +	unsigned long size;
> +	int ret;
> +
> +	buffer = read_sha1_file(item->object.sha1, type, &size);
> +	if (!buffer)
> +		return error("Could not read %s",
> +			     sha1_to_hex(item->object.sha1));
> +	if (strcmp(type, commit_type)) {
> +		free(buffer);
> +		error("Object %s not a commit",
> +		      sha1_to_hex(item->object.sha1));
> +		return 0;
> +	}
> +	/* we do not care what is in the message; we do not
> +	 * want to overrun with strstr and strchr.
> +	 */
> +	ep = buffer + (size -1);
> +	*ep = 0;
> +	if ((author_line = strstr(buffer, "\nauthor ")) == 0) {
> +		free(buffer);
> +		error("Commit %s does not have author.",
> +		      sha1_to_hex(item->object.sha1));
> +		return 0;
> +	}
> +	author_line = strchr(author_line, ' ') + 1;
> +	if ((ep = strchr(author_line, '>')) == 0) {
> +		free(buffer);
> +		error("Commit %s has a malformed author line.",
> +		      sha1_to_hex(item->object.sha1));
> +		return 0;
> +	}
> +	*++ep = 0;
> +	ret = (strstr(author_line, author) != 0);
> +	free(buffer);
> +	return ret;
> +}
> --- a/commit.h
> +++ b/commit.h
> @@ -36,4 +36,6 @@
>  struct commit *pop_most_recent_commit(struct commit_list **list, 
>  				      unsigned int mark);
>  
> +int author_match(const struct commit *, const char *);
> +
>  #endif /* COMMIT_H */
> --- a/rev-list.c
> +++ b/rev-list.c
> @@ -11,6 +11,7 @@
>  	unsigned long max_age = -1;
>  	unsigned long min_age = -1;
>  	int max_count = -1;
> +	const char *author = NULL;
>  
>  	for (i = 1 ; i < argc; i++) {
>  		char *arg = argv[i];
> @@ -21,6 +22,8 @@
>  			max_age = atoi(arg + 10);
>  		} else if (!strncmp(arg, "--min-age=", 10)) {
>  			min_age = atoi(arg + 10);
> +		} else if (!strncmp(arg, "--author=", 9)) {
> +			author = arg + 9;
>  		} else {
>  			commit_arg = arg;
>  		}
> @@ -28,6 +31,7 @@
>  
>  	if (!commit_arg || get_sha1(commit_arg, sha1))
>  		usage("usage: rev-list [OPTION] commit-id\n"
> +		      "  --author=author\n"
>  		      "  --max-count=nr\n"
>  		      "  --max-age=epoch\n"
>  		      "  --min-age=epoch\n");
> @@ -44,6 +48,8 @@
>  			continue;
>  		if (max_age != -1 && (commit->date < max_age))
>  			break;
> +		if (author && !author_match(commit, author))
> +			continue;
>  		if (max_count != -1 && !max_count--)
>  			break;
>  		printf("%s\n", sha1_to_hex(commit->object.sha1));
> --- a/rev-tree.c
> +++ b/rev-tree.c
> @@ -64,7 +64,7 @@
>  }
>  
>  /*
> - * Usage: rev-tree [--edges] [--cache <cache-file>] <commit-id> [<commit-id2>]
> + * Usage: rev-tree [--edges] [--author <author>] [--cache <cache-file>] <commit-id> [<commit-id2>]
>   *
>   * The cache-file can be quite important for big trees. This is an
>   * expensive operation if you have to walk the whole chain of
> @@ -75,6 +75,7 @@
>  	int i;
>  	int nr = 0;
>  	unsigned char sha1[MAX_COMMITS][20];
> +	const char *author = NULL; 
>  
>  	/*
>  	 * First - pick up all the revisions we can (both from
> @@ -83,6 +84,11 @@
>  	for (i = 1; i < argc ; i++) {
>  		char *arg = argv[i];
>  
> +		if (!strcmp(arg, "--author")) {
> +			author = argv[++i];
> +			continue;
> +		}
> +
>  		if (!strcmp(arg, "--cache")) {
>  			read_cache_file(argv[++i]);
>  			continue;
> @@ -98,7 +104,7 @@
>  			basemask |= 1<<nr;
>  		}
>  		if (nr >= MAX_COMMITS || get_sha1(arg, sha1[nr]))
> -			usage("rev-tree [--edges] [--cache <cache-file>] <commit-id> [<commit-id>]");
> +			usage("rev-tree [--edges] [--author <author>] [--cache <cache-file>] <commit-id> [<commit-id>]");
>  		process_commit(sha1[nr]);
>  		nr++;
>  	}
> @@ -125,6 +131,9 @@
>  		if (!interesting(commit))
>  			continue;
>  
> +		if (author && !author_match(commit, author))
> +			continue;
> +
>  		printf("%lu %s:%d", commit->date, sha1_to_hex(obj->sha1), 
>  		       obj->flags);
>  		p = commit->parents;
> ------------------------------------------------
> 
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Zack Brown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add --author match to git-rev-list and git-rev-tree.
  2005-05-14 15:06   ` [PATCH] Add --author match to git-rev-list and git-rev-tree Zack Brown
@ 2005-05-14 15:52     ` Petr Baudis
  2005-05-14 16:02       ` Zack Brown
  0 siblings, 1 reply; 14+ messages in thread
From: Petr Baudis @ 2005-05-14 15:52 UTC (permalink / raw
  To: Zack Brown; +Cc: Junio C Hamano, torvalds, git

Dear diary, on Sat, May 14, 2005 at 05:06:07PM CEST, I got a letter
where Zack Brown <zbrown@tumblerings.org> told me that...
> On Sat, May 14, 2005 at 02:50:24AM -0700, Junio C Hamano wrote:
> > Zack Brown wondered if handling author match at core GIT level
> > would make cg-log -u go faster (JIT also can use this in jit-log
> > --author).  Here is a patch to implement it.
> > 
> > I considered adding author and committer strings to commit
> > objects for general use as commit objects are parsed, but I was
> > unsure about the lifetime rules of the commit objects (nobody
> > seems to free them in the current code), so refrained from doing
> > so for the time being.  The code instead re-reads the commit
> > object when it needs to filter them by the author.
> 
> I applied this, but can't figure out the change to make to cg-log in order to
> actually test it.
> 
> - git-cat-file commit $commit | grep -e '^author ' -e '^committer ' | grep -qi "$user" || continue
> + git-cat-file --author $user commit $commit || continue
> 
> sure didn't work.

You throw that whole thing away and pass the --author parameter directly
to git-rev-(list|tree).

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add --author match to git-rev-list and git-rev-tree.
  2005-05-14 15:52     ` Petr Baudis
@ 2005-05-14 16:02       ` Zack Brown
  0 siblings, 0 replies; 14+ messages in thread
From: Zack Brown @ 2005-05-14 16:02 UTC (permalink / raw
  To: Petr Baudis; +Cc: Junio C Hamano, torvalds, git

On Sat, May 14, 2005 at 05:52:57PM +0200, Petr Baudis wrote:
> Dear diary, on Sat, May 14, 2005 at 05:06:07PM CEST, I got a letter
> where Zack Brown <zbrown@tumblerings.org> told me that...
> > On Sat, May 14, 2005 at 02:50:24AM -0700, Junio C Hamano wrote:
> > > Zack Brown wondered if handling author match at core GIT level
> > > would make cg-log -u go faster (JIT also can use this in jit-log
> > > --author).  Here is a patch to implement it.
> > > 
> > > I considered adding author and committer strings to commit
> > > objects for general use as commit objects are parsed, but I was
> > > unsure about the lifetime rules of the commit objects (nobody
> > > seems to free them in the current code), so refrained from doing
> > > so for the time being.  The code instead re-reads the commit
> > > object when it needs to filter them by the author.
> > 
> > I applied this, but can't figure out the change to make to cg-log in order to
> > actually test it.
> > 
> > - git-cat-file commit $commit | grep -e '^author ' -e '^committer ' | grep -qi "$user" || continue
> > + git-cat-file --author $user commit $commit || continue
> > 
> > sure didn't work.
> 
> You throw that whole thing away and pass the --author parameter directly
> to git-rev-(list|tree).

You mean like this?

Be well,
Zack


Use git-rev-(list|tree) --author to cg-log

Signed-off-by: Zack Brown <zbrown@tumblerings.org>

--- de641904363cd3759f132ee7c0dfaf8a2ee58388/cg-log  (mode:100755)
+++ uncommitted/cg-log  (mode:100755)
@@ -112,12 +112,20 @@
 if [ "$log_end" ]; then
        id1="$(commit-id $log_start)" || exit 1
        id2="$(commit-id $log_end)" || exit 1
-       revls="git-rev-tree $id2 ^$id1"
+       if [ "$user" ]; then
+               revls="git-rev-tree --author $user $id2 ^$id1"
+       else
+               revls="git-rev-tree $id2 ^$id1"
+       fi
        revsort="sort -rn"
        revfmt="git-rev-tree"
 else
        id1="$(commit-id $log_start)" || exit 1
-       revls="git-rev-list $id1"
+       if [ "$user" ]; then
+               revls="git-rev-list --author $user $id1"
+       else
+               revls="git-rev-list $id1"
+       fi
        revsort="cat"
        revfmt="git-rev-list"
 fi
@@ -131,9 +139,6 @@
                parent=$(git-cat-file commit $commit | sed -n '2s/parent //p;2Q')
                [ "$parent" ] && [ "$(git-diff-tree -r $commit $parent "$@")" ] || continue
        fi
-       if [ "$user" ]; then
-               git-cat-file commit $commit | grep -e '^author ' -e '^committer ' | grep -qi "$user" || continue
-       fi
        echo $colheader""commit ${commit%:*} $coldefault;
        git-cat-file commit $commit | \
                while read key rest; do

> 
> -- 
> 				Petr "Pasky" Baudis
> Stuff: http://pasky.or.cz/
> C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Zack Brown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: speeding up cg-log -u
  2005-05-14 11:17     ` Junio C Hamano
  2005-05-14 14:23       ` Zack Brown
@ 2005-05-15 16:09       ` Daniel Barkalow
  2005-05-16  8:13         ` Junio C Hamano
  1 sibling, 1 reply; 14+ messages in thread
From: Daniel Barkalow @ 2005-05-15 16:09 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Petr Baudis, Zack Brown, git, Linus Torvalds

On Sat, 14 May 2005, Junio C Hamano wrote:

> Whenever I work with those "struct object" derivatives, I get
> very frustrated by the fact that they are designed to cater only
> to the need of very narrow immediate users. 

They aren't designed for the immediate users; they're implemented for the
immediate users. Feel free to add more fields as you need them. The
current selection is based on only adding things when there's a user for
them.

> The first round of tree objects did not even have names for each entry
> because the only thing it cared about was connectivity checking, and for
> that purpose callers would not care about what each blob or
> subtree was referred as.  Now when I want to use commit objects
> I find that it only records the commit date (other than
> connectivity information).  It really appears that connectivity
> is the primary thing and everything else is bolted on top.

Existance is the primary thing, and everything else was added as
needed. (Pure connectivity is a bit special, because it's a property of
generic objects so that fsck-cache doesn't need to know about particular
types of objects unless there are particular things to check about them)

If you need more fields, let me know, and I'll figure out how to include
them.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: speeding up cg-log -u
  2005-05-15 16:09       ` Daniel Barkalow
@ 2005-05-16  8:13         ` Junio C Hamano
  2005-05-16 15:38           ` Daniel Barkalow
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2005-05-16  8:13 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: Petr Baudis, Zack Brown, git, Linus Torvalds

>>>>> "DB" == Daniel Barkalow <barkalow@iabervon.org> writes:

DB> Existance is the primary thing, and everything else was added as
DB> needed. (Pure connectivity is a bit special, because it's a property of
DB> generic objects so that fsck-cache doesn't need to know about particular
DB> types of objects unless there are particular things to check about them)

DB> If you need more fields, let me know, and I'll figure out how to include
DB> them.

Could you take a look at the latest round of the patch and see
what I did there makes sense?

    From: Junio C Hamano <junkio@cox.net>
    Date: Sun, 15 May 2005 14:18:36 -0700
    Message-ID: <7vy8agqjbn.fsf@assigned-by-dhcp.cox.net>
    Subject: [PATCH 1/4] Add --author and --committer match
             to git-rev-list and git-rev-tree.

Another thing I wanted to ask you about was lifetime rules.
When we "lookup" these objects (and then "parse" them, which
causes more memory to be used), who is responsible for freeing
them?  When my program thinks it is done with a commit, is it
allowed to free it?  Or does the lookup machinery own all of the
objects that have ever been looked up, and the program should
not worry about freeing them to begin with?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: speeding up cg-log -u
  2005-05-16  8:13         ` Junio C Hamano
@ 2005-05-16 15:38           ` Daniel Barkalow
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Barkalow @ 2005-05-16 15:38 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Petr Baudis, Zack Brown, git, Linus Torvalds

On Mon, 16 May 2005, Junio C Hamano wrote:

> >>>>> "DB" == Daniel Barkalow <barkalow@iabervon.org> writes:
> 
> DB> Existance is the primary thing, and everything else was added as
> DB> needed. (Pure connectivity is a bit special, because it's a property of
> DB> generic objects so that fsck-cache doesn't need to know about particular
> DB> types of objects unless there are particular things to check about them)
> 
> DB> If you need more fields, let me know, and I'll figure out how to include
> DB> them.
> 
> Could you take a look at the latest round of the patch and see
> what I did there makes sense?
> 
>     From: Junio C Hamano <junkio@cox.net>
>     Date: Sun, 15 May 2005 14:18:36 -0700
>     Message-ID: <7vy8agqjbn.fsf@assigned-by-dhcp.cox.net>
>     Subject: [PATCH 1/4] Add --author and --committer match
>              to git-rev-list and git-rev-tree.

It seems generally good to me.

I think it would fit stylistically a bit better if the "mark" field on the
person names were left for programs to use however they wanted, and the
"interesting" determination were done in the programs (or, since there are
two with the same characteristics, a new file they both link against).

Alternatively, put the used bit definitions in the header file and have a
mask for unused flags.

> Another thing I wanted to ask you about was lifetime rules.
> When we "lookup" these objects (and then "parse" them, which
> causes more memory to be used), who is responsible for freeing
> them?  When my program thinks it is done with a commit, is it
> allowed to free it?  Or does the lookup machinery own all of the
> objects that have ever been looked up, and the program should
> not worry about freeing them to begin with?

The lookup machinery owns all of the objects that have been looked up. The
thing is that the program can never effectively tell if it's really done
with a commit, because some other branch it's following could have
incorrect dates and suddenly turn out to be descended from a commit that
it freed, and things would likely misbehave if the object were looked up
again, since the flags would be reset.

We could have something that causes them to be reset to unparsed, if the
program thinks that, even if it references the same object again, it won't
need the contents.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2005-05-16 15:45 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-14  6:19 speeding up cg-log -u Zack Brown
2005-05-14  8:13 ` Junio C Hamano
2005-05-14  9:50 ` [PATCH] Add --author match to git-rev-list and git-rev-tree Junio C Hamano
2005-05-14 10:39   ` speeding up cg-log -u Petr Baudis
2005-05-14 11:14     ` Petr Baudis
2005-05-14 11:17     ` Junio C Hamano
2005-05-14 14:23       ` Zack Brown
2005-05-14 14:40         ` Petr Baudis
2005-05-15 16:09       ` Daniel Barkalow
2005-05-16  8:13         ` Junio C Hamano
2005-05-16 15:38           ` Daniel Barkalow
2005-05-14 15:06   ` [PATCH] Add --author match to git-rev-list and git-rev-tree Zack Brown
2005-05-14 15:52     ` Petr Baudis
2005-05-14 16:02       ` Zack Brown

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).