* [PATCH/RFC 1/2] Move "--parent" parsing into generic revision.c library code
@ 2006-03-31 0:52 Linus Torvalds
2006-03-31 1:05 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Linus Torvalds
0 siblings, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2006-03-31 0:52 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
Move "--parent" parsing into generic revision.c library code
Not only do we do it in both rev-list.c and git.c, the revision walking
code will soon want to know whether we should rewrite parenthood
information or not.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---
This is the trivial part. The next email will contain the real meat of the
change, which starts doing path limiting incrementally, and makes doing
things like
git log drivers/
on the kernel a hell of a lot more pleasant, because it starts outputting
the log immediately instead of pausing for about 20 seconds before it has
parsed all of the history and then outputting it all in one go.
There's a HUGE difference to the "feel" of doing pathname limiting
git-rev-list options. Not very well tested, but at least this initial
preparatory patch is "obviously safe".
diff --git a/git.c b/git.c
index 0b40e30..72039c6 100644
--- a/git.c
+++ b/git.c
@@ -283,7 +283,6 @@ static int cmd_log(int argc, const char
char *buf = xmalloc(LOGSIZE);
static enum cmit_fmt commit_format = CMIT_FMT_DEFAULT;
int abbrev = DEFAULT_ABBREV;
- int show_parents = 0;
const char *commit_prefix = "commit ";
argc = setup_revisions(argc, argv, &rev, "HEAD");
@@ -294,9 +293,6 @@ static int cmd_log(int argc, const char
if (commit_format == CMIT_FMT_ONELINE)
commit_prefix = "";
}
- else if (!strcmp(arg, "--parents")) {
- show_parents = 1;
- }
else if (!strcmp(arg, "--no-abbrev")) {
abbrev = 0;
}
@@ -317,7 +313,7 @@ static int cmd_log(int argc, const char
while ((commit = get_revision(&rev)) != NULL) {
printf("%s%s", commit_prefix,
sha1_to_hex(commit->object.sha1));
- if (show_parents) {
+ if (rev.parents) {
struct commit_list *parents = commit->parents;
while (parents) {
struct object *o = &(parents->item->object);
diff --git a/rev-list.c b/rev-list.c
index f3a989c..19a547a 100644
--- a/rev-list.c
+++ b/rev-list.c
@@ -39,7 +39,6 @@ struct rev_info revs;
static int bisect_list = 0;
static int verbose_header = 0;
static int abbrev = DEFAULT_ABBREV;
-static int show_parents = 0;
static int show_timestamp = 0;
static int hdr_termination = 0;
static const char *commit_prefix = "";
@@ -54,7 +53,7 @@ static void show_commit(struct commit *c
if (commit->object.flags & BOUNDARY)
putchar('-');
fputs(sha1_to_hex(commit->object.sha1), stdout);
- if (show_parents) {
+ if (revs.parents) {
struct commit_list *parents = commit->parents;
while (parents) {
struct object *o = &(parents->item->object);
@@ -336,10 +335,6 @@ int main(int argc, const char **argv)
commit_prefix = "";
else
commit_prefix = "commit ";
- continue;
- }
- if (!strcmp(arg, "--parents")) {
- show_parents = 1;
continue;
}
if (!strcmp(arg, "--timestamp")) {
diff --git a/revision.c b/revision.c
index abc8745..0330f9f 100644
--- a/revision.c
+++ b/revision.c
@@ -596,6 +596,10 @@ int setup_revisions(int argc, const char
revs->limited = 1;
continue;
}
+ if (!strcmp(arg, "--parents")) {
+ revs->parents = 1;
+ continue;
+ }
if (!strcmp(arg, "--dense")) {
revs->dense = 1;
continue;
diff --git a/revision.h b/revision.h
index 61e6bc9..0caeecf 100644
--- a/revision.h
+++ b/revision.h
@@ -34,7 +34,8 @@ struct rev_info {
edge_hint:1,
limited:1,
unpacked:1,
- boundary:1;
+ boundary:1,
+ parents:1;
/* special limits */
int max_count;
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH/RFC 2/2] Make path-limiting be incremental when possible.
2006-03-31 0:52 [PATCH/RFC 1/2] Move "--parent" parsing into generic revision.c library code Linus Torvalds
@ 2006-03-31 1:05 ` Linus Torvalds
2006-03-31 6:05 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Linus Torvalds @ 2006-03-31 1:05 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
This makes git-rev-list able to do path-limiting without having to parse
all of history before it starts showing the results.
This makes it things like "git log -- pathname" much more pleasant to use.
This is actually a pretty small patch, and the biggest part of it is
purely cleanups (turning the "goto next" statements into "continue"), but
it's conceptually a lot bigger than it looks.
What it does is that if you do a path-limited revision list, and you do
_not_ ask for pseudo-parenthood information, it won't do all the
path-limiting up-front, but instead do it incrementally in
"get_revision()".
This is an absolutely huge deal for anything like "git log -- <pathname>",
but also for some things that we don't do yet - like the "find where
things changed" logic I've described elsewhere, where we want to find the
previous revision that changed a file.
The reason I put "RFC" in the subject line is that while I've validated it
various ways, like doing
git-rev-list HEAD -- drivers/char/ | md5sum
before-and-after on the kernel archive, it's "git-rev-list" after all. In
other words, it's that really really subtle and complex central piece of
software. So while I think this is important and should go in asap, I also
think it should get lots of testing and eyeballs looking at the code.
Btw, don't even bother testing this with the git archive. git itself is so
small that parsing the whole revision history for it takes about a second
even with path limiting. The thing that _really_ shows this off is doing
git log drivers/
on the kernel archive, or even better, on the _historic_ kernel archive.
With this change, the response is instantaneous (although seeking to the
end of the result will obviously take as long as it ever did). Before this
change, the command would think about the result for tens of seconds - or
even minutes, in the case of the bigger old kernel archive - before
starting to output the results.
NOTE NOTE NOTE! Using path limiting with things like "gitk", which uses
the "--parents" flag to actually generate a pseudo-history of the
resulting commits won't actually see the improvement in interactivity,
since that forces git-rev-list to do the whole-history thing after all.
MAYBE we can fix that too at some point, but I won't promise anything.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---
diff --git a/revision.c b/revision.c
index 0330f9f..0e3f074 100644
--- a/revision.c
+++ b/revision.c
@@ -702,7 +702,13 @@ int setup_revisions(int argc, const char
if (revs->prune_data) {
diff_tree_setup_paths(revs->prune_data);
revs->prune_fn = try_to_simplify_commit;
- revs->limited = 1;
+
+ /*
+ * If we fix up parent data, we currently cannot
+ * do that on-the-fly.
+ */
+ if (revs->parents)
+ revs->limited = 1;
}
return left;
@@ -763,23 +769,32 @@ struct commit *get_revision(struct rev_i
do {
struct commit *commit = revs->commits->item;
+
+ revs->commits = revs->commits->next;
+ /*
+ * If we haven't done the list limiting, we need to look at
+ * the parents here
+ */
+ if (!revs->limited)
+ add_parents_to_list(revs, commit, &revs->commits);
if (commit->object.flags & SHOWN)
- goto next;
+ continue;
if (!(commit->object.flags & BOUNDARY) &&
(commit->object.flags & UNINTERESTING))
- goto next;
+ continue;
if (revs->min_age != -1 && (commit->date > revs->min_age))
- goto next;
+ continue;
if (revs->max_age != -1 && (commit->date < revs->max_age))
return NULL;
if (revs->no_merges &&
commit->parents && commit->parents->next)
- goto next;
+ continue;
if (revs->prune_fn && revs->dense) {
if (!(commit->object.flags & TREECHANGE))
- goto next;
- rewrite_parents(commit);
+ continue;
+ if (revs->parents)
+ rewrite_parents(commit);
}
/* More to go? */
if (revs->max_count) {
@@ -792,13 +807,9 @@ struct commit *get_revision(struct rev_i
revs->commits = it->next;
free(it);
}
- else
- pop_most_recent_commit(&revs->commits, SEEN);
}
commit->object.flags |= SHOWN;
return commit;
-next:
- pop_most_recent_commit(&revs->commits, SEEN);
} while (revs->commits);
return NULL;
}
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible.
2006-03-31 1:05 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Linus Torvalds
@ 2006-03-31 6:05 ` Linus Torvalds
2006-03-31 6:45 ` Junio C Hamano
2006-03-31 6:16 ` Junio C Hamano
2006-03-31 8:28 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Junio C Hamano
2 siblings, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2006-03-31 6:05 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
On Thu, 30 Mar 2006, Linus Torvalds wrote:
>
> This makes git-rev-list able to do path-limiting without having to parse
> all of history before it starts showing the results.
>
> This makes it things like "git log -- pathname" much more pleasant to use.
Sadly, it seems to react really badly to Junio's new --boundary logic for
some reason that I haven't quite figured out yet.
That reaction is independent of the actual pathname restriction, and seems
to be related to how the --boundary logic expected
pop_most_recent_commit() to work. In particular:
...
if (commit->object.flags & BOUNDARY) {
/* this is already uninteresting,
* so there is no point popping its
* parents into the list.
*/
that code is magic, and seems to depend on something subtle going on with
the list, and the incremental thing already popped the parent earlier and
screwed up whatever magic that the BOUNDARY code depends on.
Junio? I think you did some funky special case with BOUNDARY commits, and
I broke it for you, can you look at the patch and see if you can see it?
I'd really like to have the incremental path-limiter, because it really
makes a huge difference in the usability of "git log pathname".
Linus
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible.
2006-03-31 1:05 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Linus Torvalds
2006-03-31 6:05 ` Linus Torvalds
@ 2006-03-31 6:16 ` Junio C Hamano
2006-03-31 6:42 ` Linus Torvalds
2006-03-31 8:28 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Junio C Hamano
2 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2006-03-31 6:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Linus Torvalds <torvalds@osdl.org> writes:
> The reason I put "RFC" in the subject line is that while I've validated it
> various ways, like doing
>
> git-rev-list HEAD -- drivers/char/ | md5sum
>
> before-and-after on the kernel archive, it's "git-rev-list" after all. In
> other words, it's that really really subtle and complex central piece of
> software. So while I think this is important and should go in asap, I also
> think it should get lots of testing and eyeballs looking at the code.
Let me make sure I understand what is going on.
Currently, limit_list(), which is called when revs->limited is
set, serves two different purposes:
* traverse the ancestry chain and mark ancestors of
uninteresting commits also uninteresting;
* "simplify" each commit that is still interesting, by
comparing the tree object of it and its parents.
We used to call limit_list() when we are limiting the commits by
paths, because that was what the latter does as a side effect of
add_parents_to_list(). You made it streamable by moving the
call to get_revision() and not calling limit_list() when you do
not have other reasons to call it.
You currently do not do this streaming optimization when you
have to show simplified parents, because the streaming version
traverses ancestry chain one step at a time as it goes, and you
cannot obviously see who the final "fake" parent is until you
simplify the parents enouogh.
Now, I have some observations, not necessarily limited to this
patch but also to the code from the "master" version.
* get_commit_reference() sets "limited" when the user gives ^exclude
commit. no question about this --- we would need to see the
reachability in this case.
* when we are going to sort the result we are going to show,
we set "limited" -- we cannot sort without knowing the set we
are sorting.
* giving commit timestamp limits via --max-age, --min-age
etc. sets "limited". I have doubts about this. We currently
look at the commit timestamp in limit_list() and mark a
commit old enough to be "uninteresting" -- which makes
ancestor of such commit also uninteresting. Similarly
commits that are too young are not pushed into the result.
There is a filter in get_revision() to omit ineligible
commits by time range already, so I wonder if we can remove
that code from limit_list() and not set "limited" when these
timestamp ranges are given. This would let us stream even
more.
Admittably, ancestors of an old commit had better be even
older, so marking them uninteresting to stop the traversal
early is a good way to optimize the full-traversal case, but
not having to call limit_list() would help streaming usage.
Also I have doubts about correctness issue about the max-age
handling. Is it correct to mark a commit that is old enough
to be uninteresting, implying that its ancestors are all
uninteresting? With clock skew among people, it is possible
to make a merge commit that has parents one of which is "in
the future".
* As to handling of --unpacked, exactly the same comment as
"max-age" applies, but it might be even worse. Marking an
unpacked one "uninteresting" means if a packed commit refers
to loose commit, the loose one will be also marked
uninteresting, I think.
"--objects --unpacked" is an idiom to repack objects that are
still loose, but I suspect it would do interesting thing if the
commit is in a pack but its trees and blobs are still loose. Am
I confused, or have I just spotted a potential (but hard to
trigger) bug?
--
diff --git a/revision.c b/revision.c
index 0e3f074..a55c0d1 100644
--- a/revision.c
+++ b/revision.c
@@ -404,10 +404,6 @@ static void limit_list(struct rev_info *
list = list->next;
free(entry);
- if (revs->max_age != -1 && (commit->date < revs->max_age))
- obj->flags |= UNINTERESTING;
- if (revs->unpacked && has_sha1_pack(obj->sha1))
- obj->flags |= UNINTERESTING;
add_parents_to_list(revs, commit, &list);
if (obj->flags & UNINTERESTING) {
mark_parents_uninteresting(commit);
@@ -415,8 +411,6 @@ static void limit_list(struct rev_info *
break;
continue;
}
- if (revs->min_age != -1 && (commit->date > revs->min_age))
- continue;
p = &commit_list_insert(commit, p)->next;
}
if (revs->boundary) {
@@ -543,32 +537,26 @@ int setup_revisions(int argc, const char
}
if (!strncmp(arg, "--max-age=", 10)) {
revs->max_age = atoi(arg + 10);
- revs->limited = 1;
continue;
}
if (!strncmp(arg, "--min-age=", 10)) {
revs->min_age = atoi(arg + 10);
- revs->limited = 1;
continue;
}
if (!strncmp(arg, "--since=", 8)) {
revs->max_age = approxidate(arg + 8);
- revs->limited = 1;
continue;
}
if (!strncmp(arg, "--after=", 8)) {
revs->max_age = approxidate(arg + 8);
- revs->limited = 1;
continue;
}
if (!strncmp(arg, "--before=", 9)) {
revs->min_age = approxidate(arg + 9);
- revs->limited = 1;
continue;
}
if (!strncmp(arg, "--until=", 8)) {
revs->min_age = approxidate(arg + 8);
- revs->limited = 1;
continue;
}
if (!strcmp(arg, "--all")) {
@@ -635,7 +623,6 @@ int setup_revisions(int argc, const char
}
if (!strcmp(arg, "--unpacked")) {
revs->unpacked = 1;
- revs->limited = 1;
continue;
}
*unrecognized++ = arg;
@@ -786,7 +773,9 @@ struct commit *get_revision(struct rev_i
if (revs->min_age != -1 && (commit->date > revs->min_age))
continue;
if (revs->max_age != -1 && (commit->date < revs->max_age))
- return NULL;
+ continue;
+ if (revs->unpacked && has_sha1_pack(commit->object.sha1))
+ continue;
if (revs->no_merges &&
commit->parents && commit->parents->next)
continue;
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible.
2006-03-31 6:16 ` Junio C Hamano
@ 2006-03-31 6:42 ` Linus Torvalds
2006-03-31 7:02 ` Junio C Hamano
0 siblings, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2006-03-31 6:42 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Thu, 30 Mar 2006, Junio C Hamano wrote:
>
> Linus Torvalds <torvalds@osdl.org> writes:
>
> > The reason I put "RFC" in the subject line is that while I've validated it
> > various ways, like doing
> >
> > git-rev-list HEAD -- drivers/char/ | md5sum
> >
> > before-and-after on the kernel archive, it's "git-rev-list" after all. In
> > other words, it's that really really subtle and complex central piece of
> > software. So while I think this is important and should go in asap, I also
> > think it should get lots of testing and eyeballs looking at the code.
>
> Let me make sure I understand what is going on.
>
> Currently, limit_list(), which is called when revs->limited is
> set, serves two different purposes:
>
> * traverse the ancestry chain and mark ancestors of
> uninteresting commits also uninteresting;
Right.
We _also_ traverse the ancestry chain in the "walking" stage later in
get_revision(), but if we have a "limit" case, we'll pre-traverse
everything early in limit_list.
> * "simplify" each commit that is still interesting, by
> comparing the tree object of it and its parents.
Through "add_parent_to_list()", yes.
> We used to call limit_list() when we are limiting the commits by
> paths, because that was what the latter does as a side effect of
> add_parents_to_list(). You made it streamable by moving the
> call to get_revision() and not calling limit_list() when you do
> not have other reasons to call it.
Yes, and by using "add_parent_to_list()" instead of the
non-pathname-limit-aware "pop_most_recent_commit()".
So we don't call limit-list any more, because we do the same thing in
get_revision() that it used to do at limit time.
> You currently do not do this streaming optimization when you
> have to show simplified parents, because the streaming version
> traverses ancestry chain one step at a time as it goes, and you
> cannot obviously see who the final "fake" parent is until you
> simplify the parents enouogh.
Yes. For exactly the same reason some other things ause us to do
limit_list(): some ops simply need the fully connected end result in order
to be able to work correctly.
> * get_commit_reference() sets "limited" when the user gives ^exclude
> commit. no question about this --- we would need to see the
> reachability in this case.
Right.
> * when we are going to sort the result we are going to show,
> we set "limited" -- we cannot sort without knowing the set we
> are sorting.
Right.
> * giving commit timestamp limits via --max-age, --min-age
> etc. sets "limited". I have doubts about this.
I agree. I looked at it when I did the "rev-list.c" vs "revision.c" split,
but I wanted to do as few changes as possible, and for some reason we've
always considered "time" to cause us to limit things.
However, I think your patch is wrong. Even if you don't limit things when
we have a date specifier, you still want to stop doing traversal in
limit_list when you hit that date. Why? Because it might be limited for
some _other_ reason, eg due to --topo-order or or some other issue.
> * As to handling of --unpacked, exactly the same comment as
> "max-age" applies, but it might be even worse. Marking an
> unpacked one "uninteresting" means if a packed commit refers
> to loose commit, the loose one will be also marked
> uninteresting, I think.
>
> "--objects --unpacked" is an idiom to repack objects that are
> still loose, but I suspect it would do interesting thing if the
> commit is in a pack but its trees and blobs are still loose. Am
> I confused, or have I just spotted a potential (but hard to
> trigger) bug?
Regardless of where you do the "unpacked" test, it will always really end
up stopping when it hits a packed commit. So you won't ever get away from
that.
So same logic applies. Once you hit a packed commit, you should stop, both
in limit_list _and_ in get_revision(). Otherwise you'll do too much work
when trying to limit things, for no gain.
The fact is, "--unpacked" means that we traverse the part of the chain
that hasn't been packed yet. Anything that is packed automatically cuts
off parsing, whether there is anything else unpacked below it or not. It's
not a bug, it's a feature, and if you want to pack everything, you should
just do
git repack -a -d
and not use --unpacked.
Same goes for dates, btw. We've always stopped when we reached a certain
date, even though there _could_ be commits before it that are from a more
recent date (due to clocks being set wrong). Neither --unpacked nor
--since=xyzzy are meant to be any kind of "guarantees" that you get all
commits that match some pattern. They are just "stop early" rules.
Linus
So the following is wrong:
> diff --git a/revision.c b/revision.c
> index 0e3f074..a55c0d1 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -404,10 +404,6 @@ static void limit_list(struct rev_info *
> list = list->next;
> free(entry);
>
> - if (revs->max_age != -1 && (commit->date < revs->max_age))
> - obj->flags |= UNINTERESTING;
> - if (revs->unpacked && has_sha1_pack(obj->sha1))
> - obj->flags |= UNINTERESTING;
> add_parents_to_list(revs, commit, &list);
> if (obj->flags & UNINTERESTING) {
> mark_parents_uninteresting(commit);
> @@ -415,8 +411,6 @@ static void limit_list(struct rev_info *
> break;
> continue;
> }
> - if (revs->min_age != -1 && (commit->date > revs->min_age))
> - continue;
> p = &commit_list_insert(commit, p)->next;
> }
> if (revs->boundary) {
..but the later part of the patch looks fine (modulo testing, of course):
> @@ -543,32 +537,26 @@ int setup_revisions(int argc, const char
> }
> if (!strncmp(arg, "--max-age=", 10)) {
> revs->max_age = atoi(arg + 10);
> - revs->limited = 1;
> continue;
> }
> if (!strncmp(arg, "--min-age=", 10)) {
> revs->min_age = atoi(arg + 10);
> - revs->limited = 1;
> continue;
> }
> if (!strncmp(arg, "--since=", 8)) {
> revs->max_age = approxidate(arg + 8);
> - revs->limited = 1;
> continue;
> }
> if (!strncmp(arg, "--after=", 8)) {
> revs->max_age = approxidate(arg + 8);
> - revs->limited = 1;
> continue;
> }
> if (!strncmp(arg, "--before=", 9)) {
> revs->min_age = approxidate(arg + 9);
> - revs->limited = 1;
> continue;
> }
> if (!strncmp(arg, "--until=", 8)) {
> revs->min_age = approxidate(arg + 8);
> - revs->limited = 1;
> continue;
> }
> if (!strcmp(arg, "--all")) {
> @@ -635,7 +623,6 @@ int setup_revisions(int argc, const char
> }
> if (!strcmp(arg, "--unpacked")) {
> revs->unpacked = 1;
> - revs->limited = 1;
> continue;
> }
> *unrecognized++ = arg;
> @@ -786,7 +773,9 @@ struct commit *get_revision(struct rev_i
> if (revs->min_age != -1 && (commit->date > revs->min_age))
> continue;
> if (revs->max_age != -1 && (commit->date < revs->max_age))
> - return NULL;
> + continue;
> + if (revs->unpacked && has_sha1_pack(commit->object.sha1))
> + continue;
> if (revs->no_merges &&
> commit->parents && commit->parents->next)
> continue;
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible.
2006-03-31 6:05 ` Linus Torvalds
@ 2006-03-31 6:45 ` Junio C Hamano
2006-03-31 19:39 ` Linus Torvalds
0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2006-03-31 6:45 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List
Linus Torvalds <torvalds@osdl.org> writes:
> Sadly, it seems to react really badly to Junio's new --boundary logic for
> some reason that I haven't quite figured out yet.
There already was a report that --boundary stuff is not quite
right, so what you are seeing might be that the new code exposes
its original breakage even more. I haven't looked into the
breakage of the original version yet either, so I cannot really
say how your change breaks it.
> That reaction is independent of the actual pathname restriction, and seems
> to be related to how the --boundary logic expected
> pop_most_recent_commit() to work. In particular:
>
> ...
> if (commit->object.flags & BOUNDARY) {
> /* this is already uninteresting,
> * so there is no point popping its
> * parents into the list.
> */
>
> that code is magic, and seems to depend on something subtle going on with
> the list, and the incremental thing already popped the parent earlier and
> screwed up whatever magic that the BOUNDARY code depends on.
This was not so magic, but the magic was actually in the code
added to limit_list(). Usually, "newlist" consists interesting
commits, and what are found interesting initially but marked as
uninteresting when a different ancestry chain coming from an
uninteresting head leading to it was later discovered. The
magic code looks at still-interesting commits, and re-injects
its parents that are uninteresting to the list (and I just
spotted a bug there -- it does not check if what is being "re-"
injected are already on the list -- should I check for SEEN flag
there perhaps?), while marking them as boundary. This was done
to make sure that all the open-circle (in gitk) commits are on
the resulting list.
The part of the code you quoted is just a short-cut for not
doing pop_most_recent_commit() -- we used to have
pop_most_recent_commit() there, which pushed the parents of the
commit being processed into the list. Because we are processing
a boundary commit, we know it is uninteresting -- and by
definition, its parents are uninteresting and that is why it
just advances the list without calling pop_most_recent_commit(),
bypassing its side effect to push its parents into the list.
Since the new code has already advanced the list immediately
after the beginning of do {} block, I think you can remove the
entire "if (revs->max_count) {}" block. As the code currently
stands, you are skipping what happens to be next to the boundary
commit on the result list.
> Junio? I think you did some funky special case with BOUNDARY commits, and
> I broke it for you, can you look at the patch and see if you can see it?
> I'd really like to have the incremental path-limiter, because it really
> makes a huge difference in the usability of "git log pathname".
Oh, there is no question about making it streamable in more
cases is a good thing.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible.
2006-03-31 6:42 ` Linus Torvalds
@ 2006-03-31 7:02 ` Junio C Hamano
2006-04-02 0:35 ` Linus Torvalds
0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2006-03-31 7:02 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Linus Torvalds <torvalds@osdl.org> writes:
> So the following is wrong:
>
>> diff --git a/revision.c b/revision.c
>> index 0e3f074..a55c0d1 100644
>> --- a/revision.c
>> +++ b/revision.c
>> @@ -404,10 +404,6 @@ static void limit_list(struct rev_info *
>> list = list->next;
>> free(entry);
>>
>> - if (revs->max_age != -1 && (commit->date < revs->max_age))
>> - obj->flags |= UNINTERESTING;
>> - if (revs->unpacked && has_sha1_pack(obj->sha1))
>> - obj->flags |= UNINTERESTING;
>> add_parents_to_list(revs, commit, &list);
>> if (obj->flags & UNINTERESTING) {
>> mark_parents_uninteresting(commit);
>...
> ..but the later part of the patch looks fine (modulo testing, of course):
>
>> @@ -786,7 +773,9 @@ struct commit *get_revision(struct rev_i
>> if (revs->min_age != -1 && (commit->date > revs->min_age))
>> continue;
>> if (revs->max_age != -1 && (commit->date < revs->max_age))
>> - return NULL;
>> + continue;
>> + if (revs->unpacked && has_sha1_pack(commit->object.sha1))
>> + continue;
OK, so let's say I agree with you that --unpacked and --since
are "stop early" rules. I fully agree with that from usability
and implementation point of view, but I now wonder if the
"output filter" in get_revision() and "stop early" in limit_list()
would do the same thing. That is, if we are _otherwise_
limited, limit_list() would use them for "stop early" rule, but
if we end up not calling limit_list() we would simply filter
them and keep going (which is what my patch did for both
timestamp and packedness), or we could also stop there.
I am not sure which one is correct, but I suspect whichever we
do in get_revision() we would get inconsistent results between a
case where we call limit_list() and we do not. That is, the set
of commits we are going to show when --(topo|date)-order is
given and not given can be different. Is this acceptable?
I would say, from the implementation point of view, it should be
tolerated, but...
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible.
2006-03-31 1:05 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Linus Torvalds
2006-03-31 6:05 ` Linus Torvalds
2006-03-31 6:16 ` Junio C Hamano
@ 2006-03-31 8:28 ` Junio C Hamano
2006-03-31 19:44 ` Linus Torvalds
2 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2006-03-31 8:28 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Linus Torvalds <torvalds@osdl.org> writes:
> This is an absolutely huge deal for anything like "git log -- <pathname>",
> but also for some things that we don't do yet - like the "find where
> things changed" logic I've described elsewhere, where we want to find the
> previous revision that changed a file.
>...
> Btw, don't even bother testing this with the git archive. git itself is so
> small that parsing the whole revision history for it takes about a second
> even with path limiting.
By the way, I forgot to praise you ;-).
Even on a fast machine, the old one was not very useful, but
this one is _instantaneous_. Very good job.
$ PAGER=cat GIT_DIR=/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/ \
/usr/bin/time git log -1 --pretty=short -- drivers/
commit ce362c009250340358a7221f3cdb7954cbf19c01
Merge: 064c94f... cd7a920...
Author: Linus Torvalds <torvalds@g5.osdl.org>
Merge git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6
15.44user 0.19system 0:25.11elapsed 62%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+18050minor)pagefaults 0swaps
$ PAGER=cat GIT_DIR=/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/ \
/usr/bin/time ./git.pu log -1 --pretty=short -- drivers/
commit ce362c009250340358a7221f3cdb7954cbf19c01
Merge: 064c94f... cd7a920...
Author: Linus Torvalds <torvalds@g5.osdl.org>
Merge git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6
0.00user 0.00system 0:00.00elapsed 50%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+388minor)pagefaults 0swaps
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible.
2006-03-31 6:45 ` Junio C Hamano
@ 2006-03-31 19:39 ` Linus Torvalds
2006-03-31 20:35 ` Junio C Hamano
0 siblings, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2006-03-31 19:39 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List
On Thu, 30 Mar 2006, Junio C Hamano wrote:
>
> There already was a report that --boundary stuff is not quite
> right, so what you are seeing might be that the new code exposes
> its original breakage even more. I haven't looked into the
> breakage of the original version yet either, so I cannot really
> say how your change breaks it.
[ Awake and thinking about this again ]
No, I think the new breakage was just because I was being a stupid ass.
When I converted get_revision() to do the parent parsing up at the top
instead of at the bottom, I just didn't think correctly about your new
BOUNDARY code, and my conversion for that was just wrong. Part of the "do
the parents early" code was also removing the current commit early, so
suddenly your BOUNDARY case for doing that thing was just removing some
other random commit instead, which was obviously wrong.
The fix is trivial - with the new get_revision() organization, the
BOUNDARY case special-case actually goes away entirely, and this trivial
patch (on top of my 2/2 patch) should just fix it.
At least it passes my tests again now, and looking at the code everything
seems sane.
Linus
---
diff --git a/revision.c b/revision.c
index 0e3f074..753633e 100644
--- a/revision.c
+++ b/revision.c
@@ -796,18 +796,6 @@ struct commit *get_revision(struct rev_i
if (revs->parents)
rewrite_parents(commit);
}
- /* More to go? */
- if (revs->max_count) {
- if (commit->object.flags & BOUNDARY) {
- /* this is already uninteresting,
- * so there is no point popping its
- * parents into the list.
- */
- struct commit_list *it = revs->commits;
- revs->commits = it->next;
- free(it);
- }
- }
commit->object.flags |= SHOWN;
return commit;
} while (revs->commits);
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible.
2006-03-31 8:28 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Junio C Hamano
@ 2006-03-31 19:44 ` Linus Torvalds
0 siblings, 0 replies; 17+ messages in thread
From: Linus Torvalds @ 2006-03-31 19:44 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Fri, 31 Mar 2006, Junio C Hamano wrote:
> Linus Torvalds <torvalds@osdl.org> writes:
>
> > This is an absolutely huge deal for anything like "git log -- <pathname>",
> > but also for some things that we don't do yet - like the "find where
> > things changed" logic I've described elsewhere, where we want to find the
> > previous revision that changed a file.
> >...
> > Btw, don't even bother testing this with the git archive. git itself is so
> > small that parsing the whole revision history for it takes about a second
> > even with path limiting.
>
> By the way, I forgot to praise you ;-).
>
> Even on a fast machine, the old one was not very useful, but
> this one is _instantaneous_. Very good job.
Indeed. It's why I'd really like this to be merged before 1.3.0 - it moves
a certain class of problems from "it works" to "it's actually usable".
Now, the _real_ usage I foresee (which just wasn't practical before) is
the interactive annotation thing - this won't help a _full_file_ annotate
(which usually needs to go back to the very first version of a file
anyway), but it should make it possible to play with an incremental one
(the "graphical git-whatchanged" kind).
But even just the "git log" difference makes it worth it.
Linus
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible.
2006-03-31 19:39 ` Linus Torvalds
@ 2006-03-31 20:35 ` Junio C Hamano
2006-03-31 20:50 ` Linus Torvalds
0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2006-03-31 20:35 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Linus Torvalds <torvalds@osdl.org> writes:
> The fix is trivial - with the new get_revision() organization, the
> BOUNDARY case special-case actually goes away entirely, and this trivial
> patch (on top of my 2/2 patch) should just fix it.
Yes, that is exactly the fix I have in "pu" -- I suspect you
replied before getting to my response last night.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible.
2006-03-31 20:35 ` Junio C Hamano
@ 2006-03-31 20:50 ` Linus Torvalds
0 siblings, 0 replies; 17+ messages in thread
From: Linus Torvalds @ 2006-03-31 20:50 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Fri, 31 Mar 2006, Junio C Hamano wrote:
>
> Yes, that is exactly the fix I have in "pu" -- I suspect you
> replied before getting to my response last night.
No, I just get too much email, and hadn't read your response carefully
enough, so I just missed the fact that you had already found the problem.
Linus
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible.
2006-03-31 7:02 ` Junio C Hamano
@ 2006-04-02 0:35 ` Linus Torvalds
2006-04-02 3:11 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Linus Torvalds @ 2006-04-02 0:35 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Thu, 30 Mar 2006, Junio C Hamano wrote:
>
> OK, so let's say I agree with you that --unpacked and --since
> are "stop early" rules. I fully agree with that from usability
> and implementation point of view, but I now wonder if the
> "output filter" in get_revision() and "stop early" in limit_list()
> would do the same thing.
They don't.
What ends up not working very well at all is the combination of
"--topo-order" and the output filter in get_revision. It will return NULL
when we see the first commit out of date-order, even if we have other
commits coming.
So we really should do the "past the date order" thing in get_revision()
only if we have _not_ done it already in limit_list().
Something like this.
The easiest way to test this is with just
gitk --since=3.days.ago
on the kernel tree. Without this patch, it tends to be pretty obviously
broken.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus
---
diff --git a/revision.c b/revision.c
index a8a54b6..558ed01 100644
--- a/revision.c
+++ b/revision.c
@@ -783,10 +783,14 @@ struct commit *get_revision(struct rev_i
/*
* If we haven't done the list limiting, we need to look at
- * the parents here
+ * the parents here. We also need to do the date-based limiting
+ * that we'd otherwise have done in limit_list().
*/
- if (!revs->limited)
+ if (!revs->limited) {
+ if (revs->max_age != -1 && (commit->date < revs->max_age))
+ continue;
add_parents_to_list(revs, commit, &revs->commits);
+ }
if (commit->object.flags & SHOWN)
continue;
if (!(commit->object.flags & BOUNDARY) &&
@@ -794,8 +798,6 @@ struct commit *get_revision(struct rev_i
continue;
if (revs->min_age != -1 && (commit->date > revs->min_age))
continue;
- if (revs->max_age != -1 && (commit->date < revs->max_age))
- return NULL;
if (revs->no_merges &&
commit->parents && commit->parents->next)
continue;
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible.
2006-04-02 0:35 ` Linus Torvalds
@ 2006-04-02 3:11 ` Junio C Hamano
2006-04-02 3:17 ` [PATCH] revision: simplify argument parsing Junio C Hamano
2006-04-02 3:17 ` [PATCH] revision: --max-age alone does not need limit_list() anymore Junio C Hamano
2 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2006-04-02 3:11 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Linus Torvalds <torvalds@osdl.org> writes:
> What ends up not working very well at all is the combination of
> "--topo-order" and the output filter in get_revision. It will return NULL
> when we see the first commit out of date-order, even if we have other
> commits coming.
>
> So we really should do the "past the date order" thing in get_revision()
> only if we have _not_ done it already in limit_list().
Right now --max-age causes "limited" so there should not be a
difference, if I am not mistaken. But I've been meaning to
change that, so the patch makes sense. Similarly,...
-- >8 --
[PATCH] revision: --topo-order and --unpacked
Now, using --unpacked without limit_list() does not make much
sense, but this is parallel to the earlier --max-age fix.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
revision.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
22c31bf183bff576c7807f9d67abfc11ee8e1fa4
diff --git a/revision.c b/revision.c
index 558ed01..07cc86f 100644
--- a/revision.c
+++ b/revision.c
@@ -787,7 +787,10 @@ struct commit *get_revision(struct rev_i
* that we'd otherwise have done in limit_list().
*/
if (!revs->limited) {
- if (revs->max_age != -1 && (commit->date < revs->max_age))
+ if ((revs->unpacked &&
+ has_sha1_pack(commit->object.sha1)) ||
+ (revs->max_age != -1 &&
+ (commit->date < revs->max_age)))
continue;
add_parents_to_list(revs, commit, &revs->commits);
}
--
1.3.0.rc1.gf385
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH] revision: simplify argument parsing.
2006-04-02 0:35 ` Linus Torvalds
2006-04-02 3:11 ` Junio C Hamano
@ 2006-04-02 3:17 ` Junio C Hamano
[not found] ` <443063E2.1040904@lsrfire.ath.cx>
2006-04-02 3:17 ` [PATCH] revision: --max-age alone does not need limit_list() anymore Junio C Hamano
2 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2006-04-02 3:17 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
This just moves code around to consolidate the part that sets
revs->limited to one place based on various flags.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
* Just the preparation for the next step...
revision.c | 20 +++++++-------------
1 files changed, 7 insertions(+), 13 deletions(-)
53069686601d156dea3787a100ffc4e35c78040f
diff --git a/revision.c b/revision.c
index 07cc86f..cdd29c5 100644
--- a/revision.c
+++ b/revision.c
@@ -552,32 +552,26 @@ int setup_revisions(int argc, const char
}
if (!strncmp(arg, "--max-age=", 10)) {
revs->max_age = atoi(arg + 10);
- revs->limited = 1;
continue;
}
- if (!strncmp(arg, "--min-age=", 10)) {
- revs->min_age = atoi(arg + 10);
- revs->limited = 1;
- continue;
- }
if (!strncmp(arg, "--since=", 8)) {
revs->max_age = approxidate(arg + 8);
- revs->limited = 1;
continue;
}
if (!strncmp(arg, "--after=", 8)) {
revs->max_age = approxidate(arg + 8);
- revs->limited = 1;
continue;
}
+ if (!strncmp(arg, "--min-age=", 10)) {
+ revs->min_age = atoi(arg + 10);
+ continue;
+ }
if (!strncmp(arg, "--before=", 9)) {
revs->min_age = approxidate(arg + 9);
- revs->limited = 1;
continue;
}
if (!strncmp(arg, "--until=", 8)) {
revs->min_age = approxidate(arg + 8);
- revs->limited = 1;
continue;
}
if (!strcmp(arg, "--all")) {
@@ -596,13 +590,11 @@ int setup_revisions(int argc, const char
}
if (!strcmp(arg, "--topo-order")) {
revs->topo_order = 1;
- revs->limited = 1;
continue;
}
if (!strcmp(arg, "--date-order")) {
revs->lifo = 0;
revs->topo_order = 1;
- revs->limited = 1;
continue;
}
if (!strcmp(arg, "--parents")) {
@@ -644,7 +636,6 @@ int setup_revisions(int argc, const char
}
if (!strcmp(arg, "--unpacked")) {
revs->unpacked = 1;
- revs->limited = 1;
continue;
}
*unrecognized++ = arg;
@@ -707,6 +698,9 @@ int setup_revisions(int argc, const char
commit = get_commit_reference(revs, def, sha1, 0);
add_one_commit(commit, revs);
}
+
+ if ((revs->max_age != -1) || revs->topo_order || revs->unpacked)
+ revs->limited = 1;
if (revs->prune_data) {
diff_tree_setup_paths(revs->prune_data);
--
1.3.0.rc1.gf385
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH] revision: --max-age alone does not need limit_list() anymore.
2006-04-02 0:35 ` Linus Torvalds
2006-04-02 3:11 ` Junio C Hamano
2006-04-02 3:17 ` [PATCH] revision: simplify argument parsing Junio C Hamano
@ 2006-04-02 3:17 ` Junio C Hamano
2 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2006-04-02 3:17 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
This makes git log --since=7.days to be streamable.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
revision.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
bbbc8c3a8d455e1f5d15c3764eba70250b5479e9
diff --git a/revision.c b/revision.c
index cdd29c5..728b6d1 100644
--- a/revision.c
+++ b/revision.c
@@ -699,7 +699,7 @@ int setup_revisions(int argc, const char
add_one_commit(commit, revs);
}
- if ((revs->max_age != -1) || revs->topo_order || revs->unpacked)
+ if (revs->topo_order || revs->unpacked)
revs->limited = 1;
if (revs->prune_data) {
--
1.3.0.rc1.gf385
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] revision: simplify argument parsing.
[not found] ` <443063E2.1040904@lsrfire.ath.cx>
@ 2006-04-03 4:22 ` Junio C Hamano
0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2006-04-03 4:22 UTC (permalink / raw)
To: Rene Scharfe; +Cc: Linus Torvalds, git
Rene Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
> The comment is misleading because the patch not only moves stuff around,
> it also removes setting of revs->limited when revs->min_age has been
> set.
You are correct that the patch does more than it claims to.
That is an independent fix mentioned in my earlier message
<7vr74jxhp3.fsf@assigned-by-dhcp.cox.net>. We did not have to
pay the overhead of having to call limit_list() when only
min_age was specified because we did not do traversal filtering
using UNINTERESTING logic for min_age, but we called it anyway.
Unlike --max-age that marks an old one _and_ its parents
uninteresting, it needs to filter a young one without making its
parents uninteresting.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2006-04-03 4:22 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-31 0:52 [PATCH/RFC 1/2] Move "--parent" parsing into generic revision.c library code Linus Torvalds
2006-03-31 1:05 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Linus Torvalds
2006-03-31 6:05 ` Linus Torvalds
2006-03-31 6:45 ` Junio C Hamano
2006-03-31 19:39 ` Linus Torvalds
2006-03-31 20:35 ` Junio C Hamano
2006-03-31 20:50 ` Linus Torvalds
2006-03-31 6:16 ` Junio C Hamano
2006-03-31 6:42 ` Linus Torvalds
2006-03-31 7:02 ` Junio C Hamano
2006-04-02 0:35 ` Linus Torvalds
2006-04-02 3:11 ` Junio C Hamano
2006-04-02 3:17 ` [PATCH] revision: simplify argument parsing Junio C Hamano
[not found] ` <443063E2.1040904@lsrfire.ath.cx>
2006-04-03 4:22 ` Junio C Hamano
2006-04-02 3:17 ` [PATCH] revision: --max-age alone does not need limit_list() anymore Junio C Hamano
2006-03-31 8:28 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Junio C Hamano
2006-03-31 19:44 ` Linus Torvalds
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).