* Question about get_cached_commit_buffer() @ 2018-02-20 22:12 Derrick Stolee 2018-02-20 22:57 ` Jeff King 0 siblings, 1 reply; 17+ messages in thread From: Derrick Stolee @ 2018-02-20 22:12 UTC (permalink / raw) To: git@vger.kernel.org; +Cc: Jeff King, Derrick Stolee, Junio C Hamano While working on my commit-graph patch [1] and using a local build in my usual workflows, I found a bug in my branch. Essentially, when calling `git rev-list --header`, the header information is actually missing for many commits except the first. This was not caught in my testing since t6000-rev-list-misc.sh creates a test repo with only one commit. The root cause is that the serialized commit graph gets its speedup by not loading buffers for every commit. For many use-cases (merge bases, --topo-order, etc.) we do not need the buffer for most of the commits. In the rev-list example, the buffer is loaded due to a side-effect of being referenced by HEAD or a branch. A similar effect happened in `git log`, hence the following change in my patch: diff --git a/log-tree.c b/log-tree.c index 580b3a9..14735d4 100644 --- a/log-tree.c +++ b/log-tree.c @@ -647,8 +647,7 @@ void show_log(struct rev_info *opt) show_mergetag(opt, commit); } - if (!get_cached_commit_buffer(commit, NULL)) - return; + get_commit_buffer(commit, NULL); if (opt->show_notes) { int raw; In rev-list, the "--header" option outputs a value and expects the buffer to be cached. It outputs the header info only if get_cached_commit_buffer() returns a non-null buffer, giving incorrect output. If it called get_commit_buffer() instead, it would immediately call get_cached_commit_buffer() and on failure actually load the buffer. This has not been a problem before, since the buffer was always loaded at some point for each commit (and saved because of the save_commit_buffer global). I propose to make get_cached_commit_buffer() static to commit.c and convert all callers to get_commit_buffer(). Is there any reason to _not_ do this? It seems that there is no functional or performance change. After the serialized commit graph exists and is used, the only change is that we delay loading the buffer until we need to read the commit metadata that is not stored in the graph (message, author/committer). Thanks, -Stolee [1] https://public-inbox.org/git/4d1ee202-7d79-d73c-6e05-d0fc85db943c@gmail.com/T/#m381bfd3f2eafbd254e35a5147cd198bc35055e92 [Patch v4 00/14] Serialized Git Commit Graph [2] https://public-inbox.org/git/20140610214039.GJ19147@sigill.intra.peff.net/ [Patch 10/17] provide helpers to access the commit buffer ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: Question about get_cached_commit_buffer() 2018-02-20 22:12 Question about get_cached_commit_buffer() Derrick Stolee @ 2018-02-20 22:57 ` Jeff King 2018-02-21 14:13 ` Derrick Stolee 0 siblings, 1 reply; 17+ messages in thread From: Jeff King @ 2018-02-20 22:57 UTC (permalink / raw) To: Derrick Stolee; +Cc: git@vger.kernel.org, Derrick Stolee, Junio C Hamano On Tue, Feb 20, 2018 at 05:12:50PM -0500, Derrick Stolee wrote: > In rev-list, the "--header" option outputs a value and expects the buffer to > be cached. It outputs the header info only if get_cached_commit_buffer() > returns a non-null buffer, giving incorrect output. If it called > get_commit_buffer() instead, it would immediately call > get_cached_commit_buffer() and on failure actually load the buffer. > > This has not been a problem before, since the buffer was always loaded at > some point for each commit (and saved because of the save_commit_buffer > global). > > I propose to make get_cached_commit_buffer() static to commit.c and convert > all callers to get_commit_buffer(). Is there any reason to _not_ do this? It > seems that there is no functional or performance change. That helper was added in 152ff1cceb (provide helpers to access the commit buffer, 2014-06-10). I think interesting part is the final paragraph: Note that we also need to add a "get_cached" variant which returns NULL when we do not have a cached buffer. At first glance this seems to defeat the purpose of "get", which is to always provide a return value. However, some log code paths actually use the NULL-ness of commit->buffer as a boolean flag to decide whether to try printing the commit. At least for now, we want to continue supporting that use. So I think a conversion to get_commit_buffer() would break the callers that use the boolean nature for something useful. Unfortunately the author did not enumerate exactly what those uses are, so we'll have to dig. :) My guess is that it has to do with showing some commits twice, since we would normally have the buffer available the first time we hit the commit, and then free it in finish_commit(). If we blame that rev-list line (and then walk back over a bunch of uninteresting commits via parent re-blaming), it comes from 3131b71301 (Add "--show-all" revision walker flag for debugging, 2008-02-09). So there it is. It does show commits multiple times, but suppresses the verbose header after the first showing. If we do something like this: git rev-list --show-all --pretty --boundary c93150cfb0^- you'll see some boundary commits that _don't_ have their pretty headers shown. And with your proposed patch, we'd show them again. To keep the same behavior we need to store that "we've already seen this" boolean somewhere else (e.g., in an object flag; possibly SEEN, but that might run afoul of other logic). It looks like the call in log-tree comes from the same commit, and serves the same purpose. Aside from storing the boolean "did we show it" in another way, the other option is to simply change the behavior and accept that we might pretty-print the commit twice. This is a backwards-incompatible change, but I'm not sure if anybody would care. According to that commit, --show-all was added explicitly for debugging, and it has never been documented. I couldn't find any reference to people actually using it on the list (a grep of the whole archive turns up 32 messages, most of which are just it appearing in context; the only person mentioning its actual use was Linus in 2008. -Peff ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question about get_cached_commit_buffer() 2018-02-20 22:57 ` Jeff King @ 2018-02-21 14:13 ` Derrick Stolee 2018-02-21 18:48 ` Jeff King 0 siblings, 1 reply; 17+ messages in thread From: Derrick Stolee @ 2018-02-21 14:13 UTC (permalink / raw) To: Jeff King; +Cc: git@vger.kernel.org, Derrick Stolee, Junio C Hamano On 2/20/2018 5:57 PM, Jeff King wrote: > On Tue, Feb 20, 2018 at 05:12:50PM -0500, Derrick Stolee wrote: > >> In rev-list, the "--header" option outputs a value and expects the buffer to >> be cached. It outputs the header info only if get_cached_commit_buffer() >> returns a non-null buffer, giving incorrect output. If it called >> get_commit_buffer() instead, it would immediately call >> get_cached_commit_buffer() and on failure actually load the buffer. >> >> This has not been a problem before, since the buffer was always loaded at >> some point for each commit (and saved because of the save_commit_buffer >> global). >> >> I propose to make get_cached_commit_buffer() static to commit.c and convert >> all callers to get_commit_buffer(). Is there any reason to _not_ do this? It >> seems that there is no functional or performance change. > That helper was added in 152ff1cceb (provide helpers to access the > commit buffer, 2014-06-10). I think interesting part is the final > paragraph: > > Note that we also need to add a "get_cached" variant which > returns NULL when we do not have a cached buffer. At first > glance this seems to defeat the purpose of "get", which is > to always provide a return value. However, some log code > paths actually use the NULL-ness of commit->buffer as a > boolean flag to decide whether to try printing the > commit. At least for now, we want to continue supporting > that use. > > So I think a conversion to get_commit_buffer() would break the callers > that use the boolean nature for something useful. Unfortunately the > author did not enumerate exactly what those uses are, so we'll have to > dig. :) > > My guess is that it has to do with showing some commits twice, since we > would normally have the buffer available the first time we hit the > commit, and then free it in finish_commit(). > > If we blame that rev-list line (and then walk back over a bunch of > uninteresting commits via parent re-blaming), it comes from 3131b71301 > (Add "--show-all" revision walker flag for debugging, 2008-02-09). Thanks for doing this digging. I appreciate the breadcrumbs, too, so I can do a better job of digging next time. > So there it is. It does show commits multiple times, but suppresses the > verbose header after the first showing. If we do something like this: > > git rev-list --show-all --pretty --boundary c93150cfb0^- > > you'll see some boundary commits that _don't_ have their pretty headers > shown. And with your proposed patch, we'd show them again. To keep the > same behavior we need to store that "we've already seen this" boolean > somewhere else (e.g., in an object flag; possibly SEEN, but that might > run afoul of other logic). What confuses me about this behavior is that the OID is still shown on the repeat (and in the case of `git log --oneline` will not actually have a line break between two short-OIDs). I don't believe this behavior is something to preserve. You are right that we definitely don't want to show the full content twice. > It looks like the call in log-tree comes from the same commit, and > serves the same purpose. > > Aside from storing the boolean "did we show it" in another way, the > other option is to simply change the behavior and accept that we might > pretty-print the commit twice. This is a backwards-incompatible change, > but I'm not sure if anybody would care. According to that commit, > --show-all was added explicitly for debugging, and it has never been > documented. I couldn't find any reference to people actually using it > on the list (a grep of the whole archive turns up 32 messages, most of > which are just it appearing in context; the only person mentioning its > actual use was Linus in 2008. Unless I am misunderstanding, the current behavior on a repeated commit is already incorrect: some amount of output occurs before checking the buffer, so the output includes repeated records but with formatting that violates the expectation. By doing the simple change of swapping get_cached_commit_buffer() with get_commit_buffer(), we correct that format violation but have duplicate copies. The most-correct thing to do (in my opinion) is to put the requirement of "no repeats" into the revision walk logic and stop having the formatting methods expect them. Then, however we change this boolean setting of "we have seen this before" it will not require the formatting methods to change. I can start working on a patch to move the duplicate-removal logic into revision.c instead of these three callers: builtin/rev-list.c: if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) { log-tree.c: if (!get_cached_commit_buffer(commit, NULL)) object.c: if (!get_cached_commit_buffer(commit, NULL)) { But this caller seems pretty important in pretty.c: /* * Otherwise, we still want to munge the encoding header in the * result, which will be done by modifying the buffer. If we * are using a fresh copy, we can reuse it. But if we are using * the cached copy from get_commit_buffer, we need to duplicate it * to avoid munging the cached copy. */ if (msg == get_cached_commit_buffer(commit, NULL)) out = xstrdup(msg); else out = (char *)msg Thanks, -Stolee ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question about get_cached_commit_buffer() 2018-02-21 14:13 ` Derrick Stolee @ 2018-02-21 18:48 ` Jeff King 2018-02-21 18:52 ` Jeff King 2018-02-21 22:22 ` Question about get_cached_commit_buffer() Junio C Hamano 0 siblings, 2 replies; 17+ messages in thread From: Jeff King @ 2018-02-21 18:48 UTC (permalink / raw) To: Derrick Stolee; +Cc: git@vger.kernel.org, Derrick Stolee, Junio C Hamano On Wed, Feb 21, 2018 at 09:13:22AM -0500, Derrick Stolee wrote: > > So there it is. It does show commits multiple times, but suppresses the > > verbose header after the first showing. If we do something like this: > > > > git rev-list --show-all --pretty --boundary c93150cfb0^- > > > > you'll see some boundary commits that _don't_ have their pretty headers > > shown. And with your proposed patch, we'd show them again. To keep the > > same behavior we need to store that "we've already seen this" boolean > > somewhere else (e.g., in an object flag; possibly SEEN, but that might > > run afoul of other logic). > > What confuses me about this behavior is that the OID is still shown on the > repeat (and in the case of `git log --oneline` will not actually have a line > break between two short-OIDs). I don't believe this behavior is something to > preserve. I think that repeating the oid is intentional; the point is to dump how the traversal code is hitting the endpoints, even if we do so multiple times. The --oneline behavior just looks like a bug. I think --format is broken with --show-all, too (it does not show anything!). > Unless I am misunderstanding, the current behavior on a repeated commit is > already incorrect: some amount of output occurs before checking the buffer, > so the output includes repeated records but with formatting that violates > the expectation. By doing the simple change of swapping > get_cached_commit_buffer() with get_commit_buffer(), we correct that format > violation but have duplicate copies. Yeah, I'd agree with that assessment. > The most-correct thing to do (in my opinion) is to put the requirement of > "no repeats" into the revision walk logic and stop having the formatting > methods expect them. Then, however we change this boolean setting of "we > have seen this before" it will not require the formatting methods to change. But then you wouldn't show repeats at all. If I'm understanding you correctly. TBH, I do not think it is worth spending a lot of effort on this --show-all feature. It seems mostly like forgotten debugging cruft to me. That's why I'd be OK with showing the whole header as the simplest fix (i.e., just removing those calls entirely, not even converting them to get_commit_buffer). > I can start working on a patch to move the duplicate-removal logic into > revision.c instead of these three callers: > > builtin/rev-list.c: if (revs->verbose_header && > get_cached_commit_buffer(commit, NULL)) { > log-tree.c: if (!get_cached_commit_buffer(commit, NULL)) > object.c: if (!get_cached_commit_buffer(commit, NULL)) Those first two are duplicate detection. The third one in object.c should stay, though. We've been fed a commit buffer to parse, and we want to know whether we should attach it as the cached buffer for that commit. But if we already have a cached buffer, there's no point in doing so. And that's what we're checking there. Though I think it would be equally correct to have set_commit_buffer() just throw away the existing cache entry and replace it with this one. I don't think there's a real reason to prefer the old to the new. And that might be worth doing if it would let us drop get_cached_commit_buffer() as a public function. But... > But this caller seems pretty important in pretty.c: > > /* > * Otherwise, we still want to munge the encoding header in the > * result, which will be done by modifying the buffer. If we > * are using a fresh copy, we can reuse it. But if we are using > * the cached copy from get_commit_buffer, we need to duplicate it > * to avoid munging the cached copy. > */ > if (msg == get_cached_commit_buffer(commit, NULL)) > out = xstrdup(msg); > else > out = (char *)msg Like the one in object.c, this really does want to know about the cached entry. And it should be unaffected by your patch, since we will have called get_commit_buffer() at the top of that function. If we wanted to write this one without get_cached_commit_buffer(), we'd really need a function to ask "did this pointer come from the cache, or was it freshly allocated?". That's the same thing we do for unuse_commit_buffer(). So in theory we could have a boolean function that would check that, and that would let us make get_cached_commit_buffer() private. In my opinion it's not really worth trying to make it private. The confusion you're fixing in the first two calls is not due to a bad API, but due to some subtly confusing logic in that code's use of the API. ;) So I'd probably do this: diff --git a/builtin/rev-list.c b/builtin/rev-list.c index d94062bc84..3af56921c8 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -150,7 +150,7 @@ static void show_commit(struct commit *commit, void *data) else putchar('\n'); - if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) { + if (revs->verbose_header) { struct strbuf buf = STRBUF_INIT; struct pretty_print_context ctx = {0}; ctx.abbrev = revs->abbrev; diff --git a/log-tree.c b/log-tree.c index cab9353f45..cb2dab8a1c 100644 --- a/log-tree.c +++ b/log-tree.c @@ -690,9 +690,6 @@ void show_log(struct rev_info *opt) show_mergetag(opt, commit); } - if (!get_cached_commit_buffer(commit, NULL)) - return; - if (opt->show_notes) { int raw; struct strbuf notebuf = STRBUF_INIT; with the rationale that: 1. Nobody really cares about this verbose-output suppression anyway. 2. The code is confusing and fragile, since it uses the cached commit buffer as an implicit boolean for "did we show the commit already". 3. It's broken for --oneline and user-formats, and this fixes it. -Peff ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: Question about get_cached_commit_buffer() 2018-02-21 18:48 ` Jeff King @ 2018-02-21 18:52 ` Jeff King 2018-02-21 19:17 ` [PATCH] commit: drop uses of get_cached_commit_buffer() Derrick Stolee 2018-02-21 22:22 ` Question about get_cached_commit_buffer() Junio C Hamano 1 sibling, 1 reply; 17+ messages in thread From: Jeff King @ 2018-02-21 18:52 UTC (permalink / raw) To: Derrick Stolee; +Cc: git@vger.kernel.org, Derrick Stolee, Junio C Hamano On Wed, Feb 21, 2018 at 01:48:11PM -0500, Jeff King wrote: > > What confuses me about this behavior is that the OID is still shown on the > > repeat (and in the case of `git log --oneline` will not actually have a line > > break between two short-OIDs). I don't believe this behavior is something to > > preserve. > > I think that repeating the oid is intentional; the point is to dump how > the traversal code is hitting the endpoints, even if we do so multiple > times. > > The --oneline behavior just looks like a bug. I think --format is broken > with --show-all, too (it does not show anything!). I poked at one of the examples a little more closely. I actually think these are not repeats, but simply UNINTERESTING parents that we never needed to look at in our traversal (because we hit a point where everything was UNINTERESTING). So we are relying not on finish_commit() to have freed the buffer, but on the traversal code to have never parsed those commits in the first place. Which is doubly subtle. I think the rest of my email stands, though: we should just show the full headers for those commits. -Peff ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] commit: drop uses of get_cached_commit_buffer() 2018-02-21 18:52 ` Jeff King @ 2018-02-21 19:17 ` Derrick Stolee 2018-02-21 19:19 ` Derrick Stolee 2018-02-21 23:13 ` Jeff King 0 siblings, 2 replies; 17+ messages in thread From: Derrick Stolee @ 2018-02-21 19:17 UTC (permalink / raw) To: git; +Cc: peff, Derrick Stolee The get_cached_commit_buffer() method provides access to the buffer loaded for a struct commit, if it was ever loadead and was not freed. Two places use this to inform how to output information about commits. log-tree.c uses this method to short-circuit the output of commit information when the buffer is not cached. However, this leads to incorrect output in 'git log --oneline' where the short-OID is written but then the rest of the commit information is dropped and the next commit is written on the same line. rev-list uses this method for two reasons: - First, if the revision walk visits a commit twice, the buffer was freed by rev-list in the first write. The output then does not match the format expectations, since the OID is written without the rest of the content. - Second, if the revision walk visits a commit that was marked UNINTERESTING, the walk may never load a buffer and hence rev-list will not output the verbose information. These behaviors are undocumented, untested, and unlikely to be expected by users or other software attempting to parse this output. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- builtin/rev-list.c | 2 +- log-tree.c | 3 --- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 48300d9..d320b6f 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -134,7 +134,7 @@ static void show_commit(struct commit *commit, void *data) else putchar('\n'); - if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) { + if (revs->verbose_header) { struct strbuf buf = STRBUF_INIT; struct pretty_print_context ctx = {0}; ctx.abbrev = revs->abbrev; diff --git a/log-tree.c b/log-tree.c index fc0cc0d..22b2fb6 100644 --- a/log-tree.c +++ b/log-tree.c @@ -659,9 +659,6 @@ void show_log(struct rev_info *opt) show_mergetag(opt, commit); } - if (!get_cached_commit_buffer(commit, NULL)) - return; - if (opt->show_notes) { int raw; struct strbuf notebuf = STRBUF_INIT; -- 2.7.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] commit: drop uses of get_cached_commit_buffer() 2018-02-21 19:17 ` [PATCH] commit: drop uses of get_cached_commit_buffer() Derrick Stolee @ 2018-02-21 19:19 ` Derrick Stolee 2018-02-21 23:34 ` Jeff King 2018-02-21 23:13 ` Jeff King 1 sibling, 1 reply; 17+ messages in thread From: Derrick Stolee @ 2018-02-21 19:19 UTC (permalink / raw) To: Derrick Stolee, git; +Cc: peff On 2/21/2018 2:17 PM, Derrick Stolee wrote: > The get_cached_commit_buffer() method provides access to the buffer > loaded for a struct commit, if it was ever loadead and was not freed. > > Two places use this to inform how to output information about commits. > > log-tree.c uses this method to short-circuit the output of commit > information when the buffer is not cached. However, this leads to > incorrect output in 'git log --oneline' where the short-OID is written > but then the rest of the commit information is dropped and the next > commit is written on the same line. > > rev-list uses this method for two reasons: > > - First, if the revision walk visits a commit twice, the buffer was > freed by rev-list in the first write. The output then does not > match the format expectations, since the OID is written without the > rest of the content. > > - Second, if the revision walk visits a commit that was marked > UNINTERESTING, the walk may never load a buffer and hence rev-list > will not output the verbose information. > > These behaviors are undocumented, untested, and unlikely to be > expected by users or other software attempting to parse this output. > > Helped-by: Jeff King <peff@peff.net> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> This would be a good time to allow multiple authors, or to just change the author, since this is exactly the diff you (Peff) provided in an earlier email. The commit message hopefully summarizes our discussion, but I welcome edits. > --- > builtin/rev-list.c | 2 +- > log-tree.c | 3 --- > 2 files changed, 1 insertion(+), 4 deletions(-) > > diff --git a/builtin/rev-list.c b/builtin/rev-list.c > index 48300d9..d320b6f 100644 > --- a/builtin/rev-list.c > +++ b/builtin/rev-list.c > @@ -134,7 +134,7 @@ static void show_commit(struct commit *commit, void *data) > else > putchar('\n'); > > - if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) { > + if (revs->verbose_header) { > struct strbuf buf = STRBUF_INIT; > struct pretty_print_context ctx = {0}; > ctx.abbrev = revs->abbrev; > diff --git a/log-tree.c b/log-tree.c > index fc0cc0d..22b2fb6 100644 > --- a/log-tree.c > +++ b/log-tree.c > @@ -659,9 +659,6 @@ void show_log(struct rev_info *opt) > show_mergetag(opt, commit); > } > > - if (!get_cached_commit_buffer(commit, NULL)) > - return; > - > if (opt->show_notes) { > int raw; > struct strbuf notebuf = STRBUF_INIT; ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] commit: drop uses of get_cached_commit_buffer() 2018-02-21 19:19 ` Derrick Stolee @ 2018-02-21 23:34 ` Jeff King 0 siblings, 0 replies; 17+ messages in thread From: Jeff King @ 2018-02-21 23:34 UTC (permalink / raw) To: Derrick Stolee; +Cc: Derrick Stolee, git On Wed, Feb 21, 2018 at 02:19:17PM -0500, Derrick Stolee wrote: > > These behaviors are undocumented, untested, and unlikely to be > > expected by users or other software attempting to parse this output. > > > > Helped-by: Jeff King <peff@peff.net> > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > > This would be a good time to allow multiple authors, or to just change the > author, since this is exactly the diff you (Peff) provided in an earlier > email. The commit message hopefully summarizes our discussion, but I welcome > edits. The point is moot if we take the revision I just sent (though in retrospect I really ought to have put in a Reported-by: for you there). But some communities are settling on Co-authored-by as a trailer for this case. And GitHub has started parsing and showing that along with author information: https://github.com/blog/2496-commit-together-with-co-authors -Peff ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] commit: drop uses of get_cached_commit_buffer() 2018-02-21 19:17 ` [PATCH] commit: drop uses of get_cached_commit_buffer() Derrick Stolee 2018-02-21 19:19 ` Derrick Stolee @ 2018-02-21 23:13 ` Jeff King 2018-02-21 23:22 ` Stefan Beller 2018-02-22 1:52 ` Derrick Stolee 1 sibling, 2 replies; 17+ messages in thread From: Jeff King @ 2018-02-21 23:13 UTC (permalink / raw) To: Derrick Stolee; +Cc: git, Junio C Hamano On Wed, Feb 21, 2018 at 02:17:11PM -0500, Derrick Stolee wrote: > The get_cached_commit_buffer() method provides access to the buffer > loaded for a struct commit, if it was ever loadead and was not freed. > > Two places use this to inform how to output information about commits. > > log-tree.c uses this method to short-circuit the output of commit > information when the buffer is not cached. However, this leads to > incorrect output in 'git log --oneline' where the short-OID is written > but then the rest of the commit information is dropped and the next > commit is written on the same line. > > rev-list uses this method for two reasons: > > - First, if the revision walk visits a commit twice, the buffer was > freed by rev-list in the first write. The output then does not > match the format expectations, since the OID is written without the > rest of the content. I'm not sure after my earlier digging if there is even a way to trigger this (and if so, it is probably accidental, since those lines were added explicitly for --show-all). And actually after re-reading the commit message for 3131b7130 again, I think the current behavior is definitely not something that was carefully planned. So I'd propose a commit message like below. -- >8 -- Subject: [PATCH] commit: drop uses of get_cached_commit_buffer() The "--show-all" revision option shows UNINTERESTING commits. Some of these commits may be unparsed when we try to show them (since we may or may not need to walk their parents to fulfill the request). Commit 3131b71301 (Add "--show-all" revision walker flag for debugging, 2008-02-09) resolved this by just skipping pretty-printing for commits without their object contents cached, saying: Because we now end up listing commits we may not even have been parsed at all "show_log" and "show_commit" need to protect against commits that don't have a commit buffer entry. That was the easy fix to avoid the pretty-printer segfaulting, but: 1. It doesn't work for all formats. E.g., --oneline prints the oid for each such commit but not a trailing newline, leading to jumbled output. 2. It only affects some commits, depending on whether we happened to parse them or not (so if they were at the tip of an UNINTERESTING starting point, or if we happened to traverse over them, you'd see more data). 3. It unncessarily ties the decision to show the verbose header to whether the commit buffer was cached. That makes it harder to change the logic around caching (e.g., if we could traverse without actually loading the full commit objects). These days it's safe to feed such a commit to the pretty-print code. Since be5c9fb904 (logmsg_reencode: lazily load missing commit buffers, 2013-01-26), we'll load it on demand in such a case. So let's just always show the verbose headers. This does change the behavior of plumbing, but: a. The --show-all option was explicitly introduced as a debugging aid, and was never documented (and has rarely even been mentioned on the list by git devs). b. Avoiding the commits was already not deterministic due to (2) above. So the caller might have seen full headers for these commits anyway, and would need to be prepared for it. Signed-off-by: Jeff King <peff@peff.net> --- builtin/rev-list.c | 2 +- log-tree.c | 3 --- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 48300d9e11..d320b6f1e3 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -134,7 +134,7 @@ static void show_commit(struct commit *commit, void *data) else putchar('\n'); - if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) { + if (revs->verbose_header) { struct strbuf buf = STRBUF_INIT; struct pretty_print_context ctx = {0}; ctx.abbrev = revs->abbrev; diff --git a/log-tree.c b/log-tree.c index fc0cc0d6d1..22b2fb6c58 100644 --- a/log-tree.c +++ b/log-tree.c @@ -659,9 +659,6 @@ void show_log(struct rev_info *opt) show_mergetag(opt, commit); } - if (!get_cached_commit_buffer(commit, NULL)) - return; - if (opt->show_notes) { int raw; struct strbuf notebuf = STRBUF_INIT; -- 2.16.2.555.g885a024879 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] commit: drop uses of get_cached_commit_buffer() 2018-02-21 23:13 ` Jeff King @ 2018-02-21 23:22 ` Stefan Beller 2018-02-21 23:29 ` Jeff King 2018-02-22 1:52 ` Derrick Stolee 1 sibling, 1 reply; 17+ messages in thread From: Stefan Beller @ 2018-02-21 23:22 UTC (permalink / raw) To: Jeff King; +Cc: Derrick Stolee, git, Junio C Hamano > Subject: [PATCH] commit: drop uses of get_cached_commit_buffer() > --- > builtin/rev-list.c | 2 +- > log-tree.c | 3 --- > 2 files changed, 1 insertion(+), 4 deletions(-) Now if we'd get around to rewrite pretty.c as well, we could make it static, giving a stronger reason of not using that function. But it looks a bit complicated to me, who is not familiar in that area of the code. Thanks for making less use of this suboptimal API, Stefan ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] commit: drop uses of get_cached_commit_buffer() 2018-02-21 23:22 ` Stefan Beller @ 2018-02-21 23:29 ` Jeff King 0 siblings, 0 replies; 17+ messages in thread From: Jeff King @ 2018-02-21 23:29 UTC (permalink / raw) To: Stefan Beller; +Cc: Derrick Stolee, git, Junio C Hamano On Wed, Feb 21, 2018 at 03:22:02PM -0800, Stefan Beller wrote: > > Subject: [PATCH] commit: drop uses of get_cached_commit_buffer() > > --- > > builtin/rev-list.c | 2 +- > > log-tree.c | 3 --- > > 2 files changed, 1 insertion(+), 4 deletions(-) > > Now if we'd get around to rewrite pretty.c as well, we could make it static, > giving a stronger reason of not using that function. But it looks a bit > complicated to me, who is not familiar in that area of the code. > > Thanks for making less use of this suboptimal API, I'm not sure the API is suboptimal. It's not wrong to ask "do you have a cached copy of this?". It was just being used poorly here. :) See the discussion in https://public-inbox.org/git/20180221184811.GD4333@sigill.intra.peff.net/ -Peff ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] commit: drop uses of get_cached_commit_buffer() 2018-02-21 23:13 ` Jeff King 2018-02-21 23:22 ` Stefan Beller @ 2018-02-22 1:52 ` Derrick Stolee 1 sibling, 0 replies; 17+ messages in thread From: Derrick Stolee @ 2018-02-22 1:52 UTC (permalink / raw) To: Jeff King, Derrick Stolee; +Cc: git, Junio C Hamano On 2/21/2018 6:13 PM, Jeff King wrote: > On Wed, Feb 21, 2018 at 02:17:11PM -0500, Derrick Stolee wrote: > >> The get_cached_commit_buffer() method provides access to the buffer >> loaded for a struct commit, if it was ever loadead and was not freed. >> >> Two places use this to inform how to output information about commits. >> >> log-tree.c uses this method to short-circuit the output of commit >> information when the buffer is not cached. However, this leads to >> incorrect output in 'git log --oneline' where the short-OID is written >> but then the rest of the commit information is dropped and the next >> commit is written on the same line. >> >> rev-list uses this method for two reasons: >> >> - First, if the revision walk visits a commit twice, the buffer was >> freed by rev-list in the first write. The output then does not >> match the format expectations, since the OID is written without the >> rest of the content. > I'm not sure after my earlier digging if there is even a way to trigger > this (and if so, it is probably accidental, since those lines were added > explicitly for --show-all). > > And actually after re-reading the commit message for 3131b7130 again, I > think the current behavior is definitely not something that was > carefully planned. So I'd propose a commit message like below. I only submitted my patch to avoid making you do the work of writing the commit message. My messages still don't have quite the right amount of detail (or the correct details, in this case). Junio: please add Reported-by: Derrick Stolee <dstolee@microsoft.com> Thanks, -Stolee > > -- >8 -- > Subject: [PATCH] commit: drop uses of get_cached_commit_buffer() > > The "--show-all" revision option shows UNINTERESTING > commits. Some of these commits may be unparsed when we try > to show them (since we may or may not need to walk their > parents to fulfill the request). > > Commit 3131b71301 (Add "--show-all" revision walker flag for > debugging, 2008-02-09) resolved this by just skipping > pretty-printing for commits without their object contents > cached, saying: > > Because we now end up listing commits we may not even have been parsed > at all "show_log" and "show_commit" need to protect against commits > that don't have a commit buffer entry. > > That was the easy fix to avoid the pretty-printer segfaulting, > but: > > 1. It doesn't work for all formats. E.g., --oneline > prints the oid for each such commit but not a trailing > newline, leading to jumbled output. > > 2. It only affects some commits, depending on whether we > happened to parse them or not (so if they were at the > tip of an UNINTERESTING starting point, or if we > happened to traverse over them, you'd see more data). > > 3. It unncessarily ties the decision to show the verbose > header to whether the commit buffer was cached. That > makes it harder to change the logic around caching > (e.g., if we could traverse without actually loading > the full commit objects). > > These days it's safe to feed such a commit to the > pretty-print code. Since be5c9fb904 (logmsg_reencode: lazily > load missing commit buffers, 2013-01-26), we'll load it on > demand in such a case. So let's just always show the verbose > headers. > > This does change the behavior of plumbing, but: > > a. The --show-all option was explicitly introduced as a > debugging aid, and was never documented (and has rarely > even been mentioned on the list by git devs). > > b. Avoiding the commits was already not deterministic due > to (2) above. So the caller might have seen full > headers for these commits anyway, and would need to be > prepared for it. > > Signed-off-by: Jeff King <peff@peff.net> > --- > builtin/rev-list.c | 2 +- > log-tree.c | 3 --- > 2 files changed, 1 insertion(+), 4 deletions(-) > > diff --git a/builtin/rev-list.c b/builtin/rev-list.c > index 48300d9e11..d320b6f1e3 100644 > --- a/builtin/rev-list.c > +++ b/builtin/rev-list.c > @@ -134,7 +134,7 @@ static void show_commit(struct commit *commit, void *data) > else > putchar('\n'); > > - if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) { > + if (revs->verbose_header) { > struct strbuf buf = STRBUF_INIT; > struct pretty_print_context ctx = {0}; > ctx.abbrev = revs->abbrev; > diff --git a/log-tree.c b/log-tree.c > index fc0cc0d6d1..22b2fb6c58 100644 > --- a/log-tree.c > +++ b/log-tree.c > @@ -659,9 +659,6 @@ void show_log(struct rev_info *opt) > show_mergetag(opt, commit); > } > > - if (!get_cached_commit_buffer(commit, NULL)) > - return; > - > if (opt->show_notes) { > int raw; > struct strbuf notebuf = STRBUF_INIT; ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question about get_cached_commit_buffer() 2018-02-21 18:48 ` Jeff King 2018-02-21 18:52 ` Jeff King @ 2018-02-21 22:22 ` Junio C Hamano 2018-02-21 22:50 ` Jeff King 1 sibling, 1 reply; 17+ messages in thread From: Junio C Hamano @ 2018-02-21 22:22 UTC (permalink / raw) To: Jeff King; +Cc: Derrick Stolee, git@vger.kernel.org, Derrick Stolee Jeff King <peff@peff.net> writes: > I think that repeating the oid is intentional; the point is to dump how > the traversal code is hitting the endpoints, even if we do so multiple > times. > > The --oneline behavior just looks like a bug. I think --format is broken > with --show-all, too (it does not show anything!). I do not know about the --format thing, but the part about --oneline being a bug is correct. I've known about the oneline that does not show anything other than the oid (not even end-of-line) for unparsed commits for a long time---I just didn't bother looking into fixing it exactly because this is only a debugging aid ;-) > Though I think it would be equally correct to have set_commit_buffer() > just throw away the existing cache entry and replace it with this one. I > don't think there's a real reason to prefer the old to the new. And that > might be worth doing if it would let us drop get_cached_commit_buffer() > as a public function. But... > ... > In my opinion it's not really worth trying to make it private. The > confusion you're fixing in the first two calls is not due to a bad API, > but due to some subtly confusing logic in that code's use of the API. ;) Yup. > So I'd probably do this: > ... Makes sense to me. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question about get_cached_commit_buffer() 2018-02-21 22:22 ` Question about get_cached_commit_buffer() Junio C Hamano @ 2018-02-21 22:50 ` Jeff King 2018-02-21 23:03 ` Junio C Hamano 0 siblings, 1 reply; 17+ messages in thread From: Jeff King @ 2018-02-21 22:50 UTC (permalink / raw) To: Junio C Hamano; +Cc: Derrick Stolee, git@vger.kernel.org, Derrick Stolee On Wed, Feb 21, 2018 at 02:22:05PM -0800, Junio C Hamano wrote: > > I think that repeating the oid is intentional; the point is to dump how > > the traversal code is hitting the endpoints, even if we do so multiple > > times. > > > > The --oneline behavior just looks like a bug. I think --format is broken > > with --show-all, too (it does not show anything!). > > I do not know about the --format thing,[...] Hmm, maybe it is fine. I thought before that I got funny output out of "git log --show-all --format", but I can't seem to reproduce it now. > being a bug is correct. I've known about the oneline that does not > show anything other than the oid (not even end-of-line) for unparsed > commits for a long time---I just didn't bother looking into fixing > it exactly because this is only a debugging aid ;-) Out of curiosity, do you actually use --show-all for anything? I didn't even know it existed until this thread, and AFAICT nobody but Linus has ever recommended its use. And it does not even seem that useful as a general debugging aid. So what I'm wondering is whether we should consider just ripping it out (but I'm OK with keeping it, as once the commit-buffer stuff is fixed, it's probably not hurting anybody). -Peff ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question about get_cached_commit_buffer() 2018-02-21 22:50 ` Jeff King @ 2018-02-21 23:03 ` Junio C Hamano 2018-02-21 23:27 ` [PATCH] revision: drop --show-all option Jeff King 0 siblings, 1 reply; 17+ messages in thread From: Junio C Hamano @ 2018-02-21 23:03 UTC (permalink / raw) To: Jeff King; +Cc: Derrick Stolee, git@vger.kernel.org, Derrick Stolee Jeff King <peff@peff.net> writes: > Out of curiosity, do you actually use --show-all for anything? Absolutely not. I'd actually love it if I could say "not anymore" instead, but I haven't had an opportunity to debug the revision traversal code for quite some time so I do not even remember when was the last time I used it, which disqualifies me from saying even that. > So what I'm wondering is whether we should consider just ripping it out > (but I'm OK with keeping it, as once the commit-buffer stuff is fixed, > it's probably not hurting anybody). I see no problem in removing it. With more "interesting" features relying on post-processing (like 'simplify-merges'), show_all whose primary focus was how limit_list() behaves soft of outlived its usefulness, I would think. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] revision: drop --show-all option 2018-02-21 23:03 ` Junio C Hamano @ 2018-02-21 23:27 ` Jeff King 2018-02-21 23:45 ` Linus Torvalds 0 siblings, 1 reply; 17+ messages in thread From: Jeff King @ 2018-02-21 23:27 UTC (permalink / raw) To: Junio C Hamano Cc: Linus Torvalds, Derrick Stolee, git@vger.kernel.org, Derrick Stolee On Wed, Feb 21, 2018 at 03:03:18PM -0800, Junio C Hamano wrote: > > So what I'm wondering is whether we should consider just ripping it out > > (but I'm OK with keeping it, as once the commit-buffer stuff is fixed, > > it's probably not hurting anybody). > > I see no problem in removing it. With more "interesting" features > relying on post-processing (like 'simplify-merges'), show_all whose > primary focus was how limit_list() behaves soft of outlived its > usefulness, I would think. So here's a patch to do that (textually independent but conceptually on top of the earlier one to fix the commit-buffer thing). It actually doesn't remove that much code, so I could go either way on it. +cc Linus in case he's secretly in love with this feature. -- >8 -- Subject: [PATCH] revision: drop --show-all option This was an undocumented debugging aid that does not seem to have come in handy in the past decade, judging from its lack of mentions on the mailing list. Let's drop it in the name of simplicity. This is morally a revert of 3131b71301 (Add "--show-all" revision walker flag for debugging, 2008-02-09), but note that I did leave in the mapping of UNINTERESTING to "^" in get_revision_mark(). I don't think this would be possible to trigger with the current code, but it's the only sensible marker. We'll skip the usual deprecation period because this was explicitly a debugging aid that was never documented. Signed-off-by: Jeff King <peff@peff.net> --- revision.c | 10 --------- revision.h | 1 - t/t6015-rev-list-show-all-parents.sh | 31 ---------------------------- 3 files changed, 42 deletions(-) delete mode 100755 t/t6015-rev-list-show-all-parents.sh diff --git a/revision.c b/revision.c index 5ce9b93baa..5c1cb7277c 100644 --- a/revision.c +++ b/revision.c @@ -1065,14 +1065,9 @@ static int limit_list(struct rev_info *revs) return -1; if (obj->flags & UNINTERESTING) { mark_parents_uninteresting(commit); - if (revs->show_all) - p = &commit_list_insert(commit, p)->next; slop = still_interesting(list, date, slop, &interesting_cache); if (slop) continue; - /* If showing all, add the whole pending list to the end */ - if (revs->show_all) - *p = list; break; } if (revs->min_age != -1 && (commit->date > revs->min_age)) @@ -1864,8 +1859,6 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg revs->dense = 1; } else if (!strcmp(arg, "--sparse")) { revs->dense = 0; - } else if (!strcmp(arg, "--show-all")) { - revs->show_all = 1; } else if (!strcmp(arg, "--in-commit-order")) { revs->tree_blobs_in_commit_order = 1; } else if (!strcmp(arg, "--remove-empty")) { @@ -3094,8 +3087,6 @@ enum commit_action get_commit_action(struct rev_info *revs, struct commit *commi return commit_ignore; if (revs->unpacked && has_sha1_pack(commit->object.oid.hash)) return commit_ignore; - if (revs->show_all) - return commit_show; if (commit->object.flags & UNINTERESTING) return commit_ignore; if (revs->min_age != -1 && @@ -3194,7 +3185,6 @@ enum commit_action simplify_commit(struct rev_info *revs, struct commit *commit) enum commit_action action = get_commit_action(revs, commit); if (action == commit_show && - !revs->show_all && revs->prune && revs->dense && want_ancestry(revs)) { /* * --full-diff on simplified parents is no good: it diff --git a/revision.h b/revision.h index 3dee97bfb9..b8c47b98e2 100644 --- a/revision.h +++ b/revision.h @@ -90,7 +90,6 @@ struct rev_info { unsigned int dense:1, prune:1, no_walk:2, - show_all:1, remove_empty_trees:1, simplify_history:1, topo_order:1, diff --git a/t/t6015-rev-list-show-all-parents.sh b/t/t6015-rev-list-show-all-parents.sh deleted file mode 100755 index 3c73c93ba6..0000000000 --- a/t/t6015-rev-list-show-all-parents.sh +++ /dev/null @@ -1,31 +0,0 @@ -#!/bin/sh - -test_description='--show-all --parents does not rewrite TREESAME commits' - -. ./test-lib.sh - -test_expect_success 'set up --show-all --parents test' ' - test_commit one foo.txt && - commit1=$(git rev-list -1 HEAD) && - test_commit two bar.txt && - commit2=$(git rev-list -1 HEAD) && - test_commit three foo.txt && - commit3=$(git rev-list -1 HEAD) - ' - -test_expect_success '--parents rewrites TREESAME parents correctly' ' - echo $commit3 $commit1 > expected && - echo $commit1 >> expected && - git rev-list --parents HEAD -- foo.txt > actual && - test_cmp expected actual - ' - -test_expect_success '--parents --show-all does not rewrites TREESAME parents' ' - echo $commit3 $commit2 > expected && - echo $commit2 $commit1 >> expected && - echo $commit1 >> expected && - git rev-list --parents --show-all HEAD -- foo.txt > actual && - test_cmp expected actual - ' - -test_done -- 2.16.2.555.g885a024879 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] revision: drop --show-all option 2018-02-21 23:27 ` [PATCH] revision: drop --show-all option Jeff King @ 2018-02-21 23:45 ` Linus Torvalds 0 siblings, 0 replies; 17+ messages in thread From: Linus Torvalds @ 2018-02-21 23:45 UTC (permalink / raw) To: Jeff King Cc: Junio C Hamano, Derrick Stolee, git@vger.kernel.org, Derrick Stolee On Wed, Feb 21, 2018 at 3:27 PM, Jeff King <peff@peff.net> wrote: > > We'll skip the usual deprecation period because this was > explicitly a debugging aid that was never documented. Ack. I don't think I've used it since, and probably nobody else ever used it. Linus ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2018-02-22 1:52 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-02-20 22:12 Question about get_cached_commit_buffer() Derrick Stolee 2018-02-20 22:57 ` Jeff King 2018-02-21 14:13 ` Derrick Stolee 2018-02-21 18:48 ` Jeff King 2018-02-21 18:52 ` Jeff King 2018-02-21 19:17 ` [PATCH] commit: drop uses of get_cached_commit_buffer() Derrick Stolee 2018-02-21 19:19 ` Derrick Stolee 2018-02-21 23:34 ` Jeff King 2018-02-21 23:13 ` Jeff King 2018-02-21 23:22 ` Stefan Beller 2018-02-21 23:29 ` Jeff King 2018-02-22 1:52 ` Derrick Stolee 2018-02-21 22:22 ` Question about get_cached_commit_buffer() Junio C Hamano 2018-02-21 22:50 ` Jeff King 2018-02-21 23:03 ` Junio C Hamano 2018-02-21 23:27 ` [PATCH] revision: drop --show-all option Jeff King 2018-02-21 23:45 ` Linus Torvalds
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).