* [PATCH] name-rev: stop including taggerdate in naming of commits @ 2023-01-21 4:28 Elijah Newren via GitGitGadget 2023-02-07 6:32 ` [PATCH v2] name-rev: fix names by dropping taggerdate workaround Elijah Newren via GitGitGadget 0 siblings, 1 reply; 6+ messages in thread From: Elijah Newren via GitGitGadget @ 2023-01-21 4:28 UTC (permalink / raw) To: git; +Cc: Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Commit 7550424804 ("name-rev: include taggerdate in considering the best name", 2016-04-22) introduced the idea of using taggerdate in the criteria for selecting the best name. At the time, a certain commit in linux.git -- namely, aed06b9cfcab -- was being named by name-rev as v4.6-rc1~9^2~792 which, while correct, felt very suboptimal. Some investigation found that tweaking the MERGE_TRAVERSAL_WEIGHT to lower it could give alternate answers such as v3.13-rc7~9^2~14^2~42 or v3.13~5^2~4^2~2^2~1^2~42 A manual solution involving looking at tagger dates came up with v3.13-rc1~65^2^2~42 which was then implemented in name-rev. It turns out that this taggerdate heuristic isn't needed due to a subsequent change to fix the naming logic in 3656f84278 ("name-rev: prefer shorter names over following merges", 2021-12-04). Simply removing the taggerdate heuristic from the calculation nowadays still causes us to get the optimal answer on that particular commit of interest in linux.git, namely: v3.13-rc1~65^2^2~42 Further, the taggerdate heuristic is causing bugs of its own. I was pointed to a case in a private repository where name-rev reports a name of the form v2022.10.02~86 when users expected to see one of the form v2022.10.01~2 (I've modified the names and numbers a bit from the real testcase.) As you can probably guess, v2022.10.01 was created after v2022.10.02 (by a few hours), even though it pointed to an older commit. While the condition is unusual even in the repository in question, it is not the only problematic set of tags in that repository. The taggerdate logic was a workaround that is no longer needed, and is now causing suboptimal results in other cases. As such, remove the taggerdate in the comparison. However, note that "taggerdate" is actually also used to store commit dates since ef1e74065c ("name-rev: favor describing with tags and use committer date to tiebreak", 2017-03-29), where it is used as a fallback tiebreaker when distances are equal. We do not want to remove that fallback tiebreaker, we are only removing the use of actual taggerdates as a primary criteria overridding effective distance calculations. Signed-off-by: Elijah Newren <newren@gmail.com> --- name-rev: stop including taggerdate in naming of commits Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1468%2Fnewren%2Ffix-name-rev-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1468/newren/fix-name-rev-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/1468 builtin/name-rev.c | 4 +--- t/t6120-describe.sh | 6 ++++++ 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/builtin/name-rev.c b/builtin/name-rev.c index 15535e914a6..df50abcdeb9 100644 --- a/builtin/name-rev.c +++ b/builtin/name-rev.c @@ -113,9 +113,7 @@ static int is_better_name(struct rev_name *name, * based on the older tag, even if it is farther away. */ if (from_tag && name->from_tag) - return (name->taggerdate > taggerdate || - (name->taggerdate == taggerdate && - name_distance > new_distance)); + return name_distance > new_distance; /* * We know that at least one of them is a non-tag at this point. diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh index 9a35e783a75..c9afcef2018 100755 --- a/t/t6120-describe.sh +++ b/t/t6120-describe.sh @@ -657,4 +657,10 @@ test_expect_success 'setup: describe commits with disjoint bases 2' ' check_describe -C disjoint2 "B-3-gHASH" HEAD +test_expect_success 'setup misleading taggerdates' ' + GIT_COMMITTER_DATE="2006-12-12 12:31" git tag -a -m "another tag" newer-tag-older-commit unique-file~1 +' + +check_describe newer-tag-older-commit~1 --contains unique-file~2 + test_done base-commit: 221222b278e713054e65cbbbcb2b1ac85483ea89 -- gitgitgadget ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2] name-rev: fix names by dropping taggerdate workaround 2023-01-21 4:28 [PATCH] name-rev: stop including taggerdate in naming of commits Elijah Newren via GitGitGadget @ 2023-02-07 6:32 ` Elijah Newren via GitGitGadget 2023-02-07 19:34 ` Calvin Wan 2023-02-09 9:11 ` [PATCH v3] " Elijah Newren via GitGitGadget 0 siblings, 2 replies; 6+ messages in thread From: Elijah Newren via GitGitGadget @ 2023-02-07 6:32 UTC (permalink / raw) To: git; +Cc: Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Commit 7550424804 ("name-rev: include taggerdate in considering the best name", 2016-04-22) introduced the idea of using taggerdate in the criteria for selecting the best name. At the time, a certain commit in linux.git -- namely, aed06b9cfcab -- was being named by name-rev as v4.6-rc1~9^2~792 which, while correct, was very suboptimal. Some investigation found that tweaking the MERGE_TRAVERSAL_WEIGHT to lower it could give alternate answers such as v3.13-rc7~9^2~14^2~42 or v3.13~5^2~4^2~2^2~1^2~42 A manual solution involving looking at tagger dates came up with v3.13-rc1~65^2^2~42 which is much nicer. That workaround was then implemented in name-rev. Unfortunately, the taggerdate heuristic is causing bugs. I was pointed to a case in a private repository where name-rev reports a name of the form v2022.10.02~86 when users expected to see one of the form v2022.10.01~2 (I've modified the names and numbers a bit from the real testcase.) As you can probably guess, v2022.10.01 was created after v2022.10.02 (by a few hours), even though it pointed to an older commit. While the condition is unusual even in the repository in question, it is not the only problematic set of tags in that repository. The taggerdate logic is causing problems. Further, it turns out that this taggerdate heuristic isn't even helping anymore. Due to the fix to naming logic in 3656f84278 ("name-rev: prefer shorter names over following merges", 2021-12-04), we get improved names without the taggerdate heuristic. For the original commit of interest in linux.git, a modern git without the taggerdate heuristic still provides the same optimal answer of interest, namely: v3.13-rc1~65^2^2~42 So, the taggerdate is no longer providing benefit, and it is causing problems. Simply get rid of it. However, note that "taggerdate" as a variable is used to store things besides a taggerdate these days. Ever since commit ef1e74065c ("name-rev: favor describing with tags and use committer date to tiebreak", 2017-03-29), this has been used to store committer dates and there it is used as a fallback tiebreaker (as opposed to a primary criteria overriding effective distance calculations). We do not want to remove that fallback tiebreaker, so not all instances of "taggerdate" are removed in this change. Signed-off-by: Elijah Newren <newren@gmail.com> --- name-rev: fix names by dropping taggerdate workaround Changes since v1: Slight tweaks to the commit message v1 was never picked up or commented on, so this is mostly just a resubmission, with a rewording to make it clear this is a bugfix. Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1468%2Fnewren%2Ffix-name-rev-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1468/newren/fix-name-rev-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/1468 Range-diff vs v1: 1: 78bbfb3286b ! 1: 206726fc954 name-rev: stop including taggerdate in naming of commits @@ Metadata Author: Elijah Newren <newren@gmail.com> ## Commit message ## - name-rev: stop including taggerdate in naming of commits + name-rev: fix names by dropping taggerdate workaround Commit 7550424804 ("name-rev: include taggerdate in considering the best name", 2016-04-22) introduced the idea of using taggerdate in the criteria for selecting the best name. At the time, a certain commit in linux.git -- namely, aed06b9cfcab -- was being named by name-rev as v4.6-rc1~9^2~792 - which, while correct, felt very suboptimal. Some investigation found + which, while correct, was very suboptimal. Some investigation found that tweaking the MERGE_TRAVERSAL_WEIGHT to lower it could give alternate answers such as v3.13-rc7~9^2~14^2~42 @@ Commit message v3.13~5^2~4^2~2^2~1^2~42 A manual solution involving looking at tagger dates came up with v3.13-rc1~65^2^2~42 - which was then implemented in name-rev. + which is much nicer. That workaround was then implemented in name-rev. - It turns out that this taggerdate heuristic isn't needed due to a - subsequent change to fix the naming logic in 3656f84278 ("name-rev: - prefer shorter names over following merges", 2021-12-04). Simply - removing the taggerdate heuristic from the calculation nowadays - still causes us to get the optimal answer on that particular commit - of interest in linux.git, namely: - v3.13-rc1~65^2^2~42 - - Further, the taggerdate heuristic is causing bugs of its own. I was - pointed to a case in a private repository where name-rev reports a name - of the form + Unfortunately, the taggerdate heuristic is causing bugs. I was pointed + to a case in a private repository where name-rev reports a name of the + form v2022.10.02~86 when users expected to see one of the form v2022.10.01~2 @@ Commit message few hours), even though it pointed to an older commit. While the condition is unusual even in the repository in question, it is not the only problematic set of tags in that repository. The taggerdate logic - was a workaround that is no longer needed, and is now causing suboptimal - results in other cases. + is causing problems. + + Further, it turns out that this taggerdate heuristic isn't even helping + anymore. Due to the fix to naming logic in 3656f84278 ("name-rev: + prefer shorter names over following merges", 2021-12-04), we get + improved names without the taggerdate heuristic. For the original + commit of interest in linux.git, a modern git without the taggerdate + heuristic still provides the same optimal answer of interest, namely: + v3.13-rc1~65^2^2~42 + + So, the taggerdate is no longer providing benefit, and it is causing + problems. Simply get rid of it. - As such, remove the taggerdate in the comparison. However, note that - "taggerdate" is actually also used to store commit dates since - ef1e74065c ("name-rev: favor describing with tags and use committer date - to tiebreak", 2017-03-29), where it is used as a fallback tiebreaker - when distances are equal. We do not want to remove that fallback - tiebreaker, we are only removing the use of actual taggerdates as a - primary criteria overridding effective distance calculations. + However, note that "taggerdate" as a variable is used to store things + besides a taggerdate these days. Ever since commit ef1e74065c + ("name-rev: favor describing with tags and use committer date to + tiebreak", 2017-03-29), this has been used to store committer dates and + there it is used as a fallback tiebreaker (as opposed to a primary + criteria overriding effective distance calculations). We do not want to + remove that fallback tiebreaker, so not all instances of "taggerdate" + are removed in this change. Signed-off-by: Elijah Newren <newren@gmail.com> builtin/name-rev.c | 4 +--- t/t6120-describe.sh | 6 ++++++ 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/builtin/name-rev.c b/builtin/name-rev.c index 15535e914a6..df50abcdeb9 100644 --- a/builtin/name-rev.c +++ b/builtin/name-rev.c @@ -113,9 +113,7 @@ static int is_better_name(struct rev_name *name, * based on the older tag, even if it is farther away. */ if (from_tag && name->from_tag) - return (name->taggerdate > taggerdate || - (name->taggerdate == taggerdate && - name_distance > new_distance)); + return name_distance > new_distance; /* * We know that at least one of them is a non-tag at this point. diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh index 9a35e783a75..c9afcef2018 100755 --- a/t/t6120-describe.sh +++ b/t/t6120-describe.sh @@ -657,4 +657,10 @@ test_expect_success 'setup: describe commits with disjoint bases 2' ' check_describe -C disjoint2 "B-3-gHASH" HEAD +test_expect_success 'setup misleading taggerdates' ' + GIT_COMMITTER_DATE="2006-12-12 12:31" git tag -a -m "another tag" newer-tag-older-commit unique-file~1 +' + +check_describe newer-tag-older-commit~1 --contains unique-file~2 + test_done base-commit: 221222b278e713054e65cbbbcb2b1ac85483ea89 -- gitgitgadget ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] name-rev: fix names by dropping taggerdate workaround 2023-02-07 6:32 ` [PATCH v2] name-rev: fix names by dropping taggerdate workaround Elijah Newren via GitGitGadget @ 2023-02-07 19:34 ` Calvin Wan 2023-02-08 3:33 ` Elijah Newren 2023-02-09 9:11 ` [PATCH v3] " Elijah Newren via GitGitGadget 1 sibling, 1 reply; 6+ messages in thread From: Calvin Wan @ 2023-02-07 19:34 UTC (permalink / raw) To: Elijah Newren via GitGitGadget; +Cc: Calvin Wan, git, Elijah Newren Are there any cases where a taggerdate heuristic would be useful now? I'm having a hard time coming up with an example of such, so this change looks very reasonable to me. Even if there existed such a case, I would imagine it would be better solved using other heuristics rather than checking the taggerdate since that was a very loose heuristic to begin with. > diff --git a/builtin/name-rev.c b/builtin/name-rev.c > index 15535e914a6..df50abcdeb9 100644 > --- a/builtin/name-rev.c > +++ b/builtin/name-rev.c > @@ -113,9 +113,7 @@ static int is_better_name(struct rev_name *name, > * based on the older tag, even if it is farther away. > */ > if (from_tag && name->from_tag) > - return (name->taggerdate > taggerdate || > - (name->taggerdate == taggerdate && > - name_distance > new_distance)); > + return name_distance > new_distance; Comment above this block should be updated to match the new logic. -Calvin ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] name-rev: fix names by dropping taggerdate workaround 2023-02-07 19:34 ` Calvin Wan @ 2023-02-08 3:33 ` Elijah Newren 0 siblings, 0 replies; 6+ messages in thread From: Elijah Newren @ 2023-02-08 3:33 UTC (permalink / raw) To: Calvin Wan; +Cc: Elijah Newren via GitGitGadget, git On Tue, Feb 7, 2023 at 11:34 AM Calvin Wan <calvinwan@google.com> wrote: > > Are there any cases where a taggerdate heuristic would be useful now? > I'm having a hard time coming up with an example of such, so this > change looks very reasonable to me. Even if there existed such a case, > I would imagine it would be better solved using other heuristics rather > than checking the taggerdate since that was a very loose heuristic to > begin with. I'm currently only aware of cases where the heuristic hurts and none where it helps. I know it historically helped, but that was just a workaround to the algorithm being suboptimal. Since the algorithm has been fixed, I think the workaround can be shelved. > > diff --git a/builtin/name-rev.c b/builtin/name-rev.c > > index 15535e914a6..df50abcdeb9 100644 > > --- a/builtin/name-rev.c > > +++ b/builtin/name-rev.c > > @@ -113,9 +113,7 @@ static int is_better_name(struct rev_name *name, > > * based on the older tag, even if it is farther away. > > */ > > if (from_tag && name->from_tag) > > - return (name->taggerdate > taggerdate || > > - (name->taggerdate == taggerdate && > > - name_distance > new_distance)); > > + return name_distance > new_distance; > > Comment above this block should be updated to match the new logic. Good catch; will fix. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3] name-rev: fix names by dropping taggerdate workaround 2023-02-07 6:32 ` [PATCH v2] name-rev: fix names by dropping taggerdate workaround Elijah Newren via GitGitGadget 2023-02-07 19:34 ` Calvin Wan @ 2023-02-09 9:11 ` Elijah Newren via GitGitGadget 2023-02-09 17:10 ` Junio C Hamano 1 sibling, 1 reply; 6+ messages in thread From: Elijah Newren via GitGitGadget @ 2023-02-09 9:11 UTC (permalink / raw) To: git; +Cc: Calvin Wan, Elijah Newren, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Commit 7550424804 ("name-rev: include taggerdate in considering the best name", 2016-04-22) introduced the idea of using taggerdate in the criteria for selecting the best name. At the time, a certain commit in linux.git -- namely, aed06b9cfcab -- was being named by name-rev as v4.6-rc1~9^2~792 which, while correct, was very suboptimal. Some investigation found that tweaking the MERGE_TRAVERSAL_WEIGHT to lower it could give alternate answers such as v3.13-rc7~9^2~14^2~42 or v3.13~5^2~4^2~2^2~1^2~42 A manual solution involving looking at tagger dates came up with v3.13-rc1~65^2^2~42 which is much nicer. That workaround was then implemented in name-rev. Unfortunately, the taggerdate heuristic is causing bugs. I was pointed to a case in a private repository where name-rev reports a name of the form v2022.10.02~86 when users expected to see one of the form v2022.10.01~2 (I've modified the names and numbers a bit from the real testcase.) As you can probably guess, v2022.10.01 was created after v2022.10.02 (by a few hours), even though it pointed to an older commit. While the condition is unusual even in the repository in question, it is not the only problematic set of tags in that repository. The taggerdate logic is causing problems. Further, it turns out that this taggerdate heuristic isn't even helping anymore. Due to the fix to naming logic in 3656f84278 ("name-rev: prefer shorter names over following merges", 2021-12-04), we get improved names without the taggerdate heuristic. For the original commit of interest in linux.git, a modern git without the taggerdate heuristic still provides the same optimal answer of interest, namely: v3.13-rc1~65^2^2~42 So, the taggerdate is no longer providing benefit, and it is causing problems. Simply get rid of it. However, note that "taggerdate" as a variable is used to store things besides a taggerdate these days. Ever since commit ef1e74065c ("name-rev: favor describing with tags and use committer date to tiebreak", 2017-03-29), this has been used to store committer dates and there it is used as a fallback tiebreaker (as opposed to a primary criteria overriding effective distance calculations). We do not want to remove that fallback tiebreaker, so not all instances of "taggerdate" are removed in this change. Signed-off-by: Elijah Newren <newren@gmail.com> --- name-rev: fix names by dropping taggerdate workaround Changes since v2: Fixed nearby comments based on code changes Changes since v1: Slight tweaks to the commit message Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1468%2Fnewren%2Ffix-name-rev-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1468/newren/fix-name-rev-v3 Pull-Request: https://github.com/gitgitgadget/git/pull/1468 Range-diff vs v2: 1: 206726fc954 ! 1: ff57b1583b1 name-rev: fix names by dropping taggerdate workaround @@ Commit message ## builtin/name-rev.c ## @@ builtin/name-rev.c: static int is_better_name(struct rev_name *name, - * based on the older tag, even if it is farther away. - */ + int name_distance = effective_distance(name->distance, name->generation); + int new_distance = effective_distance(distance, generation); + +- /* +- * When comparing names based on tags, prefer names +- * based on the older tag, even if it is farther away. +- */ ++ /* If both are tags, we prefer the nearer one. */ if (from_tag && name->from_tag) - return (name->taggerdate > taggerdate || - (name->taggerdate == taggerdate && - name_distance > new_distance)); + return name_distance > new_distance; - /* - * We know that at least one of them is a non-tag at this point. +- /* +- * We know that at least one of them is a non-tag at this point. +- * favor a tag over a non-tag. +- */ ++ /* Favor a tag over a non-tag. */ + if (name->from_tag != from_tag) + return from_tag; + ## t/t6120-describe.sh ## @@ t/t6120-describe.sh: test_expect_success 'setup: describe commits with disjoint bases 2' ' builtin/name-rev.c | 14 +++----------- t/t6120-describe.sh | 6 ++++++ 2 files changed, 9 insertions(+), 11 deletions(-) diff --git a/builtin/name-rev.c b/builtin/name-rev.c index 15535e914a6..0ebf06fad5a 100644 --- a/builtin/name-rev.c +++ b/builtin/name-rev.c @@ -108,19 +108,11 @@ static int is_better_name(struct rev_name *name, int name_distance = effective_distance(name->distance, name->generation); int new_distance = effective_distance(distance, generation); - /* - * When comparing names based on tags, prefer names - * based on the older tag, even if it is farther away. - */ + /* If both are tags, we prefer the nearer one. */ if (from_tag && name->from_tag) - return (name->taggerdate > taggerdate || - (name->taggerdate == taggerdate && - name_distance > new_distance)); + return name_distance > new_distance; - /* - * We know that at least one of them is a non-tag at this point. - * favor a tag over a non-tag. - */ + /* Favor a tag over a non-tag. */ if (name->from_tag != from_tag) return from_tag; diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh index 9a35e783a75..c9afcef2018 100755 --- a/t/t6120-describe.sh +++ b/t/t6120-describe.sh @@ -657,4 +657,10 @@ test_expect_success 'setup: describe commits with disjoint bases 2' ' check_describe -C disjoint2 "B-3-gHASH" HEAD +test_expect_success 'setup misleading taggerdates' ' + GIT_COMMITTER_DATE="2006-12-12 12:31" git tag -a -m "another tag" newer-tag-older-commit unique-file~1 +' + +check_describe newer-tag-older-commit~1 --contains unique-file~2 + test_done base-commit: 221222b278e713054e65cbbbcb2b1ac85483ea89 -- gitgitgadget ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3] name-rev: fix names by dropping taggerdate workaround 2023-02-09 9:11 ` [PATCH v3] " Elijah Newren via GitGitGadget @ 2023-02-09 17:10 ` Junio C Hamano 0 siblings, 0 replies; 6+ messages in thread From: Junio C Hamano @ 2023-02-09 17:10 UTC (permalink / raw) To: Elijah Newren via GitGitGadget; +Cc: git, Calvin Wan, Elijah Newren "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > - /* > - * When comparing names based on tags, prefer names > - * based on the older tag, even if it is farther away. > - */ > + /* If both are tags, we prefer the nearer one. */ > if (from_tag && name->from_tag) > - return (name->taggerdate > taggerdate || > - (name->taggerdate == taggerdate && > - name_distance > new_distance)); > + return name_distance > new_distance; OK. > - /* > - * We know that at least one of them is a non-tag at this point. > - * favor a tag over a non-tag. > - */ > + /* Favor a tag over a non-tag. */ > if (name->from_tag != from_tag) > return from_tag; The removed sentence is not something whose validity has changed due to the code change. We still know at this point one of from_tag or name->from_tag is false, thanks to the previous check, whose condition did not change (only what is returned when the condition holds changed). But it may be obvious to readers, so, ... OK. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-02-09 17:11 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-01-21 4:28 [PATCH] name-rev: stop including taggerdate in naming of commits Elijah Newren via GitGitGadget 2023-02-07 6:32 ` [PATCH v2] name-rev: fix names by dropping taggerdate workaround Elijah Newren via GitGitGadget 2023-02-07 19:34 ` Calvin Wan 2023-02-08 3:33 ` Elijah Newren 2023-02-09 9:11 ` [PATCH v3] " Elijah Newren via GitGitGadget 2023-02-09 17:10 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).