* rev-list with multiple commits sharing same patch-id
@ 2021-01-09 16:24 Arnaud Morin
2021-01-11 7:59 ` Arnaud Morin
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Arnaud Morin @ 2021-01-09 16:24 UTC (permalink / raw)
To: git
Hey all,
I am struggling with a rev-list command, hope someone can give me a
clue on what is going on.
I have 2 branches:
$ git branch
B
* master
Currently on master, there is not diff against B:
$ git diff
(empty)
The commits in B are cherry-picked from master.
Here is the graph:
$ git log --graph --oneline --all
* ae2e3c4 (origin/B, B) remove line2 and add line4 (bis)
* a7a0339 remove line4
* caa4aad restore line2
* d7dc596 remove line2 add line4
* 44bcfd4 add line3
* e372641 b
| * dbf86d8 (HEAD -> master, origin/master) remove line2 and add line4 (bis)
| * 4017282 remove line4
| * 0f2a449 restore line2
| * 8969d3f remove line2 add line4
| * e73b420 add line3
| * fe5a75a b
|/
* 6192505 a
* b4089e1 init
However, when using git rev-list to perform a symmetric difference, git
is giving me a commit ID:
$ git rev-list --left-right --cherry-pick B...master
>dbf86d8aafc897a25a3093139b4237a62395041e
Note that this commit is not empty
$ git show dbf86d8aafc897a25a3093139b4237a62395041e --stat
commit dbf86d8aafc897a25a3093139b4237a62395041e (origin/master, master)
Author: Arnaud Morin <hidden@mail>
Date: Sat Jan 9 10:30:10 2021 +0100
remove line2 and add line4 (bis)
a | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
So, from git rev-list perspective, there is a diff.
After digging a little bit, the thing is that this commit is having the
following patch-id:
$ git show dbf86d8aafc897a25a3093139b4237a62395041e | git patch-id
20f4ace68e80a751b07d78a27c94e83d6c5314bc dbf86d8aafc897a25a3093139b4237a62395041e
Which is also already existing in an other commit:
$ for c in $(git rev-list HEAD) ; do git show $c |git patch-id |grep 20f4ace68e80a751b07d78a27c94e83d6c5314bc; done
20f4ace68e80a751b07d78a27c94e83d6c5314bc dbf86d8aafc897a25a3093139b4237a62395041e
20f4ace68e80a751b07d78a27c94e83d6c5314bc 8969d3fa9159730fd3b23199873bfb26e3d20027
So, is it normal that rev-list is not able to figure out that a commit
is existing in both branch when 2 commits share the same patch-id?
Is there any way to prevent rev-list from showing this commit?
Thanks for your help.
PS. I have uploaded my test repo here:
https://gitlab.com/arnaudmorin/git-rev-list
--
Arnaud Morin
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rev-list with multiple commits sharing same patch-id 2021-01-09 16:24 rev-list with multiple commits sharing same patch-id Arnaud Morin @ 2021-01-11 7:59 ` Arnaud Morin 2021-01-11 9:54 ` Christian Couder 2021-01-12 14:17 ` Jeff King 2 siblings, 0 replies; 15+ messages in thread From: Arnaud Morin @ 2021-01-11 7:59 UTC (permalink / raw) To: git Sorry, forgot to mention 2 things: - I am using latest stable git version (2.30) - I my previous mail, my git diff was against branch B: $ git diff B (empty) Thanks -- Arnaud Morin On 09.01.21 - 16:24, Arnaud Morin wrote: > Hey all, > > I am struggling with a rev-list command, hope someone can give me a > clue on what is going on. > > I have 2 branches: > $ git branch > B > * master > > Currently on master, there is not diff against B: > $ git diff > (empty) > > The commits in B are cherry-picked from master. > Here is the graph: > $ git log --graph --oneline --all > * ae2e3c4 (origin/B, B) remove line2 and add line4 (bis) > * a7a0339 remove line4 > * caa4aad restore line2 > * d7dc596 remove line2 add line4 > * 44bcfd4 add line3 > * e372641 b > | * dbf86d8 (HEAD -> master, origin/master) remove line2 and add line4 (bis) > | * 4017282 remove line4 > | * 0f2a449 restore line2 > | * 8969d3f remove line2 add line4 > | * e73b420 add line3 > | * fe5a75a b > |/ > * 6192505 a > * b4089e1 init > > > However, when using git rev-list to perform a symmetric difference, git > is giving me a commit ID: > $ git rev-list --left-right --cherry-pick B...master > >dbf86d8aafc897a25a3093139b4237a62395041e > > Note that this commit is not empty > $ git show dbf86d8aafc897a25a3093139b4237a62395041e --stat > commit dbf86d8aafc897a25a3093139b4237a62395041e (origin/master, master) > Author: Arnaud Morin <hidden@mail> > Date: Sat Jan 9 10:30:10 2021 +0100 > > remove line2 and add line4 (bis) > > a | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > So, from git rev-list perspective, there is a diff. > > After digging a little bit, the thing is that this commit is having the > following patch-id: > $ git show dbf86d8aafc897a25a3093139b4237a62395041e | git patch-id > 20f4ace68e80a751b07d78a27c94e83d6c5314bc dbf86d8aafc897a25a3093139b4237a62395041e > > Which is also already existing in an other commit: > $ for c in $(git rev-list HEAD) ; do git show $c |git patch-id |grep 20f4ace68e80a751b07d78a27c94e83d6c5314bc; done > 20f4ace68e80a751b07d78a27c94e83d6c5314bc dbf86d8aafc897a25a3093139b4237a62395041e > 20f4ace68e80a751b07d78a27c94e83d6c5314bc 8969d3fa9159730fd3b23199873bfb26e3d20027 > > So, is it normal that rev-list is not able to figure out that a commit > is existing in both branch when 2 commits share the same patch-id? > > Is there any way to prevent rev-list from showing this commit? > > Thanks for your help. > > > PS. I have uploaded my test repo here: > https://gitlab.com/arnaudmorin/git-rev-list > > > -- > Arnaud Morin > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rev-list with multiple commits sharing same patch-id 2021-01-09 16:24 rev-list with multiple commits sharing same patch-id Arnaud Morin 2021-01-11 7:59 ` Arnaud Morin @ 2021-01-11 9:54 ` Christian Couder 2021-01-11 18:25 ` Arnaud Morin 2021-01-12 14:17 ` Jeff King 2 siblings, 1 reply; 15+ messages in thread From: Christian Couder @ 2021-01-11 9:54 UTC (permalink / raw) To: Arnaud Morin; +Cc: git Hi, On Sat, Jan 9, 2021 at 5:29 PM Arnaud Morin <arnaud.morin@gmail.com> wrote: > I am struggling with a rev-list command, hope someone can give me a > clue on what is going on. [...] > However, when using git rev-list to perform a symmetric difference, git > is giving me a commit ID: > $ git rev-list --left-right --cherry-pick B...master > >dbf86d8aafc897a25a3093139b4237a62395041e It looks like a bug. You might want to check the following test file and let us know if a similar case is tested in it, or not: t6007-rev-list-cherry-pick-file.sh If not, it might be a good idea to add one using 'test_expect_failure' instead of 'test_expect_success' to document the failure. Thanks, Christian. > Note that this commit is not empty > $ git show dbf86d8aafc897a25a3093139b4237a62395041e --stat > commit dbf86d8aafc897a25a3093139b4237a62395041e (origin/master, master) > Author: Arnaud Morin <hidden@mail> > Date: Sat Jan 9 10:30:10 2021 +0100 > > remove line2 and add line4 (bis) > > a | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > So, from git rev-list perspective, there is a diff. > > After digging a little bit, the thing is that this commit is having the > following patch-id: > $ git show dbf86d8aafc897a25a3093139b4237a62395041e | git patch-id > 20f4ace68e80a751b07d78a27c94e83d6c5314bc dbf86d8aafc897a25a3093139b4237a62395041e > > Which is also already existing in an other commit: > $ for c in $(git rev-list HEAD) ; do git show $c |git patch-id |grep 20f4ace68e80a751b07d78a27c94e83d6c5314bc; done > 20f4ace68e80a751b07d78a27c94e83d6c5314bc dbf86d8aafc897a25a3093139b4237a62395041e > 20f4ace68e80a751b07d78a27c94e83d6c5314bc 8969d3fa9159730fd3b23199873bfb26e3d20027 > > So, is it normal that rev-list is not able to figure out that a commit > is existing in both branch when 2 commits share the same patch-id? > > Is there any way to prevent rev-list from showing this commit? > > Thanks for your help. > > > PS. I have uploaded my test repo here: > https://gitlab.com/arnaudmorin/git-rev-list > > > -- > Arnaud Morin > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rev-list with multiple commits sharing same patch-id 2021-01-11 9:54 ` Christian Couder @ 2021-01-11 18:25 ` Arnaud Morin 0 siblings, 0 replies; 15+ messages in thread From: Arnaud Morin @ 2021-01-11 18:25 UTC (permalink / raw) To: Christian Couder; +Cc: git Hi, Thanks for the link to the test file. I patched the test file to add this new case (where the same patch-id is duplicated in the tree). This patch can be find in [1] I had to patch two existing tests to make them work again. I also added a new test with this new test-case (two identical patch-id in the same tree.) Note that I still have one test (outside my new one) which is failing: not ok 10 - name-rev --exclude excludes matched patterns I dont know why yet. Moreover, I am willing to help fixing the code. I have the feeling that a patch needs to be done in revision.c, maybe in static void cherry_pick_list? If you have any clue to help me? Thanks. [1] https://github.com/arnaudmorin/git/pull/1 -- Arnaud Morin On 11.01.21 - 10:54, Christian Couder wrote: > Hi, > > On Sat, Jan 9, 2021 at 5:29 PM Arnaud Morin <arnaud.morin@gmail.com> wrote: > > > I am struggling with a rev-list command, hope someone can give me a > > clue on what is going on. > > [...] > > > However, when using git rev-list to perform a symmetric difference, git > > is giving me a commit ID: > > $ git rev-list --left-right --cherry-pick B...master > > >dbf86d8aafc897a25a3093139b4237a62395041e > > It looks like a bug. You might want to check the following test file > and let us know if a similar case is tested in it, or not: > > t6007-rev-list-cherry-pick-file.sh > > If not, it might be a good idea to add one using 'test_expect_failure' > instead of 'test_expect_success' to document the failure. > > Thanks, > Christian. > > > > Note that this commit is not empty > > $ git show dbf86d8aafc897a25a3093139b4237a62395041e --stat > > commit dbf86d8aafc897a25a3093139b4237a62395041e (origin/master, master) > > Author: Arnaud Morin <hidden@mail> > > Date: Sat Jan 9 10:30:10 2021 +0100 > > > > remove line2 and add line4 (bis) > > > > a | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > So, from git rev-list perspective, there is a diff. > > > > After digging a little bit, the thing is that this commit is having the > > following patch-id: > > $ git show dbf86d8aafc897a25a3093139b4237a62395041e | git patch-id > > 20f4ace68e80a751b07d78a27c94e83d6c5314bc dbf86d8aafc897a25a3093139b4237a62395041e > > > > Which is also already existing in an other commit: > > $ for c in $(git rev-list HEAD) ; do git show $c |git patch-id |grep 20f4ace68e80a751b07d78a27c94e83d6c5314bc; done > > 20f4ace68e80a751b07d78a27c94e83d6c5314bc dbf86d8aafc897a25a3093139b4237a62395041e > > 20f4ace68e80a751b07d78a27c94e83d6c5314bc 8969d3fa9159730fd3b23199873bfb26e3d20027 > > > > So, is it normal that rev-list is not able to figure out that a commit > > is existing in both branch when 2 commits share the same patch-id? > > > > Is there any way to prevent rev-list from showing this commit? > > > > Thanks for your help. > > > > > > PS. I have uploaded my test repo here: > > https://gitlab.com/arnaudmorin/git-rev-list > > > > > > -- > > Arnaud Morin > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rev-list with multiple commits sharing same patch-id 2021-01-09 16:24 rev-list with multiple commits sharing same patch-id Arnaud Morin 2021-01-11 7:59 ` Arnaud Morin 2021-01-11 9:54 ` Christian Couder @ 2021-01-12 14:17 ` Jeff King 2021-01-12 15:11 ` Jeff King 2 siblings, 1 reply; 15+ messages in thread From: Jeff King @ 2021-01-12 14:17 UTC (permalink / raw) To: Arnaud Morin; +Cc: git, Kevin Willford On Sat, Jan 09, 2021 at 04:24:40PM +0000, Arnaud Morin wrote: > After digging a little bit, the thing is that this commit is having the > following patch-id: > $ git show dbf86d8aafc897a25a3093139b4237a62395041e | git patch-id > 20f4ace68e80a751b07d78a27c94e83d6c5314bc dbf86d8aafc897a25a3093139b4237a62395041e > > Which is also already existing in an other commit: > $ for c in $(git rev-list HEAD) ; do git show $c |git patch-id |grep 20f4ace68e80a751b07d78a27c94e83d6c5314bc; done > 20f4ace68e80a751b07d78a27c94e83d6c5314bc dbf86d8aafc897a25a3093139b4237a62395041e > 20f4ace68e80a751b07d78a27c94e83d6c5314bc 8969d3fa9159730fd3b23199873bfb26e3d20027 A slight nit first. Those two commits have the same patch-id, but in your graph they are on the same branch ("master", not "B"): > The commits in B are cherry-picked from master. > Here is the graph: > $ git log --graph --oneline --all > * ae2e3c4 (origin/B, B) remove line2 and add line4 (bis) > * a7a0339 remove line4 > * caa4aad restore line2 > * d7dc596 remove line2 add line4 > * 44bcfd4 add line3 > * e372641 b > | * dbf86d8 (HEAD -> master, origin/master) remove line2 and add line4 (bis) > | * 4017282 remove line4 > | * 0f2a449 restore line2 > | * 8969d3f remove line2 add line4 > | * e73b420 add line3 > | * fe5a75a b > |/ > * 6192505 a > * b4089e1 init And --cherry-pick is only looking for matches between the two sides of the symmetric difference. However, I think that patch-id does exist on the other side (it appears again twice in fact, which is not surprising): $ git rev-list --all | git diff-tree --stdin -p | git patch-id | grep 20f4ace68e80 20f4ace68e80a751b07d78a27c94e83d6c5314bc ae2e3c4754f53440cc4612d35f80d795a695c862 20f4ace68e80a751b07d78a27c94e83d6c5314bc dbf86d8aafc897a25a3093139b4237a62395041e 20f4ace68e80a751b07d78a27c94e83d6c5314bc d7dc596fcc34662cba35363febc846bfcab1e4be 20f4ace68e80a751b07d78a27c94e83d6c5314bc 8969d3fa9159730fd3b23199873bfb26e3d20027 So the entries should be suppressed from both sides. It looks like this was broken in v2.10.0, via dfb7a1b4d0 (patch-ids: stop using a hand-rolled hashmap implementation, 2016-07-29). I think the issue is that it is allowing duplicate entries in the hashmap. The algorithm is something like: - iterate over left-hand commits, inserting patch-id for each into hashmap - iterate over right-hand commits, seeing if any are present in hashmap. If so, we exclude the commit _and_ mark the patch-id as "seen" - iterate again over left-hand commits, omitting any whose patch-ids are marked as "seen" So if two commits on the left-hand side have the same patch-id, if we insert two entries into the hashmap, then only one of them is going to get its "seen" flag marked in the middle step. -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rev-list with multiple commits sharing same patch-id 2021-01-12 14:17 ` Jeff King @ 2021-01-12 15:11 ` Jeff King 2021-01-12 15:34 ` Arnaud Morin 0 siblings, 1 reply; 15+ messages in thread From: Jeff King @ 2021-01-12 15:11 UTC (permalink / raw) To: Arnaud Morin; +Cc: git, Kevin Willford On Tue, Jan 12, 2021 at 09:17:37AM -0500, Jeff King wrote: > It looks like this was broken in v2.10.0, via dfb7a1b4d0 (patch-ids: > stop using a hand-rolled hashmap implementation, 2016-07-29). > > I think the issue is that it is allowing duplicate entries in the > hashmap. The algorithm is something like: > > - iterate over left-hand commits, inserting patch-id for each into > hashmap > > - iterate over right-hand commits, seeing if any are present in > hashmap. If so, we exclude the commit _and_ mark the patch-id as > "seen" > > - iterate again over left-hand commits, omitting any whose patch-ids > are marked as "seen" > > So if two commits on the left-hand side have the same patch-id, if we > insert two entries into the hashmap, then only one of them is going to > get its "seen" flag marked in the middle step. Yeah, that's definitely it. Here's what the fix would look like directly on top of dfb7a1b4d0: diff --git a/patch-ids.c b/patch-ids.c index db31fa647a..a8ed4f0872 100644 --- a/patch-ids.c +++ b/patch-ids.c @@ -66,12 +66,24 @@ struct patch_id *add_commit_patch_id(struct commit *commit, struct patch_ids *ids) { struct patch_id *key = xcalloc(1, sizeof(*key)); + struct patch_id *old; if (init_patch_id_entry(key, commit, ids)) { free(key); return NULL; } + /* + * Don't allow duplicates. Note that we can't use hashmap_put here; we + * must retain the _old_ struct, because some commit->util is pointing + * to it. + */ + old = hashmap_get(&ids->patches, key, ids); + if (old) { + free(key); + return old; + } + hashmap_add(&ids->patches, key); return key; } diff --git a/t/t6007-rev-list-cherry-pick-file.sh b/t/t6007-rev-list-cherry-pick-file.sh index 28d4f6b259..0c5f1dcc54 100755 --- a/t/t6007-rev-list-cherry-pick-file.sh +++ b/t/t6007-rev-list-cherry-pick-file.sh @@ -207,4 +207,16 @@ test_expect_success '--count --left-right' ' test_cmp expect actual ' +test_expect_success '--cherry-pick with duplicates on each side' ' + git checkout -b dup-orig && + test_commit dup-base && + git revert dup-base && + git cherry-pick dup-base && + git checkout -b dup-side HEAD~3 && + test_tick && + git cherry-pick -3 dup-orig && + git rev-list --cherry-pick dup-orig...dup-side >actual && + test_must_be_empty actual +' + test_done Unfortunately we can't do the same thing now. The "seen" flag went away in 683f17ec44 (patch-ids: replace the seen indicator with a commit pointer, 2016-07-29), in favor of having each patch_id struct point to a "struct commit". We could revert that, or turn that pointer into a list, but it gets worse. Later, b3dfeebb92 (rebase: avoid computing unnecessary patch IDs, 2016-07-29) built on that by lazily computing the patch-ids. So each patch_id struct must point to exactly one commit! We'll have to either: - split them apart into two structs at the time of the lazy computation or - accept having duplicate patch_ids, but on lookup make sure we find _all_ matches, not just the first one. Hmm, that last one might not be too bad, since we have hashmap_get_next() (it does mean iterating over a chained bucket linearly, but that's already the worst-case for a hash lookup). -Peff ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: rev-list with multiple commits sharing same patch-id 2021-01-12 15:11 ` Jeff King @ 2021-01-12 15:34 ` Arnaud Morin 2021-01-12 15:52 ` [PATCH] patch-ids: handle duplicate hashmap entries Jeff King 0 siblings, 1 reply; 15+ messages in thread From: Arnaud Morin @ 2021-01-12 15:34 UTC (permalink / raw) To: Jeff King; +Cc: git, Kevin Willford So instead of: id = has_commit_patch_id(commit, &ids); We should use some sort of iterator, in order to call: id->commit->object.flags |= cherry_flag; for all commits which are related to this patch? -- Arnaud Morin On 12.01.21 - 10:11, Jeff King wrote: > On Tue, Jan 12, 2021 at 09:17:37AM -0500, Jeff King wrote: > > > It looks like this was broken in v2.10.0, via dfb7a1b4d0 (patch-ids: > > stop using a hand-rolled hashmap implementation, 2016-07-29). > > > > I think the issue is that it is allowing duplicate entries in the > > hashmap. The algorithm is something like: > > > > - iterate over left-hand commits, inserting patch-id for each into > > hashmap > > > > - iterate over right-hand commits, seeing if any are present in > > hashmap. If so, we exclude the commit _and_ mark the patch-id as > > "seen" > > > > - iterate again over left-hand commits, omitting any whose patch-ids > > are marked as "seen" > > > > So if two commits on the left-hand side have the same patch-id, if we > > insert two entries into the hashmap, then only one of them is going to > > get its "seen" flag marked in the middle step. > > Yeah, that's definitely it. Here's what the fix would look like directly > on top of dfb7a1b4d0: > > diff --git a/patch-ids.c b/patch-ids.c > index db31fa647a..a8ed4f0872 100644 > --- a/patch-ids.c > +++ b/patch-ids.c > @@ -66,12 +66,24 @@ struct patch_id *add_commit_patch_id(struct commit *commit, > struct patch_ids *ids) > { > struct patch_id *key = xcalloc(1, sizeof(*key)); > + struct patch_id *old; > > if (init_patch_id_entry(key, commit, ids)) { > free(key); > return NULL; > } > > + /* > + * Don't allow duplicates. Note that we can't use hashmap_put here; we > + * must retain the _old_ struct, because some commit->util is pointing > + * to it. > + */ > + old = hashmap_get(&ids->patches, key, ids); > + if (old) { > + free(key); > + return old; > + } > + > hashmap_add(&ids->patches, key); > return key; > } > diff --git a/t/t6007-rev-list-cherry-pick-file.sh b/t/t6007-rev-list-cherry-pick-file.sh > index 28d4f6b259..0c5f1dcc54 100755 > --- a/t/t6007-rev-list-cherry-pick-file.sh > +++ b/t/t6007-rev-list-cherry-pick-file.sh > @@ -207,4 +207,16 @@ test_expect_success '--count --left-right' ' > test_cmp expect actual > ' > > +test_expect_success '--cherry-pick with duplicates on each side' ' > + git checkout -b dup-orig && > + test_commit dup-base && > + git revert dup-base && > + git cherry-pick dup-base && > + git checkout -b dup-side HEAD~3 && > + test_tick && > + git cherry-pick -3 dup-orig && > + git rev-list --cherry-pick dup-orig...dup-side >actual && > + test_must_be_empty actual > +' > + > test_done > > > Unfortunately we can't do the same thing now. The "seen" flag went away > in 683f17ec44 (patch-ids: replace the seen indicator with a commit > pointer, 2016-07-29), in favor of having each patch_id struct point to a > "struct commit". We could revert that, or turn that pointer into a list, > but it gets worse. Later, b3dfeebb92 (rebase: avoid computing > unnecessary patch IDs, 2016-07-29) built on that by lazily computing the > patch-ids. So each patch_id struct must point to exactly one commit! > > We'll have to either: > > - split them apart into two structs at the time of the lazy computation > > or > > - accept having duplicate patch_ids, but on lookup make sure we find > _all_ matches, not just the first one. > > Hmm, that last one might not be too bad, since we have > hashmap_get_next() (it does mean iterating over a chained bucket > linearly, but that's already the worst-case for a hash lookup). > > -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] patch-ids: handle duplicate hashmap entries 2021-01-12 15:34 ` Arnaud Morin @ 2021-01-12 15:52 ` Jeff King 2021-01-12 16:24 ` Arnaud Morin 2021-01-12 19:13 ` Junio C Hamano 0 siblings, 2 replies; 15+ messages in thread From: Jeff King @ 2021-01-12 15:52 UTC (permalink / raw) To: Arnaud Morin; +Cc: git, Kevin Willford On Tue, Jan 12, 2021 at 03:34:38PM +0000, Arnaud Morin wrote: > So instead of: > id = has_commit_patch_id(commit, &ids); > > We should use some sort of iterator, in order to call: > id->commit->object.flags |= cherry_flag; > > for all commits which are related to this patch? Yep, exactly. Here it is as a patch. :) -- >8 -- Subject: [PATCH] patch-ids: handle duplicate hashmap entries This fixes a bug introduced in dfb7a1b4d0 (patch-ids: stop using a hand-rolled hashmap implementation, 2016-07-29) in which git rev-list --cherry-pick A...B will fail to suppress commits reachable from A even if a commit with matching patch-id appears in B. Around the time of that commit, the algorithm for "--cherry-pick" looked something like this: 0. Traverse all of the commits, marking them as being on the left or right side of the symmetric difference. 1. Iterate over the left-hand commits, inserting a patch-id struct for each into a hashmap, and pointing commit->util to the patch-id struct. 2. Iterate over the right-hand commits, checking which are present in the hashmap. If so, we exclude the commit from the output _and_ we mark the patch-id as "seen". 3. Iterate again over the left-hand commits, checking whether commit->util->seen is set; if so, exclude them from the output. At the end, we'll have eliminated commits from both sides that have a matching patch-id on the other side. But there's a subtle assumption here: for any given patch-id, we must have exactly one struct representing it. If two commits from A both have the same patch-id and we allow duplicates in the hashmap, then we run into a problem: a. In step 1, we insert two patch-id structs into the hashmap. b. In step 2, our lookups will find only one of these structs, so only one "seen" flag is marked. c. In step 3, one of the commits in A will have its commit->util->seen set, but the other will not. We'll erroneously output the latter. Prior to dfb7a1b4d0, our hashmap did not allow duplicates. Afterwards, it used hashmap_add(), which explicitly does allow duplicates. At that point, the solution would have been easy: when we are about to add a duplicate, skip doing so and return the existing entry which matches. But it gets more complicated. In 683f17ec44 (patch-ids: replace the seen indicator with a commit pointer, 2016-07-29), our step 3 goes away entirely. Instead, in step 2, when the right-hand side finds a matching patch_id from the left-hand side, we can directly mark the left-hand patch_id->commit to be omitted. Solving that would be easy, too; there's a one-to-many relationship of patch-ids to commits, so we just need to keep a list. But there's more. Commit b3dfeebb92 (rebase: avoid computing unnecessary patch IDs, 2016-07-29) built on that by lazily computing the full patch-ids. So we don't even know when adding to the hashmap whether two commits truly have the same id. We'd have to tentatively assign them a list, and then possibly split them apart (possibly into N new structs) at the moment we compute the real patch-ids. This could work, but it's complicated and error-prone. Instead, let's accept that we may store duplicates, and teach the lookup side to be more clever. Rather than asking for a single matching patch-id, it will need to iterate over all matching patch-ids. This does mean examining every entry in a single hash bucket, but the worst-case for a hash lookup was already doing that. We'll keep the hashmap details out of the caller by providing a simple iteration interface. We can retain the simple has_commit_patch_id() interface for the other callers, but we'll simplify its return value into an integer, rather than returning the patch_id struct. That way they won't be tempted to look at the "commit" field of the return value without iterating. Reported-by: Arnaud Morin <arnaud.morin@gmail.com> Signed-off-by: Jeff King <peff@peff.net> --- patch-ids.c | 14 +++++++++++++- patch-ids.h | 20 +++++++++++++++++++- revision.c | 6 ++++-- t/t6007-rev-list-cherry-pick-file.sh | 12 ++++++++++++ 4 files changed, 48 insertions(+), 4 deletions(-) diff --git a/patch-ids.c b/patch-ids.c index 21973e4933..f51021a0cb 100644 --- a/patch-ids.c +++ b/patch-ids.c @@ -89,7 +89,7 @@ static int init_patch_id_entry(struct patch_id *patch, return 0; } -struct patch_id *has_commit_patch_id(struct commit *commit, +struct patch_id *patch_id_iter_first(struct commit *commit, struct patch_ids *ids) { struct patch_id patch; @@ -104,6 +104,18 @@ struct patch_id *has_commit_patch_id(struct commit *commit, return hashmap_get_entry(&ids->patches, &patch, ent, NULL); } +struct patch_id *patch_id_iter_next(struct patch_id *cur, + struct patch_ids *ids) +{ + return hashmap_get_next_entry(&ids->patches, cur, ent); +} + +int has_commit_patch_id(struct commit *commit, + struct patch_ids *ids) +{ + return !!patch_id_iter_first(commit, ids); +} + struct patch_id *add_commit_patch_id(struct commit *commit, struct patch_ids *ids) { diff --git a/patch-ids.h b/patch-ids.h index 03bb04e707..ab6c6a6804 100644 --- a/patch-ids.h +++ b/patch-ids.h @@ -23,7 +23,25 @@ int commit_patch_id(struct commit *commit, struct diff_options *options, struct object_id *oid, int, int); int init_patch_ids(struct repository *, struct patch_ids *); int free_patch_ids(struct patch_ids *); + +/* Add a patch_id for a single commit to the set. */ struct patch_id *add_commit_patch_id(struct commit *, struct patch_ids *); -struct patch_id *has_commit_patch_id(struct commit *, struct patch_ids *); + +/* Returns true if the patch-id of "commit" is present in the set. */ +int has_commit_patch_id(struct commit *commit, struct patch_ids *); + +/* + * Iterate over all commits in the set whose patch id matches that of + * "commit", like: + * + * struct patch_id *cur; + * for (cur = patch_id_iter_first(commit, ids); + * cur; + * cur = patch_id_iter_next(cur, ids) { + * ... look at cur->commit + * } + */ +struct patch_id *patch_id_iter_first(struct commit *commit, struct patch_ids *); +struct patch_id *patch_id_iter_next(struct patch_id *cur, struct patch_ids *); #endif /* PATCH_IDS_H */ diff --git a/revision.c b/revision.c index 9dff845bed..7ec9b634e3 100644 --- a/revision.c +++ b/revision.c @@ -1241,12 +1241,14 @@ static void cherry_pick_list(struct commit_list *list, struct rev_info *revs) /* * Have we seen the same patch id? */ - id = has_commit_patch_id(commit, &ids); + id = patch_id_iter_first(commit, &ids); if (!id) continue; commit->object.flags |= cherry_flag; - id->commit->object.flags |= cherry_flag; + do { + id->commit->object.flags |= cherry_flag; + } while ((id = patch_id_iter_next(id, &ids))); } free_patch_ids(&ids); diff --git a/t/t6007-rev-list-cherry-pick-file.sh b/t/t6007-rev-list-cherry-pick-file.sh index f0268372d2..8bf5ae23c2 100755 --- a/t/t6007-rev-list-cherry-pick-file.sh +++ b/t/t6007-rev-list-cherry-pick-file.sh @@ -245,6 +245,18 @@ test_expect_success '--count --left-right' ' test_cmp expect actual ' +test_expect_success '--cherry-pick with duplicates on each side' ' + git checkout -b dup-orig && + test_commit dup-base && + git revert dup-base && + git cherry-pick dup-base && + git checkout -b dup-side HEAD~3 && + test_tick && + git cherry-pick -3 dup-orig && + git rev-list --cherry-pick dup-orig...dup-side >actual && + test_must_be_empty actual +' + # Corrupt the object store deliberately to make sure # the object is not even checked for its existence. remove_loose_object () { -- 2.30.0.455.gdab6b73766 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] patch-ids: handle duplicate hashmap entries 2021-01-12 15:52 ` [PATCH] patch-ids: handle duplicate hashmap entries Jeff King @ 2021-01-12 16:24 ` Arnaud Morin 2021-01-12 19:13 ` Junio C Hamano 1 sibling, 0 replies; 15+ messages in thread From: Arnaud Morin @ 2021-01-12 16:24 UTC (permalink / raw) To: Jeff King; +Cc: git, Kevin Willford YAY! Thanks a lot :) -- Arnaud Morin On 12.01.21 - 10:52, Jeff King wrote: > On Tue, Jan 12, 2021 at 03:34:38PM +0000, Arnaud Morin wrote: > > > So instead of: > > id = has_commit_patch_id(commit, &ids); > > > > We should use some sort of iterator, in order to call: > > id->commit->object.flags |= cherry_flag; > > > > for all commits which are related to this patch? > > Yep, exactly. > > Here it is as a patch. :) > > -- >8 -- > Subject: [PATCH] patch-ids: handle duplicate hashmap entries > > This fixes a bug introduced in dfb7a1b4d0 (patch-ids: stop using a > hand-rolled hashmap implementation, 2016-07-29) in which > > git rev-list --cherry-pick A...B > > will fail to suppress commits reachable from A even if a commit with > matching patch-id appears in B. > > Around the time of that commit, the algorithm for "--cherry-pick" looked > something like this: > > 0. Traverse all of the commits, marking them as being on the left or > right side of the symmetric difference. > > 1. Iterate over the left-hand commits, inserting a patch-id struct for > each into a hashmap, and pointing commit->util to the patch-id > struct. > > 2. Iterate over the right-hand commits, checking which are present in > the hashmap. If so, we exclude the commit from the output _and_ we > mark the patch-id as "seen". > > 3. Iterate again over the left-hand commits, checking whether > commit->util->seen is set; if so, exclude them from the output. > > At the end, we'll have eliminated commits from both sides that have a > matching patch-id on the other side. But there's a subtle assumption > here: for any given patch-id, we must have exactly one struct > representing it. If two commits from A both have the same patch-id and > we allow duplicates in the hashmap, then we run into a problem: > > a. In step 1, we insert two patch-id structs into the hashmap. > > b. In step 2, our lookups will find only one of these structs, so only > one "seen" flag is marked. > > c. In step 3, one of the commits in A will have its commit->util->seen > set, but the other will not. We'll erroneously output the latter. > > Prior to dfb7a1b4d0, our hashmap did not allow duplicates. Afterwards, > it used hashmap_add(), which explicitly does allow duplicates. > > At that point, the solution would have been easy: when we are about to > add a duplicate, skip doing so and return the existing entry which > matches. But it gets more complicated. > > In 683f17ec44 (patch-ids: replace the seen indicator with a commit > pointer, 2016-07-29), our step 3 goes away entirely. Instead, in step 2, > when the right-hand side finds a matching patch_id from the left-hand > side, we can directly mark the left-hand patch_id->commit to be omitted. > Solving that would be easy, too; there's a one-to-many relationship of > patch-ids to commits, so we just need to keep a list. > > But there's more. Commit b3dfeebb92 (rebase: avoid computing unnecessary > patch IDs, 2016-07-29) built on that by lazily computing the full > patch-ids. So we don't even know when adding to the hashmap whether two > commits truly have the same id. We'd have to tentatively assign them a > list, and then possibly split them apart (possibly into N new structs) > at the moment we compute the real patch-ids. This could work, but it's > complicated and error-prone. > > Instead, let's accept that we may store duplicates, and teach the lookup > side to be more clever. Rather than asking for a single matching > patch-id, it will need to iterate over all matching patch-ids. This does > mean examining every entry in a single hash bucket, but the worst-case > for a hash lookup was already doing that. > > We'll keep the hashmap details out of the caller by providing a simple > iteration interface. We can retain the simple has_commit_patch_id() > interface for the other callers, but we'll simplify its return value > into an integer, rather than returning the patch_id struct. That way > they won't be tempted to look at the "commit" field of the return value > without iterating. > > Reported-by: Arnaud Morin <arnaud.morin@gmail.com> > Signed-off-by: Jeff King <peff@peff.net> > --- > patch-ids.c | 14 +++++++++++++- > patch-ids.h | 20 +++++++++++++++++++- > revision.c | 6 ++++-- > t/t6007-rev-list-cherry-pick-file.sh | 12 ++++++++++++ > 4 files changed, 48 insertions(+), 4 deletions(-) > > diff --git a/patch-ids.c b/patch-ids.c > index 21973e4933..f51021a0cb 100644 > --- a/patch-ids.c > +++ b/patch-ids.c > @@ -89,7 +89,7 @@ static int init_patch_id_entry(struct patch_id *patch, > return 0; > } > > -struct patch_id *has_commit_patch_id(struct commit *commit, > +struct patch_id *patch_id_iter_first(struct commit *commit, > struct patch_ids *ids) > { > struct patch_id patch; > @@ -104,6 +104,18 @@ struct patch_id *has_commit_patch_id(struct commit *commit, > return hashmap_get_entry(&ids->patches, &patch, ent, NULL); > } > > +struct patch_id *patch_id_iter_next(struct patch_id *cur, > + struct patch_ids *ids) > +{ > + return hashmap_get_next_entry(&ids->patches, cur, ent); > +} > + > +int has_commit_patch_id(struct commit *commit, > + struct patch_ids *ids) > +{ > + return !!patch_id_iter_first(commit, ids); > +} > + > struct patch_id *add_commit_patch_id(struct commit *commit, > struct patch_ids *ids) > { > diff --git a/patch-ids.h b/patch-ids.h > index 03bb04e707..ab6c6a6804 100644 > --- a/patch-ids.h > +++ b/patch-ids.h > @@ -23,7 +23,25 @@ int commit_patch_id(struct commit *commit, struct diff_options *options, > struct object_id *oid, int, int); > int init_patch_ids(struct repository *, struct patch_ids *); > int free_patch_ids(struct patch_ids *); > + > +/* Add a patch_id for a single commit to the set. */ > struct patch_id *add_commit_patch_id(struct commit *, struct patch_ids *); > -struct patch_id *has_commit_patch_id(struct commit *, struct patch_ids *); > + > +/* Returns true if the patch-id of "commit" is present in the set. */ > +int has_commit_patch_id(struct commit *commit, struct patch_ids *); > + > +/* > + * Iterate over all commits in the set whose patch id matches that of > + * "commit", like: > + * > + * struct patch_id *cur; > + * for (cur = patch_id_iter_first(commit, ids); > + * cur; > + * cur = patch_id_iter_next(cur, ids) { > + * ... look at cur->commit > + * } > + */ > +struct patch_id *patch_id_iter_first(struct commit *commit, struct patch_ids *); > +struct patch_id *patch_id_iter_next(struct patch_id *cur, struct patch_ids *); > > #endif /* PATCH_IDS_H */ > diff --git a/revision.c b/revision.c > index 9dff845bed..7ec9b634e3 100644 > --- a/revision.c > +++ b/revision.c > @@ -1241,12 +1241,14 @@ static void cherry_pick_list(struct commit_list *list, struct rev_info *revs) > /* > * Have we seen the same patch id? > */ > - id = has_commit_patch_id(commit, &ids); > + id = patch_id_iter_first(commit, &ids); > if (!id) > continue; > > commit->object.flags |= cherry_flag; > - id->commit->object.flags |= cherry_flag; > + do { > + id->commit->object.flags |= cherry_flag; > + } while ((id = patch_id_iter_next(id, &ids))); > } > > free_patch_ids(&ids); > diff --git a/t/t6007-rev-list-cherry-pick-file.sh b/t/t6007-rev-list-cherry-pick-file.sh > index f0268372d2..8bf5ae23c2 100755 > --- a/t/t6007-rev-list-cherry-pick-file.sh > +++ b/t/t6007-rev-list-cherry-pick-file.sh > @@ -245,6 +245,18 @@ test_expect_success '--count --left-right' ' > test_cmp expect actual > ' > > +test_expect_success '--cherry-pick with duplicates on each side' ' > + git checkout -b dup-orig && > + test_commit dup-base && > + git revert dup-base && > + git cherry-pick dup-base && > + git checkout -b dup-side HEAD~3 && > + test_tick && > + git cherry-pick -3 dup-orig && > + git rev-list --cherry-pick dup-orig...dup-side >actual && > + test_must_be_empty actual > +' > + > # Corrupt the object store deliberately to make sure > # the object is not even checked for its existence. > remove_loose_object () { > -- > 2.30.0.455.gdab6b73766 > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] patch-ids: handle duplicate hashmap entries 2021-01-12 15:52 ` [PATCH] patch-ids: handle duplicate hashmap entries Jeff King 2021-01-12 16:24 ` Arnaud Morin @ 2021-01-12 19:13 ` Junio C Hamano 2021-01-13 9:24 ` Arnaud Morin 1 sibling, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2021-01-12 19:13 UTC (permalink / raw) To: Jeff King; +Cc: Arnaud Morin, git, Kevin Willford Jeff King <peff@peff.net> writes: > At the end, we'll have eliminated commits from both sides that have a > matching patch-id on the other side. But there's a subtle assumption > here: for any given patch-id, we must have exactly one struct > representing it. If two commits from A both have the same patch-id and > we allow duplicates in the hashmap, then we run into a problem: In practical terms, for one side of the history to have two identical patches, the most likely scenerio for that to happen is to have a patch, its reversion, and its reapplication, intermixed in its history with other commits, e.g. ---o---o---M---o---o---W---o---o---M---o--- ... \ o---o---o---M---o--- ... where "M" are commits that does an identical change, and "W" (upside-down "M") is its reversion. On the top history, "M" was introduced, the bottom history cherry-picked, the top history found problems in it and reverted with "W", and later (perhaps with the help of preparatory patches on top of "W"), the top history now considers that "M" is safe, so reapplies it. And a "--cherry-pick" output that excludes "identical patches" that appear on both sides on such a history would hide all "M"'s, leaving a history like ---o---o-------o---o---W---o---o-------o--- ... \ o---o---o-------o--- ... But is this result what the user really wanted to see, I have to wonder. I do not see any problem in the patch itself. We used to hide only one "M" from the history on the top half in the picture, leaving one "M" and "W", while hiding the sole "M" from the bottom half. Now if we want to no longer show any "M", the updated code would correctly hide all of them. It just feels to me that the resulting view of the history look weird, leaving only the reversion of a patch that has never been applied in the first place. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] patch-ids: handle duplicate hashmap entries 2021-01-12 19:13 ` Junio C Hamano @ 2021-01-13 9:24 ` Arnaud Morin 2021-01-13 12:59 ` Jeff King 2021-01-13 19:28 ` Junio C Hamano 0 siblings, 2 replies; 15+ messages in thread From: Arnaud Morin @ 2021-01-13 9:24 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, git, Kevin Willford Without this patch, that's even worst, consistency is broken. Let me explain. With your history example: ---o---o---M---o---o---W---o---o---M---o--- branch \ o---o---o---M---o--- master # WITHOUT PATCH If we imagine that master is having more commits count than branch. The result of rev-list will be like you described: $ git rev-list --left-right --cherry-pick branch...master <M <W In other words, it's showing both W and M. BUT, if we imagine now that master is having less commits count than branch. $ git rev-list --left-right --cherry-pick branch...master <W It's only showing W! So the result is not consistent, depending on which branch is having more commits. # WITH PATCH With the patch, everything is consistent, and only W is kept in ouptut, no matter the size of history: $ git.p rev-list --left-right --cherry-pick branch...master <W Cheers, -- Arnaud Morin On 12.01.21 - 11:13, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > At the end, we'll have eliminated commits from both sides that have a > > matching patch-id on the other side. But there's a subtle assumption > > here: for any given patch-id, we must have exactly one struct > > representing it. If two commits from A both have the same patch-id and > > we allow duplicates in the hashmap, then we run into a problem: > > In practical terms, for one side of the history to have two > identical patches, the most likely scenerio for that to happen is to > have a patch, its reversion, and its reapplication, intermixed in > its history with other commits, e.g. > > ---o---o---M---o---o---W---o---o---M---o--- ... > \ > o---o---o---M---o--- ... > > where "M" are commits that does an identical change, and "W" > (upside-down "M") is its reversion. On the top history, "M" was > introduced, the bottom history cherry-picked, the top history found > problems in it and reverted with "W", and later (perhaps with the > help of preparatory patches on top of "W"), the top history now > considers that "M" is safe, so reapplies it. > > And a "--cherry-pick" output that excludes "identical patches" that > appear on both sides on such a history would hide all "M"'s, leaving > a history like > > ---o---o-------o---o---W---o---o-------o--- ... > \ > o---o---o-------o--- ... > > But is this result what the user really wanted to see, I have to > wonder. > > I do not see any problem in the patch itself. We used to hide only > one "M" from the history on the top half in the picture, leaving one > "M" and "W", while hiding the sole "M" from the bottom half. Now if > we want to no longer show any "M", the updated code would correctly > hide all of them. > > It just feels to me that the resulting view of the history look > weird, leaving only the reversion of a patch that has never been > applied in the first place. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] patch-ids: handle duplicate hashmap entries 2021-01-13 9:24 ` Arnaud Morin @ 2021-01-13 12:59 ` Jeff King 2021-01-13 20:21 ` Junio C Hamano 2021-01-13 19:28 ` Junio C Hamano 1 sibling, 1 reply; 15+ messages in thread From: Jeff King @ 2021-01-13 12:59 UTC (permalink / raw) To: Arnaud Morin; +Cc: Junio C Hamano, git, Kevin Willford On Wed, Jan 13, 2021 at 09:24:48AM +0000, Arnaud Morin wrote: > Without this patch, that's even worst, consistency is broken. > Let me explain. > > With your history example: > > ---o---o---M---o---o---W---o---o---M---o--- branch > \ > o---o---o---M---o--- master > > # WITHOUT PATCH > If we imagine that master is having more commits count than branch. > The result of rev-list will be like you described: > $ git rev-list --left-right --cherry-pick branch...master > <M > <W > > In other words, it's showing both W and M. > > BUT, if we imagine now that master is having less commits count than branch. > $ git rev-list --left-right --cherry-pick branch...master > <W > > It's only showing W! > > So the result is not consistent, depending on which branch is having > more commits. Right. There's definitely a question of whether --cherry-pick is the most useful thing in such a situation, but the current behavior was simply broken. It did not behave as advertised, and it treated one side of the history different from the other. So whether we want to do anything else to help that case, I think this at least makes --cherry-pick sane. :) Here are two related histories to think about. This is the same history, but the revert and re-do happens on both sides (and this is actually the setup in the new test): ---o---o---M---o---o---W---o---o---M---o--- branch \ o---o---M---o---o---W---o---o---M---o--- master There we really _do_ want to clear out both M's (really, all four), because we're doing so _from both sides_. And that is ultimately the point of the cherry-pick option: show commits that were picked from one side to the other. Another interesting case is when the first "M" is actually bundled into another commit, like: ---o---o---M+X---o---o---W---o---o---M---o--- branch \ o---o---o--M---o--- master where "M+X" has the changes from "M" plus some other changes. It will get a different patch-id, and --cherry-pick would show both M+X and W, but not M for either side. This is a limitation of the patch-id concept: we are looking at whole commits, not overall changes. I suspect for most operations we care less about "remove all cherry-picks from both sides", but rather "give me one side, omitting commits that are already on the other side" (because you want to rebase or cherry-pick some subset). -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] patch-ids: handle duplicate hashmap entries 2021-01-13 12:59 ` Jeff King @ 2021-01-13 20:21 ` Junio C Hamano 2021-01-13 20:33 ` Jeff King 0 siblings, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2021-01-13 20:21 UTC (permalink / raw) To: Jeff King; +Cc: Arnaud Morin, git, Kevin Willford Jeff King <peff@peff.net> writes: > Right. There's definitely a question of whether --cherry-pick is the > most useful thing in such a situation, but the current behavior was > simply broken. It did not behave as advertised, and it treated one side > of the history different from the other. So whether we want to do > anything else to help that case, I think this at least makes > --cherry-pick sane. :) Yes; I think "sane" does not always equal to "useful", though ;-) > Here are two related histories to think about. > ... > I suspect for most operations we care less about "remove all > cherry-picks from both sides", but rather "give me one side, omitting > commits that are already on the other side" (because you want to rebase > or cherry-pick some subset). Yes again. It means "--cherry-pick" inherently is not symmetric. One side is the reference side (i.e. mainline), and the goal is to remove from the other side what is in the reference side. What we are seeing in this discussion is that "--cherry-pick" and symmetric difference do not conceptually play well together. But the behaviour with the patch makes the most sense when cherry-pick is applied to a symmetric difference (provided that doing so makes sense in the first place, that is). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] patch-ids: handle duplicate hashmap entries 2021-01-13 20:21 ` Junio C Hamano @ 2021-01-13 20:33 ` Jeff King 0 siblings, 0 replies; 15+ messages in thread From: Jeff King @ 2021-01-13 20:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: Arnaud Morin, git, Kevin Willford On Wed, Jan 13, 2021 at 12:21:30PM -0800, Junio C Hamano wrote: > > I suspect for most operations we care less about "remove all > > cherry-picks from both sides", but rather "give me one side, omitting > > commits that are already on the other side" (because you want to rebase > > or cherry-pick some subset). > > Yes again. It means "--cherry-pick" inherently is not symmetric. > One side is the reference side (i.e. mainline), and the goal is to > remove from the other side what is in the reference side. > > What we are seeing in this discussion is that "--cherry-pick" and > symmetric difference do not conceptually play well together. > > But the behaviour with the patch makes the most sense when > cherry-pick is applied to a symmetric difference (provided that > doing so makes sense in the first place, that is). I didn't realize --cherry-pick would work without a symmetric difference. The documentation says: Omit any commit that introduces the same change as another commit on the “other side” when the set of commits are limited with symmetric difference. And indeed, I think it silently does nothing with: git rev-list --cherry-pick A..B (because there is nothing on the "other" side to match). So you do need some symmetric difference in order to define the two sets, but you might only want to see one of the sides. And that is basically what --cherry does. But having looked at the implementation of cherry_pick_list(), it is quite happy to swap the sides internally. I guess if we were going to make the output unsymmetric, the first thing would be to stop doing that swap. :) For the specifics of reverts and replays, though, I think the weirdness has nothing to do with left/right, or showing one side. It's the mismatched count. So if we were to make any change, it would be to keep a count of how many times each commit appears on the other side, and cancel them one for one. I.e., this: o--M--W--M--o-- \ o--M--o--- might plausibly want to show "M" once on the upper branch. But: o--M--W--M--o-- \ o--M--W--M--o-- should never show M, I wouldn't think. I'm not sure if that would be violating what existing callers expect from "--cherry-pick", though (i.e., if it would need another name, or an extra option to trigger). -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] patch-ids: handle duplicate hashmap entries 2021-01-13 9:24 ` Arnaud Morin 2021-01-13 12:59 ` Jeff King @ 2021-01-13 19:28 ` Junio C Hamano 1 sibling, 0 replies; 15+ messages in thread From: Junio C Hamano @ 2021-01-13 19:28 UTC (permalink / raw) To: Arnaud Morin; +Cc: Jeff King, git, Kevin Willford Arnaud Morin <arnaud.morin@gmail.com> writes: > Without this patch, that's even worst, consistency is broken. > Let me explain. > > With your history example: > > ---o---o---M---o---o---W---o---o---M---o--- branch > \ > o---o---o---M---o--- master > > # WITHOUT PATCH > If we imagine that master is having more commits count than branch. > The result of rev-list will be like you described: > $ git rev-list --left-right --cherry-pick branch...master > <M > <W > > In other words, it's showing both W and M. So, at least they cancel out and the reader can tell that the net effect was none --- that is "sort of understandable" result. > BUT, if we imagine now that master is having less commits count than branch. > $ git rev-list --left-right --cherry-pick branch...master > <W > > It's only showing W! Which is what I felt misleading. > # WITH PATCH > With the patch, everything is consistent, and only W is kept in ouptut, > no matter the size of history: > $ git.p rev-list --left-right --cherry-pick branch...master > <W So with the patch, the former case is made the same as the latter misleading result, which would mean that they are consistently misleading? I guess that it is better to be misleading all the time than being "sort of understandable" half the time and misleading the rest of time. At least that is consistent. Thanks. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2021-01-13 20:35 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-01-09 16:24 rev-list with multiple commits sharing same patch-id Arnaud Morin 2021-01-11 7:59 ` Arnaud Morin 2021-01-11 9:54 ` Christian Couder 2021-01-11 18:25 ` Arnaud Morin 2021-01-12 14:17 ` Jeff King 2021-01-12 15:11 ` Jeff King 2021-01-12 15:34 ` Arnaud Morin 2021-01-12 15:52 ` [PATCH] patch-ids: handle duplicate hashmap entries Jeff King 2021-01-12 16:24 ` Arnaud Morin 2021-01-12 19:13 ` Junio C Hamano 2021-01-13 9:24 ` Arnaud Morin 2021-01-13 12:59 ` Jeff King 2021-01-13 20:21 ` Junio C Hamano 2021-01-13 20:33 ` Jeff King 2021-01-13 19:28 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).