From: Arnaud Morin <arnaud.morin@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Kevin Willford <kewillf@microsoft.com>
Subject: Re: [PATCH] patch-ids: handle duplicate hashmap entries
Date: Tue, 12 Jan 2021 16:24:25 +0000 [thread overview]
Message-ID: <20210112162425.GD32482@sync> (raw)
In-Reply-To: <X/3FwNPHqZqY+hv0@coredump.intra.peff.net>
YAY!
Thanks a lot :)
--
Arnaud Morin
On 12.01.21 - 10:52, Jeff King wrote:
> On Tue, Jan 12, 2021 at 03:34:38PM +0000, Arnaud Morin wrote:
>
> > So instead of:
> > id = has_commit_patch_id(commit, &ids);
> >
> > We should use some sort of iterator, in order to call:
> > id->commit->object.flags |= cherry_flag;
> >
> > for all commits which are related to this patch?
>
> Yep, exactly.
>
> Here it is as a patch. :)
>
> -- >8 --
> Subject: [PATCH] patch-ids: handle duplicate hashmap entries
>
> This fixes a bug introduced in dfb7a1b4d0 (patch-ids: stop using a
> hand-rolled hashmap implementation, 2016-07-29) in which
>
> git rev-list --cherry-pick A...B
>
> will fail to suppress commits reachable from A even if a commit with
> matching patch-id appears in B.
>
> Around the time of that commit, the algorithm for "--cherry-pick" looked
> something like this:
>
> 0. Traverse all of the commits, marking them as being on the left or
> right side of the symmetric difference.
>
> 1. Iterate over the left-hand commits, inserting a patch-id struct for
> each into a hashmap, and pointing commit->util to the patch-id
> struct.
>
> 2. Iterate over the right-hand commits, checking which are present in
> the hashmap. If so, we exclude the commit from the output _and_ we
> mark the patch-id as "seen".
>
> 3. Iterate again over the left-hand commits, checking whether
> commit->util->seen is set; if so, exclude them from the output.
>
> At the end, we'll have eliminated commits from both sides that have a
> matching patch-id on the other side. But there's a subtle assumption
> here: for any given patch-id, we must have exactly one struct
> representing it. If two commits from A both have the same patch-id and
> we allow duplicates in the hashmap, then we run into a problem:
>
> a. In step 1, we insert two patch-id structs into the hashmap.
>
> b. In step 2, our lookups will find only one of these structs, so only
> one "seen" flag is marked.
>
> c. In step 3, one of the commits in A will have its commit->util->seen
> set, but the other will not. We'll erroneously output the latter.
>
> Prior to dfb7a1b4d0, our hashmap did not allow duplicates. Afterwards,
> it used hashmap_add(), which explicitly does allow duplicates.
>
> At that point, the solution would have been easy: when we are about to
> add a duplicate, skip doing so and return the existing entry which
> matches. But it gets more complicated.
>
> In 683f17ec44 (patch-ids: replace the seen indicator with a commit
> pointer, 2016-07-29), our step 3 goes away entirely. Instead, in step 2,
> when the right-hand side finds a matching patch_id from the left-hand
> side, we can directly mark the left-hand patch_id->commit to be omitted.
> Solving that would be easy, too; there's a one-to-many relationship of
> patch-ids to commits, so we just need to keep a list.
>
> But there's more. Commit b3dfeebb92 (rebase: avoid computing unnecessary
> patch IDs, 2016-07-29) built on that by lazily computing the full
> patch-ids. So we don't even know when adding to the hashmap whether two
> commits truly have the same id. We'd have to tentatively assign them a
> list, and then possibly split them apart (possibly into N new structs)
> at the moment we compute the real patch-ids. This could work, but it's
> complicated and error-prone.
>
> Instead, let's accept that we may store duplicates, and teach the lookup
> side to be more clever. Rather than asking for a single matching
> patch-id, it will need to iterate over all matching patch-ids. This does
> mean examining every entry in a single hash bucket, but the worst-case
> for a hash lookup was already doing that.
>
> We'll keep the hashmap details out of the caller by providing a simple
> iteration interface. We can retain the simple has_commit_patch_id()
> interface for the other callers, but we'll simplify its return value
> into an integer, rather than returning the patch_id struct. That way
> they won't be tempted to look at the "commit" field of the return value
> without iterating.
>
> Reported-by: Arnaud Morin <arnaud.morin@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> patch-ids.c | 14 +++++++++++++-
> patch-ids.h | 20 +++++++++++++++++++-
> revision.c | 6 ++++--
> t/t6007-rev-list-cherry-pick-file.sh | 12 ++++++++++++
> 4 files changed, 48 insertions(+), 4 deletions(-)
>
> diff --git a/patch-ids.c b/patch-ids.c
> index 21973e4933..f51021a0cb 100644
> --- a/patch-ids.c
> +++ b/patch-ids.c
> @@ -89,7 +89,7 @@ static int init_patch_id_entry(struct patch_id *patch,
> return 0;
> }
>
> -struct patch_id *has_commit_patch_id(struct commit *commit,
> +struct patch_id *patch_id_iter_first(struct commit *commit,
> struct patch_ids *ids)
> {
> struct patch_id patch;
> @@ -104,6 +104,18 @@ struct patch_id *has_commit_patch_id(struct commit *commit,
> return hashmap_get_entry(&ids->patches, &patch, ent, NULL);
> }
>
> +struct patch_id *patch_id_iter_next(struct patch_id *cur,
> + struct patch_ids *ids)
> +{
> + return hashmap_get_next_entry(&ids->patches, cur, ent);
> +}
> +
> +int has_commit_patch_id(struct commit *commit,
> + struct patch_ids *ids)
> +{
> + return !!patch_id_iter_first(commit, ids);
> +}
> +
> struct patch_id *add_commit_patch_id(struct commit *commit,
> struct patch_ids *ids)
> {
> diff --git a/patch-ids.h b/patch-ids.h
> index 03bb04e707..ab6c6a6804 100644
> --- a/patch-ids.h
> +++ b/patch-ids.h
> @@ -23,7 +23,25 @@ int commit_patch_id(struct commit *commit, struct diff_options *options,
> struct object_id *oid, int, int);
> int init_patch_ids(struct repository *, struct patch_ids *);
> int free_patch_ids(struct patch_ids *);
> +
> +/* Add a patch_id for a single commit to the set. */
> struct patch_id *add_commit_patch_id(struct commit *, struct patch_ids *);
> -struct patch_id *has_commit_patch_id(struct commit *, struct patch_ids *);
> +
> +/* Returns true if the patch-id of "commit" is present in the set. */
> +int has_commit_patch_id(struct commit *commit, struct patch_ids *);
> +
> +/*
> + * Iterate over all commits in the set whose patch id matches that of
> + * "commit", like:
> + *
> + * struct patch_id *cur;
> + * for (cur = patch_id_iter_first(commit, ids);
> + * cur;
> + * cur = patch_id_iter_next(cur, ids) {
> + * ... look at cur->commit
> + * }
> + */
> +struct patch_id *patch_id_iter_first(struct commit *commit, struct patch_ids *);
> +struct patch_id *patch_id_iter_next(struct patch_id *cur, struct patch_ids *);
>
> #endif /* PATCH_IDS_H */
> diff --git a/revision.c b/revision.c
> index 9dff845bed..7ec9b634e3 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -1241,12 +1241,14 @@ static void cherry_pick_list(struct commit_list *list, struct rev_info *revs)
> /*
> * Have we seen the same patch id?
> */
> - id = has_commit_patch_id(commit, &ids);
> + id = patch_id_iter_first(commit, &ids);
> if (!id)
> continue;
>
> commit->object.flags |= cherry_flag;
> - id->commit->object.flags |= cherry_flag;
> + do {
> + id->commit->object.flags |= cherry_flag;
> + } while ((id = patch_id_iter_next(id, &ids)));
> }
>
> free_patch_ids(&ids);
> diff --git a/t/t6007-rev-list-cherry-pick-file.sh b/t/t6007-rev-list-cherry-pick-file.sh
> index f0268372d2..8bf5ae23c2 100755
> --- a/t/t6007-rev-list-cherry-pick-file.sh
> +++ b/t/t6007-rev-list-cherry-pick-file.sh
> @@ -245,6 +245,18 @@ test_expect_success '--count --left-right' '
> test_cmp expect actual
> '
>
> +test_expect_success '--cherry-pick with duplicates on each side' '
> + git checkout -b dup-orig &&
> + test_commit dup-base &&
> + git revert dup-base &&
> + git cherry-pick dup-base &&
> + git checkout -b dup-side HEAD~3 &&
> + test_tick &&
> + git cherry-pick -3 dup-orig &&
> + git rev-list --cherry-pick dup-orig...dup-side >actual &&
> + test_must_be_empty actual
> +'
> +
> # Corrupt the object store deliberately to make sure
> # the object is not even checked for its existence.
> remove_loose_object () {
> --
> 2.30.0.455.gdab6b73766
>
next prev parent reply other threads:[~2021-01-12 16:26 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-09 16:24 rev-list with multiple commits sharing same patch-id Arnaud Morin
2021-01-11 7:59 ` Arnaud Morin
2021-01-11 9:54 ` Christian Couder
2021-01-11 18:25 ` Arnaud Morin
2021-01-12 14:17 ` Jeff King
2021-01-12 15:11 ` Jeff King
2021-01-12 15:34 ` Arnaud Morin
2021-01-12 15:52 ` [PATCH] patch-ids: handle duplicate hashmap entries Jeff King
2021-01-12 16:24 ` Arnaud Morin [this message]
2021-01-12 19:13 ` Junio C Hamano
2021-01-13 9:24 ` Arnaud Morin
2021-01-13 12:59 ` Jeff King
2021-01-13 20:21 ` Junio C Hamano
2021-01-13 20:33 ` Jeff King
2021-01-13 19:28 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210112162425.GD32482@sync \
--to=arnaud.morin@gmail.com \
--cc=git@vger.kernel.org \
--cc=kewillf@microsoft.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).