git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>, git@vger.kernel.org
Cc: congdanhqx@gmail.com, newren@gmail.com, gitster@pobox.com
Subject: Re: [PATCH v3] rebase --merge: optionally skip upstreamed commits
Date: Mon, 30 Mar 2020 08:13:32 -0400	[thread overview]
Message-ID: <e917f451-c00a-c819-7f78-888ba6e8f5ea@gmail.com> (raw)
In-Reply-To: <20200330040621.13701-1-jonathantanmy@google.com>

On 3/30/2020 12:06 AM, Jonathan Tan wrote:
> When rebasing against an upstream that has had many commits since the
> original branch was created:
> 
>  O -- O -- ... -- O -- O (upstream)
>   \
>    -- O (my-dev-branch)
> 
> it must read the contents of every novel upstream commit, in addition to
> the tip of the upstream and the merge base, because "git rebase"
> attempts to exclude commits that are duplicates of upstream ones. This
> can be a significant performance hit, especially in a partial clone,
> wherein a read of an object may end up being a fetch.
> 
> Add a flag to "git rebase" to allow suppression of this feature. This
> flag only works when using the "merge" backend.

So this is the behavior that already exists, and you are providing a way
to suppress it. However, you also change the default in this patch, which
may surprise users expecting this behavior to continue.

> This flag changes the behavior of sequencer_make_script(), called from
> do_interactive_rebase() <- run_rebase_interactive() <-
> run_specific_rebase() <- cmd_rebase(). With this flag, limit_list()
> (indirectly called from sequencer_make_script() through
> prepare_revision_walk()) will no longer call cherry_pick_list(), and
> thus PATCHSAME is no longer set. Refraining from setting PATCHSAME both
> means that the intermediate commits in upstream are no longer read (as
> shown by the test) and means that no PATCHSAME-caused skipping of
> commits is done by sequencer_make_script(), either directly or through
> make_script_with_merges().
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> This commit contains Junio's sign-off because I based it on
> jt/rebase-allow-duplicate.
> 
> This does not include the fix by Đoàn Trần Công Danh. If we want all
> commits to pass all tests (whether run by Busybox or not) it seems like
> we should squash that patch instead of having it as a separate commit.
> If we do squash, maybe include a "Helped-by" with Đoàn Trần Công Danh's
> name.
> 
> Junio wrote [1]:
> 
>> Sounds much better to me.  I do not mind --[no-]keep-cherry-pick,
>> either, by the way.  I know I raised the possibility of having to
>> make it non-bool later, but since then I haven't thought of a good
>> third option myself anyway, so...
> 
> In that case, I think it's better to stick to bool. This also means that
> the change from the version in jt/rebase-allow-duplicate is very small,
> hopefully aiding reviewers - mostly a replacement of
> --skip-cherry-pick-detection with --keep-cherry-pick (which mean the
> same thing).
> 
> [1] https://lore.kernel.org/git/xmqq4kuakjcn.fsf@gitster.c.googlers.com/
> ---
>  Documentation/git-rebase.txt | 21 +++++++++-
>  builtin/rebase.c             |  7 ++++
>  sequencer.c                  |  3 +-
>  sequencer.h                  |  2 +-
>  t/t3402-rebase-merge.sh      | 77 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 107 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> index 0c4f038dd6..f4f8afeb9a 100644
> --- a/Documentation/git-rebase.txt
> +++ b/Documentation/git-rebase.txt
> @@ -318,6 +318,21 @@ See also INCOMPATIBLE OPTIONS below.
>  +
>  See also INCOMPATIBLE OPTIONS below.
>  
> +--keep-cherry-pick::
> +--no-keep-cherry-pick::

I noticed that this _could_ have been simplified to

	--[no-]keep-cherry-pick::

but I also see several uses of either in our documentation. Do we
have a preference? By inspecting the lines before a "no-" string,
I see that some have these two lines, some use the [no-] pattern,
and others highlight the --no-<option> flag completely separately.

> +	Control rebase's behavior towards commits in the working
> +	branch that are already present upstream, i.e. cherry-picks.

I think the "already present upstream" is misleading. We don't rebase
things that are _reachable_ already, but this is probably better as

	Specify if rebase should include commits in the working branch
	that have diffs equivalent to other commits upstream. For example,
	a cherry-picked commit has an equivalent diff.

> ++
> +By default, these commits will be dropped. Because this necessitates
> +reading all upstream commits, this can be expensive in repos with a
> +large number of upstream commits that need to be read.

Now I'm confused. Are they dropped by default? Which option does what?
--keep-cherry-pick makes me think that cherry-picked commits will come
along for the rebase, so we will not check for them. But you have documented
that --no-keep-cherry-pick is the default.

(Also, I keep writing "--[no-]keep-cherry-picks" (plural) because that
seems more natural to me. Then I go back and fix it when I notice.)

> ++
> +If `--keep-cherry-pick is given`, all commits (including these) will be

Bad tick marks: "`--keep-cherry-pick` is given"

> +re-applied. This allows rebase to forgo reading all upstream commits,
> +potentially improving performance.

This reasoning is good. Could you also introduce a config option to make
--keep-cherry-pick the default? I would like to enable that option by
default in Scalar, but could also see partial clones wanting to enable that
by default, too.

> ++
> +See also INCOMPATIBLE OPTIONS below.
> +

Could we just say that his only applies with the --merge option? Why require
the jump to the end of the options section? (After writing this, I go look
at the rest of the doc file and see this is a common pattern.)

>  --rerere-autoupdate::
>  --no-rerere-autoupdate::
>  	Allow the rerere mechanism to update the index with the
> @@ -568,6 +583,9 @@ In addition, the following pairs of options are incompatible:
>   * --keep-base and --onto
>   * --keep-base and --root
>  
> +Also, the --keep-cherry-pick option requires the use of the merge backend
> +(e.g., through --merge).
> +

Will the command _fail_ if someone says --keep-cherry-pick without the merge
backend, or just have no effect? Also, specify the option with ticks and

	`--[no-]keep-cherry-pick`

It seems that none of the options in this section are back-ticked, which I think
is a doc bug.

>  BEHAVIORAL DIFFERENCES
>  -----------------------
>  
> @@ -866,7 +884,8 @@ Only works if the changes (patch IDs based on the diff contents) on
>  'subsystem' did.
>  
>  In that case, the fix is easy because 'git rebase' knows to skip
> -changes that are already present in the new upstream.  So if you say
> +changes that are already present in the new upstream (unless
> +`--keep-cherry-pick` is given). So if you say
>  (assuming you're on 'topic')
>  ------------
>      $ git rebase subsystem
> diff --git a/builtin/rebase.c b/builtin/rebase.c
> index 8081741f8a..626549b0b2 100644
> --- a/builtin/rebase.c
> +++ b/builtin/rebase.c
> @@ -88,6 +88,7 @@ struct rebase_options {
>  	struct strbuf git_format_patch_opt;
>  	int reschedule_failed_exec;
>  	int use_legacy_rebase;
> +	int keep_cherry_pick;
>  };
>  
>  #define REBASE_OPTIONS_INIT {			  	\
> @@ -381,6 +382,7 @@ static int run_rebase_interactive(struct rebase_options *opts,
>  	flags |= opts->rebase_cousins > 0 ? TODO_LIST_REBASE_COUSINS : 0;
>  	flags |= opts->root_with_onto ? TODO_LIST_ROOT_WITH_ONTO : 0;
>  	flags |= command == ACTION_SHORTEN_OIDS ? TODO_LIST_SHORTEN_IDS : 0;
> +	flags |= opts->keep_cherry_pick ? TODO_LIST_KEEP_CHERRY_PICK : 0;

Since opts->keep_cherry_pick is initialized as zero, did you change the default
behavior? Do we not have a test that verifies this behavior when using the merge
backend an no "--keep-cherry-pick" option?

If you initialize it to -1, then you can tell if the --no-keep-cherry-pick option
is specified, which is relevant to my concern below.

>  
>  	switch (command) {
>  	case ACTION_NONE: {
> @@ -1515,6 +1517,8 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>  		OPT_BOOL(0, "reschedule-failed-exec",
>  			 &reschedule_failed_exec,
>  			 N_("automatically re-schedule any `exec` that fails")),
> +		OPT_BOOL(0, "keep-cherry-pick", &options.keep_cherry_pick,
> +			 N_("apply all changes, even those already present upstream")),
>  		OPT_END(),
>  	};
>  	int i;
> @@ -1848,6 +1852,9 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>  			      "interactive or merge options"));
>  	}
>  
> +	if (options.keep_cherry_pick && !is_interactive(&options))
> +		die(_("--keep-cherry-pick does not work with the 'apply' backend"));
> +

I see you are failing here. Is this the right decision?

The apply backend will "keep" cherry-picks because it will not look for them upstream.
If anything, shouldn't it be that "--no-keep-cherry-pick" is incompatible?

>  	if (options.signoff) {
>  		if (options.type == REBASE_PRESERVE_MERGES)
>  			die("cannot combine '--signoff' with "
> diff --git a/sequencer.c b/sequencer.c
> index b9dbf1adb0..7bbb63f444 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -4800,12 +4800,13 @@ int sequencer_make_script(struct repository *r, struct strbuf *out, int argc,
>  	int keep_empty = flags & TODO_LIST_KEEP_EMPTY;
>  	const char *insn = flags & TODO_LIST_ABBREVIATE_CMDS ? "p" : "pick";
>  	int rebase_merges = flags & TODO_LIST_REBASE_MERGES;
> +	int keep_cherry_pick = flags & TODO_LIST_KEEP_CHERRY_PICK;
>  
>  	repo_init_revisions(r, &revs, NULL);
>  	revs.verbose_header = 1;
>  	if (!rebase_merges)
>  		revs.max_parents = 1;
> -	revs.cherry_mark = 1;
> +	revs.cherry_mark = !keep_cherry_pick;
>  	revs.limited = 1;
>  	revs.reverse = 1;
>  	revs.right_only = 1;
> diff --git a/sequencer.h b/sequencer.h
> index 9f9ae291e3..298b7de1c8 100644
> --- a/sequencer.h
> +++ b/sequencer.h
> @@ -148,7 +148,7 @@ int sequencer_remove_state(struct replay_opts *opts);
>   * `--onto`, we do not want to re-generate the root commits.
>   */
>  #define TODO_LIST_ROOT_WITH_ONTO (1U << 6)
> -
> +#define TODO_LIST_KEEP_CHERRY_PICK (1U << 7)
>  
>  int sequencer_make_script(struct repository *r, struct strbuf *out, int argc,
>  			  const char **argv, unsigned flags);
> diff --git a/t/t3402-rebase-merge.sh b/t/t3402-rebase-merge.sh
> index a1ec501a87..64200c5f20 100755
> --- a/t/t3402-rebase-merge.sh
> +++ b/t/t3402-rebase-merge.sh
> @@ -162,4 +162,81 @@ test_expect_success 'rebase --skip works with two conflicts in a row' '
>  	git rebase --skip
>  '
>  
> +test_expect_success '--keep-cherry-pick' '
> +	git init repo &&
> +
> +	# O(1-10) -- O(1-11) -- O(0-10) master
> +	#        \
> +	#         -- O(1-11) -- O(1-12) otherbranch
> +
> +	printf "Line %d\n" $(test_seq 1 10) >repo/file.txt &&
> +	git -C repo add file.txt &&
> +	git -C repo commit -m "base commit" &&
> +
> +	printf "Line %d\n" $(test_seq 1 11) >repo/file.txt &&
> +	git -C repo commit -a -m "add 11" &&
> +
> +	printf "Line %d\n" $(test_seq 0 10) >repo/file.txt &&
> +	git -C repo commit -a -m "add 0 delete 11" &&
> +
> +	git -C repo checkout -b otherbranch HEAD^^ &&
> +	printf "Line %d\n" $(test_seq 1 11) >repo/file.txt &&
> +	git -C repo commit -a -m "add 11 in another branch" &&
> +
> +	printf "Line %d\n" $(test_seq 1 12) >repo/file.txt &&
> +	git -C repo commit -a -m "add 12 in another branch" &&
> +
> +	# Regular rebase fails, because the 1-11 commit is deduplicated
> +	test_must_fail git -C repo rebase --merge master 2> err &&
> +	test_i18ngrep "error: could not apply.*add 12 in another branch" err &&
> +	git -C repo rebase --abort &&

OK. So here you are demonstrating that the --no-keep-cherry-pick is the
new default. Just trying to be sure that this was intended.

> +
> +	# With --keep-cherry-pick, it works
> +	git -C repo rebase --merge --keep-cherry-pick master
> +'
> +
> +test_expect_success '--keep-cherry-pick refrains from reading unneeded blobs' '
> +	git init server &&
> +
> +	# O(1-10) -- O(1-11) -- O(1-12) master
> +	#        \
> +	#         -- O(0-10) otherbranch
> +
> +	printf "Line %d\n" $(test_seq 1 10) >server/file.txt &&
> +	git -C server add file.txt &&
> +	git -C server commit -m "merge base" &&
> +
> +	printf "Line %d\n" $(test_seq 1 11) >server/file.txt &&
> +	git -C server commit -a -m "add 11" &&
> +
> +	printf "Line %d\n" $(test_seq 1 12) >server/file.txt &&
> +	git -C server commit -a -m "add 12" &&
> +
> +	git -C server checkout -b otherbranch HEAD^^ &&
> +	printf "Line %d\n" $(test_seq 0 10) >server/file.txt &&
> +	git -C server commit -a -m "add 0" &&
> +
> +	test_config -C server uploadpack.allowfilter 1 &&
> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> +
> +	git clone --filter=blob:none "file://$(pwd)/server" client &&
> +	git -C client checkout origin/master &&
> +	git -C client checkout origin/otherbranch &&
> +
> +	# Sanity check to ensure that the blobs from the merge base and "add
> +	# 11" are missing
> +	git -C client rev-list --objects --all --missing=print >missing_list &&
> +	MERGE_BASE_BLOB=$(git -C server rev-parse master^^:file.txt) &&
> +	ADD_11_BLOB=$(git -C server rev-parse master^:file.txt) &&
> +	grep "\\?$MERGE_BASE_BLOB" missing_list &&
> +	grep "\\?$ADD_11_BLOB" missing_list &&
> +
> +	git -C client rebase --merge --keep-cherry-pick origin/master &&
> +
> +	# The blob from the merge base had to be fetched, but not "add 11"
> +	git -C client rev-list --objects --all --missing=print >missing_list &&
> +	! grep "\\?$MERGE_BASE_BLOB" missing_list &&
> +	grep "\\?$ADD_11_BLOB" missing_list
> +'

I appreciate this test showing that this is accomplishing the goal in
a partial clone. 

Thanks,
-Stolee



  parent reply	other threads:[~2020-03-30 12:13 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-09 20:55 [PATCH] rebase --merge: optionally skip upstreamed commits Jonathan Tan
2020-03-10  2:10 ` Taylor Blau
2020-03-10 15:51   ` Jonathan Tan
2020-03-10 12:17 ` Johannes Schindelin
2020-03-10 16:00   ` Jonathan Tan
2020-03-10 18:56 ` Elijah Newren
2020-03-10 22:56   ` Jonathan Tan
2020-03-12 18:04     ` Jonathan Tan
2020-03-12 22:40       ` Elijah Newren
2020-03-14  8:04         ` Elijah Newren
2020-03-17  3:03           ` Jonathan Tan
2020-03-18 17:30 ` [PATCH v2] " Jonathan Tan
2020-03-18 18:47   ` Junio C Hamano
2020-03-18 19:28     ` Jonathan Tan
2020-03-18 19:55       ` Junio C Hamano
2020-03-18 20:41         ` Elijah Newren
2020-03-18 23:39           ` Junio C Hamano
2020-03-19  0:17             ` Elijah Newren
2020-03-18 20:20   ` Junio C Hamano
2020-03-26 17:50   ` Jonathan Tan
2020-03-26 19:17     ` Elijah Newren
2020-03-26 19:27     ` Junio C Hamano
2020-03-29 10:12   ` [PATCH v2 4/4] t3402: use POSIX compliant regex(7) Đoàn Trần Công Danh
2020-03-30  4:06 ` [PATCH v3] rebase --merge: optionally skip upstreamed commits Jonathan Tan
2020-03-30  5:09   ` Junio C Hamano
2020-03-30  5:22   ` Danh Doan
2020-03-30 12:13   ` Derrick Stolee [this message]
2020-03-30 16:49     ` Junio C Hamano
2020-03-30 16:57     ` Jonathan Tan
2020-03-31 11:55       ` Derrick Stolee
2020-03-31 16:27   ` Elijah Newren
2020-03-31 18:34     ` Junio C Hamano
2020-03-31 18:43       ` Junio C Hamano
2020-04-10 22:27     ` Jonathan Tan
2020-04-11  0:06       ` Elijah Newren
2020-04-11  1:11         ` Jonathan Tan
2020-04-11  2:46           ` Elijah Newren
  -- strict thread matches above, loose matches on Subject: below --
2020-03-26  7:35 [PATCH 0/3] add travis job for linux with musl libc Đoàn Trần Công Danh
2020-03-26  7:35 ` [PATCH 1/3] ci: libify logic for usage and checking CI_USER Đoàn Trần Công Danh
2020-03-26  7:35 ` [PATCH 2/3] ci: refactor docker runner script Đoàn Trần Công Danh
2020-03-26 16:06   ` Eric Sunshine
2020-03-28 17:53   ` SZEDER Gábor
2020-03-29  6:36     ` Danh Doan
2020-03-26  7:35 ` [PATCH 3/3] travis: build and test on Linux with musl libc and busybox Đoàn Trần Công Danh
2020-03-29  5:49 ` [PATCH 0/3] add travis job for linux with musl libc Junio C Hamano
2020-03-29 10:12 ` [PATCH v2 0/4] Travis + Azure jobs " Đoàn Trần Công Danh
2020-03-29 10:12   ` [PATCH v2 1/4] ci: libify logic for usage and checking CI_USER Đoàn Trần Công Danh
2020-03-29 10:12   ` [PATCH v2 2/4] ci: refactor docker runner script Đoàn Trần Công Danh
2020-04-01 21:51     ` SZEDER Gábor
2020-03-29 10:12   ` [PATCH v2 3/4] travis: build and test on Linux with musl libc and busybox Đoàn Trần Công Danh
2020-04-01 22:18     ` SZEDER Gábor
2020-04-02  1:42       ` Danh Doan
2020-04-07 14:53       ` Johannes Schindelin
2020-04-07 21:35         ` Junio C Hamano
2020-04-10 13:38           ` Johannes Schindelin
2020-03-29 16:23   ` [PATCH v2 0/4] Travis + Azure jobs for linux with musl libc Junio C Hamano
2020-04-02 13:03   ` [PATCH v3 0/6] " Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 1/6] ci: make MAKEFLAGS available inside the Docker container in the Linux32 job Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 2/6] ci/lib-docker: preserve required environment variables Đoàn Trần Công Danh
2020-04-03  8:22       ` SZEDER Gábor
2020-04-03 10:09         ` Danh Doan
2020-04-03 19:55           ` SZEDER Gábor
2020-04-02 13:04     ` [PATCH v3 3/6] ci/linux32: parameterise command to switch arch Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 4/6] ci: refactor docker runner script Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 5/6] ci/linux32: libify install-dependencies step Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 6/6] travis: build and test on Linux with musl libc and busybox Đoàn Trần Công Danh
2020-04-02 17:53     ` [PATCH v3 0/6] Travis + Azure jobs for linux with musl libc Junio C Hamano
2020-04-03  0:23       ` Danh Doan
2020-04-04  1:08   ` [PATCH v4 0/6] Travis " Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 1/6] ci: make MAKEFLAGS available inside the Docker container in the Linux32 job Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 2/6] ci/lib-docker: preserve required environment variables Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 3/6] ci/linux32: parameterise command to switch arch Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 4/6] ci: refactor docker runner script Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 5/6] ci/linux32: libify install-dependencies step Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 6/6] travis: build and test on Linux with musl libc and busybox Đoàn Trần Công Danh
2020-04-05 20:39     ` [PATCH v4 0/6] Travis jobs for linux with musl libc Junio C Hamano
2020-04-07 14:55       ` Johannes Schindelin
2020-04-07 19:25         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e917f451-c00a-c819-7f78-888ba6e8f5ea@gmail.com \
    --to=stolee@gmail.com \
    --cc=congdanhqx@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).