git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: newren@gmail.com
Cc: jonathantanmy@google.com, git@vger.kernel.org, stolee@gmail.com,
	git@jeffhostetler.com
Subject: Re: [PATCH] rebase --merge: optionally skip upstreamed commits
Date: Tue, 10 Mar 2020 15:56:41 -0700	[thread overview]
Message-ID: <20200310225641.96556-1-jonathantanmy@google.com> (raw)
In-Reply-To: <CABPp-BF17qYVZE6BWEh56QYKGojDG6yNz8QawX14XD8zeR=jig@mail.gmail.com>

> On Mon, Mar 9, 2020 at 1:58 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> >
> > When rebasing against an upstream that has had many commits since the
> > original branch was created:
> >
> >  O -- O -- ... -- O -- O (upstream)
> >   \
> >    -- O (my-dev-branch)
> >
> > because "git rebase" attempts to exclude commits that are duplicates of
> > upstream ones, it must read the contents of every novel upstream commit,
> > in addition to the tip of the upstream and the merge base. This can be a
> > significant performance hit, especially in a partial clone, wherein a
> > read of an object may end up being a fetch.
> 
> Does this suggest that the cherry-pick detection is suboptimal and
> needs to be improved?  When rebasing, it is typical that you are just
> rebasing a small number of patches compared to how many exist
> upstream.  As such, any upstream patch modifying files outside the set
> of files modified on the rebased side is known to not be PATCHSAME
> without looking at those new files.

That's true - and this would drastically reduce the fetches necessary in
partial clone, perhaps enough that we no longer need this check.

In the absence of partial clone, this also might improve performance
sufficiently, such that we no longer need my new option. (Or it might
not.)

> Or is the issue just the sheer
> number of upstream commits that modify only the files also modified on
> the rebased side is large?
> 
> > Add a flag to "git rebase" to allow suppression of this feature. This
> > flag only works when using the "merge" backend.
> 
> Interesting.  A little over a year ago we discussed not only making
> such a change in behavior, but making it the default; see
> https://public-inbox.org/git/nycvar.QRO.7.76.6.1901211635080.41@tvgsbejvaqbjf.bet/
> (from "Ooh, that's interesting" to "VFS for Git").

Thanks for the pointer! Dscho did propose again to make it the default
[1] and I replied that this can be done later [2].

[1] https://lore.kernel.org/git/nycvar.QRO.7.76.6.2003101315100.46@tvgsbejvaqbjf.bet/
[2] https://lore.kernel.org/git/20200310160035.20252-1-jonathantanmy@google.com/

> Yes, I've wanted to kill that performance overhead too even without
> partial clones, though I was slightly worried about cases of a known
> cherry-pick no longer cleanly applying and thus forcing the user to
> detect that it has become empty.  I guess that's why it's a flag
> instead of the new default, but there's something inside of me that
> asks why this special case is detected for the user when other
> conflict cases aren't...  Not sure if I'm just pigeonholing on
> performance too much.

I haven't dug into this, but the email you linked [3] shows that this
behavior was once-upon-a-time relied upon ("For example, when I did not
use GitGitGadget yet to submit patches..."). So I don't think we should
change it.

[3] https://public-inbox.org/git/nycvar.QRO.7.76.6.1901211635080.41@tvgsbejvaqbjf.bet/


> > I've only implemented this for the "merge" backend since I think that
> > there is an effort to migrate "rebase" to use the "merge" backend by
> > default, and also because "merge" uses diff internally which already has
> > the (per-commit) blob batch prefetching.
> 
> I understand the first half of your reason here, but I don't follow
> the second half.  The apply backend uses diff to generate the patches,
> but diff isn't the relevant operation here; it's the rev-list walking,
> and both call the exact same rev-list walk the last time I checked so
> I'm not sure what the difference is here.  Am I misunderstanding one
> or more things?

Maybe just ignore the second half :-)

I thought and wrote the second half because I noticed that somewhere in
the "am"-related code, blobs were being fetched one by one, but no such
thing was happening when I used the "merge" backend. The rev-list
walking doesn't access blobs, I believe.

> > +--skip-already-present::
> > +--no-skip-already-present::
> > +       Skip commits that are already present in the new upstream.
> > +       This is the default.
> > ++
> > +If the skip-if-already-present feature is unnecessary or undesired,
> > +`--no-skip-already-present` may improve performance since it avoids
> > +the need to read the contents of every commit in the new upstream.
> > +
> 
> I'm afraid the naming might be pretty opaque and confusing to users.
> Even if we do keep the names, it might help to be clearer about the
> ramifications.  And there's a missing reference to the option
> incompatibility.  Perhaps something like:
> 
> --skip-cherry-pick-detection
> --no-skip-cherry-pick-detection
> 
>     Whether rebase tries to determine if commits are already present
> upstream, i.e. if there are commits which are cherry-picks.  If such
> detection is done, any commits being rebased which are cherry-picks
> will be dropped, since those commits are already found upstream.  If
> such detection is not done, those commits will be re-applied, which
> most likely will result in no new changes (as the changes are already
> upstream) and result in the commit being dropped anyway.  cherry-pick
> detection is the default, but can be expensive in repos with a large
> number of upstream commits that need to be read.
> 
> See also INCOMPATIBLE OPTIONS below.

I understand that commits being already present in upstream is usually
due to cherry-picking, but I don't think that's always the case, so
perhaps there is some imprecision here. But this might be better - in
particular, documentation and code will not be so clumsy (the "no", or
0, is the status quo, and the lack of "no", or 1, requires special
handling).

> > +       if (!options.skip_already_present && !is_interactive(&options))
> > +               die(_("--no-skip-already-present does not work with the 'am' backend"));
> > +
> 
> with the *apply* backend, not the 'am' one (the backend was renamed in
> commit 10cdb9f38a ("rebase: rename the two primary rebase backends",
> 2020-02-15))

Thanks. Will do.

> Should there be a config setting to flip the default?  And should
> feature.experimental and/or feature.manyFiles enable it by default?

As above, I think this can be done in a separate patch.

  reply	other threads:[~2020-03-10 22:56 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-09 20:55 [PATCH] rebase --merge: optionally skip upstreamed commits Jonathan Tan
2020-03-10  2:10 ` Taylor Blau
2020-03-10 15:51   ` Jonathan Tan
2020-03-10 12:17 ` Johannes Schindelin
2020-03-10 16:00   ` Jonathan Tan
2020-03-10 18:56 ` Elijah Newren
2020-03-10 22:56   ` Jonathan Tan [this message]
2020-03-12 18:04     ` Jonathan Tan
2020-03-12 22:40       ` Elijah Newren
2020-03-14  8:04         ` Elijah Newren
2020-03-17  3:03           ` Jonathan Tan
2020-03-18 17:30 ` [PATCH v2] " Jonathan Tan
2020-03-18 18:47   ` Junio C Hamano
2020-03-18 19:28     ` Jonathan Tan
2020-03-18 19:55       ` Junio C Hamano
2020-03-18 20:41         ` Elijah Newren
2020-03-18 23:39           ` Junio C Hamano
2020-03-19  0:17             ` Elijah Newren
2020-03-18 20:20   ` Junio C Hamano
2020-03-26 17:50   ` Jonathan Tan
2020-03-26 19:17     ` Elijah Newren
2020-03-26 19:27     ` Junio C Hamano
2020-03-29 10:12   ` [PATCH v2 4/4] t3402: use POSIX compliant regex(7) Đoàn Trần Công Danh
2020-03-30  4:06 ` [PATCH v3] rebase --merge: optionally skip upstreamed commits Jonathan Tan
2020-03-30  5:09   ` Junio C Hamano
2020-03-30  5:22   ` Danh Doan
2020-03-30 12:13   ` Derrick Stolee
2020-03-30 16:49     ` Junio C Hamano
2020-03-30 16:57     ` Jonathan Tan
2020-03-31 11:55       ` Derrick Stolee
2020-03-31 16:27   ` Elijah Newren
2020-03-31 18:34     ` Junio C Hamano
2020-03-31 18:43       ` Junio C Hamano
2020-04-10 22:27     ` Jonathan Tan
2020-04-11  0:06       ` Elijah Newren
2020-04-11  1:11         ` Jonathan Tan
2020-04-11  2:46           ` Elijah Newren
  -- strict thread matches above, loose matches on Subject: below --
2020-03-26  7:35 [PATCH 0/3] add travis job for linux with musl libc Đoàn Trần Công Danh
2020-03-26  7:35 ` [PATCH 1/3] ci: libify logic for usage and checking CI_USER Đoàn Trần Công Danh
2020-03-26  7:35 ` [PATCH 2/3] ci: refactor docker runner script Đoàn Trần Công Danh
2020-03-26 16:06   ` Eric Sunshine
2020-03-28 17:53   ` SZEDER Gábor
2020-03-29  6:36     ` Danh Doan
2020-03-26  7:35 ` [PATCH 3/3] travis: build and test on Linux with musl libc and busybox Đoàn Trần Công Danh
2020-03-29  5:49 ` [PATCH 0/3] add travis job for linux with musl libc Junio C Hamano
2020-03-29 10:12 ` [PATCH v2 0/4] Travis + Azure jobs " Đoàn Trần Công Danh
2020-03-29 10:12   ` [PATCH v2 1/4] ci: libify logic for usage and checking CI_USER Đoàn Trần Công Danh
2020-03-29 10:12   ` [PATCH v2 2/4] ci: refactor docker runner script Đoàn Trần Công Danh
2020-04-01 21:51     ` SZEDER Gábor
2020-03-29 10:12   ` [PATCH v2 3/4] travis: build and test on Linux with musl libc and busybox Đoàn Trần Công Danh
2020-04-01 22:18     ` SZEDER Gábor
2020-04-02  1:42       ` Danh Doan
2020-04-07 14:53       ` Johannes Schindelin
2020-04-07 21:35         ` Junio C Hamano
2020-04-10 13:38           ` Johannes Schindelin
2020-03-29 16:23   ` [PATCH v2 0/4] Travis + Azure jobs for linux with musl libc Junio C Hamano
2020-04-02 13:03   ` [PATCH v3 0/6] " Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 1/6] ci: make MAKEFLAGS available inside the Docker container in the Linux32 job Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 2/6] ci/lib-docker: preserve required environment variables Đoàn Trần Công Danh
2020-04-03  8:22       ` SZEDER Gábor
2020-04-03 10:09         ` Danh Doan
2020-04-03 19:55           ` SZEDER Gábor
2020-04-02 13:04     ` [PATCH v3 3/6] ci/linux32: parameterise command to switch arch Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 4/6] ci: refactor docker runner script Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 5/6] ci/linux32: libify install-dependencies step Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 6/6] travis: build and test on Linux with musl libc and busybox Đoàn Trần Công Danh
2020-04-02 17:53     ` [PATCH v3 0/6] Travis + Azure jobs for linux with musl libc Junio C Hamano
2020-04-03  0:23       ` Danh Doan
2020-04-04  1:08   ` [PATCH v4 0/6] Travis " Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 1/6] ci: make MAKEFLAGS available inside the Docker container in the Linux32 job Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 2/6] ci/lib-docker: preserve required environment variables Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 3/6] ci/linux32: parameterise command to switch arch Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 4/6] ci: refactor docker runner script Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 5/6] ci/linux32: libify install-dependencies step Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 6/6] travis: build and test on Linux with musl libc and busybox Đoàn Trần Công Danh
2020-04-05 20:39     ` [PATCH v4 0/6] Travis jobs for linux with musl libc Junio C Hamano
2020-04-07 14:55       ` Johannes Schindelin
2020-04-07 19:25         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200310225641.96556-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).