git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: gitster@pobox.com
Cc: jonathantanmy@google.com, git@vger.kernel.org, me@ttaylorr.com,
	Johannes.Schindelin@gmx.de, newren@gmail.com
Subject: Re: [PATCH v2] rebase --merge: optionally skip upstreamed commits
Date: Wed, 18 Mar 2020 12:28:21 -0700	[thread overview]
Message-ID: <20200318192821.43808-1-jonathantanmy@google.com> (raw)
In-Reply-To: <xmqqpnd9fql0.fsf@gitster.c.googlers.com>

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > When rebasing against an upstream that has had many commits since the
> > original branch was created:
> >
> >  O -- O -- ... -- O -- O (upstream)
> >   \
> >    -- O (my-dev-branch)
> >
> > it must read the contents of every novel upstream commit, in addition to
> > the tip of the upstream and the merge base, because "git rebase"
> > attempts to exclude commits that are duplicates of upstream ones. This
> > can be a significant performance hit, especially in a partial clone,
> > wherein a read of an object may end up being a fetch.
> 
> OK.  I presume that we do this by comparing patch IDs?

Yes.

> Total disabling would of course is OK as a feature, especially for
> the first cut, but I wonder if it would be a reasonable idea to use
> some heuristic to keep the current "filter the same change" feature
> as much as possible but optimize it by filtering the novel upstream
> commits without hitting their trees and blobs (I am assuming that
> you at least are aware of and have the commit objects on the
> upstream side).
> 
> The most false-negative-prone approach is just to compare the
> <author ident, author timestamp> of a candidate upstream commit with
> what you have---if that author does not appear on my-dev-branch, it
> is very unlikely that your change has been accepted upstream.  Of
> course, two people who independently discover the same solution is
> not all that rare, so it does risk false-negative to take too little
> clue from the commits to compare, but at least it is not worse than
> what you are proposing here ;-)  And if one of your commits on
> my-dev-branch _might_ be identical to one of the novel upstream ones,
> at that point, we could dig deeper to actually compute the patch ID
> by fetching the upstream's tree.

As far as I know, the existing patch ID behavior is only based on the
patch contents, so if there was any author name or time rewriting (or if
two people independently discovered the same solution, as you wrote),
then the behavior would be different. Apart from that, this does sound
like a cheap thing to compare before comparing the diff.

Elijah Newren suggested and I investigated another approach of using a
filename-only diff as a first approximation. The relevant quotations and
explanations are in my email here [1].

[1] https://lore.kernel.org/git/20200312180427.192096-1-jonathantanmy@google.com/

> That's all totally outside the scope of this patch.  It is just a
> random thought to see if anybody wants to pursue to make the topic
> even better, possible after it lands.

OK.

> > New in V2: changed parameter name, used Taylor's commit message
> > suggestions, and used Elijah's documentation suggestions.
> 
> Hmph, what was it called earlier?  My gut reaction without much
> thinking finds --no-skip-* a bit confusing double-negation and
> suspect "--[no-]detect-cherry-pick" (which defaults to true for
> backward compatibility) may feel more natural, but I suspect (I do
> not recall details of the discussion on v1) it has been already
> discussed and people found --no-skip-* is OK (in which case I won't
> object)?

It was earlier called "--{,no-}skip-already-present" (with the opposite
meaning, and thus, --skip-already-present is the default), so the double
negative has always existed. "--detect-cherry-pick" might be a better
idea...I'll wait to see if anybody else has an opinion.

> I also wonder if --detect-cherry-pick=(yes|no|auto) may give a
> better end-user experience, with "auto" meaning "do run patch-ID
> based filtering, but if we know it will be expensive (e.g. the
> repository is sparsely cloned), please skip it".  That way, there
> may appear other reasons that makes patch-ID computation expensive
> now or in the fiture, and the users are automatically covered.

It might be better to have predictability, and for "auto", I don't know
if we can have a simple and explainable set of rules as to when to use
patch-ID-based filtering - for example, in a partial clone with no
blobs, I would normally want no patch-ID-based filtering, but in a
partial clone with only a blob size limit, I probably will still want
patch-ID-based filtering.

  reply	other threads:[~2020-03-18 19:28 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-09 20:55 [PATCH] rebase --merge: optionally skip upstreamed commits Jonathan Tan
2020-03-10  2:10 ` Taylor Blau
2020-03-10 15:51   ` Jonathan Tan
2020-03-10 12:17 ` Johannes Schindelin
2020-03-10 16:00   ` Jonathan Tan
2020-03-10 18:56 ` Elijah Newren
2020-03-10 22:56   ` Jonathan Tan
2020-03-12 18:04     ` Jonathan Tan
2020-03-12 22:40       ` Elijah Newren
2020-03-14  8:04         ` Elijah Newren
2020-03-17  3:03           ` Jonathan Tan
2020-03-18 17:30 ` [PATCH v2] " Jonathan Tan
2020-03-18 18:47   ` Junio C Hamano
2020-03-18 19:28     ` Jonathan Tan [this message]
2020-03-18 19:55       ` Junio C Hamano
2020-03-18 20:41         ` Elijah Newren
2020-03-18 23:39           ` Junio C Hamano
2020-03-19  0:17             ` Elijah Newren
2020-03-18 20:20   ` Junio C Hamano
2020-03-26 17:50   ` Jonathan Tan
2020-03-26 19:17     ` Elijah Newren
2020-03-26 19:27     ` Junio C Hamano
2020-03-29 10:12   ` [PATCH v2 4/4] t3402: use POSIX compliant regex(7) Đoàn Trần Công Danh
2020-03-30  4:06 ` [PATCH v3] rebase --merge: optionally skip upstreamed commits Jonathan Tan
2020-03-30  5:09   ` Junio C Hamano
2020-03-30  5:22   ` Danh Doan
2020-03-30 12:13   ` Derrick Stolee
2020-03-30 16:49     ` Junio C Hamano
2020-03-30 16:57     ` Jonathan Tan
2020-03-31 11:55       ` Derrick Stolee
2020-03-31 16:27   ` Elijah Newren
2020-03-31 18:34     ` Junio C Hamano
2020-03-31 18:43       ` Junio C Hamano
2020-04-10 22:27     ` Jonathan Tan
2020-04-11  0:06       ` Elijah Newren
2020-04-11  1:11         ` Jonathan Tan
2020-04-11  2:46           ` Elijah Newren
  -- strict thread matches above, loose matches on Subject: below --
2020-03-26  7:35 [PATCH 0/3] add travis job for linux with musl libc Đoàn Trần Công Danh
2020-03-26  7:35 ` [PATCH 1/3] ci: libify logic for usage and checking CI_USER Đoàn Trần Công Danh
2020-03-26  7:35 ` [PATCH 2/3] ci: refactor docker runner script Đoàn Trần Công Danh
2020-03-26 16:06   ` Eric Sunshine
2020-03-28 17:53   ` SZEDER Gábor
2020-03-29  6:36     ` Danh Doan
2020-03-26  7:35 ` [PATCH 3/3] travis: build and test on Linux with musl libc and busybox Đoàn Trần Công Danh
2020-03-29  5:49 ` [PATCH 0/3] add travis job for linux with musl libc Junio C Hamano
2020-03-29 10:12 ` [PATCH v2 0/4] Travis + Azure jobs " Đoàn Trần Công Danh
2020-03-29 10:12   ` [PATCH v2 1/4] ci: libify logic for usage and checking CI_USER Đoàn Trần Công Danh
2020-03-29 10:12   ` [PATCH v2 2/4] ci: refactor docker runner script Đoàn Trần Công Danh
2020-04-01 21:51     ` SZEDER Gábor
2020-03-29 10:12   ` [PATCH v2 3/4] travis: build and test on Linux with musl libc and busybox Đoàn Trần Công Danh
2020-04-01 22:18     ` SZEDER Gábor
2020-04-02  1:42       ` Danh Doan
2020-04-07 14:53       ` Johannes Schindelin
2020-04-07 21:35         ` Junio C Hamano
2020-04-10 13:38           ` Johannes Schindelin
2020-03-29 16:23   ` [PATCH v2 0/4] Travis + Azure jobs for linux with musl libc Junio C Hamano
2020-04-02 13:03   ` [PATCH v3 0/6] " Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 1/6] ci: make MAKEFLAGS available inside the Docker container in the Linux32 job Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 2/6] ci/lib-docker: preserve required environment variables Đoàn Trần Công Danh
2020-04-03  8:22       ` SZEDER Gábor
2020-04-03 10:09         ` Danh Doan
2020-04-03 19:55           ` SZEDER Gábor
2020-04-02 13:04     ` [PATCH v3 3/6] ci/linux32: parameterise command to switch arch Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 4/6] ci: refactor docker runner script Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 5/6] ci/linux32: libify install-dependencies step Đoàn Trần Công Danh
2020-04-02 13:04     ` [PATCH v3 6/6] travis: build and test on Linux with musl libc and busybox Đoàn Trần Công Danh
2020-04-02 17:53     ` [PATCH v3 0/6] Travis + Azure jobs for linux with musl libc Junio C Hamano
2020-04-03  0:23       ` Danh Doan
2020-04-04  1:08   ` [PATCH v4 0/6] Travis " Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 1/6] ci: make MAKEFLAGS available inside the Docker container in the Linux32 job Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 2/6] ci/lib-docker: preserve required environment variables Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 3/6] ci/linux32: parameterise command to switch arch Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 4/6] ci: refactor docker runner script Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 5/6] ci/linux32: libify install-dependencies step Đoàn Trần Công Danh
2020-04-04  1:08     ` [PATCH v4 6/6] travis: build and test on Linux with musl libc and busybox Đoàn Trần Công Danh
2020-04-05 20:39     ` [PATCH v4 0/6] Travis jobs for linux with musl libc Junio C Hamano
2020-04-07 14:55       ` Johannes Schindelin
2020-04-07 19:25         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200318192821.43808-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).