git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Tom Clarkson <tqclarkson@icloud.com>
To: Ed Maste <emaste@freebsd.org>
Cc: git mailing list <git@vger.kernel.org>
Subject: Re: git-subtree split misbehaviour with a commit having empty ls-tree for the specified subdir
Date: Thu, 18 Jun 2020 11:13:02 +1000	[thread overview]
Message-ID: <5CD94CF2-48D4-4EFD-9581-625E6C117F89@icloud.com> (raw)
In-Reply-To: <CAPyFy2CMSGwPgGLh2Jbfvuf8oRBcvZ1LRv-m7AVvPybtpEybnw@mail.gmail.com>


> On 18 Jun 2020, at 12:46 am, Ed Maste <emaste@freebsd.org> wrote:
> 
> On Fri, 20 Dec 2019 at 10:56, Ed Maste <emaste@freebsd.org> wrote:
>> 
>> On Wed, 18 Dec 2019 at 19:57, Tom Clarkson <tqclarkson@icloud.com> wrote:
>>> 
>>>> Overall I think your proposed algorithm is reasonable (even though I
>>>> think it won't address some of the cases in our repo). Will your
>>>> algorithm allow us to pass $dir to git rev-list, for the initial
>>>> split?
>>> 
>>> Is this just for performance reasons? As I understand it that was left out because it would exclude relevant commits on an existing subtree, but it could make sense as an optimization for the first split of a large repo.
>> 
>> Yes, it's for performance reasons on a first split that I'd like to
>> see it. On the FreeBSD repo the difference is some 40 minutes vs. a
>> few seconds.
> 
> Following up on this old thread, I plan to revisit the optimization,
> implementing something on top of your work in
> https://github.com/gitgitgadget/git/pull/493. I might look at adding a
> --initial flag to subtree split, having it essentially auto-detect a
> revision to use as the value for --onto. For the common case of an
> initial merge commit with two parents I think we can relatively easily
> determine which is the subtree parent. If that's not sufficiently
> general (or broadly useful outside of our context) we could just
> create a helper script wrapping `subtree split` tailored to the
> FreeBSD cases. We have something like 100 projects we're looking to
> split, as part of our svn to git migration.

The new use command might be a better fit than onto in this case - it does the same thing as onto, except it also marks the commit as processed and therefore excludes them from the initial rev list.

Actually, on reading the code, I’m not sure onto does quite what the documentation suggests it does - by updating the cache it will shortcut processing of subtree commits that have already been merged into mainline, but has no mechanism for building onto an existing unrelated history.

Reliably differentiating subtree and mainline commits has always been tricky, but should be ok as part of an advanced flag/new command. Perhaps rev-list --merges <path> to find potential unmarked subtree merges, then take the one where the root tree matches the post merge subdir tree. No doubt it won’t catch everything, but I’d say that’s less of a risk than false positives.

In the context of a helper script, a new command or adding a --auto flag to use might be better than adding a flag to split - that way you could easily tell if the expected initial state was found rather than having to wait for the full process to produce something weird. 

That would also let you mark the other side of the merge as ignored mainline history - a significant optimization when you’re excluding 200k commits, but risky to include more generally.


      reply	other threads:[~2020-06-18  2:19 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-22 16:55 git-subtree split misbehaviour with a commit having empty ls-tree for the specified subdir Ed Maste
2019-12-18  0:17 ` Tom Clarkson
2019-12-18 10:23   ` Ed Maste
2019-12-19  0:57     ` Tom Clarkson
2019-12-20 15:56       ` Ed Maste
2019-12-22 14:01         ` Tom Clarkson
2020-01-21 22:36           ` Ed Maste
     [not found]         ` <DB65AE2F-12DE-43B7-8B20-4E173794CAF2@icloud.com>
2020-04-28 18:08           ` Ed Maste
2020-06-17 14:46         ` Ed Maste
2020-06-18  1:13           ` Tom Clarkson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5CD94CF2-48D4-4EFD-9581-625E6C117F89@icloud.com \
    --to=tqclarkson@icloud.com \
    --cc=emaste@freebsd.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).