git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Marc Balmer <marc@msys.ch>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Tom Clarkson <tqclarkson@icloud.com>,
	"Strain, Roger L." <roger.strain@swri.org>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	"gitster@pobox.com" <gitster@pobox.com>,
	"ns@nadavsinai.com" <ns@nadavsinai.com>,
	"pclouds@gmail.com" <pclouds@gmail.com>
Subject: Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
Date: Thu, 12 Mar 2020 11:40:09 +0100	[thread overview]
Message-ID: <6FCBB30F-4557-46E4-8255-B9746887F151@msys.ch> (raw)
In-Reply-To: <BAB4CF6D-6904-4698-ACE1-EBEEC745E569@msys.ch>

G'day

Due to some issue in git subtree, a subtree push pushed all commits (over 8000) ever done to the main repository.  So the history of the subtree'ed repository not only showed commits done to the particular subtree, but all commits in the whole project (see E-mail exchange below).

Today we decided to no longer use subtrees, but to use two independend repository and managing merges manually.

How can we get rid of a subtree cache data?  Is it enough to remove the .git/subtree-cache directory?  Or is that dangerous?  Does git-subtree store any data anywhere else?

Thanks and regards,
Marc


> Am 14.12.2019 um 14:59 schrieb Marc Balmer <marc@msys.ch>:
> 
> 
> 
>> Am 14.12.2019 um 09:29 schrieb Marc Balmer <marc@msys.ch>:
>> 
>> 
>> 
>>> Am 13.12.2019 um 14:41 schrieb Johannes Schindelin <Johannes.Schindelin@gmx.de>:
>>> 
>>> Hi Tom,
>>> 
>>> On Thu, 12 Dec 2019, Tom Clarkson wrote:
>>> 
>>>> 
>>>>> This makes me wonder if the problem is perhaps related to the hardware
>>>>> involved; maybe the algorithm is doing exactly what it should, but the
>>>>> available RAM isn't sufficient. If that's the problem, perhaps we could
>>>>> find a way to perform the recursive work without using actual
>>>>> recursion, reducing the number of instances on the stack.
>>>> 
>>>> It’s not so much hardware as OS I think - After adding stack depth (the indent parameter on check_parents) to the log, I have been able to get different results with ulimit settings.
>>> 
>>> Do you mean to say that the stack overflow is reported as a segmentation
>>> fault? If so, that message was sure a red herring...
>> 
>> FWIW, changing the stack limit using ulimit does not change anything on my (Fedora) system.  At some point, with exactly the same two leading numbers (those separated with a /)it seems to enter an endless (recursive) loop, eventually eating up all memory.  And then, after some time, it segfaults.
>> 
>> 
> 
> So today I tried a subtree push again.  It took hours.... Then it pushed every single commit that was ever done to repository.
> 
> I can definitely say that git subtree is totally broken and unusable at this moment.
> 
> We will now split out what once was a subtree into a proper repository of it's own.  git subtree was a nice idea, but it does not work.
> 
>>> 
>>> Thanks,
>>> Dscho
>>> 
>>>> 
>>>> With the default stack size on macOS of 8MB, It falls over at depth 445. Being less than the shortest path to the root commit, that matches my initial count, which was just the number of lines in the log.
>>>> 
>>>> Reducing the stack size with ulimit -s 4096 makes it fall over at 225
>>>> 
>>>> Increasing to the hard limit of 64MB should allow a depth of around 4000, and as it turns out that did allow the script to complete, reaching a maximum depth of 1148.
>>>> 
>>>> I’m not seeing any issues with the hashes being wrong (all show no parents or subtree) but processing all those commits that resolve to nothing does take forever.
>>>> 
>>>> The mainline commit test seems to work ok on my repo, but it’s fairly easy to see scenarios where it would break, such as having a  subfolder with the same name within the subtree.
>>>> 
>>>> So while part of the fix will be a more reliable test, it also needs to work before parent commits are processed to mitigate the recursion issues.
>>>> 
>>>> The rules I have  come up with so far are below. There are still scenarios where the recursion is unavoidable such as running an initial split on a large repo, but that should be much less common than using a small subtree with a more complex existing repo.
>>>> 
>>>> In the initial setup of cmd_split, collect some extra information:
>>>> 
>>>> 	- Add rev-list of all git-subtree-split values to the cache. I’d expect subtrees to usually be smaller than mainline, but since we can do that non-recursively we may as well.
>>>> 
>>>> 	- Find the git-subtree-mainline value from subtree add/rejoin. Anything in its rev list should only be reachable by mainline commits. If not (which probably requires doing something convoluted like having subtree include mainline as its own subtree), this is a good place to check that and fall back to the existing behavior.
>>>> 
>>>> 
>>>> When processing each commit:
>>>> 
>>>> If no prior splits were found, we only have mainline commits.
>>>> 
>>>> 	- If $dir exists, it is a mainline commit needing copy - use existing process.
>>>> 	- If $dir does not exist, it is a mainline commit that will map to nothing - no need to process further.
>>>> 
>>>> If we do have some known subtree commits:
>>>> 
>>>> 	- If it is in the cache, it is a subtree commit we don’t need to process further.
>>>> 	- If subtree root is not reachable (rev-list or merge-base), must be mainline pre subtree add. Map to nothing and skip further processing.
>>>> 	- If any subtree root is reachable, could be either mainline commit with subtree merged in, or subtree commit newer than the last add/squash (subtree pull/merge without squash does not use a custom commit message)
>>>> 		- If $dir does not exist, must be subtree - add to the cache as mapped to self, no need to process parents.
>>>> 		- If the folder does exist, it is  either a mainline commit to be processed normally, or a subtree that happens to contain a folder with the same name.  Check if mainline root is reachable.


  parent reply	other threads:[~2020-03-12 10:40 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-08 10:30 Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed Nadav SInai
2019-12-09 14:11 ` Strain, Roger L.
2019-12-09 11:45   ` Ed Maste
2019-12-09 16:19     ` Strain, Roger L.
2019-12-09 14:13   ` Marc Balmer
2019-12-09 14:18     ` Strain, Roger L.
2019-12-09 14:30       ` Marc Balmer
2019-12-09 15:26         ` Johannes Schindelin
2019-12-09 15:31           ` Marc Balmer
2019-12-09 19:38             ` Johannes Schindelin
2019-12-11  5:43               ` Tom Clarkson
2019-12-11 14:39                 ` Strain, Roger L.
2019-12-12  5:02                   ` Tom Clarkson
2019-12-13 13:41                     ` Johannes Schindelin
2019-12-14  8:29                       ` Marc Balmer
     [not found]                         ` <BAB4CF6D-6904-4698-ACE1-EBEEC745E569@msys.ch>
2019-12-14 14:27                           ` Tom Clarkson
2019-12-16 11:30                             ` Ed Maste
2019-12-18  0:15                               ` Tom Clarkson
2020-03-12 10:40                           ` Marc Balmer [this message]
2019-12-16  3:50                     ` Tom Clarkson
     [not found] <3E84DE22-9614-4E1B-9717-69F6777DD219@msys.ch>
2020-03-12 10:43 ` Tom Clarkson
  -- strict thread matches above, loose matches on Subject: below --
2018-12-31 10:28 Marc Balmer
2018-12-31 10:51 ` Duy Nguyen
2018-12-31 11:12   ` Marc Balmer
2018-12-31 11:20     ` Duy Nguyen
2018-12-31 11:24       ` Marc Balmer
2018-12-31 11:36         ` Duy Nguyen
2018-12-31 12:31           ` Marc Balmer
2019-01-01 13:19             ` Duy Nguyen
2019-01-02  9:13               ` Marc Balmer
2019-01-02 20:20         ` Strain, Roger L.
2019-01-03 13:50           ` Johannes Schindelin
2019-01-03 15:30             ` Strain, Roger L.

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6FCBB30F-4557-46E4-8255-B9746887F151@msys.ch \
    --to=marc@msys.ch \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ns@nadavsinai.com \
    --cc=pclouds@gmail.com \
    --cc=roger.strain@swri.org \
    --cc=tqclarkson@icloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).