git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
@ 2018-12-31 10:28 Marc Balmer
  2018-12-31 10:51 ` Duy Nguyen
  0 siblings, 1 reply; 33+ messages in thread
From: Marc Balmer @ 2018-12-31 10:28 UTC (permalink / raw)
  To: git; +Cc: roger.strain, gitster

Hi

One of the last three commits in git-subtree.sh introduced a regression leading to a segfault.

Here is the error message when I try to split out my i18n files:

$ git subtree split --prefix=i18n
cache for e39a2a0c6431773a5d831eb3cb7f1cd40d0da623 already exists!
   (Lots of output omitted)
436/627 (1819) [1455]       <- Stays at 436/ while the numbers in () and [] increase, then segfaults:
/usr/libexec/git-core/git-subtree: line 751: 54693 Done                    eval "$grl"
    54694 Segmentation fault      (core dumped) | while read rev parents; do
   process_split_commit "$rev" "$parents" 0;
done

Please note that this regression can not easily be reproduced, normally a subtree split just works.

Reverting the last three commits "fixes" the issue.  So I kindly ask the last three commits to be reverted.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2018-12-31 10:28 Marc Balmer
@ 2018-12-31 10:51 ` Duy Nguyen
  2018-12-31 11:12   ` Marc Balmer
  0 siblings, 1 reply; 33+ messages in thread
From: Duy Nguyen @ 2018-12-31 10:51 UTC (permalink / raw)
  To: Marc Balmer; +Cc: Git Mailing List, roger.strain, Junio C Hamano

On Mon, Dec 31, 2018 at 5:44 PM Marc Balmer <marc@msys.ch> wrote:
>
> Hi
>
> One of the last three commits in git-subtree.sh introduced a regression leading to a segfault.
>
> Here is the error message when I try to split out my i18n files:
>
> $ git subtree split --prefix=i18n
> cache for e39a2a0c6431773a5d831eb3cb7f1cd40d0da623 already exists!
>    (Lots of output omitted)
> 436/627 (1819) [1455]       <- Stays at 436/ while the numbers in () and [] increase, then segfaults:
> /usr/libexec/git-core/git-subtree: line 751: 54693 Done                    eval "$grl"
>     54694 Segmentation fault      (core dumped) | while read rev parents; do

Do you still have this core dump? Could you run it and see if it's
"git" that crashed (and where) or "sh"?

>    process_split_commit "$rev" "$parents" 0;
> done
>
> Please note that this regression can not easily be reproduced, normally a subtree split just works.
>
> Reverting the last three commits "fixes" the issue.  So I kindly ask the last three commits to be reverted.

Please provide the SHA-1 of the "good" commit you tested.
-- 
Duy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2018-12-31 10:51 ` Duy Nguyen
@ 2018-12-31 11:12   ` Marc Balmer
  2018-12-31 11:20     ` Duy Nguyen
  0 siblings, 1 reply; 33+ messages in thread
From: Marc Balmer @ 2018-12-31 11:12 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, roger.strain, Junio C Hamano



> Am 31.12.2018 um 11:51 schrieb Duy Nguyen <pclouds@gmail.com>:
> 
> On Mon, Dec 31, 2018 at 5:44 PM Marc Balmer <marc@msys.ch> wrote:
>> 
>> Hi
>> 
>> One of the last three commits in git-subtree.sh introduced a regression leading to a segfault.
>> 
>> Here is the error message when I try to split out my i18n files:
>> 
>> $ git subtree split --prefix=i18n
>> cache for e39a2a0c6431773a5d831eb3cb7f1cd40d0da623 already exists!
>>   (Lots of output omitted)
>> 436/627 (1819) [1455]       <- Stays at 436/ while the numbers in () and [] increase, then segfaults:
>> /usr/libexec/git-core/git-subtree: line 751: 54693 Done                    eval "$grl"
>>    54694 Segmentation fault      (core dumped) | while read rev parents; do
> 
> Do you still have this core dump? Could you run it and see if it's
> "git" that crashed (and where) or "sh"?

It is /usr/bin/bash that segfaults.  My guess is, that it runs out of memory (as described above, git-subtree enters an infinite loop untils it segafults).

> 
>>   process_split_commit "$rev" "$parents" 0;
>> done
>> 
>> Please note that this regression can not easily be reproduced, normally a subtree split just works.
>> 
>> Reverting the last three commits "fixes" the issue.  So I kindly ask the last three commits to be reverted.
> 
> Please provide the SHA-1 of the "good" commit you tested.

I reverted these three commits (actually the last three commits to contrib/subtree/git-subtree.sh):

19ad68d95d6f8104eca1e86f8d1dfae50c7fb268
68f8ff81513fb3599ef3dfc3dd11da36d868e91b
315a84f9aa0e2e629b0680068646b0032518ebed

And then it worked.

- Marc

-- 
> Duy


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2018-12-31 11:12   ` Marc Balmer
@ 2018-12-31 11:20     ` Duy Nguyen
  2018-12-31 11:24       ` Marc Balmer
  0 siblings, 1 reply; 33+ messages in thread
From: Duy Nguyen @ 2018-12-31 11:20 UTC (permalink / raw)
  To: Marc Balmer; +Cc: Git Mailing List, roger.strain, Junio C Hamano

On Mon, Dec 31, 2018 at 6:13 PM Marc Balmer <marc@msys.ch> wrote:
>
>
>
> > Am 31.12.2018 um 11:51 schrieb Duy Nguyen <pclouds@gmail.com>:
> >
> > On Mon, Dec 31, 2018 at 5:44 PM Marc Balmer <marc@msys.ch> wrote:
> >>
> >> Hi
> >>
> >> One of the last three commits in git-subtree.sh introduced a regression leading to a segfault.
> >>
> >> Here is the error message when I try to split out my i18n files:
> >>
> >> $ git subtree split --prefix=i18n
> >> cache for e39a2a0c6431773a5d831eb3cb7f1cd40d0da623 already exists!
> >>   (Lots of output omitted)
> >> 436/627 (1819) [1455]       <- Stays at 436/ while the numbers in () and [] increase, then segfaults:
> >> /usr/libexec/git-core/git-subtree: line 751: 54693 Done                    eval "$grl"
> >>    54694 Segmentation fault      (core dumped) | while read rev parents; do
> >
> > Do you still have this core dump? Could you run it and see if it's
> > "git" that crashed (and where) or "sh"?
>
> It is /usr/bin/bash that segfaults.  My guess is, that it runs out of memory (as described above, git-subtree enters an infinite loop untils it segafults).

Ah that's better (I was worried about "git" crashing). The problematic
commit should be 19ad68d95d (subtree: performance improvement for
finding unexpected parent commits - 2018-10-12) then, although I can't
see why.

I don't think we have any release coming up soon, so maybe Roger can
still have some time to fix it instead of a just a revert.

>
> >
> >>   process_split_commit "$rev" "$parents" 0;
> >> done
> >>
> >> Please note that this regression can not easily be reproduced, normally a subtree split just works.
> >>
> >> Reverting the last three commits "fixes" the issue.  So I kindly ask the last three commits to be reverted.
> >
> > Please provide the SHA-1 of the "good" commit you tested.
>
> I reverted these three commits (actually the last three commits to contrib/subtree/git-subtree.sh):
>
> 19ad68d95d6f8104eca1e86f8d1dfae50c7fb268
> 68f8ff81513fb3599ef3dfc3dd11da36d868e91b
> 315a84f9aa0e2e629b0680068646b0032518ebed
>
> And then it worked.
>
> - Marc
>
> --
> > Duy
>


-- 
Duy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2018-12-31 11:20     ` Duy Nguyen
@ 2018-12-31 11:24       ` Marc Balmer
  2018-12-31 11:36         ` Duy Nguyen
  2019-01-02 20:20         ` Strain, Roger L.
  0 siblings, 2 replies; 33+ messages in thread
From: Marc Balmer @ 2018-12-31 11:24 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, roger.strain, Junio C Hamano



> Am 31.12.2018 um 12:20 schrieb Duy Nguyen <pclouds@gmail.com>:
> 
> On Mon, Dec 31, 2018 at 6:13 PM Marc Balmer <marc@msys.ch> wrote:
>> 
>> 
>> 
>>> Am 31.12.2018 um 11:51 schrieb Duy Nguyen <pclouds@gmail.com>:
>>> 
>>> On Mon, Dec 31, 2018 at 5:44 PM Marc Balmer <marc@msys.ch> wrote:
>>>> 
>>>> Hi
>>>> 
>>>> One of the last three commits in git-subtree.sh introduced a regression leading to a segfault.
>>>> 
>>>> Here is the error message when I try to split out my i18n files:
>>>> 
>>>> $ git subtree split --prefix=i18n
>>>> cache for e39a2a0c6431773a5d831eb3cb7f1cd40d0da623 already exists!
>>>>  (Lots of output omitted)
>>>> 436/627 (1819) [1455]       <- Stays at 436/ while the numbers in () and [] increase, then segfaults:
>>>> /usr/libexec/git-core/git-subtree: line 751: 54693 Done                    eval "$grl"
>>>>   54694 Segmentation fault      (core dumped) | while read rev parents; do
>>> 
>>> Do you still have this core dump? Could you run it and see if it's
>>> "git" that crashed (and where) or "sh"?
>> 
>> It is /usr/bin/bash that segfaults.  My guess is, that it runs out of memory (as described above, git-subtree enters an infinite loop untils it segafults).
> 
> Ah that's better (I was worried about "git" crashing). The problematic
> commit should be 19ad68d95d (subtree: performance improvement for
> finding unexpected parent commits - 2018-10-12) then, although I can't
> see why.
> 
> I don't think we have any release coming up soon, so maybe Roger can
> still have some time to fix it instead of a just a revert.

In a (private) Email to me, he indicated that had no time for a fix.  Maybe he can speak up here?

In any case, if I can help testing, I am in.  I just don't know the inner workings of git-subtree.sh (I am a mere user of it...)


> 
>> 
>>> 
>>>>  process_split_commit "$rev" "$parents" 0;
>>>> done
>>>> 
>>>> Please note that this regression can not easily be reproduced, normally a subtree split just works.
>>>> 
>>>> Reverting the last three commits "fixes" the issue.  So I kindly ask the last three commits to be reverted.
>>> 
>>> Please provide the SHA-1 of the "good" commit you tested.
>> 
>> I reverted these three commits (actually the last three commits to contrib/subtree/git-subtree.sh):
>> 
>> 19ad68d95d6f8104eca1e86f8d1dfae50c7fb268
>> 68f8ff81513fb3599ef3dfc3dd11da36d868e91b
>> 315a84f9aa0e2e629b0680068646b0032518ebed
>> 
>> And then it worked.
>> 
>> - Marc
>> 
>> --
>>> Duy
>> 
> 
> 
> -- 
> Duy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2018-12-31 11:24       ` Marc Balmer
@ 2018-12-31 11:36         ` Duy Nguyen
  2018-12-31 12:31           ` Marc Balmer
  2019-01-02 20:20         ` Strain, Roger L.
  1 sibling, 1 reply; 33+ messages in thread
From: Duy Nguyen @ 2018-12-31 11:36 UTC (permalink / raw)
  To: Marc Balmer; +Cc: Git Mailing List, roger.strain, Junio C Hamano

On Mon, Dec 31, 2018 at 6:24 PM Marc Balmer <marc@msys.ch> wrote:
> In a (private) Email to me, he indicated that had no time for a fix.  Maybe he can speak up here?

Well, I guess Junio will revert when he's back after the holidays
then. Meanwhile..

> In any case, if I can help testing, I am in.  I just don't know the inner workings of git-subtree.sh (I am a mere user of it...)

If the repo you're facing the problem is publicly available, that
would be great so some of us could try reproduce.

Otherwise we'll need your help to track this problem down. in
git-subtree script line 640 (or somewhere close)

    progress "$revcount/$revmax ($createcount) [$extracount]"

could you update it to show $parents and $rev as well, e.g.

    progress "$revcount/$revmax ($createcount) [$extracount] ($parents) ($rev)"

Then please run these commands and post the output here

    git rev-parse <that-rev>^@

and

    git show -s --pretty=%P <that-rev>

where <that-rev> is $rev from the last few progress lines before bash crashes.
-- 
Duy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2018-12-31 11:36         ` Duy Nguyen
@ 2018-12-31 12:31           ` Marc Balmer
  2019-01-01 13:19             ` Duy Nguyen
  0 siblings, 1 reply; 33+ messages in thread
From: Marc Balmer @ 2018-12-31 12:31 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, roger.strain, Junio C Hamano



> Am 31.12.2018 um 12:36 schrieb Duy Nguyen <pclouds@gmail.com>:
> 
> On Mon, Dec 31, 2018 at 6:24 PM Marc Balmer <marc@msys.ch> wrote:
>> In a (private) Email to me, he indicated that had no time for a fix.  Maybe he can speak up here?
> 
> Well, I guess Junio will revert when he's back after the holidays
> then. Meanwhile..
> 
>> In any case, if I can help testing, I am in.  I just don't know the inner workings of git-subtree.sh (I am a mere user of it...)
> 
> If the repo you're facing the problem is publicly available, that
> would be great so some of us could try reproduce.

Unfortunately it is not.

> 
> Otherwise we'll need your help to track this problem down. in
> git-subtree script line 640 (or somewhere close)
> 
>    progress "$revcount/$revmax ($createcount) [$extracount]"
> 
> could you update it to show $parents and $rev as well, e.g.
> 
>    progress "$revcount/$revmax ($createcount) [$extracount] ($parents) ($rev)"

I did add this, plus changed progress to output a linefeed, and now just before the crash, the output looks like this:

436/627 (2013) [1649] (6e54a90a29e4e01fa2d6a42c232e02e08e912b2d) (2ca7b24e731ff91c94c9abf214686cb29cdc367e)
436/627 (2014) [1650] (1ef866e5a18012e80eed36315deb932c2b66d34a) (6e54a90a29e4e01fa2d6a42c232e02e08e912b2d)
436/627 (2015) [1651] (c8585f441548dd43f113a96ba48f6fa70363d388) (1ef866e5a18012e80eed36315deb932c2b66d34a)
436/627 (2016) [1652] (663bb110a58decfe889cf7c6b766f1d0c032ba39) (c8585f441548dd43f113a96ba48f6fa70363d388)
436/627 (2017) [1653] (edbdd28e009e52c8001bb54e53a56b059167e07d) (663bb110a58decfe889cf7c6b766f1d0c032ba39)
436/627 (2018) [1654] (c47739713912ae6e94714b9a1a6732407b236932) (edbdd28e009e52c8001bb54e53a56b059167e07d)
436/627 (2019) [1655] (d444823b97d9a8e53c4e721a44e4c49619d0b372) (c47739713912ae6e94714b9a1a6732407b236932)
436/627 (2020) [1656] (15a7ccecb2ca8bc47c77a997f8c74e7ac3b13325) (d444823b97d9a8e53c4e721a44e4c49619d0b372)
436/627 (2021) [1657] (b9bc5c9b33b100b57e23626ff422dac73f94384e) (15a7ccecb2ca8bc47c77a997f8c74e7ac3b13325)
436/627 (2022) [1658] (eec0f28c6fe5f7d664c41a913883d64cdf53c111) (b9bc5c9b33b100b57e23626ff422dac73f94384e)
436/627 (2023) [1659] (e0ddd9c60f71283996cfb169f1dbb77e8f7c4b94) (eec0f28c6fe5f7d664c41a913883d64cdf53c111)
436/627 (2024) [1660] (27b96988847caf3bfd71df2d7f58cbe6ba78208a) (e0ddd9c60f71283996cfb169f1dbb77e8f7c4b94)
436/627 (2025) [1661] (11e5861e50f88237ce362b6c7531e4e90bac86ac) (27b96988847caf3bfd71df2d7f58cbe6ba78208a)
/usr/libexec/git-core/git-subtree: line 751: 122202 Done                    eval "$grl"
     122203 Segmentation fault      (core dumped) | while read rev parents; do
    process_split_commit "$rev" "$parents" 0;
done


> 
> Then please run these commands and post the output here
> 
>    git rev-parse <that-rev>^@

Did that with the last three lines:

$ git rev-parse 27b96988847caf3bfd71df2d7f58cbe6ba78208a^@
11e5861e50f88237ce362b6c7531e4e90bac86ac
$ git rev-parse e0ddd9c60f71283996cfb169f1dbb77e8f7c4b94^@
27b96988847caf3bfd71df2d7f58cbe6ba78208a
$ git rev-parse eec0f28c6fe5f7d664c41a913883d64cdf53c111^@
e0ddd9c60f71283996cfb169f1dbb77e8f7c4b94

> 
> and
> 
>    git show -s --pretty=%P <that-rev>

$ git show -s --pretty=%P 27b96988847caf3bfd71df2d7f58cbe6ba78208a
11e5861e50f88237ce362b6c7531e4e90bac86ac
$ git show -s --pretty=%P e0ddd9c60f71283996cfb169f1dbb77e8f7c4b94
27b96988847caf3bfd71df2d7f58cbe6ba78208a
$ git show -s --pretty=%P eec0f28c6fe5f7d664c41a913883d64cdf53c111
e0ddd9c60f71283996cfb169f1dbb77e8f7c4b94

> 
> where <that-rev> is $rev from the last few progress lines before bash crashes.
> -- 
> Duy


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2018-12-31 12:31           ` Marc Balmer
@ 2019-01-01 13:19             ` Duy Nguyen
  2019-01-02  9:13               ` Marc Balmer
  0 siblings, 1 reply; 33+ messages in thread
From: Duy Nguyen @ 2019-01-01 13:19 UTC (permalink / raw)
  To: Marc Balmer; +Cc: Git Mailing List, roger.strain, Junio C Hamano

On Mon, Dec 31, 2018 at 01:31:21PM +0100, Marc Balmer wrote:
> 
> 
> > Am 31.12.2018 um 12:36 schrieb Duy Nguyen <pclouds@gmail.com>:
> > 
> > On Mon, Dec 31, 2018 at 6:24 PM Marc Balmer <marc@msys.ch> wrote:
> >> In a (private) Email to me, he indicated that had no time for a fix.  Maybe he can speak up here?
> > 
> > Well, I guess Junio will revert when he's back after the holidays
> > then. Meanwhile..
> > 
> >> In any case, if I can help testing, I am in.  I just don't know the inner workings of git-subtree.sh (I am a mere user of it...)
> > 
> > If the repo you're facing the problem is publicly available, that
> > would be great so some of us could try reproduce.
> 
> Unfortunately it is not.
> 
> > 
> > Otherwise we'll need your help to track this problem down. in
> > git-subtree script line 640 (or somewhere close)
> > 
> >    progress "$revcount/$revmax ($createcount) [$extracount]"
> > 
> > could you update it to show $parents and $rev as well, e.g.
> > 
> >    progress "$revcount/$revmax ($createcount) [$extracount] ($parents) ($rev)"
> 
> I did add this, plus changed progress to output a linefeed, and now just before the crash, the output looks like this:
> 
> 436/627 (2013) [1649] (6e54a90a29e4e01fa2d6a42c232e02e08e912b2d) (2ca7b24e731ff91c94c9abf214686cb29cdc367e)
> 436/627 (2014) [1650] (1ef866e5a18012e80eed36315deb932c2b66d34a) (6e54a90a29e4e01fa2d6a42c232e02e08e912b2d)
> 436/627 (2015) [1651] (c8585f441548dd43f113a96ba48f6fa70363d388) (1ef866e5a18012e80eed36315deb932c2b66d34a)
> 436/627 (2016) [1652] (663bb110a58decfe889cf7c6b766f1d0c032ba39) (c8585f441548dd43f113a96ba48f6fa70363d388)
> 436/627 (2017) [1653] (edbdd28e009e52c8001bb54e53a56b059167e07d) (663bb110a58decfe889cf7c6b766f1d0c032ba39)
> 436/627 (2018) [1654] (c47739713912ae6e94714b9a1a6732407b236932) (edbdd28e009e52c8001bb54e53a56b059167e07d)
> 436/627 (2019) [1655] (d444823b97d9a8e53c4e721a44e4c49619d0b372) (c47739713912ae6e94714b9a1a6732407b236932)
> 436/627 (2020) [1656] (15a7ccecb2ca8bc47c77a997f8c74e7ac3b13325) (d444823b97d9a8e53c4e721a44e4c49619d0b372)
> 436/627 (2021) [1657] (b9bc5c9b33b100b57e23626ff422dac73f94384e) (15a7ccecb2ca8bc47c77a997f8c74e7ac3b13325)
> 436/627 (2022) [1658] (eec0f28c6fe5f7d664c41a913883d64cdf53c111) (b9bc5c9b33b100b57e23626ff422dac73f94384e)
> 436/627 (2023) [1659] (e0ddd9c60f71283996cfb169f1dbb77e8f7c4b94) (eec0f28c6fe5f7d664c41a913883d64cdf53c111)
> 436/627 (2024) [1660] (27b96988847caf3bfd71df2d7f58cbe6ba78208a) (e0ddd9c60f71283996cfb169f1dbb77e8f7c4b94)
> 436/627 (2025) [1661] (11e5861e50f88237ce362b6c7531e4e90bac86ac) (27b96988847caf3bfd71df2d7f58cbe6ba78208a)
> /usr/libexec/git-core/git-subtree: line 751: 122202 Done                    eval "$grl"
>      122203 Segmentation fault      (core dumped) | while read rev parents; do
>     process_split_commit "$rev" "$parents" 0;
> done
> 
> 
> > 
> > Then please run these commands and post the output here
> > 
> >    git rev-parse <that-rev>^@
> 
> Did that with the last three lines:
> 
> $ git rev-parse 27b96988847caf3bfd71df2d7f58cbe6ba78208a^@
> 11e5861e50f88237ce362b6c7531e4e90bac86ac
> $ git rev-parse e0ddd9c60f71283996cfb169f1dbb77e8f7c4b94^@
> 27b96988847caf3bfd71df2d7f58cbe6ba78208a
> $ git rev-parse eec0f28c6fe5f7d664c41a913883d64cdf53c111^@
> e0ddd9c60f71283996cfb169f1dbb77e8f7c4b94
> 
> > 
> > and
> > 
> >    git show -s --pretty=%P <that-rev>
> 
> $ git show -s --pretty=%P 27b96988847caf3bfd71df2d7f58cbe6ba78208a
> 11e5861e50f88237ce362b6c7531e4e90bac86ac
> $ git show -s --pretty=%P e0ddd9c60f71283996cfb169f1dbb77e8f7c4b94
> 27b96988847caf3bfd71df2d7f58cbe6ba78208a
> $ git show -s --pretty=%P eec0f28c6fe5f7d664c41a913883d64cdf53c111
> e0ddd9c60f71283996cfb169f1dbb77e8f7c4b94
> 

Hmm.. I'm not that familiar with git-subtree.sh, so here's one last
blind shot.

There's a format change between git-show and git-rev-parse. The former
separates commits by spaces while the latter by newlines. Will this
help?

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index 147201dc6c..23f570beee 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -633,7 +633,7 @@ process_split_commit () {
 	else
 		# processing commit without normal parent information;
 		# fetch from repo
-		parents=$(git rev-parse "$rev^@")
+		parents=$(git rev-parse "$rev^@" | tr '\n' ' ')
 		extracount=$(($extracount + 1))
 	fi
 
--
Duy

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-01-01 13:19             ` Duy Nguyen
@ 2019-01-02  9:13               ` Marc Balmer
  0 siblings, 0 replies; 33+ messages in thread
From: Marc Balmer @ 2019-01-02  9:13 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, roger.strain, Junio C Hamano



> [...]
> 
>> 
> 
> Hmm.. I'm not that familiar with git-subtree.sh, so here's one last
> blind shot.
> 
> There's a format change between git-show and git-rev-parse. The former
> separates commits by spaces while the latter by newlines. Will this
> help?
> 
> diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
> index 147201dc6c..23f570beee 100755
> --- a/contrib/subtree/git-subtree.sh
> +++ b/contrib/subtree/git-subtree.sh
> @@ -633,7 +633,7 @@ process_split_commit () {
> 	else
> 		# processing commit without normal parent information;
> 		# fetch from repo
> -		parents=$(git rev-parse "$rev^@")
> +		parents=$(git rev-parse "$rev^@" | tr '\n' ' ')
> 		extracount=$(($extracount + 1))
> 	fi
> 

Unfortunately, this did not change the situation.  It still segfaults.

> --
> Duy


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2018-12-31 11:24       ` Marc Balmer
  2018-12-31 11:36         ` Duy Nguyen
@ 2019-01-02 20:20         ` Strain, Roger L.
  2019-01-03 13:50           ` Johannes Schindelin
  1 sibling, 1 reply; 33+ messages in thread
From: Strain, Roger L. @ 2019-01-02 20:20 UTC (permalink / raw)
  To: Marc Balmer, Duy Nguyen; +Cc: Git Mailing List, Junio C Hamano

> -----Original Message-----
> From: Marc Balmer <marc@msys.ch>
> Sent: Monday, December 31, 2018 5:24 AM
> To: Duy Nguyen <pclouds@gmail.com>
> Cc: Git Mailing List <git@vger.kernel.org>; Strain, Roger L.
> <roger.strain@swri.org>; Junio C Hamano <gitster@pobox.com>
> Subject: Re: Regression in git-subtree.sh, introduced in 2.20.1, after
> 315a84f9aa0e2e629b0680068646b0032518ebed
> 
> 
> 
> > Am 31.12.2018 um 12:20 schrieb Duy Nguyen <pclouds@gmail.com>:
> >
> > On Mon, Dec 31, 2018 at 6:13 PM Marc Balmer <marc@msys.ch> wrote:
> >>
> >>
> >>
> >>> Am 31.12.2018 um 11:51 schrieb Duy Nguyen <pclouds@gmail.com>:
> >>>
> >>> On Mon, Dec 31, 2018 at 5:44 PM Marc Balmer <marc@msys.ch> wrote:
> >>>>
> >>>> Hi
> >>>>
> >>>> One of the last three commits in git-subtree.sh introduced a regression
> leading to a segfault.
> >>>>
> >>>> Here is the error message when I try to split out my i18n files:
> >>>>
> >>>> $ git subtree split --prefix=i18n
> >>>> cache for e39a2a0c6431773a5d831eb3cb7f1cd40d0da623 already exists!
> >>>>  (Lots of output omitted)
> >>>> 436/627 (1819) [1455]       <- Stays at 436/ while the numbers in () and []
> increase, then segfaults:
> >>>> /usr/libexec/git-core/git-subtree: line 751: 54693 Done                    eval
> "$grl"
> >>>>   54694 Segmentation fault      (core dumped) | while read rev parents;
> do
> >>>
> >>> Do you still have this core dump? Could you run it and see if it's
> >>> "git" that crashed (and where) or "sh"?
> >>
> >> It is /usr/bin/bash that segfaults.  My guess is, that it runs out of memory
> (as described above, git-subtree enters an infinite loop untils it segafults).
> >
> > Ah that's better (I was worried about "git" crashing). The problematic
> > commit should be 19ad68d95d (subtree: performance improvement for
> > finding unexpected parent commits - 2018-10-12) then, although I can't
> > see why.
> >
> > I don't think we have any release coming up soon, so maybe Roger can
> > still have some time to fix it instead of a just a revert.
> 
> In a (private) Email to me, he indicated that had no time for a fix.  Maybe he
> can speak up here?
> 
> In any case, if I can help testing, I am in.  I just don't know the inner workings
> of git-subtree.sh (I am a mere user of it...)

TL;DR: Current script uses git rev-list to retrieve all commits which are reachable from HEAD but not from <abc123>. Is there a syntax that will instead return all commits reachable from HEAD, but stop traversing when <abc123> is encountered? It's a subtle distinction, but important.

Long version:

While I don't have a lot of time to troubleshoot this as deeply as I'd like, I've at least tried to give it a little bit of thought. If we assume it's an out of memory problem, that's likely due to commit 315a84f9aa0e2e629b0680068646b0032518ebed, which introduced a recursive history search. It might be possible to refactor that logic to avoid the recursive approach, but shell scripting is not my native tongue, and I didn't immediately see the right way to do it. I can explain the problem the commit was trying to address, and if anyone has suggestions about how to better fix the problem, I'd love to hear them.

This part of the subtree script is attempting to identify all mainline commits which need to be evaluated for subtree content. When subtree rejoin commits are included, it short-circuits this process by identifying all commits which can be reached from the current HEAD, but which CANNOT be reached from any of the known subtree rejoin points. While this does simplify the number of commits that need to be evaluated, it causes the script to miss commits that do in fact need to be evaluated when merges bypass those rejoins. Consider the following:

(Apologies for the crude diagram, I'm having to use Outlook at work and have no idea if this will format properly.)

[A]--(B)--(D)--[E]---(G)
 \       /          /
  ---(C)-------(F)--

In this graph, [A] represents an initial commit, which is a subtree rejoin (presumably mapping to subtree commit A'). From this, commits B and C were created, then those two were merged as commit D, and a subtree operation was performed to push these changes to the subtree. The result was rejoined as commit [E] (mapping as subtree commit E'). We then have a commit F with a single parent of C, and a merge of E and F into commit G.

At this point, I want to push the commits through G back to the subtree. The original script used git rev-list to list all commits which can be reached from G, but which cannot be reached from known split points A and E. This results in a commit list of G, F. It then looks at commit F to determine how it fits into the subtree, sees that it has a parent C, and looks for what subtree commit C maps to. Unfortunately C wasn't included in the list of commits to evaluate, so that check failed, and the script treats F as an initial commit. The subtree generated then includes F' as a complete commit, unrelated to any previous commits in the subtree, rather than connecting it to C' as a parent.

The ideal solution to this would be a call to git rev-list (or another tool, if appropriate) that could actually generate a commit list of G, F, C in the above diagram. Rather than being "All commits I can reach from G but not from A or E", what it really needs is "All commits I can reach from G, but stop traversing the history any time I reach A or E". If someone with better knowledge can provide that command, I think we could restore the script to a non-recursive approach and give better information about the true progress of the command, rather than having to rely on the "extra count" I introduced to keep track of how many unexpected commits are being recursively evaluated because the initial rev-list didn't include them.

As an aside, there's still another problem that we've run into which is forcing us to still run a custom version of the subtree script. There's a section in the script commented "ugly.  is there no better way to tell if this is a subtree vs. a mainline commit?  Does it matter?" Unfortunately, that section isn't working properly in our repo at this point, and I've had to include a VERY ugly check that looks for the existence of a known file in our specific mainline repo to make this determination. I'm still trying to sort through a better approach that might work for others, but have come up blank to this point. If anyone else has thoughts, I'd really love to hear those as well.

Anyway, sorry for the novel, but I wanted to brain dump what I've learned in this process. If anyone can suggest a syntax for rev-list that gives the appropriate set of commits, I'd love to get that in place; otherwise we could definitely try to unwind the recursion approach, but my shell script naivety may get me in trouble.

-- 
Roger

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-01-02 20:20         ` Strain, Roger L.
@ 2019-01-03 13:50           ` Johannes Schindelin
  2019-01-03 15:30             ` Strain, Roger L.
  0 siblings, 1 reply; 33+ messages in thread
From: Johannes Schindelin @ 2019-01-03 13:50 UTC (permalink / raw)
  To: Strain, Roger L.
  Cc: Marc Balmer, Duy Nguyen, Git Mailing List, Junio C Hamano

Hi Roger,


On Wed, 2 Jan 2019, Strain, Roger L. wrote:

> TL;DR: Current script uses git rev-list to retrieve all commits which
> are reachable from HEAD but not from <abc123>. Is there a syntax that
> will instead return all commits reachable from HEAD, but stop traversing
> when <abc123> is encountered? It's a subtle distinction, but important.

Maybe you are looking for the --ancestry-path option? Essentially, `git
rev-list --ancestry-path A..B` will list only commits that are reachable
from B, not reachable from A, but that *can* reach A (i.e. that are
descendants of A).

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-01-03 13:50           ` Johannes Schindelin
@ 2019-01-03 15:30             ` Strain, Roger L.
  0 siblings, 0 replies; 33+ messages in thread
From: Strain, Roger L. @ 2019-01-03 15:30 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Marc Balmer, Duy Nguyen, Git Mailing List, Junio C Hamano

> -----Original Message-----
> From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
> 
> Hi Roger,
> 
> 
> On Wed, 2 Jan 2019, Strain, Roger L. wrote:
> 
> > TL;DR: Current script uses git rev-list to retrieve all commits which
> > are reachable from HEAD but not from <abc123>. Is there a syntax that
> > will instead return all commits reachable from HEAD, but stop
> traversing
> > when <abc123> is encountered? It's a subtle distinction, but
> important.
> 
> Maybe you are looking for the --ancestry-path option? Essentially, `git
> rev-list --ancestry-path A..B` will list only commits that are reachable
> from B, not reachable from A, but that *can* reach A (i.e. that are
> descendants of A).
> 
> Ciao,
> Johannes

Thanks for the suggestion, but I don't think that one does quite what is needed here. It did provide a good sample graph to consider, though. Subtree needs to rebuild history and tie things in to previously reconstructed commits. Here's the sample graph from the --ancestry-path portion of the git-rev-list manpage:

	    D---E-------F
	   /     \       \
	  B---C---G---H---I---J
	 /                     \
	A-------K---------------L--M

Subtree maps mainline commits to known subtree commits, so let's assume we have a mapping of D to D'. As documented, if we were to rev-list D..M normally, we'd get all commits except D itself, and D's ancestors B and A. So the "normal" result would be:

	        E-------F
	         \       \
	      C---G---H---I---J
	                       \
	        K---------------L--M

This is bad for subtree, because commit C's parent is B, which is not a known commit to subtree, and which wasn't included in the list of commits to convert. It therefore assumes C is an initial commit, which is wrong. Likewise K's parent A isn't in the list to convert, so K is assumed to be an initial commit, which also is wrong. (E is okay here, because E's parent is D, and D maps to D', so we can stitch that history together properly.)

By using --ancestry-path, we would instead get only the things directly between D and M, as documented:

	        E-------F
	         \       \
	          G---H---I---J
	                       \
	                        L--M

This actually moves us in the wrong direction, as now both G and L have one known parent and one unknown parent; I'm not sure how the script would handle this, but we actually end up with less information.

In this case, what I need is a way to trace back history along all merge parents, stopping only when I hit one of multiple known commits that I can directly tie back to. In this instance, subtree *knows* what D maps to, so any time D is encountered, we can stop tracing back. But if I can get to one of D's ancestors through another path, I need to keep following that path. Here's what I need for this to work properly:

	        E-------F
	         \       \
	  B---C---G---H---I---J
	 /                     \
	A-------K---------------L--M

To give one more example (since removing a single commit frankly isn't very interesting) let's say that I have known subtree mappings for both D = D' and G = G'. I would therefore need to find all commits which are ancestors of M, but stop tracing history when I reach *either* D or G. Note that if I can reach a commit from any other path, I still need to know about it. Here's what we ultimately would want to find:

	        E-------F
	                 \
	              H---I---J
	                       \
	A-------K---------------L--M

In this case, commit E will reference known commit D as a parent and maps to D', and is good. Commit H references known commit G as a parent and maps to G', and is good. Commit K references A, which itself is an initial commit so is converted to A' (just as it has been previous times subtree has run), and is good.

I'll keep digging around a little bit, but I'm starting to think the necessary plumbing for this operation might not exist. If I can't find it, I'll see if there's some way to unroll that recursive call.

-- 
Roger

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
@ 2019-12-08 10:30 Nadav SInai
  2019-12-09 14:11 ` Strain, Roger L.
  0 siblings, 1 reply; 33+ messages in thread
From: Nadav SInai @ 2019-12-08 10:30 UTC (permalink / raw)
  To: roger.strain; +Cc: Johannes.Schindelin, git, gitster, marc, pclouds

Hi, I'm curious if any of you had any luck in preventing that
seg-fault in git-subtree script
I'm encountering it myself using git 2.24.0.windows.2., seg-fault is
in the same while loop (currently on line 757)
When I tried your suggestion of adding the ($parents) ($rev) to the
progress print I see that the last commit have only one revision
printed
like this:

259/290 (523) [271] (843dd34090d36dfabd6a2e3e8459a4887427313b)
(a69ee056f66acf66c63f89f55d26c0cc17036623)
259/290 (525) [273] (f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
(843dd34090d36dfabd6a2e3e8459a4887427313b)
259/290 (527) [275] (82303752a428cf1d789ac9f156008adb2798b7b5)
(f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
259/290 (528) [276]
(7187897883c9fb4d33d4c87a02b876f8603728ff05f0945ae2ce9f98a35135)
259/290 (529) [277]
(a00a3665343439a426671958dd90ed0407a22cad9ac9f156008adb2798b7b5)
259/290 (530) [278]
(90beb94ebd331c457d79d05341453f5829a50bfcd4c87a02b876f8603728ff)
259/290 (531) [279]
(9582e0acbed1910173564e250f350b5cc4291a7f671958dd90ed0407a22cad)
259/290 (532) [280]
(f183930d6fabd3dccdddc5ec35d754ad28caf3b879d05341453f5829a50bfc)
259/290 (533) [281]
(c9309f3a38c41f7991d9e78ddb47f7e85b8521eb564e250f350b5cc4291a7f)
259/290 (534) [282]
(3bcf08f63a0e2b93ecc376bd679a16c80e99e7b1ddc5ec35d754ad28caf3b8)
259/290 (535) [283]
(134621bb55a0470cdf6519ce08d6909af43ce0e5d9e78ddb47f7e85b8521eb)
259/290 (536) [284]
(edb3471fbba29748f9784d29b3cee1dee2df4b37c376bd679a16c80e99e7b1)
259/290 (537) [285]
(dd947a095df07a32dfd56666a395a7c42b25ca116519ce08d6909af43ce0e5)
259/290 (538) [286]
(a639e09d2cbe1ea1149c080c1c95b8b018340ae2784d29b3cee1dee2df4b37)
C:/Program Files/Git/mingw64/libexec/git-core\git-subtree: line 757:
8853 Done                    eval "$grl"
      8854 Segmentation fault      (core dumped) | while read rev parents; do
    process_split_commit "$rev" "$parents" 0;
done

I downgraded git to 2.19.0-windows.1 and it works now.


I'm thankful for your insights
Nadav Sinai
Web Tech lead
Philips-Algotec

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-12-09 14:11 ` Strain, Roger L.
@ 2019-12-09 11:45   ` Ed Maste
  2019-12-09 16:19     ` Strain, Roger L.
  2019-12-09 14:13   ` Marc Balmer
  1 sibling, 1 reply; 33+ messages in thread
From: Ed Maste @ 2019-12-09 11:45 UTC (permalink / raw)
  To: Strain, Roger L.; +Cc: git@vger.kernel.org, marc@msys.ch

On Mon, 9 Dec 2019 at 09:29, Strain, Roger L. <roger.strain@swri.org> wrote:
>
> I've had to further
> customize the script for our internal use, and those changes aren't
> something that would be useful for the public at large.

Would you describe the sort of problem you have to work around with
custom changes?

I'm starting on a path of trying to fix git-subtree for failures[1]
encountered in a prototype conversion of the FreeBSD repository from
svn to git. The misbehaviour I encounter occurs when split encounters
a commit for which the path being split is empty in 'git ls-tree', and
the commit is actually not a subtree commit. I'm currently
experimenting with hacks to skip specific hashes during the initial
subtree split. On reading your mail I realize I could address my issue
by testing for the existence of a specific file though, which makes me
wonder if the issue you have is similar.

[1] https://lore.kernel.org/git/CAPyFy2AsmaxU-BDf_teZJE5hiaVpTSZc8fftnuXPb_4-j7j5Fw@mail.gmail.com/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-12-08 10:30 Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed Nadav SInai
@ 2019-12-09 14:11 ` Strain, Roger L.
  2019-12-09 11:45   ` Ed Maste
  2019-12-09 14:13   ` Marc Balmer
  0 siblings, 2 replies; 33+ messages in thread
From: Strain, Roger L. @ 2019-12-09 14:11 UTC (permalink / raw)
  To: ns@nadavsinai.com
  Cc: git@vger.kernel.org, Johannes.Schindelin@gmx.de,
	gitster@pobox.com, pclouds@gmail.com, marc@msys.ch

I haven't been able to find anything relating to the issue, but I also
haven't had a repo that exposes the problem to test more thoroughly
against. If this happens to be a public repo somewhere, I'd be more
than happy to take a second look.

That being said, if the community feels it would be better to revert
the changes that were introduced, I won't object. I've had to further
customize the script for our internal use, and those changes aren't
something that would be useful for the public at large. (A few changes relate to the presence/absence of a specific file, which I certainly wouldn't expect anyone else to have.) Short story is we're going to have to use a custom script going forward, so keeping or reverting the changes here make no difference to us. I still feel that the changes which were made make the script more correct, but clearly there's some undiagnosed logic error somewhere.

Honestly, I'm surprised we didn't see this particular issue show up on
our own repo; it's ridiculously large and complex. At least if it had,
I'd be able to troubleshoot it more reliably.

--  
Roger Strain

-----Original Message-----
From: Nadav SInai <ns@nadavsinai.com>
To: roger.strain@swri.org
Cc: Johannes.Schindelin@gmx.de, git@vger.kernel.org, gitster@pobox.com,
marc@msys.ch, pclouds@gmail.com
Subject: RE: Regression in git-subtree.sh, introduced in 2.20.1, after
315a84f9aa0e2e629b0680068646b0032518ebed
Date: Sun, 08 Dec 2019 12:30:48 +0200

[EXTERNAL EMAIL]

Hi, I'm curious if any of you had any luck in preventing that
seg-fault in git-subtree script
I'm encountering it myself using git 2.24.0.windows.2., seg-fault is
in the same while loop (currently on line 757)
When I tried your suggestion of adding the ($parents) ($rev) to the
progress print I see that the last commit have only one revision
printed
like this:

259/290 (523) [271] (843dd34090d36dfabd6a2e3e8459a4887427313b)
(a69ee056f66acf66c63f89f55d26c0cc17036623)
259/290 (525) [273] (f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
(843dd34090d36dfabd6a2e3e8459a4887427313b)
259/290 (527) [275] (82303752a428cf1d789ac9f156008adb2798b7b5)
(f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
259/290 (528) [276]
(7187897883c9fb4d33d4c87a02b876f8603728ff05f0945ae2ce9f98a35135)
259/290 (529) [277]
(a00a3665343439a426671958dd90ed0407a22cad9ac9f156008adb2798b7b5)
259/290 (530) [278]
(90beb94ebd331c457d79d05341453f5829a50bfcd4c87a02b876f8603728ff)
259/290 (531) [279]
(9582e0acbed1910173564e250f350b5cc4291a7f671958dd90ed0407a22cad)
259/290 (532) [280]
(f183930d6fabd3dccdddc5ec35d754ad28caf3b879d05341453f5829a50bfc)
259/290 (533) [281]
(c9309f3a38c41f7991d9e78ddb47f7e85b8521eb564e250f350b5cc4291a7f)
259/290 (534) [282]
(3bcf08f63a0e2b93ecc376bd679a16c80e99e7b1ddc5ec35d754ad28caf3b8)
259/290 (535) [283]
(134621bb55a0470cdf6519ce08d6909af43ce0e5d9e78ddb47f7e85b8521eb)
259/290 (536) [284]
(edb3471fbba29748f9784d29b3cee1dee2df4b37c376bd679a16c80e99e7b1)
259/290 (537) [285]
(dd947a095df07a32dfd56666a395a7c42b25ca116519ce08d6909af43ce0e5)
259/290 (538) [286]
(a639e09d2cbe1ea1149c080c1c95b8b018340ae2784d29b3cee1dee2df4b37)
C:/Program Files/Git/mingw64/libexec/git-core\git-subtree: line 757:
8853 Done                    eval "$grl"
      8854 Segmentation fault      (core dumped) | while read rev
parents; do
    process_split_commit "$rev" "$parents" 0;
done

I downgraded git to 2.19.0-windows.1 and it works now.


I'm thankful for your insights
Nadav Sinai
Web Tech lead
Philips-Algotec



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-12-09 14:11 ` Strain, Roger L.
  2019-12-09 11:45   ` Ed Maste
@ 2019-12-09 14:13   ` Marc Balmer
  2019-12-09 14:18     ` Strain, Roger L.
  1 sibling, 1 reply; 33+ messages in thread
From: Marc Balmer @ 2019-12-09 14:13 UTC (permalink / raw)
  To: Strain, Roger L.
  Cc: ns@nadavsinai.com, git@vger.kernel.org,
	Johannes.Schindelin@gmx.de, gitster@pobox.com, pclouds@gmail.com

Roger,

I am all for reverting it. if that does not cause any other regressions or headaches (or both...)

- Marc


> Am 09.12.2019 um 15:11 schrieb Strain, Roger L. <roger.strain@swri.org>:
> 
> I haven't been able to find anything relating to the issue, but I also
> haven't had a repo that exposes the problem to test more thoroughly
> against. If this happens to be a public repo somewhere, I'd be more
> than happy to take a second look.
> 
> That being said, if the community feels it would be better to revert
> the changes that were introduced, I won't object. I've had to further
> customize the script for our internal use, and those changes aren't
> something that would be useful for the public at large. (A few changes relate to the presence/absence of a specific file, which I certainly wouldn't expect anyone else to have.) Short story is we're going to have to use a custom script going forward, so keeping or reverting the changes here make no difference to us. I still feel that the changes which were made make the script more correct, but clearly there's some undiagnosed logic error somewhere.
> 
> Honestly, I'm surprised we didn't see this particular issue show up on
> our own repo; it's ridiculously large and complex. At least if it had,
> I'd be able to troubleshoot it more reliably.
> 
> --  
> Roger Strain
> 
> -----Original Message-----
> From: Nadav SInai <ns@nadavsinai.com>
> To: roger.strain@swri.org
> Cc: Johannes.Schindelin@gmx.de, git@vger.kernel.org, gitster@pobox.com,
> marc@msys.ch, pclouds@gmail.com
> Subject: RE: Regression in git-subtree.sh, introduced in 2.20.1, after
> 315a84f9aa0e2e629b0680068646b0032518ebed
> Date: Sun, 08 Dec 2019 12:30:48 +0200
> 
> [EXTERNAL EMAIL]
> 
> Hi, I'm curious if any of you had any luck in preventing that
> seg-fault in git-subtree script
> I'm encountering it myself using git 2.24.0.windows.2., seg-fault is
> in the same while loop (currently on line 757)
> When I tried your suggestion of adding the ($parents) ($rev) to the
> progress print I see that the last commit have only one revision
> printed
> like this:
> 
> 259/290 (523) [271] (843dd34090d36dfabd6a2e3e8459a4887427313b)
> (a69ee056f66acf66c63f89f55d26c0cc17036623)
> 259/290 (525) [273] (f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
> (843dd34090d36dfabd6a2e3e8459a4887427313b)
> 259/290 (527) [275] (82303752a428cf1d789ac9f156008adb2798b7b5)
> (f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
> 259/290 (528) [276]
> (7187897883c9fb4d33d4c87a02b876f8603728ff05f0945ae2ce9f98a35135)
> 259/290 (529) [277]
> (a00a3665343439a426671958dd90ed0407a22cad9ac9f156008adb2798b7b5)
> 259/290 (530) [278]
> (90beb94ebd331c457d79d05341453f5829a50bfcd4c87a02b876f8603728ff)
> 259/290 (531) [279]
> (9582e0acbed1910173564e250f350b5cc4291a7f671958dd90ed0407a22cad)
> 259/290 (532) [280]
> (f183930d6fabd3dccdddc5ec35d754ad28caf3b879d05341453f5829a50bfc)
> 259/290 (533) [281]
> (c9309f3a38c41f7991d9e78ddb47f7e85b8521eb564e250f350b5cc4291a7f)
> 259/290 (534) [282]
> (3bcf08f63a0e2b93ecc376bd679a16c80e99e7b1ddc5ec35d754ad28caf3b8)
> 259/290 (535) [283]
> (134621bb55a0470cdf6519ce08d6909af43ce0e5d9e78ddb47f7e85b8521eb)
> 259/290 (536) [284]
> (edb3471fbba29748f9784d29b3cee1dee2df4b37c376bd679a16c80e99e7b1)
> 259/290 (537) [285]
> (dd947a095df07a32dfd56666a395a7c42b25ca116519ce08d6909af43ce0e5)
> 259/290 (538) [286]
> (a639e09d2cbe1ea1149c080c1c95b8b018340ae2784d29b3cee1dee2df4b37)
> C:/Program Files/Git/mingw64/libexec/git-core\git-subtree: line 757:
> 8853 Done                    eval "$grl"
>      8854 Segmentation fault      (core dumped) | while read rev
> parents; do
>    process_split_commit "$rev" "$parents" 0;
> done
> 
> I downgraded git to 2.19.0-windows.1 and it works now.
> 
> 
> I'm thankful for your insights
> Nadav Sinai
> Web Tech lead
> Philips-Algotec
> 
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-12-09 14:13   ` Marc Balmer
@ 2019-12-09 14:18     ` Strain, Roger L.
  2019-12-09 14:30       ` Marc Balmer
  0 siblings, 1 reply; 33+ messages in thread
From: Strain, Roger L. @ 2019-12-09 14:18 UTC (permalink / raw)
  To: marc@msys.ch
  Cc: git@vger.kernel.org, Johannes.Schindelin@gmx.de,
	gitster@pobox.com, ns@nadavsinai.com, pclouds@gmail.com

As I said, I'm using a custom script here. I don't know if anybody else
benefited from the change and hasn't said anything, but I won't object
to someone submitting that revert.

-- 
Roger

-----Original Message-----
From: Marc Balmer <marc@msys.ch>
To: "Strain, Roger L." <roger.strain@swri.org>
Cc: ns@nadavsinai.com <ns@nadavsinai.com>, git@vger.kernel.org <
git@vger.kernel.org>, Johannes.Schindelin@gmx.de <
Johannes.Schindelin@gmx.de>, gitster@pobox.com <gitster@pobox.com>, 
pclouds@gmail.com <pclouds@gmail.com>
Subject: Re: Regression in git-subtree.sh, introduced in 2.20.1, after
315a84f9aa0e2e629b0680068646b0032518ebed
Date: Mon, 09 Dec 2019 15:13:47 +0100

Roger,

I am all for reverting it. if that does not cause any other regressions
or headaches (or both...)

- Marc



Am 09.12.2019 um 15:11 schrieb Strain, Roger L. <roger.strain@swri.org>
:

I haven't been able to find anything relating to the issue, but I also
haven't had a repo that exposes the problem to test more thoroughly
against. If this happens to be a public repo somewhere, I'd be more
than happy to take a second look.

That being said, if the community feels it would be better to revert
the changes that were introduced, I won't object. I've had to further
customize the script for our internal use, and those changes aren't
something that would be useful for the public at large. (A few changes
relate to the presence/absence of a specific file, which I certainly
wouldn't expect anyone else to have.) Short story is we're going to
have to use a custom script going forward, so keeping or reverting the
changes here make no difference to us. I still feel that the changes
which were made make the script more correct, but clearly there's some
undiagnosed logic error somewhere.

Honestly, I'm surprised we didn't see this particular issue show up on
our own repo; it's ridiculously large and complex. At least if it had,
I'd be able to troubleshoot it more reliably.

--  
Roger Strain

-----Original Message-----
From: Nadav SInai <ns@nadavsinai.com>
To: roger.strain@swri.org
Cc: Johannes.Schindelin@gmx.de, git@vger.kernel.org, gitster@pobox.com,
marc@msys.ch, pclouds@gmail.com
Subject: RE: Regression in git-subtree.sh, introduced in 2.20.1, after
315a84f9aa0e2e629b0680068646b0032518ebed
Date: Sun, 08 Dec 2019 12:30:48 +0200

[EXTERNAL EMAIL]

Hi, I'm curious if any of you had any luck in preventing that
seg-fault in git-subtree script
I'm encountering it myself using git 2.24.0.windows.2., seg-fault is
in the same while loop (currently on line 757)
When I tried your suggestion of adding the ($parents) ($rev) to the
progress print I see that the last commit have only one revision
printed
like this:

259/290 (523) [271] (843dd34090d36dfabd6a2e3e8459a4887427313b)
(a69ee056f66acf66c63f89f55d26c0cc17036623)
259/290 (525) [273] (f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
(843dd34090d36dfabd6a2e3e8459a4887427313b)
259/290 (527) [275] (82303752a428cf1d789ac9f156008adb2798b7b5)
(f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
259/290 (528) [276]
(7187897883c9fb4d33d4c87a02b876f8603728ff05f0945ae2ce9f98a35135)
259/290 (529) [277]
(a00a3665343439a426671958dd90ed0407a22cad9ac9f156008adb2798b7b5)
259/290 (530) [278]
(90beb94ebd331c457d79d05341453f5829a50bfcd4c87a02b876f8603728ff)
259/290 (531) [279]
(9582e0acbed1910173564e250f350b5cc4291a7f671958dd90ed0407a22cad)
259/290 (532) [280]
(f183930d6fabd3dccdddc5ec35d754ad28caf3b879d05341453f5829a50bfc)
259/290 (533) [281]
(c9309f3a38c41f7991d9e78ddb47f7e85b8521eb564e250f350b5cc4291a7f)
259/290 (534) [282]
(3bcf08f63a0e2b93ecc376bd679a16c80e99e7b1ddc5ec35d754ad28caf3b8)
259/290 (535) [283]
(134621bb55a0470cdf6519ce08d6909af43ce0e5d9e78ddb47f7e85b8521eb)
259/290 (536) [284]
(edb3471fbba29748f9784d29b3cee1dee2df4b37c376bd679a16c80e99e7b1)
259/290 (537) [285]
(dd947a095df07a32dfd56666a395a7c42b25ca116519ce08d6909af43ce0e5)
259/290 (538) [286]
(a639e09d2cbe1ea1149c080c1c95b8b018340ae2784d29b3cee1dee2df4b37)
C:/Program Files/Git/mingw64/libexec/git-core\git-subtree: line 757:
8853 Done                    eval "$grl"
     8854 Segmentation fault      (core dumped) | while read rev
parents; do
   process_split_commit "$rev" "$parents" 0;
done

I downgraded git to 2.19.0-windows.1 and it works now.


I'm thankful for your insights
Nadav Sinai
Web Tech lead
Philips-Algotec






^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-12-09 14:18     ` Strain, Roger L.
@ 2019-12-09 14:30       ` Marc Balmer
  2019-12-09 15:26         ` Johannes Schindelin
  0 siblings, 1 reply; 33+ messages in thread
From: Marc Balmer @ 2019-12-09 14:30 UTC (permalink / raw)
  To: Strain, Roger L.
  Cc: git@vger.kernel.org, Johannes.Schindelin@gmx.de,
	gitster@pobox.com, ns@nadavsinai.com, pclouds@gmail.com

I am not familiar with the source code, so I can not send in that revert.  I can, however, say that I am grateful to whomever does it ;)

- Marc


> Am 09.12.2019 um 15:18 schrieb Strain, Roger L. <roger.strain@swri.org>:
> 
> As I said, I'm using a custom script here. I don't know if anybody else
> benefited from the change and hasn't said anything, but I won't object
> to someone submitting that revert.
> 
> -- 
> Roger
> 
> -----Original Message-----
> From: Marc Balmer <marc@msys.ch>
> To: "Strain, Roger L." <roger.strain@swri.org>
> Cc: ns@nadavsinai.com <ns@nadavsinai.com>, git@vger.kernel.org <
> git@vger.kernel.org>, Johannes.Schindelin@gmx.de <
> Johannes.Schindelin@gmx.de>, gitster@pobox.com <gitster@pobox.com>, 
> pclouds@gmail.com <pclouds@gmail.com>
> Subject: Re: Regression in git-subtree.sh, introduced in 2.20.1, after
> 315a84f9aa0e2e629b0680068646b0032518ebed
> Date: Mon, 09 Dec 2019 15:13:47 +0100
> 
> Roger,
> 
> I am all for reverting it. if that does not cause any other regressions
> or headaches (or both...)
> 
> - Marc
> 
> 
> 
> Am 09.12.2019 um 15:11 schrieb Strain, Roger L. <roger.strain@swri.org>
> :
> 
> I haven't been able to find anything relating to the issue, but I also
> haven't had a repo that exposes the problem to test more thoroughly
> against. If this happens to be a public repo somewhere, I'd be more
> than happy to take a second look.
> 
> That being said, if the community feels it would be better to revert
> the changes that were introduced, I won't object. I've had to further
> customize the script for our internal use, and those changes aren't
> something that would be useful for the public at large. (A few changes
> relate to the presence/absence of a specific file, which I certainly
> wouldn't expect anyone else to have.) Short story is we're going to
> have to use a custom script going forward, so keeping or reverting the
> changes here make no difference to us. I still feel that the changes
> which were made make the script more correct, but clearly there's some
> undiagnosed logic error somewhere.
> 
> Honestly, I'm surprised we didn't see this particular issue show up on
> our own repo; it's ridiculously large and complex. At least if it had,
> I'd be able to troubleshoot it more reliably.
> 
> --  
> Roger Strain
> 
> -----Original Message-----
> From: Nadav SInai <ns@nadavsinai.com>
> To: roger.strain@swri.org
> Cc: Johannes.Schindelin@gmx.de, git@vger.kernel.org, gitster@pobox.com,
> marc@msys.ch, pclouds@gmail.com
> Subject: RE: Regression in git-subtree.sh, introduced in 2.20.1, after
> 315a84f9aa0e2e629b0680068646b0032518ebed
> Date: Sun, 08 Dec 2019 12:30:48 +0200
> 
> [EXTERNAL EMAIL]
> 
> Hi, I'm curious if any of you had any luck in preventing that
> seg-fault in git-subtree script
> I'm encountering it myself using git 2.24.0.windows.2., seg-fault is
> in the same while loop (currently on line 757)
> When I tried your suggestion of adding the ($parents) ($rev) to the
> progress print I see that the last commit have only one revision
> printed
> like this:
> 
> 259/290 (523) [271] (843dd34090d36dfabd6a2e3e8459a4887427313b)
> (a69ee056f66acf66c63f89f55d26c0cc17036623)
> 259/290 (525) [273] (f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
> (843dd34090d36dfabd6a2e3e8459a4887427313b)
> 259/290 (527) [275] (82303752a428cf1d789ac9f156008adb2798b7b5)
> (f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
> 259/290 (528) [276]
> (7187897883c9fb4d33d4c87a02b876f8603728ff05f0945ae2ce9f98a35135)
> 259/290 (529) [277]
> (a00a3665343439a426671958dd90ed0407a22cad9ac9f156008adb2798b7b5)
> 259/290 (530) [278]
> (90beb94ebd331c457d79d05341453f5829a50bfcd4c87a02b876f8603728ff)
> 259/290 (531) [279]
> (9582e0acbed1910173564e250f350b5cc4291a7f671958dd90ed0407a22cad)
> 259/290 (532) [280]
> (f183930d6fabd3dccdddc5ec35d754ad28caf3b879d05341453f5829a50bfc)
> 259/290 (533) [281]
> (c9309f3a38c41f7991d9e78ddb47f7e85b8521eb564e250f350b5cc4291a7f)
> 259/290 (534) [282]
> (3bcf08f63a0e2b93ecc376bd679a16c80e99e7b1ddc5ec35d754ad28caf3b8)
> 259/290 (535) [283]
> (134621bb55a0470cdf6519ce08d6909af43ce0e5d9e78ddb47f7e85b8521eb)
> 259/290 (536) [284]
> (edb3471fbba29748f9784d29b3cee1dee2df4b37c376bd679a16c80e99e7b1)
> 259/290 (537) [285]
> (dd947a095df07a32dfd56666a395a7c42b25ca116519ce08d6909af43ce0e5)
> 259/290 (538) [286]
> (a639e09d2cbe1ea1149c080c1c95b8b018340ae2784d29b3cee1dee2df4b37)
> C:/Program Files/Git/mingw64/libexec/git-core\git-subtree: line 757:
> 8853 Done                    eval "$grl"
>     8854 Segmentation fault      (core dumped) | while read rev
> parents; do
>   process_split_commit "$rev" "$parents" 0;
> done
> 
> I downgraded git to 2.19.0-windows.1 and it works now.
> 
> 
> I'm thankful for your insights
> Nadav Sinai
> Web Tech lead
> Philips-Algotec
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-12-09 14:30       ` Marc Balmer
@ 2019-12-09 15:26         ` Johannes Schindelin
  2019-12-09 15:31           ` Marc Balmer
  0 siblings, 1 reply; 33+ messages in thread
From: Johannes Schindelin @ 2019-12-09 15:26 UTC (permalink / raw)
  To: Marc Balmer
  Cc: Strain, Roger L., git@vger.kernel.org, gitster@pobox.com,
	ns@nadavsinai.com, pclouds@gmail.com

Hi,

On Mon, 9 Dec 2019, Marc Balmer wrote:

> I am not familiar with the source code, so I can not send in that
> revert.  I can, however, say that I am grateful to whomever does it ;)

I am against reverting the change without knowing the root cause.

The recent reporter only compared Git for Windows v2.19.0 vs v2.20.1,
which is _quite_ a big difference.

For what I know, the problem might be a change in the MSYS2 runtime that
is mistaken by some malware for malicious code (we did introduce some code
to emulate Ctrl+C in MinTTY which injects a remote thread and executes
ExitProcess() there, which might very well be construed as an attack, even
if it is actually very much desired behavior).

These segmentation faults in `git subtree` on Windows have traditionally
been _all_ because of overzealous anti-malware.

So first, a much more fine-grained analysis would be required, e.g.
comparing v2.20.1 against v2.20.0, then copying _just_ the `git-subtree`
file from a working into a non-working version (or vice versa; I would
highly recommend using the portable versions for such side-by side
comparison).

Ciao,
Johannes

>
> - Marc
>
>
> > Am 09.12.2019 um 15:18 schrieb Strain, Roger L. <roger.strain@swri.org>:
> >
> > As I said, I'm using a custom script here. I don't know if anybody else
> > benefited from the change and hasn't said anything, but I won't object
> > to someone submitting that revert.
> >
> > --
> > Roger
> >
> > -----Original Message-----
> > From: Marc Balmer <marc@msys.ch>
> > To: "Strain, Roger L." <roger.strain@swri.org>
> > Cc: ns@nadavsinai.com <ns@nadavsinai.com>, git@vger.kernel.org <
> > git@vger.kernel.org>, Johannes.Schindelin@gmx.de <
> > Johannes.Schindelin@gmx.de>, gitster@pobox.com <gitster@pobox.com>,
> > pclouds@gmail.com <pclouds@gmail.com>
> > Subject: Re: Regression in git-subtree.sh, introduced in 2.20.1, after
> > 315a84f9aa0e2e629b0680068646b0032518ebed
> > Date: Mon, 09 Dec 2019 15:13:47 +0100
> >
> > Roger,
> >
> > I am all for reverting it. if that does not cause any other regressions
> > or headaches (or both...)
> >
> > - Marc
> >
> >
> >
> > Am 09.12.2019 um 15:11 schrieb Strain, Roger L. <roger.strain@swri.org>
> > :
> >
> > I haven't been able to find anything relating to the issue, but I also
> > haven't had a repo that exposes the problem to test more thoroughly
> > against. If this happens to be a public repo somewhere, I'd be more
> > than happy to take a second look.
> >
> > That being said, if the community feels it would be better to revert
> > the changes that were introduced, I won't object. I've had to further
> > customize the script for our internal use, and those changes aren't
> > something that would be useful for the public at large. (A few changes
> > relate to the presence/absence of a specific file, which I certainly
> > wouldn't expect anyone else to have.) Short story is we're going to
> > have to use a custom script going forward, so keeping or reverting the
> > changes here make no difference to us. I still feel that the changes
> > which were made make the script more correct, but clearly there's some
> > undiagnosed logic error somewhere.
> >
> > Honestly, I'm surprised we didn't see this particular issue show up on
> > our own repo; it's ridiculously large and complex. At least if it had,
> > I'd be able to troubleshoot it more reliably.
> >
> > --
> > Roger Strain
> >
> > -----Original Message-----
> > From: Nadav SInai <ns@nadavsinai.com>
> > To: roger.strain@swri.org
> > Cc: Johannes.Schindelin@gmx.de, git@vger.kernel.org, gitster@pobox.com,
> > marc@msys.ch, pclouds@gmail.com
> > Subject: RE: Regression in git-subtree.sh, introduced in 2.20.1, after
> > 315a84f9aa0e2e629b0680068646b0032518ebed
> > Date: Sun, 08 Dec 2019 12:30:48 +0200
> >
> > [EXTERNAL EMAIL]
> >
> > Hi, I'm curious if any of you had any luck in preventing that
> > seg-fault in git-subtree script
> > I'm encountering it myself using git 2.24.0.windows.2., seg-fault is
> > in the same while loop (currently on line 757)
> > When I tried your suggestion of adding the ($parents) ($rev) to the
> > progress print I see that the last commit have only one revision
> > printed
> > like this:
> >
> > 259/290 (523) [271] (843dd34090d36dfabd6a2e3e8459a4887427313b)
> > (a69ee056f66acf66c63f89f55d26c0cc17036623)
> > 259/290 (525) [273] (f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
> > (843dd34090d36dfabd6a2e3e8459a4887427313b)
> > 259/290 (527) [275] (82303752a428cf1d789ac9f156008adb2798b7b5)
> > (f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
> > 259/290 (528) [276]
> > (7187897883c9fb4d33d4c87a02b876f8603728ff05f0945ae2ce9f98a35135)
> > 259/290 (529) [277]
> > (a00a3665343439a426671958dd90ed0407a22cad9ac9f156008adb2798b7b5)
> > 259/290 (530) [278]
> > (90beb94ebd331c457d79d05341453f5829a50bfcd4c87a02b876f8603728ff)
> > 259/290 (531) [279]
> > (9582e0acbed1910173564e250f350b5cc4291a7f671958dd90ed0407a22cad)
> > 259/290 (532) [280]
> > (f183930d6fabd3dccdddc5ec35d754ad28caf3b879d05341453f5829a50bfc)
> > 259/290 (533) [281]
> > (c9309f3a38c41f7991d9e78ddb47f7e85b8521eb564e250f350b5cc4291a7f)
> > 259/290 (534) [282]
> > (3bcf08f63a0e2b93ecc376bd679a16c80e99e7b1ddc5ec35d754ad28caf3b8)
> > 259/290 (535) [283]
> > (134621bb55a0470cdf6519ce08d6909af43ce0e5d9e78ddb47f7e85b8521eb)
> > 259/290 (536) [284]
> > (edb3471fbba29748f9784d29b3cee1dee2df4b37c376bd679a16c80e99e7b1)
> > 259/290 (537) [285]
> > (dd947a095df07a32dfd56666a395a7c42b25ca116519ce08d6909af43ce0e5)
> > 259/290 (538) [286]
> > (a639e09d2cbe1ea1149c080c1c95b8b018340ae2784d29b3cee1dee2df4b37)
> > C:/Program Files/Git/mingw64/libexec/git-core\git-subtree: line 757:
> > 8853 Done                    eval "$grl"
> >     8854 Segmentation fault      (core dumped) | while read rev
> > parents; do
> >   process_split_commit "$rev" "$parents" 0;
> > done
> >
> > I downgraded git to 2.19.0-windows.1 and it works now.
> >
> >
> > I'm thankful for your insights
> > Nadav Sinai
> > Web Tech lead
> > Philips-Algotec
> >
> >
> >
> >
> >
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-12-09 15:26         ` Johannes Schindelin
@ 2019-12-09 15:31           ` Marc Balmer
  2019-12-09 19:38             ` Johannes Schindelin
  0 siblings, 1 reply; 33+ messages in thread
From: Marc Balmer @ 2019-12-09 15:31 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Strain, Roger L., git@vger.kernel.org, gitster@pobox.com,
	ns@nadavsinai.com, pclouds@gmail.com

Fwiw, I see the problem on Linux.

It hay nothing to do with overzealos antimalware, it is a regression and it has been well documented.


> Am 09.12.2019 um 17:20 schrieb Johannes Schindelin <Johannes.Schindelin@gmx.de>:
> 
> Hi,
> 
>> On Mon, 9 Dec 2019, Marc Balmer wrote:
>> 
>> I am not familiar with the source code, so I can not send in that
>> revert.  I can, however, say that I am grateful to whomever does it ;)
> 
> I am against reverting the change without knowing the root cause.
> 
> The recent reporter only compared Git for Windows v2.19.0 vs v2.20.1,
> which is _quite_ a big difference.
> 
> For what I know, the problem might be a change in the MSYS2 runtime that
> is mistaken by some malware for malicious code (we did introduce some code
> to emulate Ctrl+C in MinTTY which injects a remote thread and executes
> ExitProcess() there, which might very well be construed as an attack, even
> if it is actually very much desired behavior).
> 
> These segmentation faults in `git subtree` on Windows have traditionally
> been _all_ because of overzealous anti-malware.
> 
> So first, a much more fine-grained analysis would be required, e.g.
> comparing v2.20.1 against v2.20.0, then copying _just_ the `git-subtree`
> file from a working into a non-working version (or vice versa; I would
> highly recommend using the portable versions for such side-by side
> comparison).
> 
> Ciao,
> Johannes
> 
>> 
>> - Marc
>> 
>> 
>>>> Am 09.12.2019 um 15:18 schrieb Strain, Roger L. <roger.strain@swri.org>:
>>> 
>>> As I said, I'm using a custom script here. I don't know if anybody else
>>> benefited from the change and hasn't said anything, but I won't object
>>> to someone submitting that revert.
>>> 
>>> --
>>> Roger
>>> 
>>> -----Original Message-----
>>> From: Marc Balmer <marc@msys.ch>
>>> To: "Strain, Roger L." <roger.strain@swri.org>
>>> Cc: ns@nadavsinai.com <ns@nadavsinai.com>, git@vger.kernel.org <
>>> git@vger.kernel.org>, Johannes.Schindelin@gmx.de <
>>> Johannes.Schindelin@gmx.de>, gitster@pobox.com <gitster@pobox.com>,
>>> pclouds@gmail.com <pclouds@gmail.com>
>>> Subject: Re: Regression in git-subtree.sh, introduced in 2.20.1, after
>>> 315a84f9aa0e2e629b0680068646b0032518ebed
>>> Date: Mon, 09 Dec 2019 15:13:47 +0100
>>> 
>>> Roger,
>>> 
>>> I am all for reverting it. if that does not cause any other regressions
>>> or headaches (or both...)
>>> 
>>> - Marc
>>> 
>>> 
>>> 
>>> Am 09.12.2019 um 15:11 schrieb Strain, Roger L. <roger.strain@swri.org>
>>> :
>>> 
>>> I haven't been able to find anything relating to the issue, but I also
>>> haven't had a repo that exposes the problem to test more thoroughly
>>> against. If this happens to be a public repo somewhere, I'd be more
>>> than happy to take a second look.
>>> 
>>> That being said, if the community feels it would be better to revert
>>> the changes that were introduced, I won't object. I've had to further
>>> customize the script for our internal use, and those changes aren't
>>> something that would be useful for the public at large. (A few changes
>>> relate to the presence/absence of a specific file, which I certainly
>>> wouldn't expect anyone else to have.) Short story is we're going to
>>> have to use a custom script going forward, so keeping or reverting the
>>> changes here make no difference to us. I still feel that the changes
>>> which were made make the script more correct, but clearly there's some
>>> undiagnosed logic error somewhere.
>>> 
>>> Honestly, I'm surprised we didn't see this particular issue show up on
>>> our own repo; it's ridiculously large and complex. At least if it had,
>>> I'd be able to troubleshoot it more reliably.
>>> 
>>> --
>>> Roger Strain
>>> 
>>> -----Original Message-----
>>> From: Nadav SInai <ns@nadavsinai.com>
>>> To: roger.strain@swri.org
>>> Cc: Johannes.Schindelin@gmx.de, git@vger.kernel.org, gitster@pobox.com,
>>> marc@msys.ch, pclouds@gmail.com
>>> Subject: RE: Regression in git-subtree.sh, introduced in 2.20.1, after
>>> 315a84f9aa0e2e629b0680068646b0032518ebed
>>> Date: Sun, 08 Dec 2019 12:30:48 +0200
>>> 
>>> [EXTERNAL EMAIL]
>>> 
>>> Hi, I'm curious if any of you had any luck in preventing that
>>> seg-fault in git-subtree script
>>> I'm encountering it myself using git 2.24.0.windows.2., seg-fault is
>>> in the same while loop (currently on line 757)
>>> When I tried your suggestion of adding the ($parents) ($rev) to the
>>> progress print I see that the last commit have only one revision
>>> printed
>>> like this:
>>> 
>>> 259/290 (523) [271] (843dd34090d36dfabd6a2e3e8459a4887427313b)
>>> (a69ee056f66acf66c63f89f55d26c0cc17036623)
>>> 259/290 (525) [273] (f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
>>> (843dd34090d36dfabd6a2e3e8459a4887427313b)
>>> 259/290 (527) [275] (82303752a428cf1d789ac9f156008adb2798b7b5)
>>> (f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
>>> 259/290 (528) [276]
>>> (7187897883c9fb4d33d4c87a02b876f8603728ff05f0945ae2ce9f98a35135)
>>> 259/290 (529) [277]
>>> (a00a3665343439a426671958dd90ed0407a22cad9ac9f156008adb2798b7b5)
>>> 259/290 (530) [278]
>>> (90beb94ebd331c457d79d05341453f5829a50bfcd4c87a02b876f8603728ff)
>>> 259/290 (531) [279]
>>> (9582e0acbed1910173564e250f350b5cc4291a7f671958dd90ed0407a22cad)
>>> 259/290 (532) [280]
>>> (f183930d6fabd3dccdddc5ec35d754ad28caf3b879d05341453f5829a50bfc)
>>> 259/290 (533) [281]
>>> (c9309f3a38c41f7991d9e78ddb47f7e85b8521eb564e250f350b5cc4291a7f)
>>> 259/290 (534) [282]
>>> (3bcf08f63a0e2b93ecc376bd679a16c80e99e7b1ddc5ec35d754ad28caf3b8)
>>> 259/290 (535) [283]
>>> (134621bb55a0470cdf6519ce08d6909af43ce0e5d9e78ddb47f7e85b8521eb)
>>> 259/290 (536) [284]
>>> (edb3471fbba29748f9784d29b3cee1dee2df4b37c376bd679a16c80e99e7b1)
>>> 259/290 (537) [285]
>>> (dd947a095df07a32dfd56666a395a7c42b25ca116519ce08d6909af43ce0e5)
>>> 259/290 (538) [286]
>>> (a639e09d2cbe1ea1149c080c1c95b8b018340ae2784d29b3cee1dee2df4b37)
>>> C:/Program Files/Git/mingw64/libexec/git-core\git-subtree: line 757:
>>> 8853 Done                    eval "$grl"
>>>    8854 Segmentation fault      (core dumped) | while read rev
>>> parents; do
>>>  process_split_commit "$rev" "$parents" 0;
>>> done
>>> 
>>> I downgraded git to 2.19.0-windows.1 and it works now.
>>> 
>>> 
>>> I'm thankful for your insights
>>> Nadav Sinai
>>> Web Tech lead
>>> Philips-Algotec
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-12-09 11:45   ` Ed Maste
@ 2019-12-09 16:19     ` Strain, Roger L.
  0 siblings, 0 replies; 33+ messages in thread
From: Strain, Roger L. @ 2019-12-09 16:19 UTC (permalink / raw)
  To: emaste@freebsd.org; +Cc: git@vger.kernel.org, marc@msys.ch

So it's been quite a while since I made this specific change, but I'll
attach the relevant portion of the diff below. I may be completely
misremembering portions, and apologize in advance. This was based on an
earlier version of the script, and I can see some other changes have
been made since I forked, but perhaps this will still explain what I
tried to do to work around our problem.

Within process_split_commit, there's logic that tries to distinguish
between commits which are mainline and commits which are subtree.
There's even a comment in the relevant section asking "Is there no
better way? Does it matter?" Well, the answer was yes, it mattered,
because we were picking up mainline commits that there before the
initial add of a subtree, and those were getting sucked in as if they
were subtree commits, and then all the remaining hashes were off.

What this change was meant to do was to check for the existence of a
single, known file. We keep a file called "subtrees.csv" in the root of
our mainline repo, and it defines the various subtrees that comprise
the mainline. Therefore, if that file exists, I can say with certainty
that it is a mainline commit. So when that dodgy check comes up, it
checks for the file first, then falls back to the old behavior.

Partial diff follows, feel free to try it out if it sounds like a
similar problem that you're facing. Change the specific filename for
your needs, obviously.

To be clear, this is NOT something I'm submitting for inclusion in the
general release; it's very repo-specific, and I just hope it might help
a fellow soul.

@@ -506,6 +499,20 @@ subtree_for_commit () {
        done
 }
 
+subtree_for_csv () {
+       commit="$1"
+       dir="$2"
+       git ls-tree "$commit" -- "$dir" |
+       while read mode type tree name
+       do
+               assert test "$name" = "$dir"
+               assert test "$type" = "blob" -o "$type" = "commit"
+               test "$type" = "commit" && continue  # ignore
submodules
+               echo $tree
+               break
+       done
+}
+
 tree_changed () {
        tree=$1
        shift
@@ -667,9 +674,17 @@ process_split_commit () {
        if test -z "$tree"
        then
                set_notree "$rev"
-               if test -n "$newparents"
+               subtreescsv=$(subtree_for_csv "$rev" "subtrees.csv")
+               debug "${indentprefix}  subtrees.csv tree is:
$subtreescsv"
+
+               # ugly.  is there no better way to tell if this is a
subtree
+               # vs. a mainline commit?  Does it matter?
+               if test -z "$subtreescsv"
                then
-                       cache_set "$rev" "$rev"
+                       if test -n "$newparents"
+                       then
+                               cache_set "$rev" "$rev"
+                       fi
                fi
                return
        fi


-- 
Roger Strain

-----Original Message-----
From: Ed Maste <emaste@freebsd.org>
To: "Strain, Roger L." <roger.strain@swri.org>
Cc: git@vger.kernel.org <git@vger.kernel.org>, marc@msys.ch <
marc@msys.ch>
Subject: Re: Regression in git-subtree.sh, introduced in 2.20.1, after
315a84f9aa0e2e629b0680068646b0032518ebed
Date: Mon, 09 Dec 2019 06:45:57 -0500

[EXTERNAL EMAIL]

On Mon, 9 Dec 2019 at 09:29, Strain, Roger L. <roger.strain@swri.org>
wrote:

I've had to further
customize the script for our internal use, and those changes aren't
something that would be useful for the public at large.

Would you describe the sort of problem you have to work around with
custom changes?

I'm starting on a path of trying to fix git-subtree for failures[1]
encountered in a prototype conversion of the FreeBSD repository from
svn to git. The misbehaviour I encounter occurs when split encounters
a commit for which the path being split is empty in 'git ls-tree', and
the commit is actually not a subtree commit. I'm currently
experimenting with hacks to skip specific hashes during the initial
subtree split. On reading your mail I realize I could address my issue
by testing for the existence of a specific file though, which makes me
wonder if the issue you have is similar.

[1] 
https://lore.kernel.org/git/CAPyFy2AsmaxU-BDf_teZJE5hiaVpTSZc8fftnuXPb_4-j7j5Fw@mail.gmail.com/



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-12-09 15:31           ` Marc Balmer
@ 2019-12-09 19:38             ` Johannes Schindelin
  2019-12-11  5:43               ` Tom Clarkson
  0 siblings, 1 reply; 33+ messages in thread
From: Johannes Schindelin @ 2019-12-09 19:38 UTC (permalink / raw)
  To: Marc Balmer
  Cc: Strain, Roger L., git@vger.kernel.org, gitster@pobox.com,
	ns@nadavsinai.com, pclouds@gmail.com

[-- Attachment #1: Type: text/plain, Size: 7036 bytes --]

Hi Marc,

On Mon, 9 Dec 2019, Marc Balmer wrote:

> Fwiw, I see the problem on Linux.

Okay, I must have overlooked that piece of information.

> It hay nothing to do with overzealos antimalware, it is a regression and
> it has been well documented.

Is there a minimal, complete and verifiable example that other developers
could use to analyze the bug?

Ciao,
Johannes

>
>
> > Am 09.12.2019 um 17:20 schrieb Johannes Schindelin <Johannes.Schindelin@gmx.de>:
> >
> > Hi,
> >
> >> On Mon, 9 Dec 2019, Marc Balmer wrote:
> >>
> >> I am not familiar with the source code, so I can not send in that
> >> revert.  I can, however, say that I am grateful to whomever does it ;)
> >
> > I am against reverting the change without knowing the root cause.
> >
> > The recent reporter only compared Git for Windows v2.19.0 vs v2.20.1,
> > which is _quite_ a big difference.
> >
> > For what I know, the problem might be a change in the MSYS2 runtime that
> > is mistaken by some malware for malicious code (we did introduce some code
> > to emulate Ctrl+C in MinTTY which injects a remote thread and executes
> > ExitProcess() there, which might very well be construed as an attack, even
> > if it is actually very much desired behavior).
> >
> > These segmentation faults in `git subtree` on Windows have traditionally
> > been _all_ because of overzealous anti-malware.
> >
> > So first, a much more fine-grained analysis would be required, e.g.
> > comparing v2.20.1 against v2.20.0, then copying _just_ the `git-subtree`
> > file from a working into a non-working version (or vice versa; I would
> > highly recommend using the portable versions for such side-by side
> > comparison).
> >
> > Ciao,
> > Johannes
> >
> >>
> >> - Marc
> >>
> >>
> >>>> Am 09.12.2019 um 15:18 schrieb Strain, Roger L. <roger.strain@swri.org>:
> >>>
> >>> As I said, I'm using a custom script here. I don't know if anybody else
> >>> benefited from the change and hasn't said anything, but I won't object
> >>> to someone submitting that revert.
> >>>
> >>> --
> >>> Roger
> >>>
> >>> -----Original Message-----
> >>> From: Marc Balmer <marc@msys.ch>
> >>> To: "Strain, Roger L." <roger.strain@swri.org>
> >>> Cc: ns@nadavsinai.com <ns@nadavsinai.com>, git@vger.kernel.org <
> >>> git@vger.kernel.org>, Johannes.Schindelin@gmx.de <
> >>> Johannes.Schindelin@gmx.de>, gitster@pobox.com <gitster@pobox.com>,
> >>> pclouds@gmail.com <pclouds@gmail.com>
> >>> Subject: Re: Regression in git-subtree.sh, introduced in 2.20.1, after
> >>> 315a84f9aa0e2e629b0680068646b0032518ebed
> >>> Date: Mon, 09 Dec 2019 15:13:47 +0100
> >>>
> >>> Roger,
> >>>
> >>> I am all for reverting it. if that does not cause any other regressions
> >>> or headaches (or both...)
> >>>
> >>> - Marc
> >>>
> >>>
> >>>
> >>> Am 09.12.2019 um 15:11 schrieb Strain, Roger L. <roger.strain@swri.org>
> >>> :
> >>>
> >>> I haven't been able to find anything relating to the issue, but I also
> >>> haven't had a repo that exposes the problem to test more thoroughly
> >>> against. If this happens to be a public repo somewhere, I'd be more
> >>> than happy to take a second look.
> >>>
> >>> That being said, if the community feels it would be better to revert
> >>> the changes that were introduced, I won't object. I've had to further
> >>> customize the script for our internal use, and those changes aren't
> >>> something that would be useful for the public at large. (A few changes
> >>> relate to the presence/absence of a specific file, which I certainly
> >>> wouldn't expect anyone else to have.) Short story is we're going to
> >>> have to use a custom script going forward, so keeping or reverting the
> >>> changes here make no difference to us. I still feel that the changes
> >>> which were made make the script more correct, but clearly there's some
> >>> undiagnosed logic error somewhere.
> >>>
> >>> Honestly, I'm surprised we didn't see this particular issue show up on
> >>> our own repo; it's ridiculously large and complex. At least if it had,
> >>> I'd be able to troubleshoot it more reliably.
> >>>
> >>> --
> >>> Roger Strain
> >>>
> >>> -----Original Message-----
> >>> From: Nadav SInai <ns@nadavsinai.com>
> >>> To: roger.strain@swri.org
> >>> Cc: Johannes.Schindelin@gmx.de, git@vger.kernel.org, gitster@pobox.com,
> >>> marc@msys.ch, pclouds@gmail.com
> >>> Subject: RE: Regression in git-subtree.sh, introduced in 2.20.1, after
> >>> 315a84f9aa0e2e629b0680068646b0032518ebed
> >>> Date: Sun, 08 Dec 2019 12:30:48 +0200
> >>>
> >>> [EXTERNAL EMAIL]
> >>>
> >>> Hi, I'm curious if any of you had any luck in preventing that
> >>> seg-fault in git-subtree script
> >>> I'm encountering it myself using git 2.24.0.windows.2., seg-fault is
> >>> in the same while loop (currently on line 757)
> >>> When I tried your suggestion of adding the ($parents) ($rev) to the
> >>> progress print I see that the last commit have only one revision
> >>> printed
> >>> like this:
> >>>
> >>> 259/290 (523) [271] (843dd34090d36dfabd6a2e3e8459a4887427313b)
> >>> (a69ee056f66acf66c63f89f55d26c0cc17036623)
> >>> 259/290 (525) [273] (f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
> >>> (843dd34090d36dfabd6a2e3e8459a4887427313b)
> >>> 259/290 (527) [275] (82303752a428cf1d789ac9f156008adb2798b7b5)
> >>> (f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
> >>> 259/290 (528) [276]
> >>> (7187897883c9fb4d33d4c87a02b876f8603728ff05f0945ae2ce9f98a35135)
> >>> 259/290 (529) [277]
> >>> (a00a3665343439a426671958dd90ed0407a22cad9ac9f156008adb2798b7b5)
> >>> 259/290 (530) [278]
> >>> (90beb94ebd331c457d79d05341453f5829a50bfcd4c87a02b876f8603728ff)
> >>> 259/290 (531) [279]
> >>> (9582e0acbed1910173564e250f350b5cc4291a7f671958dd90ed0407a22cad)
> >>> 259/290 (532) [280]
> >>> (f183930d6fabd3dccdddc5ec35d754ad28caf3b879d05341453f5829a50bfc)
> >>> 259/290 (533) [281]
> >>> (c9309f3a38c41f7991d9e78ddb47f7e85b8521eb564e250f350b5cc4291a7f)
> >>> 259/290 (534) [282]
> >>> (3bcf08f63a0e2b93ecc376bd679a16c80e99e7b1ddc5ec35d754ad28caf3b8)
> >>> 259/290 (535) [283]
> >>> (134621bb55a0470cdf6519ce08d6909af43ce0e5d9e78ddb47f7e85b8521eb)
> >>> 259/290 (536) [284]
> >>> (edb3471fbba29748f9784d29b3cee1dee2df4b37c376bd679a16c80e99e7b1)
> >>> 259/290 (537) [285]
> >>> (dd947a095df07a32dfd56666a395a7c42b25ca116519ce08d6909af43ce0e5)
> >>> 259/290 (538) [286]
> >>> (a639e09d2cbe1ea1149c080c1c95b8b018340ae2784d29b3cee1dee2df4b37)
> >>> C:/Program Files/Git/mingw64/libexec/git-core\git-subtree: line 757:
> >>> 8853 Done                    eval "$grl"
> >>>    8854 Segmentation fault      (core dumped) | while read rev
> >>> parents; do
> >>>  process_split_commit "$rev" "$parents" 0;
> >>> done
> >>>
> >>> I downgraded git to 2.19.0-windows.1 and it works now.
> >>>
> >>>
> >>> I'm thankful for your insights
> >>> Nadav Sinai
> >>> Web Tech lead
> >>> Philips-Algotec
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-12-09 19:38             ` Johannes Schindelin
@ 2019-12-11  5:43               ` Tom Clarkson
  2019-12-11 14:39                 ` Strain, Roger L.
  0 siblings, 1 reply; 33+ messages in thread
From: Tom Clarkson @ 2019-12-11  5:43 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Marc Balmer, Strain, Roger L., git@vger.kernel.org,
	gitster@pobox.com, ns@nadavsinai.com, pclouds@gmail.com


> Is there a minimal, complete and verifiable example that other developers
> could use to analyze the bug?

I ran into this bug today, and while not much closer to a solution, I believe I understand why it is happening. 

The recursive search is required because the original rev-list based approach could leave out some relevant commits - ie reversion would replace an obvious bug with a hidden one.

The search will stop when it reaches either a root commit or one already mapped to a subtree commit.

If you have a small repo or run git subtree add directly on your master branch, the search will terminate fairly quickly.

However, if you do everything via pull requests, the search will hit a merge commit where one side is just ahead of the subtree mapping, but the other side is several thousand commits with no sign of either a root or any subtrees.

I’m not sure yet if it is the number of commits or merges specifically, but the script seems to be able to handle around 400-500 commits before it falls over.

>> 
>> 
>>> Am 09.12.2019 um 17:20 schrieb Johannes Schindelin <Johannes.Schindelin@gmx.de>:
>>> 
>>> Hi,
>>> 
>>>> On Mon, 9 Dec 2019, Marc Balmer wrote:
>>>> 
>>>> I am not familiar with the source code, so I can not send in that
>>>> revert.  I can, however, say that I am grateful to whomever does it ;)
>>> 
>>> I am against reverting the change without knowing the root cause.
>>> 
>>> The recent reporter only compared Git for Windows v2.19.0 vs v2.20.1,
>>> which is _quite_ a big difference.
>>> 
>>> For what I know, the problem might be a change in the MSYS2 runtime that
>>> is mistaken by some malware for malicious code (we did introduce some code
>>> to emulate Ctrl+C in MinTTY which injects a remote thread and executes
>>> ExitProcess() there, which might very well be construed as an attack, even
>>> if it is actually very much desired behavior).
>>> 
>>> These segmentation faults in `git subtree` on Windows have traditionally
>>> been _all_ because of overzealous anti-malware.
>>> 
>>> So first, a much more fine-grained analysis would be required, e.g.
>>> comparing v2.20.1 against v2.20.0, then copying _just_ the `git-subtree`
>>> file from a working into a non-working version (or vice versa; I would
>>> highly recommend using the portable versions for such side-by side
>>> comparison).
>>> 
>>> Ciao,
>>> Johannes
>>> 
>>>> 
>>>> - Marc
>>>> 
>>>> 
>>>>>> Am 09.12.2019 um 15:18 schrieb Strain, Roger L. <roger.strain@swri.org>:
>>>>> 
>>>>> As I said, I'm using a custom script here. I don't know if anybody else
>>>>> benefited from the change and hasn't said anything, but I won't object
>>>>> to someone submitting that revert.
>>>>> 
>>>>> --
>>>>> Roger
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Marc Balmer <marc@msys.ch>
>>>>> To: "Strain, Roger L." <roger.strain@swri.org>
>>>>> Cc: ns@nadavsinai.com <ns@nadavsinai.com>, git@vger.kernel.org <
>>>>> git@vger.kernel.org>, Johannes.Schindelin@gmx.de <
>>>>> Johannes.Schindelin@gmx.de>, gitster@pobox.com <gitster@pobox.com>,
>>>>> pclouds@gmail.com <pclouds@gmail.com>
>>>>> Subject: Re: Regression in git-subtree.sh, introduced in 2.20.1, after
>>>>> 315a84f9aa0e2e629b0680068646b0032518ebed
>>>>> Date: Mon, 09 Dec 2019 15:13:47 +0100
>>>>> 
>>>>> Roger,
>>>>> 
>>>>> I am all for reverting it. if that does not cause any other regressions
>>>>> or headaches (or both...)
>>>>> 
>>>>> - Marc
>>>>> 
>>>>> 
>>>>> 
>>>>> Am 09.12.2019 um 15:11 schrieb Strain, Roger L. <roger.strain@swri.org>
>>>>> :
>>>>> 
>>>>> I haven't been able to find anything relating to the issue, but I also
>>>>> haven't had a repo that exposes the problem to test more thoroughly
>>>>> against. If this happens to be a public repo somewhere, I'd be more
>>>>> than happy to take a second look.
>>>>> 
>>>>> That being said, if the community feels it would be better to revert
>>>>> the changes that were introduced, I won't object. I've had to further
>>>>> customize the script for our internal use, and those changes aren't
>>>>> something that would be useful for the public at large. (A few changes
>>>>> relate to the presence/absence of a specific file, which I certainly
>>>>> wouldn't expect anyone else to have.) Short story is we're going to
>>>>> have to use a custom script going forward, so keeping or reverting the
>>>>> changes here make no difference to us. I still feel that the changes
>>>>> which were made make the script more correct, but clearly there's some
>>>>> undiagnosed logic error somewhere.
>>>>> 
>>>>> Honestly, I'm surprised we didn't see this particular issue show up on
>>>>> our own repo; it's ridiculously large and complex. At least if it had,
>>>>> I'd be able to troubleshoot it more reliably.
>>>>> 
>>>>> --
>>>>> Roger Strain
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Nadav SInai <ns@nadavsinai.com>
>>>>> To: roger.strain@swri.org
>>>>> Cc: Johannes.Schindelin@gmx.de, git@vger.kernel.org, gitster@pobox.com,
>>>>> marc@msys.ch, pclouds@gmail.com
>>>>> Subject: RE: Regression in git-subtree.sh, introduced in 2.20.1, after
>>>>> 315a84f9aa0e2e629b0680068646b0032518ebed
>>>>> Date: Sun, 08 Dec 2019 12:30:48 +0200
>>>>> 
>>>>> [EXTERNAL EMAIL]
>>>>> 
>>>>> Hi, I'm curious if any of you had any luck in preventing that
>>>>> seg-fault in git-subtree script
>>>>> I'm encountering it myself using git 2.24.0.windows.2., seg-fault is
>>>>> in the same while loop (currently on line 757)
>>>>> When I tried your suggestion of adding the ($parents) ($rev) to the
>>>>> progress print I see that the last commit have only one revision
>>>>> printed
>>>>> like this:
>>>>> 
>>>>> 259/290 (523) [271] (843dd34090d36dfabd6a2e3e8459a4887427313b)
>>>>> (a69ee056f66acf66c63f89f55d26c0cc17036623)
>>>>> 259/290 (525) [273] (f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
>>>>> (843dd34090d36dfabd6a2e3e8459a4887427313b)
>>>>> 259/290 (527) [275] (82303752a428cf1d789ac9f156008adb2798b7b5)
>>>>> (f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
>>>>> 259/290 (528) [276]
>>>>> (7187897883c9fb4d33d4c87a02b876f8603728ff05f0945ae2ce9f98a35135)
>>>>> 259/290 (529) [277]
>>>>> (a00a3665343439a426671958dd90ed0407a22cad9ac9f156008adb2798b7b5)
>>>>> 259/290 (530) [278]
>>>>> (90beb94ebd331c457d79d05341453f5829a50bfcd4c87a02b876f8603728ff)
>>>>> 259/290 (531) [279]
>>>>> (9582e0acbed1910173564e250f350b5cc4291a7f671958dd90ed0407a22cad)
>>>>> 259/290 (532) [280]
>>>>> (f183930d6fabd3dccdddc5ec35d754ad28caf3b879d05341453f5829a50bfc)
>>>>> 259/290 (533) [281]
>>>>> (c9309f3a38c41f7991d9e78ddb47f7e85b8521eb564e250f350b5cc4291a7f)
>>>>> 259/290 (534) [282]
>>>>> (3bcf08f63a0e2b93ecc376bd679a16c80e99e7b1ddc5ec35d754ad28caf3b8)
>>>>> 259/290 (535) [283]
>>>>> (134621bb55a0470cdf6519ce08d6909af43ce0e5d9e78ddb47f7e85b8521eb)
>>>>> 259/290 (536) [284]
>>>>> (edb3471fbba29748f9784d29b3cee1dee2df4b37c376bd679a16c80e99e7b1)
>>>>> 259/290 (537) [285]
>>>>> (dd947a095df07a32dfd56666a395a7c42b25ca116519ce08d6909af43ce0e5)
>>>>> 259/290 (538) [286]
>>>>> (a639e09d2cbe1ea1149c080c1c95b8b018340ae2784d29b3cee1dee2df4b37)
>>>>> C:/Program Files/Git/mingw64/libexec/git-core\git-subtree: line 757:
>>>>> 8853 Done                    eval "$grl"
>>>>>   8854 Segmentation fault      (core dumped) | while read rev
>>>>> parents; do
>>>>> process_split_commit "$rev" "$parents" 0;
>>>>> done
>>>>> 
>>>>> I downgraded git to 2.19.0-windows.1 and it works now.
>>>>> 
>>>>> 
>>>>> I'm thankful for your insights
>>>>> Nadav Sinai
>>>>> Web Tech lead
>>>>> Philips-Algotec
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-12-11  5:43               ` Tom Clarkson
@ 2019-12-11 14:39                 ` Strain, Roger L.
  2019-12-12  5:02                   ` Tom Clarkson
  0 siblings, 1 reply; 33+ messages in thread
From: Strain, Roger L. @ 2019-12-11 14:39 UTC (permalink / raw)
  To: Johannes.Schindelin@gmx.de, tqclarkson@icloud.com
  Cc: git@vger.kernel.org, gitster@pobox.com, ns@nadavsinai.com,
	marc@msys.ch, pclouds@gmail.com

The comment about "400-500 commits" is interesting to me; our repos
regularly have to process thousands (!) of commits like this, which is
part of the reason I felt a need to try to fix the behavior. When I
manage to get things into a reasonably sane state, the commit count may
stay in the hundreds, but if things go too long, I've regularly seen
counts well over 5000.

This makes me wonder if the problem is perhaps related to the hardware
involved; maybe the algorithm is doing exactly what it should, but the
available RAM isn't sufficient. If that's the problem, perhaps we could
find a way to perform the recursive work without using actual
recursion, reducing the number of instances on the stack.

-- 
Roger Strain

On Wed, 2019-12-11 at 16:43 +1100, Tom Clarkson wrote:
> > Is there a minimal, complete and verifiable example that other
> > developers
> > could use to analyze the bug?
> 
> I ran into this bug today, and while not much closer to a solution, I
> believe I understand why it is happening. 
> 
> The recursive search is required because the original rev-list based
> approach could leave out some relevant commits - ie reversion would
> replace an obvious bug with a hidden one.
> 
> The search will stop when it reaches either a root commit or one
> already mapped to a subtree commit.
> 
> If you have a small repo or run git subtree add directly on your
> master branch, the search will terminate fairly quickly.
> 
> However, if you do everything via pull requests, the search will hit
> a merge commit where one side is just ahead of the subtree mapping,
> but the other side is several thousand commits with no sign of either
> a root or any subtrees.
> 
> I’m not sure yet if it is the number of commits or merges
> specifically, but the script seems to be able to handle around 400-
> 500 commits before it falls over.
> 
> > > 
> > > 
> > > > Am 09.12.2019 um 17:20 schrieb Johannes Schindelin <
> > > > Johannes.Schindelin@gmx.de>:
> > > > 
> > > > Hi,
> > > > 
> > > > > On Mon, 9 Dec 2019, Marc Balmer wrote:
> > > > > 
> > > > > I am not familiar with the source code, so I can not send in
> > > > > that
> > > > > revert.  I can, however, say that I am grateful to whomever
> > > > > does it ;)
> > > > 
> > > > I am against reverting the change without knowing the root
> > > > cause.
> > > > 
> > > > The recent reporter only compared Git for Windows v2.19.0 vs
> > > > v2.20.1,
> > > > which is _quite_ a big difference.
> > > > 
> > > > For what I know, the problem might be a change in the MSYS2
> > > > runtime that
> > > > is mistaken by some malware for malicious code (we did
> > > > introduce some code
> > > > to emulate Ctrl+C in MinTTY which injects a remote thread and
> > > > executes
> > > > ExitProcess() there, which might very well be construed as an
> > > > attack, even
> > > > if it is actually very much desired behavior).
> > > > 
> > > > These segmentation faults in `git subtree` on Windows have
> > > > traditionally
> > > > been _all_ because of overzealous anti-malware.
> > > > 
> > > > So first, a much more fine-grained analysis would be required,
> > > > e.g.
> > > > comparing v2.20.1 against v2.20.0, then copying _just_ the
> > > > `git-subtree`
> > > > file from a working into a non-working version (or vice versa;
> > > > I would
> > > > highly recommend using the portable versions for such side-by
> > > > side
> > > > comparison).
> > > > 
> > > > Ciao,
> > > > Johannes
> > > > 
> > > > > 
> > > > > - Marc
> > > > > 
> > > > > 
> > > > > > > Am 09.12.2019 um 15:18 schrieb Strain, Roger L. <
> > > > > > > roger.strain@swri.org>:
> > > > > > 
> > > > > > As I said, I'm using a custom script here. I don't know if
> > > > > > anybody else
> > > > > > benefited from the change and hasn't said anything, but I
> > > > > > won't object
> > > > > > to someone submitting that revert.
> > > > > > 
> > > > > > --
> > > > > > Roger
> > > > > > 
> > > > > > -----Original Message-----
> > > > > > From: Marc Balmer <marc@msys.ch>
> > > > > > To: "Strain, Roger L." <roger.strain@swri.org>
> > > > > > Cc: ns@nadavsinai.com <ns@nadavsinai.com>, 
> > > > > > git@vger.kernel.org <
> > > > > > git@vger.kernel.org>, Johannes.Schindelin@gmx.de <
> > > > > > Johannes.Schindelin@gmx.de>, gitster@pobox.com <
> > > > > > gitster@pobox.com>,
> > > > > > pclouds@gmail.com <pclouds@gmail.com>
> > > > > > Subject: Re: Regression in git-subtree.sh, introduced in
> > > > > > 2.20.1, after
> > > > > > 315a84f9aa0e2e629b0680068646b0032518ebed
> > > > > > Date: Mon, 09 Dec 2019 15:13:47 +0100
> > > > > > 
> > > > > > Roger,
> > > > > > 
> > > > > > I am all for reverting it. if that does not cause any other
> > > > > > regressions
> > > > > > or headaches (or both...)
> > > > > > 
> > > > > > - Marc
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Am 09.12.2019 um 15:11 schrieb Strain, Roger L. <
> > > > > > roger.strain@swri.org>
> > > > > > :
> > > > > > 
> > > > > > I haven't been able to find anything relating to the issue,
> > > > > > but I also
> > > > > > haven't had a repo that exposes the problem to test more
> > > > > > thoroughly
> > > > > > against. If this happens to be a public repo somewhere, I'd
> > > > > > be more
> > > > > > than happy to take a second look.
> > > > > > 
> > > > > > That being said, if the community feels it would be better
> > > > > > to revert
> > > > > > the changes that were introduced, I won't object. I've had
> > > > > > to further
> > > > > > customize the script for our internal use, and those
> > > > > > changes aren't
> > > > > > something that would be useful for the public at large. (A
> > > > > > few changes
> > > > > > relate to the presence/absence of a specific file, which I
> > > > > > certainly
> > > > > > wouldn't expect anyone else to have.) Short story is we're
> > > > > > going to
> > > > > > have to use a custom script going forward, so keeping or
> > > > > > reverting the
> > > > > > changes here make no difference to us. I still feel that
> > > > > > the changes
> > > > > > which were made make the script more correct, but clearly
> > > > > > there's some
> > > > > > undiagnosed logic error somewhere.
> > > > > > 
> > > > > > Honestly, I'm surprised we didn't see this particular issue
> > > > > > show up on
> > > > > > our own repo; it's ridiculously large and complex. At least
> > > > > > if it had,
> > > > > > I'd be able to troubleshoot it more reliably.
> > > > > > 
> > > > > > --
> > > > > > Roger Strain
> > > > > > 
> > > > > > -----Original Message-----
> > > > > > From: Nadav SInai <ns@nadavsinai.com>
> > > > > > To: roger.strain@swri.org
> > > > > > Cc: Johannes.Schindelin@gmx.de, git@vger.kernel.org, 
> > > > > > gitster@pobox.com,
> > > > > > marc@msys.ch, pclouds@gmail.com
> > > > > > Subject: RE: Regression in git-subtree.sh, introduced in
> > > > > > 2.20.1, after
> > > > > > 315a84f9aa0e2e629b0680068646b0032518ebed
> > > > > > Date: Sun, 08 Dec 2019 12:30:48 +0200
> > > > > > 
> > > > > > [EXTERNAL EMAIL]
> > > > > > 
> > > > > > Hi, I'm curious if any of you had any luck in preventing
> > > > > > that
> > > > > > seg-fault in git-subtree script
> > > > > > I'm encountering it myself using git 2.24.0.windows.2.,
> > > > > > seg-fault is
> > > > > > in the same while loop (currently on line 757)
> > > > > > When I tried your suggestion of adding the ($parents)
> > > > > > ($rev) to the
> > > > > > progress print I see that the last commit have only one
> > > > > > revision
> > > > > > printed
> > > > > > like this:
> > > > > > 
> > > > > > 259/290 (523) [271]
> > > > > > (843dd34090d36dfabd6a2e3e8459a4887427313b)
> > > > > > (a69ee056f66acf66c63f89f55d26c0cc17036623)
> > > > > > 259/290 (525) [273]
> > > > > > (f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
> > > > > > (843dd34090d36dfabd6a2e3e8459a4887427313b)
> > > > > > 259/290 (527) [275]
> > > > > > (82303752a428cf1d789ac9f156008adb2798b7b5)
> > > > > > (f5eea1a3cbe1e16acba53e8a9fe07b6525a8b97c)
> > > > > > 259/290 (528) [276]
> > > > > > (7187897883c9fb4d33d4c87a02b876f8603728ff05f0945ae2ce9f98a3
> > > > > > 5135)
> > > > > > 259/290 (529) [277]
> > > > > > (a00a3665343439a426671958dd90ed0407a22cad9ac9f156008adb2798
> > > > > > b7b5)
> > > > > > 259/290 (530) [278]
> > > > > > (90beb94ebd331c457d79d05341453f5829a50bfcd4c87a02b876f86037
> > > > > > 28ff)
> > > > > > 259/290 (531) [279]
> > > > > > (9582e0acbed1910173564e250f350b5cc4291a7f671958dd90ed0407a2
> > > > > > 2cad)
> > > > > > 259/290 (532) [280]
> > > > > > (f183930d6fabd3dccdddc5ec35d754ad28caf3b879d05341453f5829a5
> > > > > > 0bfc)
> > > > > > 259/290 (533) [281]
> > > > > > (c9309f3a38c41f7991d9e78ddb47f7e85b8521eb564e250f350b5cc429
> > > > > > 1a7f)
> > > > > > 259/290 (534) [282]
> > > > > > (3bcf08f63a0e2b93ecc376bd679a16c80e99e7b1ddc5ec35d754ad28ca
> > > > > > f3b8)
> > > > > > 259/290 (535) [283]
> > > > > > (134621bb55a0470cdf6519ce08d6909af43ce0e5d9e78ddb47f7e85b85
> > > > > > 21eb)
> > > > > > 259/290 (536) [284]
> > > > > > (edb3471fbba29748f9784d29b3cee1dee2df4b37c376bd679a16c80e99
> > > > > > e7b1)
> > > > > > 259/290 (537) [285]
> > > > > > (dd947a095df07a32dfd56666a395a7c42b25ca116519ce08d6909af43c
> > > > > > e0e5)
> > > > > > 259/290 (538) [286]
> > > > > > (a639e09d2cbe1ea1149c080c1c95b8b018340ae2784d29b3cee1dee2df
> > > > > > 4b37)
> > > > > > C:/Program Files/Git/mingw64/libexec/git-core\git-subtree:
> > > > > > line 757:
> > > > > > 8853 Done                    eval "$grl"
> > > > > >   8854 Segmentation fault      (core dumped) | while read
> > > > > > rev
> > > > > > parents; do
> > > > > > process_split_commit "$rev" "$parents" 0;
> > > > > > done
> > > > > > 
> > > > > > I downgraded git to 2.19.0-windows.1 and it works now.
> > > > > > 
> > > > > > 
> > > > > > I'm thankful for your insights
> > > > > > Nadav Sinai
> > > > > > Web Tech lead
> > > > > > Philips-Algotec
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > > 
> > > 
> > > 
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-12-11 14:39                 ` Strain, Roger L.
@ 2019-12-12  5:02                   ` Tom Clarkson
  2019-12-13 13:41                     ` Johannes Schindelin
  2019-12-16  3:50                     ` Tom Clarkson
  0 siblings, 2 replies; 33+ messages in thread
From: Tom Clarkson @ 2019-12-12  5:02 UTC (permalink / raw)
  To: Strain, Roger L.
  Cc: Johannes.Schindelin@gmx.de, git@vger.kernel.org,
	gitster@pobox.com, ns@nadavsinai.com, marc@msys.ch,
	pclouds@gmail.com


> This makes me wonder if the problem is perhaps related to the hardware
> involved; maybe the algorithm is doing exactly what it should, but the
> available RAM isn't sufficient. If that's the problem, perhaps we could
> find a way to perform the recursive work without using actual
> recursion, reducing the number of instances on the stack.

It’s not so much hardware as OS I think - After adding stack depth (the indent parameter on check_parents) to the log, I have been able to get different results with ulimit settings.

With the default stack size on macOS of 8MB, It falls over at depth 445. Being less than the shortest path to the root commit, that matches my initial count, which was just the number of lines in the log.

Reducing the stack size with ulimit -s 4096 makes it fall over at 225

Increasing to the hard limit of 64MB should allow a depth of around 4000, and as it turns out that did allow the script to complete, reaching a maximum depth of 1148.

I’m not seeing any issues with the hashes being wrong (all show no parents or subtree) but processing all those commits that resolve to nothing does take forever.

The mainline commit test seems to work ok on my repo, but it’s fairly easy to see scenarios where it would break, such as having a  subfolder with the same name within the subtree.

So while part of the fix will be a more reliable test, it also needs to work before parent commits are processed to mitigate the recursion issues.

The rules I have  come up with so far are below. There are still scenarios where the recursion is unavoidable such as running an initial split on a large repo, but that should be much less common than using a small subtree with a more complex existing repo.

In the initial setup of cmd_split, collect some extra information:

	- Add rev-list of all git-subtree-split values to the cache. I’d expect subtrees to usually be smaller than mainline, but since we can do that non-recursively we may as well.

	- Find the git-subtree-mainline value from subtree add/rejoin. Anything in its rev list should only be reachable by mainline commits. If not (which probably requires doing something convoluted like having subtree include mainline as its own subtree), this is a good place to check that and fall back to the existing behavior. 


When processing each commit:

If no prior splits were found, we only have mainline commits.

 	- If $dir exists, it is a mainline commit needing copy - use existing process.
	- If $dir does not exist, it is a mainline commit that will map to nothing - no need to process further.

If we do have some known subtree commits:

	- If it is in the cache, it is a subtree commit we don’t need to process further.
	- If subtree root is not reachable (rev-list or merge-base), must be mainline pre subtree add. Map to nothing and skip further processing.
	- If any subtree root is reachable, could be either mainline commit with subtree merged in, or subtree commit newer than the last add/squash (subtree pull/merge without squash does not use a custom commit message)
		- If $dir does not exist, must be subtree - add to the cache as mapped to self, no need to process parents.
		- If the folder does exist, it is  either a mainline commit to be processed normally, or a subtree that happens to contain a folder with the same name.  Check if mainline root is reachable.




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-12-12  5:02                   ` Tom Clarkson
@ 2019-12-13 13:41                     ` Johannes Schindelin
  2019-12-14  8:29                       ` Marc Balmer
  2019-12-16  3:50                     ` Tom Clarkson
  1 sibling, 1 reply; 33+ messages in thread
From: Johannes Schindelin @ 2019-12-13 13:41 UTC (permalink / raw)
  To: Tom Clarkson
  Cc: Strain, Roger L., git@vger.kernel.org, gitster@pobox.com,
	ns@nadavsinai.com, marc@msys.ch, pclouds@gmail.com

[-- Attachment #1: Type: text/plain, Size: 3707 bytes --]

Hi Tom,

On Thu, 12 Dec 2019, Tom Clarkson wrote:

>
> > This makes me wonder if the problem is perhaps related to the hardware
> > involved; maybe the algorithm is doing exactly what it should, but the
> > available RAM isn't sufficient. If that's the problem, perhaps we could
> > find a way to perform the recursive work without using actual
> > recursion, reducing the number of instances on the stack.
>
> It’s not so much hardware as OS I think - After adding stack depth (the indent parameter on check_parents) to the log, I have been able to get different results with ulimit settings.

Do you mean to say that the stack overflow is reported as a segmentation
fault? If so, that message was sure a red herring...

Thanks,
Dscho

>
> With the default stack size on macOS of 8MB, It falls over at depth 445. Being less than the shortest path to the root commit, that matches my initial count, which was just the number of lines in the log.
>
> Reducing the stack size with ulimit -s 4096 makes it fall over at 225
>
> Increasing to the hard limit of 64MB should allow a depth of around 4000, and as it turns out that did allow the script to complete, reaching a maximum depth of 1148.
>
> I’m not seeing any issues with the hashes being wrong (all show no parents or subtree) but processing all those commits that resolve to nothing does take forever.
>
> The mainline commit test seems to work ok on my repo, but it’s fairly easy to see scenarios where it would break, such as having a  subfolder with the same name within the subtree.
>
> So while part of the fix will be a more reliable test, it also needs to work before parent commits are processed to mitigate the recursion issues.
>
> The rules I have  come up with so far are below. There are still scenarios where the recursion is unavoidable such as running an initial split on a large repo, but that should be much less common than using a small subtree with a more complex existing repo.
>
> In the initial setup of cmd_split, collect some extra information:
>
> 	- Add rev-list of all git-subtree-split values to the cache. I’d expect subtrees to usually be smaller than mainline, but since we can do that non-recursively we may as well.
>
> 	- Find the git-subtree-mainline value from subtree add/rejoin. Anything in its rev list should only be reachable by mainline commits. If not (which probably requires doing something convoluted like having subtree include mainline as its own subtree), this is a good place to check that and fall back to the existing behavior.
>
>
> When processing each commit:
>
> If no prior splits were found, we only have mainline commits.
>
>  	- If $dir exists, it is a mainline commit needing copy - use existing process.
> 	- If $dir does not exist, it is a mainline commit that will map to nothing - no need to process further.
>
> If we do have some known subtree commits:
>
> 	- If it is in the cache, it is a subtree commit we don’t need to process further.
> 	- If subtree root is not reachable (rev-list or merge-base), must be mainline pre subtree add. Map to nothing and skip further processing.
> 	- If any subtree root is reachable, could be either mainline commit with subtree merged in, or subtree commit newer than the last add/squash (subtree pull/merge without squash does not use a custom commit message)
> 		- If $dir does not exist, must be subtree - add to the cache as mapped to self, no need to process parents.
> 		- If the folder does exist, it is  either a mainline commit to be processed normally, or a subtree that happens to contain a folder with the same name.  Check if mainline root is reachable.
>
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-12-13 13:41                     ` Johannes Schindelin
@ 2019-12-14  8:29                       ` Marc Balmer
       [not found]                         ` <BAB4CF6D-6904-4698-ACE1-EBEEC745E569@msys.ch>
  0 siblings, 1 reply; 33+ messages in thread
From: Marc Balmer @ 2019-12-14  8:29 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Tom Clarkson, Strain, Roger L., git@vger.kernel.org,
	gitster@pobox.com, ns@nadavsinai.com, pclouds@gmail.com



> Am 13.12.2019 um 14:41 schrieb Johannes Schindelin <Johannes.Schindelin@gmx.de>:
> 
> Hi Tom,
> 
> On Thu, 12 Dec 2019, Tom Clarkson wrote:
> 
>> 
>>> This makes me wonder if the problem is perhaps related to the hardware
>>> involved; maybe the algorithm is doing exactly what it should, but the
>>> available RAM isn't sufficient. If that's the problem, perhaps we could
>>> find a way to perform the recursive work without using actual
>>> recursion, reducing the number of instances on the stack.
>> 
>> It’s not so much hardware as OS I think - After adding stack depth (the indent parameter on check_parents) to the log, I have been able to get different results with ulimit settings.
> 
> Do you mean to say that the stack overflow is reported as a segmentation
> fault? If so, that message was sure a red herring...

FWIW, changing the stack limit using ulimit does not change anything on my (Fedora) system.  At some point, with exactly the same two leading numbers (those separated with a /)it seems to enter an endless (recursive) loop, eventually eating up all memory.  And then, after some time, it segfaults.


> 
> Thanks,
> Dscho
> 
>> 
>> With the default stack size on macOS of 8MB, It falls over at depth 445. Being less than the shortest path to the root commit, that matches my initial count, which was just the number of lines in the log.
>> 
>> Reducing the stack size with ulimit -s 4096 makes it fall over at 225
>> 
>> Increasing to the hard limit of 64MB should allow a depth of around 4000, and as it turns out that did allow the script to complete, reaching a maximum depth of 1148.
>> 
>> I’m not seeing any issues with the hashes being wrong (all show no parents or subtree) but processing all those commits that resolve to nothing does take forever.
>> 
>> The mainline commit test seems to work ok on my repo, but it’s fairly easy to see scenarios where it would break, such as having a  subfolder with the same name within the subtree.
>> 
>> So while part of the fix will be a more reliable test, it also needs to work before parent commits are processed to mitigate the recursion issues.
>> 
>> The rules I have  come up with so far are below. There are still scenarios where the recursion is unavoidable such as running an initial split on a large repo, but that should be much less common than using a small subtree with a more complex existing repo.
>> 
>> In the initial setup of cmd_split, collect some extra information:
>> 
>> 	- Add rev-list of all git-subtree-split values to the cache. I’d expect subtrees to usually be smaller than mainline, but since we can do that non-recursively we may as well.
>> 
>> 	- Find the git-subtree-mainline value from subtree add/rejoin. Anything in its rev list should only be reachable by mainline commits. If not (which probably requires doing something convoluted like having subtree include mainline as its own subtree), this is a good place to check that and fall back to the existing behavior.
>> 
>> 
>> When processing each commit:
>> 
>> If no prior splits were found, we only have mainline commits.
>> 
>> 	- If $dir exists, it is a mainline commit needing copy - use existing process.
>> 	- If $dir does not exist, it is a mainline commit that will map to nothing - no need to process further.
>> 
>> If we do have some known subtree commits:
>> 
>> 	- If it is in the cache, it is a subtree commit we don’t need to process further.
>> 	- If subtree root is not reachable (rev-list or merge-base), must be mainline pre subtree add. Map to nothing and skip further processing.
>> 	- If any subtree root is reachable, could be either mainline commit with subtree merged in, or subtree commit newer than the last add/squash (subtree pull/merge without squash does not use a custom commit message)
>> 		- If $dir does not exist, must be subtree - add to the cache as mapped to self, no need to process parents.
>> 		- If the folder does exist, it is  either a mainline commit to be processed normally, or a subtree that happens to contain a folder with the same name.  Check if mainline root is reachable.
>> 
>> 
>> 
>> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
       [not found]                         ` <BAB4CF6D-6904-4698-ACE1-EBEEC745E569@msys.ch>
@ 2019-12-14 14:27                           ` Tom Clarkson
  2019-12-16 11:30                             ` Ed Maste
  2020-03-12 10:40                           ` Marc Balmer
  1 sibling, 1 reply; 33+ messages in thread
From: Tom Clarkson @ 2019-12-14 14:27 UTC (permalink / raw)
  To: Marc Balmer
  Cc: Johannes Schindelin, Strain, Roger L., git@vger.kernel.org,
	gitster@pobox.com, ns@nadavsinai.com, pclouds@gmail.com

>> FWIW, changing the stack limit using ulimit does not change anything on my (Fedora) system.  At some point, with exactly the same two leading numbers (those separated with a /)it seems to enter an endless (recursive) loop, eventually eating up all memory.  And then, after some time, it segfaults.
>> 
> 
> So today I tried a subtree push again.  It took hours.... Then it pushed every single commit that was ever done to repository.

So not actually an infinite loop, but close enough to make no difference?  I think we have about three bugs interacting in interesting ways here - Is your repo one where the subtree was originally extracted from mainline?

When you say it pushed every commit, does that mean that a bunch of mainline commits erroneously ended up in the subtree repo?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-12-12  5:02                   ` Tom Clarkson
  2019-12-13 13:41                     ` Johannes Schindelin
@ 2019-12-16  3:50                     ` Tom Clarkson
  1 sibling, 0 replies; 33+ messages in thread
From: Tom Clarkson @ 2019-12-16  3:50 UTC (permalink / raw)
  To: Strain, Roger L.
  Cc: Johannes.Schindelin@gmx.de, git@vger.kernel.org,
	gitster@pobox.com, ns@nadavsinai.com, marc@msys.ch,
	pclouds@gmail.com

I have put together a patch (currently testing on GitGitGadget) that at least fixes things for my repo - Commits from before subtree add are recognized as a dead end, so it no longer runs out of stack space while finding its way to a root commit.

I think the issue Marc ran into is a bit more complex in that the recursion eventually producing the wrong result suggests that correct identification of mainline commits remains an issue. While I have some ideas on how to improve that, it’s probably best handled separately.

However, there is a decent chance that excluding a large number of known irrelevant commits will catch the problematic ones in that scenario - and should match the previous behavior of treating the problem commits as initial.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-12-14 14:27                           ` Tom Clarkson
@ 2019-12-16 11:30                             ` Ed Maste
  2019-12-18  0:15                               ` Tom Clarkson
  0 siblings, 1 reply; 33+ messages in thread
From: Ed Maste @ 2019-12-16 11:30 UTC (permalink / raw)
  To: Tom Clarkson
  Cc: Marc Balmer, Johannes Schindelin, Strain, Roger L.,
	git@vger.kernel.org, gitster@pobox.com, ns@nadavsinai.com,
	pclouds@gmail.com

On Sat, 14 Dec 2019 at 09:27, Tom Clarkson <tqclarkson@icloud.com> wrote:
>
> When you say it pushed every commit, does that mean that a bunch of mainline commits erroneously ended up in the subtree repo?

We encounter this case when trying to use subtree on the FreeBSD
repository. In our case it's caused by commit that should not be
classified as a commit to the subtree, but has no files in the
subtree. In our case it looks like an artifact of svn-git conversion
of an odd working branch, but the same issue is reproducible in other
ways. For example, it will appear if the subtree is deleted at some
point and later re-added.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
  2019-12-16 11:30                             ` Ed Maste
@ 2019-12-18  0:15                               ` Tom Clarkson
  0 siblings, 0 replies; 33+ messages in thread
From: Tom Clarkson @ 2019-12-18  0:15 UTC (permalink / raw)
  To: Ed Maste
  Cc: Marc Balmer, Johannes Schindelin, Strain, Roger L.,
	git@vger.kernel.org, gitster@pobox.com, ns@nadavsinai.com,
	pclouds@gmail.com



> On 16 Dec 2019, at 10:30 pm, Ed Maste <emaste@freebsd.org> wrote:
> 
> On Sat, 14 Dec 2019 at 09:27, Tom Clarkson <tqclarkson@icloud.com> wrote:
>> 
>> When you say it pushed every commit, does that mean that a bunch of mainline commits erroneously ended up in the subtree repo?
> 
> We encounter this case when trying to use subtree on the FreeBSD
> repository. In our case it's caused by commit that should not be
> classified as a commit to the subtree, but has no files in the
> subtree. In our case it looks like an artifact of svn-git conversion
> of an odd working branch, but the same issue is reproducible in other
> ways. For example, it will appear if the subtree is deleted at some
> point and later re-added.

Deleting and re-adding a subtree is an interesting case. My patch won’t avoid the recursion there, because it can only be certain about the irrelevance of commits from before the first add.

However, I think that may be ok to leave in as something of an edge case - you may get more recursion than your system can handle, but assuming process_split_commit is correct, you can work around it by increasing ulimit, and can avoid any subsequent performance issues with a rejoin commit. Maybe we could display some sort of “your repo is doing something weird” warning to make it clearer where there are problems to be worked around.

Although the recursion would no doubt fall over on pretty much any machine when depth gets to 200k, it looks like the FreeBSD repo isn’t getting to that point, so let’s  cover the details of more reliable mainline detection on its own thread.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
       [not found]                         ` <BAB4CF6D-6904-4698-ACE1-EBEEC745E569@msys.ch>
  2019-12-14 14:27                           ` Tom Clarkson
@ 2020-03-12 10:40                           ` Marc Balmer
  1 sibling, 0 replies; 33+ messages in thread
From: Marc Balmer @ 2020-03-12 10:40 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Tom Clarkson, Strain, Roger L., git@vger.kernel.org,
	gitster@pobox.com, ns@nadavsinai.com, pclouds@gmail.com

G'day

Due to some issue in git subtree, a subtree push pushed all commits (over 8000) ever done to the main repository.  So the history of the subtree'ed repository not only showed commits done to the particular subtree, but all commits in the whole project (see E-mail exchange below).

Today we decided to no longer use subtrees, but to use two independend repository and managing merges manually.

How can we get rid of a subtree cache data?  Is it enough to remove the .git/subtree-cache directory?  Or is that dangerous?  Does git-subtree store any data anywhere else?

Thanks and regards,
Marc


> Am 14.12.2019 um 14:59 schrieb Marc Balmer <marc@msys.ch>:
> 
> 
> 
>> Am 14.12.2019 um 09:29 schrieb Marc Balmer <marc@msys.ch>:
>> 
>> 
>> 
>>> Am 13.12.2019 um 14:41 schrieb Johannes Schindelin <Johannes.Schindelin@gmx.de>:
>>> 
>>> Hi Tom,
>>> 
>>> On Thu, 12 Dec 2019, Tom Clarkson wrote:
>>> 
>>>> 
>>>>> This makes me wonder if the problem is perhaps related to the hardware
>>>>> involved; maybe the algorithm is doing exactly what it should, but the
>>>>> available RAM isn't sufficient. If that's the problem, perhaps we could
>>>>> find a way to perform the recursive work without using actual
>>>>> recursion, reducing the number of instances on the stack.
>>>> 
>>>> It’s not so much hardware as OS I think - After adding stack depth (the indent parameter on check_parents) to the log, I have been able to get different results with ulimit settings.
>>> 
>>> Do you mean to say that the stack overflow is reported as a segmentation
>>> fault? If so, that message was sure a red herring...
>> 
>> FWIW, changing the stack limit using ulimit does not change anything on my (Fedora) system.  At some point, with exactly the same two leading numbers (those separated with a /)it seems to enter an endless (recursive) loop, eventually eating up all memory.  And then, after some time, it segfaults.
>> 
>> 
> 
> So today I tried a subtree push again.  It took hours.... Then it pushed every single commit that was ever done to repository.
> 
> I can definitely say that git subtree is totally broken and unusable at this moment.
> 
> We will now split out what once was a subtree into a proper repository of it's own.  git subtree was a nice idea, but it does not work.
> 
>>> 
>>> Thanks,
>>> Dscho
>>> 
>>>> 
>>>> With the default stack size on macOS of 8MB, It falls over at depth 445. Being less than the shortest path to the root commit, that matches my initial count, which was just the number of lines in the log.
>>>> 
>>>> Reducing the stack size with ulimit -s 4096 makes it fall over at 225
>>>> 
>>>> Increasing to the hard limit of 64MB should allow a depth of around 4000, and as it turns out that did allow the script to complete, reaching a maximum depth of 1148.
>>>> 
>>>> I’m not seeing any issues with the hashes being wrong (all show no parents or subtree) but processing all those commits that resolve to nothing does take forever.
>>>> 
>>>> The mainline commit test seems to work ok on my repo, but it’s fairly easy to see scenarios where it would break, such as having a  subfolder with the same name within the subtree.
>>>> 
>>>> So while part of the fix will be a more reliable test, it also needs to work before parent commits are processed to mitigate the recursion issues.
>>>> 
>>>> The rules I have  come up with so far are below. There are still scenarios where the recursion is unavoidable such as running an initial split on a large repo, but that should be much less common than using a small subtree with a more complex existing repo.
>>>> 
>>>> In the initial setup of cmd_split, collect some extra information:
>>>> 
>>>> 	- Add rev-list of all git-subtree-split values to the cache. I’d expect subtrees to usually be smaller than mainline, but since we can do that non-recursively we may as well.
>>>> 
>>>> 	- Find the git-subtree-mainline value from subtree add/rejoin. Anything in its rev list should only be reachable by mainline commits. If not (which probably requires doing something convoluted like having subtree include mainline as its own subtree), this is a good place to check that and fall back to the existing behavior.
>>>> 
>>>> 
>>>> When processing each commit:
>>>> 
>>>> If no prior splits were found, we only have mainline commits.
>>>> 
>>>> 	- If $dir exists, it is a mainline commit needing copy - use existing process.
>>>> 	- If $dir does not exist, it is a mainline commit that will map to nothing - no need to process further.
>>>> 
>>>> If we do have some known subtree commits:
>>>> 
>>>> 	- If it is in the cache, it is a subtree commit we don’t need to process further.
>>>> 	- If subtree root is not reachable (rev-list or merge-base), must be mainline pre subtree add. Map to nothing and skip further processing.
>>>> 	- If any subtree root is reachable, could be either mainline commit with subtree merged in, or subtree commit newer than the last add/squash (subtree pull/merge without squash does not use a custom commit message)
>>>> 		- If $dir does not exist, must be subtree - add to the cache as mapped to self, no need to process parents.
>>>> 		- If the folder does exist, it is  either a mainline commit to be processed normally, or a subtree that happens to contain a folder with the same name.  Check if mainline root is reachable.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
       [not found] <3E84DE22-9614-4E1B-9717-69F6777DD219@msys.ch>
@ 2020-03-12 10:43 ` Tom Clarkson
  0 siblings, 0 replies; 33+ messages in thread
From: Tom Clarkson @ 2020-03-12 10:43 UTC (permalink / raw)
  To: Marc Balmer
  Cc: Johannes Schindelin, Strain, Roger L., git@vger.kernel.org,
	gitster@pobox.com, ns@nadavsinai.com, pclouds@gmail.com

Deleting the subtree-cache directory should be safe enough, it only holds a record of previously created commit mappings, and deleting it just makes subsequent runs slower.

The data that actually determines whether you have a regular merge or a subtree merge is part of the commit message, and everything else is derived from that.

> On 12 Mar 2020, at 9:33 pm, Marc Balmer <marc@msys.ch> wrote:
> 
> G'day
> 
> Due to some issue in git subtree, a subtree push pushed all commits (over 8000) ever done to the main repository.  So the history of the subtree'ed repository not only showed commits done to the particular subtree, but all commits in the whole project (see E-mail exchange below).
> 
> Today we decided to no longer use subtrees, but to use two independend repository and managing merges manually.
> 
> How can we get rid of a subtree cache data?  Is it enough to remove the .git/subtree-cache directory?  Or is that dangerous?  Does git-subtree store any data anywhere else?
> 
> Thanks and regards,
> Marc

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2020-03-12 10:43 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-08 10:30 Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed Nadav SInai
2019-12-09 14:11 ` Strain, Roger L.
2019-12-09 11:45   ` Ed Maste
2019-12-09 16:19     ` Strain, Roger L.
2019-12-09 14:13   ` Marc Balmer
2019-12-09 14:18     ` Strain, Roger L.
2019-12-09 14:30       ` Marc Balmer
2019-12-09 15:26         ` Johannes Schindelin
2019-12-09 15:31           ` Marc Balmer
2019-12-09 19:38             ` Johannes Schindelin
2019-12-11  5:43               ` Tom Clarkson
2019-12-11 14:39                 ` Strain, Roger L.
2019-12-12  5:02                   ` Tom Clarkson
2019-12-13 13:41                     ` Johannes Schindelin
2019-12-14  8:29                       ` Marc Balmer
     [not found]                         ` <BAB4CF6D-6904-4698-ACE1-EBEEC745E569@msys.ch>
2019-12-14 14:27                           ` Tom Clarkson
2019-12-16 11:30                             ` Ed Maste
2019-12-18  0:15                               ` Tom Clarkson
2020-03-12 10:40                           ` Marc Balmer
2019-12-16  3:50                     ` Tom Clarkson
     [not found] <3E84DE22-9614-4E1B-9717-69F6777DD219@msys.ch>
2020-03-12 10:43 ` Tom Clarkson
  -- strict thread matches above, loose matches on Subject: below --
2018-12-31 10:28 Marc Balmer
2018-12-31 10:51 ` Duy Nguyen
2018-12-31 11:12   ` Marc Balmer
2018-12-31 11:20     ` Duy Nguyen
2018-12-31 11:24       ` Marc Balmer
2018-12-31 11:36         ` Duy Nguyen
2018-12-31 12:31           ` Marc Balmer
2019-01-01 13:19             ` Duy Nguyen
2019-01-02  9:13               ` Marc Balmer
2019-01-02 20:20         ` Strain, Roger L.
2019-01-03 13:50           ` Johannes Schindelin
2019-01-03 15:30             ` Strain, Roger L.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).