git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Strain, Roger L." <roger.strain@swri.org>
To: Marc Balmer <marc@msys.ch>, Duy Nguyen <pclouds@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: RE: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed
Date: Wed, 2 Jan 2019 20:20:52 +0000	[thread overview]
Message-ID: <59718f73a0a14b828a6d4fd4c8c222d1@MBX260.adm.swri.edu> (raw)
In-Reply-To: <0F754615-C852-49D8-8E0C-DD2A00A15ED1@msys.ch>

> -----Original Message-----
> From: Marc Balmer <marc@msys.ch>
> Sent: Monday, December 31, 2018 5:24 AM
> To: Duy Nguyen <pclouds@gmail.com>
> Cc: Git Mailing List <git@vger.kernel.org>; Strain, Roger L.
> <roger.strain@swri.org>; Junio C Hamano <gitster@pobox.com>
> Subject: Re: Regression in git-subtree.sh, introduced in 2.20.1, after
> 315a84f9aa0e2e629b0680068646b0032518ebed
> 
> 
> 
> > Am 31.12.2018 um 12:20 schrieb Duy Nguyen <pclouds@gmail.com>:
> >
> > On Mon, Dec 31, 2018 at 6:13 PM Marc Balmer <marc@msys.ch> wrote:
> >>
> >>
> >>
> >>> Am 31.12.2018 um 11:51 schrieb Duy Nguyen <pclouds@gmail.com>:
> >>>
> >>> On Mon, Dec 31, 2018 at 5:44 PM Marc Balmer <marc@msys.ch> wrote:
> >>>>
> >>>> Hi
> >>>>
> >>>> One of the last three commits in git-subtree.sh introduced a regression
> leading to a segfault.
> >>>>
> >>>> Here is the error message when I try to split out my i18n files:
> >>>>
> >>>> $ git subtree split --prefix=i18n
> >>>> cache for e39a2a0c6431773a5d831eb3cb7f1cd40d0da623 already exists!
> >>>>  (Lots of output omitted)
> >>>> 436/627 (1819) [1455]       <- Stays at 436/ while the numbers in () and []
> increase, then segfaults:
> >>>> /usr/libexec/git-core/git-subtree: line 751: 54693 Done                    eval
> "$grl"
> >>>>   54694 Segmentation fault      (core dumped) | while read rev parents;
> do
> >>>
> >>> Do you still have this core dump? Could you run it and see if it's
> >>> "git" that crashed (and where) or "sh"?
> >>
> >> It is /usr/bin/bash that segfaults.  My guess is, that it runs out of memory
> (as described above, git-subtree enters an infinite loop untils it segafults).
> >
> > Ah that's better (I was worried about "git" crashing). The problematic
> > commit should be 19ad68d95d (subtree: performance improvement for
> > finding unexpected parent commits - 2018-10-12) then, although I can't
> > see why.
> >
> > I don't think we have any release coming up soon, so maybe Roger can
> > still have some time to fix it instead of a just a revert.
> 
> In a (private) Email to me, he indicated that had no time for a fix.  Maybe he
> can speak up here?
> 
> In any case, if I can help testing, I am in.  I just don't know the inner workings
> of git-subtree.sh (I am a mere user of it...)

TL;DR: Current script uses git rev-list to retrieve all commits which are reachable from HEAD but not from <abc123>. Is there a syntax that will instead return all commits reachable from HEAD, but stop traversing when <abc123> is encountered? It's a subtle distinction, but important.

Long version:

While I don't have a lot of time to troubleshoot this as deeply as I'd like, I've at least tried to give it a little bit of thought. If we assume it's an out of memory problem, that's likely due to commit 315a84f9aa0e2e629b0680068646b0032518ebed, which introduced a recursive history search. It might be possible to refactor that logic to avoid the recursive approach, but shell scripting is not my native tongue, and I didn't immediately see the right way to do it. I can explain the problem the commit was trying to address, and if anyone has suggestions about how to better fix the problem, I'd love to hear them.

This part of the subtree script is attempting to identify all mainline commits which need to be evaluated for subtree content. When subtree rejoin commits are included, it short-circuits this process by identifying all commits which can be reached from the current HEAD, but which CANNOT be reached from any of the known subtree rejoin points. While this does simplify the number of commits that need to be evaluated, it causes the script to miss commits that do in fact need to be evaluated when merges bypass those rejoins. Consider the following:

(Apologies for the crude diagram, I'm having to use Outlook at work and have no idea if this will format properly.)

[A]--(B)--(D)--[E]---(G)
 \       /          /
  ---(C)-------(F)--

In this graph, [A] represents an initial commit, which is a subtree rejoin (presumably mapping to subtree commit A'). From this, commits B and C were created, then those two were merged as commit D, and a subtree operation was performed to push these changes to the subtree. The result was rejoined as commit [E] (mapping as subtree commit E'). We then have a commit F with a single parent of C, and a merge of E and F into commit G.

At this point, I want to push the commits through G back to the subtree. The original script used git rev-list to list all commits which can be reached from G, but which cannot be reached from known split points A and E. This results in a commit list of G, F. It then looks at commit F to determine how it fits into the subtree, sees that it has a parent C, and looks for what subtree commit C maps to. Unfortunately C wasn't included in the list of commits to evaluate, so that check failed, and the script treats F as an initial commit. The subtree generated then includes F' as a complete commit, unrelated to any previous commits in the subtree, rather than connecting it to C' as a parent.

The ideal solution to this would be a call to git rev-list (or another tool, if appropriate) that could actually generate a commit list of G, F, C in the above diagram. Rather than being "All commits I can reach from G but not from A or E", what it really needs is "All commits I can reach from G, but stop traversing the history any time I reach A or E". If someone with better knowledge can provide that command, I think we could restore the script to a non-recursive approach and give better information about the true progress of the command, rather than having to rely on the "extra count" I introduced to keep track of how many unexpected commits are being recursively evaluated because the initial rev-list didn't include them.

As an aside, there's still another problem that we've run into which is forcing us to still run a custom version of the subtree script. There's a section in the script commented "ugly.  is there no better way to tell if this is a subtree vs. a mainline commit?  Does it matter?" Unfortunately, that section isn't working properly in our repo at this point, and I've had to include a VERY ugly check that looks for the existence of a known file in our specific mainline repo to make this determination. I'm still trying to sort through a better approach that might work for others, but have come up blank to this point. If anyone else has thoughts, I'd really love to hear those as well.

Anyway, sorry for the novel, but I wanted to brain dump what I've learned in this process. If anyone can suggest a syntax for rev-list that gives the appropriate set of commits, I'd love to get that in place; otherwise we could definitely try to unwind the recursion approach, but my shell script naivety may get me in trouble.

-- 
Roger

  parent reply	other threads:[~2019-01-02 20:45 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-31 10:28 Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed Marc Balmer
2018-12-31 10:51 ` Duy Nguyen
2018-12-31 11:12   ` Marc Balmer
2018-12-31 11:20     ` Duy Nguyen
2018-12-31 11:24       ` Marc Balmer
2018-12-31 11:36         ` Duy Nguyen
2018-12-31 12:31           ` Marc Balmer
2019-01-01 13:19             ` Duy Nguyen
2019-01-02  9:13               ` Marc Balmer
2019-01-02 20:20         ` Strain, Roger L. [this message]
2019-01-03 13:50           ` Johannes Schindelin
2019-01-03 15:30             ` Strain, Roger L.
  -- strict thread matches above, loose matches on Subject: below --
2019-12-08 10:30 Nadav SInai
2019-12-09 14:11 ` Strain, Roger L.
2019-12-09 11:45   ` Ed Maste
2019-12-09 16:19     ` Strain, Roger L.
2019-12-09 14:13   ` Marc Balmer
2019-12-09 14:18     ` Strain, Roger L.
2019-12-09 14:30       ` Marc Balmer
2019-12-09 15:26         ` Johannes Schindelin
2019-12-09 15:31           ` Marc Balmer
2019-12-09 19:38             ` Johannes Schindelin
2019-12-11  5:43               ` Tom Clarkson
2019-12-11 14:39                 ` Strain, Roger L.
2019-12-12  5:02                   ` Tom Clarkson
2019-12-13 13:41                     ` Johannes Schindelin
2019-12-14  8:29                       ` Marc Balmer
     [not found]                         ` <BAB4CF6D-6904-4698-ACE1-EBEEC745E569@msys.ch>
2019-12-14 14:27                           ` Tom Clarkson
2019-12-16 11:30                             ` Ed Maste
2019-12-18  0:15                               ` Tom Clarkson
2020-03-12 10:40                           ` Marc Balmer
2019-12-16  3:50                     ` Tom Clarkson
     [not found] <3E84DE22-9614-4E1B-9717-69F6777DD219@msys.ch>
2020-03-12 10:43 ` Tom Clarkson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59718f73a0a14b828a6d4fd4c8c222d1@MBX260.adm.swri.edu \
    --to=roger.strain@swri.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=marc@msys.ch \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).