git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Robert Stryker <rstryker@redhat.com>
To: git@vger.kernel.org
Subject: 30min Script in git 2.7.4 takes 22+ hrs in git 2.9.3
Date: Thu, 27 Apr 2017 12:36:54 -0400	[thread overview]
Message-ID: <CA+Up40iusByn-R55=2=2Ae8KH1mkj4hGF_E9dX3vn1vboyMwMw@mail.gmail.com> (raw)

Hi all:

The following script attempts to merge 4 git repos into one,
maintaining tag and branch content (but not SHAs). Each original repo
basically gets its own subfolder in the new one. Original repos are
first rewritten to have their history think they always belonged in
the target subfolder.

The problem:  the script takes 30 minutes for one environment
including git 2.7.4, and generates a repo of about 30mb.   When run by
a coworker using git 2.9.3, it takes 22+ hours and generates a 10gb
repo.

Clearly something here is very wrong. Either there's a pretty horrible
regression or my idea is a pretty bad one ;)

General process for the script:
  - check out 4 repos
  - rewrite their history so they always thought they were in a subfolder
  - copy these 4 rewritten folders to a temporary location
  - get a list of branches and tags for each of the 4 repos
  - initialize a new repo with a readme.md
  - for each unique tag
       - check the 4 rewritten / backed up repos for the tag
       - for each of the 4 rewritten repos:
            - if the tag exists in that repo, merge it into the new
repo in a test branch
           -  git pull --no-edit ../intermediate/oneRewrittenRepo    (SLOW PART)
        - save the tag
   - for each unique branch (same logic)

So... yeah... 30mb + 30 minutes -> 11gb + 22 hours somewhere between
these two versions of git?

According to coworker:

during each pass of the Tags' loop it's sitting for a long time on:

 git pull --no-edit ../intermediate/webtools.common

which runs in its turn

git fetch --update-head-ok ../intermediate/webtools.common

which in its turn runs

git-upload-pack ../intermediate/webtools.common


Any ideas here are much appreciated =/

The Script in question is here:
   https://gist.github.com/robstryker/4854fc86ab3714a5e1af353b98cbc768

             reply	other threads:[~2017-04-27 16:36 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-27 16:36 Robert Stryker [this message]
2017-04-27 20:09 ` 30min Script in git 2.7.4 takes 22+ hrs in git 2.9.3 Jeff King
2017-04-27 20:42   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+Up40iusByn-R55=2=2Ae8KH1mkj4hGF_E9dX3vn1vboyMwMw@mail.gmail.com' \
    --to=rstryker@redhat.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).