git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Joachim Kuebart via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Luke Diamand <luke@diamand.org>,
	Joachim Kuebart <joachim.kuebart@gmail.com>,
	Joachim Kuebart <joachim.kuebart@gmail.com>
Subject: [PATCH v2 0/2] git-p4: speed up search for branch parent
Date: Wed, 05 May 2021 11:56:24 +0000	[thread overview]
Message-ID: <pull.1013.v2.git.git.1620215786.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1013.git.git.1619640416533.gitgitgadget@gmail.com>

In this iteration, I have added more context and measurements to the commit
message.

I have also made small improvements to the code suggested by reviewers.

I enhanced t9801-git-p4-branch.sh to test for the functionality, namely that
branches are branched off at the correct point in their parents' history.

Signed-off-by: Joachim Kuebart joachim.kuebart@gmail.com

cc: Joachim Kuebart joachim.kuebart@gmail.com

Joachim Kuebart (2):
  git-p4: ensure complex branches are cloned correctly
  git-p4: speed up search for branch parent

 git-p4.py                | 21 ++++++++++-----------
 t/t9801-git-p4-branch.sh |  2 ++
 2 files changed, 12 insertions(+), 11 deletions(-)


base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1013%2Fjkuebart%2Fp4-faster-parent-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1013/jkuebart/p4-faster-parent-v2
Pull-Request: https://github.com/git/git/pull/1013

Range-diff vs v1:

 -:  ------------ > 1:  0ee0b7b55691 git-p4: ensure complex branches are cloned correctly
 1:  a171f7e6c023 ! 2:  41b3a23f682c git-p4: speed up search for branch parent
     @@ Metadata
       ## Commit message ##
          git-p4: speed up search for branch parent
      
     -    Previously, the code iterated through the parent branch commits and
     -    compared each one to the target tree using diff-tree.
     +    For every new branch that git-p4 imports, it needs to find the commit
     +    where it branched off its parent branch. While p4 doesn't record this
     +    information explicitly, the first changelist on a branch is usually an
     +    identical copy of the parent branch.
      
     -    This patch outputs the revision's tree hash along with the commit hash,
     -    thereby saving the diff-tree invocation. This results in a considerable
     -    speed-up, at least on Windows.
     +    The method searchParent() tries to find a commit in the history of the
     +    given "parent" branch whose tree exactly matches the initial changelist
     +    of the new branch, "target". The code iterates through the parent
     +    commits and compares each of them to this initial changelist using
     +    diff-tree.
     +
     +    Since we already know the tree object name we are looking for, spawning
     +    diff-tree for each commit is wasteful.
     +
     +    Use the "--format" option of "rev-list" to find out the tree object name
     +    of each commit in the history, and find the tree whose name is exactly
     +    the same as the tree of the target commit to optimize this.
     +
     +    This results in a considerable speed-up, at least on Windows. On one
     +    Windows machine with a fairly large repository of about 16000 commits in
     +    the parent branch, the current code takes over 7 minutes, while the new
     +    code only takes just over 10 seconds for the same changelist:
     +
     +    Before:
     +
     +        $ time git p4 sync
     +        Importing from/into multiple branches
     +        Depot paths: //depot
     +        Importing revision 31274 (100.0%)
     +        Updated branches: b1
     +
     +        real    7m41.458s
     +        user    0m0.000s
     +        sys     0m0.077s
     +
     +    After:
     +
     +        $ time git p4 sync
     +        Importing from/into multiple branches
     +        Depot paths: //depot
     +        Importing revision 31274 (100.0%)
     +        Updated branches: b1
     +
     +        real    0m10.235s
     +        user    0m0.000s
     +        sys     0m0.062s
      
          Signed-off-by: Joachim Kuebart <joachim.kuebart@gmail.com>
     +    Helped-by: Junio C Hamano <gitster@pobox.com>
     +    Helped-by: Luke Diamand <luke@diamand.org>
      
       ## git-p4.py ##
      @@ git-p4.py: def importNewBranch(self, branch, maxChange):
     @@ git-p4.py: def importNewBranch(self, branch, maxChange):
           def searchParent(self, parent, branch, target):
      -        parentFound = False
      -        for blob in read_pipe_lines(["git", "rev-list", "--reverse",
     -+        for tree in read_pipe_lines(["git", "rev-parse",
     -+                                     "{}^{{tree}}".format(target)]):
     -+            targetTree = tree.strip()
     -+        for blob in read_pipe_lines(["git", "rev-list", "--format=%H %T",
     ++        targetTree = read_pipe(["git", "rev-parse",
     ++                                "{}^{{tree}}".format(target)]).strip()
     ++        for line in read_pipe_lines(["git", "rev-list", "--format=%H %T",
                                            "--no-merges", parent]):
      -            blob = blob.strip()
      -            if len(read_pipe(["git", "diff-tree", blob, target])) == 0:
      -                parentFound = True
     -+            if blob[:7] == "commit ":
     ++            if line.startswith("commit "):
      +                continue
     -+            blob = blob.strip().split(" ")
     -+            if blob[1] == targetTree:
     ++            commit, tree = line.strip().split(" ")
     ++            if tree == targetTree:
                       if self.verbose:
      -                    print("Found parent of %s in commit %s" % (branch, blob))
      -                break
     @@ git-p4.py: def importNewBranch(self, branch, maxChange):
      -            return blob
      -        else:
      -            return None
     -+                    print("Found parent of %s in commit %s" % (branch, blob[0]))
     -+                return blob[0]
     ++                    print("Found parent of %s in commit %s" % (branch, commit))
     ++                return commit
      +        return None
       
           def importChanges(self, changes, origin_revision=0):

-- 
gitgitgadget

  parent reply	other threads:[~2021-05-05 11:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-28 20:06 [PATCH] git-p4: speed up search for branch parent Joachim Kuebart via GitGitGadget
2021-04-29  2:22 ` Junio C Hamano
2021-04-29  7:48   ` Joachim Kuebart
2021-04-29  8:22     ` Luke Diamand
2021-04-29  8:31       ` Junio C Hamano
2021-04-29 19:31         ` Joachim Kuebart
2021-04-29 11:30       ` Joachim Kuebart
2021-05-05 11:56 ` Joachim Kuebart via GitGitGadget [this message]
2021-05-05 11:56   ` [PATCH v2 1/2] git-p4: ensure complex branches are cloned correctly Joachim Kuebart via GitGitGadget
2021-05-05 11:56   ` [PATCH v2 2/2] git-p4: speed up search for branch parent Joachim Kuebart via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1013.v2.git.git.1620215786.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=joachim.kuebart@gmail.com \
    --cc=luke@diamand.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).