From: "Joachim Kuebart via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Luke Diamand <luke@diamand.org>,
Joachim Kuebart <joachim.kuebart@gmail.com>,
Joachim Kuebart <joachim.kuebart@gmail.com>
Subject: [PATCH v2 0/2] git-p4: speed up search for branch parent
Date: Wed, 05 May 2021 11:56:24 +0000 [thread overview]
Message-ID: <pull.1013.v2.git.git.1620215786.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1013.git.git.1619640416533.gitgitgadget@gmail.com>
In this iteration, I have added more context and measurements to the commit
message.
I have also made small improvements to the code suggested by reviewers.
I enhanced t9801-git-p4-branch.sh to test for the functionality, namely that
branches are branched off at the correct point in their parents' history.
Signed-off-by: Joachim Kuebart joachim.kuebart@gmail.com
cc: Joachim Kuebart joachim.kuebart@gmail.com
Joachim Kuebart (2):
git-p4: ensure complex branches are cloned correctly
git-p4: speed up search for branch parent
git-p4.py | 21 ++++++++++-----------
t/t9801-git-p4-branch.sh | 2 ++
2 files changed, 12 insertions(+), 11 deletions(-)
base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1013%2Fjkuebart%2Fp4-faster-parent-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1013/jkuebart/p4-faster-parent-v2
Pull-Request: https://github.com/git/git/pull/1013
Range-diff vs v1:
-: ------------ > 1: 0ee0b7b55691 git-p4: ensure complex branches are cloned correctly
1: a171f7e6c023 ! 2: 41b3a23f682c git-p4: speed up search for branch parent
@@ Metadata
## Commit message ##
git-p4: speed up search for branch parent
- Previously, the code iterated through the parent branch commits and
- compared each one to the target tree using diff-tree.
+ For every new branch that git-p4 imports, it needs to find the commit
+ where it branched off its parent branch. While p4 doesn't record this
+ information explicitly, the first changelist on a branch is usually an
+ identical copy of the parent branch.
- This patch outputs the revision's tree hash along with the commit hash,
- thereby saving the diff-tree invocation. This results in a considerable
- speed-up, at least on Windows.
+ The method searchParent() tries to find a commit in the history of the
+ given "parent" branch whose tree exactly matches the initial changelist
+ of the new branch, "target". The code iterates through the parent
+ commits and compares each of them to this initial changelist using
+ diff-tree.
+
+ Since we already know the tree object name we are looking for, spawning
+ diff-tree for each commit is wasteful.
+
+ Use the "--format" option of "rev-list" to find out the tree object name
+ of each commit in the history, and find the tree whose name is exactly
+ the same as the tree of the target commit to optimize this.
+
+ This results in a considerable speed-up, at least on Windows. On one
+ Windows machine with a fairly large repository of about 16000 commits in
+ the parent branch, the current code takes over 7 minutes, while the new
+ code only takes just over 10 seconds for the same changelist:
+
+ Before:
+
+ $ time git p4 sync
+ Importing from/into multiple branches
+ Depot paths: //depot
+ Importing revision 31274 (100.0%)
+ Updated branches: b1
+
+ real 7m41.458s
+ user 0m0.000s
+ sys 0m0.077s
+
+ After:
+
+ $ time git p4 sync
+ Importing from/into multiple branches
+ Depot paths: //depot
+ Importing revision 31274 (100.0%)
+ Updated branches: b1
+
+ real 0m10.235s
+ user 0m0.000s
+ sys 0m0.062s
Signed-off-by: Joachim Kuebart <joachim.kuebart@gmail.com>
+ Helped-by: Junio C Hamano <gitster@pobox.com>
+ Helped-by: Luke Diamand <luke@diamand.org>
## git-p4.py ##
@@ git-p4.py: def importNewBranch(self, branch, maxChange):
@@ git-p4.py: def importNewBranch(self, branch, maxChange):
def searchParent(self, parent, branch, target):
- parentFound = False
- for blob in read_pipe_lines(["git", "rev-list", "--reverse",
-+ for tree in read_pipe_lines(["git", "rev-parse",
-+ "{}^{{tree}}".format(target)]):
-+ targetTree = tree.strip()
-+ for blob in read_pipe_lines(["git", "rev-list", "--format=%H %T",
++ targetTree = read_pipe(["git", "rev-parse",
++ "{}^{{tree}}".format(target)]).strip()
++ for line in read_pipe_lines(["git", "rev-list", "--format=%H %T",
"--no-merges", parent]):
- blob = blob.strip()
- if len(read_pipe(["git", "diff-tree", blob, target])) == 0:
- parentFound = True
-+ if blob[:7] == "commit ":
++ if line.startswith("commit "):
+ continue
-+ blob = blob.strip().split(" ")
-+ if blob[1] == targetTree:
++ commit, tree = line.strip().split(" ")
++ if tree == targetTree:
if self.verbose:
- print("Found parent of %s in commit %s" % (branch, blob))
- break
@@ git-p4.py: def importNewBranch(self, branch, maxChange):
- return blob
- else:
- return None
-+ print("Found parent of %s in commit %s" % (branch, blob[0]))
-+ return blob[0]
++ print("Found parent of %s in commit %s" % (branch, commit))
++ return commit
+ return None
def importChanges(self, changes, origin_revision=0):
--
gitgitgadget
next prev parent reply other threads:[~2021-05-05 11:56 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-28 20:06 [PATCH] git-p4: speed up search for branch parent Joachim Kuebart via GitGitGadget
2021-04-29 2:22 ` Junio C Hamano
2021-04-29 7:48 ` Joachim Kuebart
2021-04-29 8:22 ` Luke Diamand
2021-04-29 8:31 ` Junio C Hamano
2021-04-29 19:31 ` Joachim Kuebart
2021-04-29 11:30 ` Joachim Kuebart
2021-05-05 11:56 ` Joachim Kuebart via GitGitGadget [this message]
2021-05-05 11:56 ` [PATCH v2 1/2] git-p4: ensure complex branches are cloned correctly Joachim Kuebart via GitGitGadget
2021-05-05 11:56 ` [PATCH v2 2/2] git-p4: speed up search for branch parent Joachim Kuebart via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1013.v2.git.git.1620215786.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=joachim.kuebart@gmail.com \
--cc=luke@diamand.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).