From: "Joachim Kuebart via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Luke Diamand <luke@diamand.org>,
Joachim Kuebart <joachim.kuebart@gmail.com>,
Joachim Kuebart <joachim.kuebart@gmail.com>,
Joachim Kuebart <joachim.kuebart@gmail.com>
Subject: [PATCH v2 2/2] git-p4: speed up search for branch parent
Date: Wed, 05 May 2021 11:56:26 +0000 [thread overview]
Message-ID: <41b3a23f682cddb3720de14723854c5956f25704.1620215786.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1013.v2.git.git.1620215786.gitgitgadget@gmail.com>
From: Joachim Kuebart <joachim.kuebart@gmail.com>
For every new branch that git-p4 imports, it needs to find the commit
where it branched off its parent branch. While p4 doesn't record this
information explicitly, the first changelist on a branch is usually an
identical copy of the parent branch.
The method searchParent() tries to find a commit in the history of the
given "parent" branch whose tree exactly matches the initial changelist
of the new branch, "target". The code iterates through the parent
commits and compares each of them to this initial changelist using
diff-tree.
Since we already know the tree object name we are looking for, spawning
diff-tree for each commit is wasteful.
Use the "--format" option of "rev-list" to find out the tree object name
of each commit in the history, and find the tree whose name is exactly
the same as the tree of the target commit to optimize this.
This results in a considerable speed-up, at least on Windows. On one
Windows machine with a fairly large repository of about 16000 commits in
the parent branch, the current code takes over 7 minutes, while the new
code only takes just over 10 seconds for the same changelist:
Before:
$ time git p4 sync
Importing from/into multiple branches
Depot paths: //depot
Importing revision 31274 (100.0%)
Updated branches: b1
real 7m41.458s
user 0m0.000s
sys 0m0.077s
After:
$ time git p4 sync
Importing from/into multiple branches
Depot paths: //depot
Importing revision 31274 (100.0%)
Updated branches: b1
real 0m10.235s
user 0m0.000s
sys 0m0.062s
Signed-off-by: Joachim Kuebart <joachim.kuebart@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Luke Diamand <luke@diamand.org>
---
git-p4.py | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/git-p4.py b/git-p4.py
index 09c9e93ac401..d34a1946b754 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -3600,19 +3600,18 @@ def importNewBranch(self, branch, maxChange):
return True
def searchParent(self, parent, branch, target):
- parentFound = False
- for blob in read_pipe_lines(["git", "rev-list", "--reverse",
+ targetTree = read_pipe(["git", "rev-parse",
+ "{}^{{tree}}".format(target)]).strip()
+ for line in read_pipe_lines(["git", "rev-list", "--format=%H %T",
"--no-merges", parent]):
- blob = blob.strip()
- if len(read_pipe(["git", "diff-tree", blob, target])) == 0:
- parentFound = True
+ if line.startswith("commit "):
+ continue
+ commit, tree = line.strip().split(" ")
+ if tree == targetTree:
if self.verbose:
- print("Found parent of %s in commit %s" % (branch, blob))
- break
- if parentFound:
- return blob
- else:
- return None
+ print("Found parent of %s in commit %s" % (branch, commit))
+ return commit
+ return None
def importChanges(self, changes, origin_revision=0):
cnt = 1
--
gitgitgadget
prev parent reply other threads:[~2021-05-05 11:56 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-28 20:06 [PATCH] git-p4: speed up search for branch parent Joachim Kuebart via GitGitGadget
2021-04-29 2:22 ` Junio C Hamano
2021-04-29 7:48 ` Joachim Kuebart
2021-04-29 8:22 ` Luke Diamand
2021-04-29 8:31 ` Junio C Hamano
2021-04-29 19:31 ` Joachim Kuebart
2021-04-29 11:30 ` Joachim Kuebart
2021-05-05 11:56 ` [PATCH v2 0/2] " Joachim Kuebart via GitGitGadget
2021-05-05 11:56 ` [PATCH v2 1/2] git-p4: ensure complex branches are cloned correctly Joachim Kuebart via GitGitGadget
2021-05-05 11:56 ` Joachim Kuebart via GitGitGadget [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41b3a23f682cddb3720de14723854c5956f25704.1620215786.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=joachim.kuebart@gmail.com \
--cc=luke@diamand.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).