* [PATCH] git-p4: speed up search for branch parent @ 2021-04-28 20:06 Joachim Kuebart via GitGitGadget 2021-04-29 2:22 ` Junio C Hamano 2021-05-05 11:56 ` [PATCH v2 0/2] " Joachim Kuebart via GitGitGadget 0 siblings, 2 replies; 10+ messages in thread From: Joachim Kuebart via GitGitGadget @ 2021-04-28 20:06 UTC (permalink / raw) To: git; +Cc: Joachim Kuebart, Joachim Kuebart From: Joachim Kuebart <joachim.kuebart@gmail.com> Previously, the code iterated through the parent branch commits and compared each one to the target tree using diff-tree. This patch outputs the revision's tree hash along with the commit hash, thereby saving the diff-tree invocation. This results in a considerable speed-up, at least on Windows. Signed-off-by: Joachim Kuebart <joachim.kuebart@gmail.com> --- git-p4: speed up search for branch parent Previously, the code iterated through the parent branch commits and compared each one to the target tree using diff-tree. This patch outputs the revision's tree hash along with the commit hash, thereby saving the diff-tree invocation. This results in a considerable speed-up, at least on Windows. Signed-off-by: Joachim Kuebart joachim.kuebart@gmail.com Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1013%2Fjkuebart%2Fp4-faster-parent-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1013/jkuebart/p4-faster-parent-v1 Pull-Request: https://github.com/git/git/pull/1013 git-p4.py | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/git-p4.py b/git-p4.py index 09c9e93ac401..dbe94e6fb83b 100755 --- a/git-p4.py +++ b/git-p4.py @@ -3600,19 +3600,19 @@ def importNewBranch(self, branch, maxChange): return True def searchParent(self, parent, branch, target): - parentFound = False - for blob in read_pipe_lines(["git", "rev-list", "--reverse", + for tree in read_pipe_lines(["git", "rev-parse", + "{}^{{tree}}".format(target)]): + targetTree = tree.strip() + for blob in read_pipe_lines(["git", "rev-list", "--format=%H %T", "--no-merges", parent]): - blob = blob.strip() - if len(read_pipe(["git", "diff-tree", blob, target])) == 0: - parentFound = True + if blob[:7] == "commit ": + continue + blob = blob.strip().split(" ") + if blob[1] == targetTree: if self.verbose: - print("Found parent of %s in commit %s" % (branch, blob)) - break - if parentFound: - return blob - else: - return None + print("Found parent of %s in commit %s" % (branch, blob[0])) + return blob[0] + return None def importChanges(self, changes, origin_revision=0): cnt = 1 base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a -- gitgitgadget ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] git-p4: speed up search for branch parent 2021-04-28 20:06 [PATCH] git-p4: speed up search for branch parent Joachim Kuebart via GitGitGadget @ 2021-04-29 2:22 ` Junio C Hamano 2021-04-29 7:48 ` Joachim Kuebart 2021-05-05 11:56 ` [PATCH v2 0/2] " Joachim Kuebart via GitGitGadget 1 sibling, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2021-04-29 2:22 UTC (permalink / raw) To: Joachim Kuebart via GitGitGadget; +Cc: git, Joachim Kuebart, Luke Diamand "Joachim Kuebart via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Joachim Kuebart <joachim.kuebart@gmail.com> Thanks. As git-p4 is not in my area of expertise, I'll make a style critique first, while pinging Luke as an area expert (you can learn who they are with "git shortlog --no-merges --since=18.months.ago git-p4.py"). > Previously, the code iterated through the parent branch commits and > compared each one to the target tree using diff-tree. It is customary in this project to describe the problem in the present tense. In other words, whoever is writing the log message still lives in the world without this patch applied to the system. The code iterates through the parent commits and compares each of them to the target tree using diff-tree. But before that sentence, please prepare the reader with a bit larger picture. A reader may not know what purpose the comparison serves. Do we know that the tree of one of the parents of the commit must match the tree of the target, and trying to see which parent is the one with the same tree? What is helped by learning which parent has the same tree? Perhaps The method searchParent() is used to find a commit in the history of the given 'parent' commit whose tree exactly matches the tree of the given 'target commit. The code iterates through the commits in the history and compares each of them to the target tree by invoking diff-tree. And then our log message would make observation, pointing out what is wrong with it (after all comparing with diff-tree is not giving us a wrong result---the point of this change is that spawning diff-tree for each commit is wasteful when we only want to see exact matches). Because we only are interested in finding a tree that is exactly the same, and not interested in how other trees are different, having to spawn diff-tree for each and every commit is wasteful. > This patch outputs the revision's tree hash along with the commit hash, > thereby saving the diff-tree invocation. This results in a considerable > speed-up, at least on Windows. And then our log message would order the codebase to "become like so", in order to resolve the issue(s) pointed out in the observation. Perhaps Use the "--format" option of "rev-list" to find out the tree object name of each commit in the history, and find the tree whose name is exactly the same as the tree of the target commit to optimize this. When making a claim on performance, it is helpful to our readers to give some numbers, even in a limited test, e.g. In a sample history where ~100 commits needed to be traversed to find the fork point on my Windows box, the current code took 10.4 seconds to complete, while the new code yields the same result in 1.8 seconds, which is a significant speed-up. or something along these lines. > Signed-off-by: Joachim Kuebart <joachim.kuebart@gmail.com> > git-p4.py | 22 +++++++++++----------- > 1 file changed, 11 insertions(+), 11 deletions(-) > > diff --git a/git-p4.py b/git-p4.py > index 09c9e93ac401..dbe94e6fb83b 100755 > --- a/git-p4.py > +++ b/git-p4.py > @@ -3600,19 +3600,19 @@ def importNewBranch(self, branch, maxChange): > return True > > def searchParent(self, parent, branch, target): > + for tree in read_pipe_lines(["git", "rev-parse", > + "{}^{{tree}}".format(target)]): > + targetTree = tree.strip() It looks very strange to run a commit that you expect a single line of output, and read the result in a loop. Doesn't git-p4.py supply a more suitable helper to read a single line output from a command? > + for blob in read_pipe_lines(["git", "rev-list", "--format=%H %T", > "--no-merges", parent]): This is not a new problem you introduced, but when we hear word "blob" in the context of this project, it reminds us of the "blob" object, while the 'blob' variable used in this loop has nothing to do with it. Perhaps rename it to say 'line' or something? > + if blob[:7] == "commit ": > + continue Perhaps blob.startswith("commit ") to avoid hardcoded string length? > + blob = blob.strip().split(" ") > + if blob[1] == targetTree: > if self.verbose: > + print("Found parent of %s in commit %s" % (branch, blob[0])) > + return blob[0] > + return None ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] git-p4: speed up search for branch parent 2021-04-29 2:22 ` Junio C Hamano @ 2021-04-29 7:48 ` Joachim Kuebart 2021-04-29 8:22 ` Luke Diamand 0 siblings, 1 reply; 10+ messages in thread From: Joachim Kuebart @ 2021-04-29 7:48 UTC (permalink / raw) To: Junio C Hamano; +Cc: Joachim Kuebart via GitGitGadget, git, Luke Diamand On Thu, 29 Apr 2021 at 04:22, Junio C Hamano <gitster@pobox.com> wrote: > > "Joachim Kuebart via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Joachim Kuebart <joachim.kuebart@gmail.com> > > Thanks. As git-p4 is not in my area of expertise, I'll make a style > critique first, while pinging Luke as an area expert (you can learn > who they are with "git shortlog --no-merges --since=18.months.ago > git-p4.py"). Hi Junio, thanks for your timely and thorough review and for putting up with my greenhorn mistakes ;-) > > Previously, the code iterated through the parent branch commits and > > compared each one to the target tree using diff-tree. > > It is customary in this project to describe the problem in the > present tense. In other words, whoever is writing the log message > still lives in the world without this patch applied to the system. I will rephrase the commit message and give better details as you mentioned. Thanks a lot for your suggestions! > When making a claim on performance, it is helpful to our readers to > give some numbers, even in a limited test, e.g. > > In a sample history where ~100 commits needed to be traversed to > find the fork point on my Windows box, the current code took > 10.4 seconds to complete, while the new code yields the same > result in 1.8 seconds, which is a significant speed-up. > > or something along these lines. I will add some measurements. > > Signed-off-by: Joachim Kuebart <joachim.kuebart@gmail.com> > > git-p4.py | 22 +++++++++++----------- > > 1 file changed, 11 insertions(+), 11 deletions(-) > > > > diff --git a/git-p4.py b/git-p4.py > > index 09c9e93ac401..dbe94e6fb83b 100755 > > --- a/git-p4.py > > +++ b/git-p4.py > > @@ -3600,19 +3600,19 @@ def importNewBranch(self, branch, maxChange): > > return True > > > > def searchParent(self, parent, branch, target): > > + for tree in read_pipe_lines(["git", "rev-parse", > > + "{}^{{tree}}".format(target)]): > > + targetTree = tree.strip() > > It looks very strange to run a commit that you expect a single line > of output, and read the result in a loop. Doesn't git-p4.py supply > a more suitable helper to read a single line output from a command? You're absolutely right that this isn't very readable. I had a quick look around for a function that reads a single-line response, but I'll look again and come up with a clearer solution. > > + for blob in read_pipe_lines(["git", "rev-list", "--format=%H %T", > > "--no-merges", parent]): > > This is not a new problem you introduced, but when we hear word > "blob" in the context of this project, it reminds us of the "blob" > object, while the 'blob' variable used in this loop has nothing to > do with it. Perhaps rename it to say 'line' or something? Will do, thanks! > > + if blob[:7] == "commit ": > > + continue > > Perhaps blob.startswith("commit ") to avoid hardcoded string length? Yes, that's the name of the function that I can never think of when I need it. Thanks again for your comments, Joachim ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] git-p4: speed up search for branch parent 2021-04-29 7:48 ` Joachim Kuebart @ 2021-04-29 8:22 ` Luke Diamand 2021-04-29 8:31 ` Junio C Hamano 2021-04-29 11:30 ` Joachim Kuebart 0 siblings, 2 replies; 10+ messages in thread From: Luke Diamand @ 2021-04-29 8:22 UTC (permalink / raw) To: Joachim Kuebart, Junio C Hamano; +Cc: Joachim Kuebart via GitGitGadget, git On 29/04/2021 07:48, Joachim Kuebart wrote: > On Thu, 29 Apr 2021 at 04:22, Junio C Hamano <gitster@pobox.com> wrote: >> >> "Joachim Kuebart via GitGitGadget" <gitgitgadget@gmail.com> writes: >> >>> From: Joachim Kuebart <joachim.kuebart@gmail.com> >> >> Thanks. As git-p4 is not in my area of expertise, I'll make a style >> critique first, while pinging Luke as an area expert (you can learn >> who they are with "git shortlog --no-merges --since=18.months.ago >> git-p4.py"). > > Hi Junio, thanks for your timely and thorough review and for putting > up with my greenhorn mistakes ;-) > >>> Previously, the code iterated through the parent branch commits and >>> compared each one to the target tree using diff-tree. >> >> It is customary in this project to describe the problem in the >> present tense. In other words, whoever is writing the log message >> still lives in the world without this patch applied to the system. > > I will rephrase the commit message and give better details as you > mentioned. Thanks a lot for your suggestions! > >> When making a claim on performance, it is helpful to our readers to >> give some numbers, even in a limited test, e.g. >> >> In a sample history where ~100 commits needed to be traversed to >> find the fork point on my Windows box, the current code took >> 10.4 seconds to complete, while the new code yields the same >> result in 1.8 seconds, which is a significant speed-up. >> >> or something along these lines. > > I will add some measurements. > >>> Signed-off-by: Joachim Kuebart <joachim.kuebart@gmail.com> >>> git-p4.py | 22 +++++++++++----------- >>> 1 file changed, 11 insertions(+), 11 deletions(-) >>> >>> diff --git a/git-p4.py b/git-p4.py >>> index 09c9e93ac401..dbe94e6fb83b 100755 >>> --- a/git-p4.py >>> +++ b/git-p4.py >>> @@ -3600,19 +3600,19 @@ def importNewBranch(self, branch, maxChange): >>> return True >>> >>> def searchParent(self, parent, branch, target): >>> + for tree in read_pipe_lines(["git", "rev-parse", >>> + "{}^{{tree}}".format(target)]): >>> + targetTree = tree.strip() >> >> It looks very strange to run a commit that you expect a single line >> of output, and read the result in a loop. Doesn't git-p4.py supply >> a more suitable helper to read a single line output from a command? > > You're absolutely right that this isn't very readable. I had a quick > look around for a function that reads a single-line response, but I'll > look again and come up with a clearer solution. I don't think there is one - git-p4 has lots of functions for calling `p4', but for calling git, it just uses Python's Popen() API. A good question is whether we can start taking advantage of the newer features in Python3 which will obviously break backward compatibility. > >>> + for blob in read_pipe_lines(["git", "rev-list", "--format=%H %T", >>> "--no-merges", parent]): >> >> This is not a new problem you introduced, but when we hear word >> "blob" in the context of this project, it reminds us of the "blob" >> object, while the 'blob' variable used in this loop has nothing to >> do with it. Perhaps rename it to say 'line' or something? > > Will do, thanks! It confused me as well. > >>> + if blob[:7] == "commit ": >>> + continue >> >> Perhaps blob.startswith("commit ") to avoid hardcoded string length? > > Yes, that's the name of the function that I can never think of when I need it. > > Thanks again for your comments, > > Joachim > There are existing tests for importing branches which should cover this. I don't know if they need to be extended or not, you might want to check. Looks good otherwise. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] git-p4: speed up search for branch parent 2021-04-29 8:22 ` Luke Diamand @ 2021-04-29 8:31 ` Junio C Hamano 2021-04-29 19:31 ` Joachim Kuebart 2021-04-29 11:30 ` Joachim Kuebart 1 sibling, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2021-04-29 8:31 UTC (permalink / raw) To: Luke Diamand; +Cc: Joachim Kuebart, Joachim Kuebart via GitGitGadget, git Luke Diamand <luke@diamand.org> writes: >>>> def searchParent(self, parent, branch, target): >>>> + for tree in read_pipe_lines(["git", "rev-parse", >>>> + "{}^{{tree}}".format(target)]): >>>> + targetTree = tree.strip() >>> >>> It looks very strange to run a commit that you expect a single line >>> of output, and read the result in a loop. Doesn't git-p4.py supply >>> a more suitable helper to read a single line output from a command? >> You're absolutely right that this isn't very readable. I had a quick >> look around for a function that reads a single-line response, but I'll >> look again and come up with a clearer solution. > > I don't think there is one - git-p4 has lots of functions for calling > `p4', but for calling git, it just uses Python's Popen() API. OK. It just felt "strange", not "wrong", so I am OK with the construct at least for now. > There are existing tests for importing branches which should cover > this. I don't know if they need to be extended or not, you might want > to check. > > Looks good otherwise. Thanks for a prompt review. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] git-p4: speed up search for branch parent 2021-04-29 8:31 ` Junio C Hamano @ 2021-04-29 19:31 ` Joachim Kuebart 0 siblings, 0 replies; 10+ messages in thread From: Joachim Kuebart @ 2021-04-29 19:31 UTC (permalink / raw) To: Junio C Hamano; +Cc: Luke Diamand, Joachim Kuebart via GitGitGadget, git On Thu, 29 Apr 2021 at 10:31, Junio C Hamano <gitster@pobox.com> wrote: > > Luke Diamand <luke@diamand.org> writes: > > > > Looks good otherwise. > > Thanks for a prompt review. I addressed your comments and updated my PR at https://github.com/git/git/pull/1013, but CI seems stuck and hasn't kicked off yet. I'd like to see it pass especially since I modified a test. Is there anything else I need to do? Joachim ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] git-p4: speed up search for branch parent 2021-04-29 8:22 ` Luke Diamand 2021-04-29 8:31 ` Junio C Hamano @ 2021-04-29 11:30 ` Joachim Kuebart 1 sibling, 0 replies; 10+ messages in thread From: Joachim Kuebart @ 2021-04-29 11:30 UTC (permalink / raw) To: Luke Diamand; +Cc: Junio C Hamano, Joachim Kuebart via GitGitGadget, git On Thu, 29 Apr 2021 at 10:22, Luke Diamand <luke@diamand.org> wrote: > > There are existing tests for importing branches which should cover this. > I don't know if they need to be extended or not, you might want to check. I enhanced t9801-git-p4-branch to check for this functionality, i.e. that the branches are branched off at the correct commits from their parents. As far as I could see, there was no test for this before. Thank you as well for your quick response! Cheers, Joachim ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 0/2] git-p4: speed up search for branch parent 2021-04-28 20:06 [PATCH] git-p4: speed up search for branch parent Joachim Kuebart via GitGitGadget 2021-04-29 2:22 ` Junio C Hamano @ 2021-05-05 11:56 ` Joachim Kuebart via GitGitGadget 2021-05-05 11:56 ` [PATCH v2 1/2] git-p4: ensure complex branches are cloned correctly Joachim Kuebart via GitGitGadget 2021-05-05 11:56 ` [PATCH v2 2/2] git-p4: speed up search for branch parent Joachim Kuebart via GitGitGadget 1 sibling, 2 replies; 10+ messages in thread From: Joachim Kuebart via GitGitGadget @ 2021-05-05 11:56 UTC (permalink / raw) To: git; +Cc: Luke Diamand, Joachim Kuebart, Joachim Kuebart In this iteration, I have added more context and measurements to the commit message. I have also made small improvements to the code suggested by reviewers. I enhanced t9801-git-p4-branch.sh to test for the functionality, namely that branches are branched off at the correct point in their parents' history. Signed-off-by: Joachim Kuebart joachim.kuebart@gmail.com cc: Joachim Kuebart joachim.kuebart@gmail.com Joachim Kuebart (2): git-p4: ensure complex branches are cloned correctly git-p4: speed up search for branch parent git-p4.py | 21 ++++++++++----------- t/t9801-git-p4-branch.sh | 2 ++ 2 files changed, 12 insertions(+), 11 deletions(-) base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1013%2Fjkuebart%2Fp4-faster-parent-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1013/jkuebart/p4-faster-parent-v2 Pull-Request: https://github.com/git/git/pull/1013 Range-diff vs v1: -: ------------ > 1: 0ee0b7b55691 git-p4: ensure complex branches are cloned correctly 1: a171f7e6c023 ! 2: 41b3a23f682c git-p4: speed up search for branch parent @@ Metadata ## Commit message ## git-p4: speed up search for branch parent - Previously, the code iterated through the parent branch commits and - compared each one to the target tree using diff-tree. + For every new branch that git-p4 imports, it needs to find the commit + where it branched off its parent branch. While p4 doesn't record this + information explicitly, the first changelist on a branch is usually an + identical copy of the parent branch. - This patch outputs the revision's tree hash along with the commit hash, - thereby saving the diff-tree invocation. This results in a considerable - speed-up, at least on Windows. + The method searchParent() tries to find a commit in the history of the + given "parent" branch whose tree exactly matches the initial changelist + of the new branch, "target". The code iterates through the parent + commits and compares each of them to this initial changelist using + diff-tree. + + Since we already know the tree object name we are looking for, spawning + diff-tree for each commit is wasteful. + + Use the "--format" option of "rev-list" to find out the tree object name + of each commit in the history, and find the tree whose name is exactly + the same as the tree of the target commit to optimize this. + + This results in a considerable speed-up, at least on Windows. On one + Windows machine with a fairly large repository of about 16000 commits in + the parent branch, the current code takes over 7 minutes, while the new + code only takes just over 10 seconds for the same changelist: + + Before: + + $ time git p4 sync + Importing from/into multiple branches + Depot paths: //depot + Importing revision 31274 (100.0%) + Updated branches: b1 + + real 7m41.458s + user 0m0.000s + sys 0m0.077s + + After: + + $ time git p4 sync + Importing from/into multiple branches + Depot paths: //depot + Importing revision 31274 (100.0%) + Updated branches: b1 + + real 0m10.235s + user 0m0.000s + sys 0m0.062s Signed-off-by: Joachim Kuebart <joachim.kuebart@gmail.com> + Helped-by: Junio C Hamano <gitster@pobox.com> + Helped-by: Luke Diamand <luke@diamand.org> ## git-p4.py ## @@ git-p4.py: def importNewBranch(self, branch, maxChange): @@ git-p4.py: def importNewBranch(self, branch, maxChange): def searchParent(self, parent, branch, target): - parentFound = False - for blob in read_pipe_lines(["git", "rev-list", "--reverse", -+ for tree in read_pipe_lines(["git", "rev-parse", -+ "{}^{{tree}}".format(target)]): -+ targetTree = tree.strip() -+ for blob in read_pipe_lines(["git", "rev-list", "--format=%H %T", ++ targetTree = read_pipe(["git", "rev-parse", ++ "{}^{{tree}}".format(target)]).strip() ++ for line in read_pipe_lines(["git", "rev-list", "--format=%H %T", "--no-merges", parent]): - blob = blob.strip() - if len(read_pipe(["git", "diff-tree", blob, target])) == 0: - parentFound = True -+ if blob[:7] == "commit ": ++ if line.startswith("commit "): + continue -+ blob = blob.strip().split(" ") -+ if blob[1] == targetTree: ++ commit, tree = line.strip().split(" ") ++ if tree == targetTree: if self.verbose: - print("Found parent of %s in commit %s" % (branch, blob)) - break @@ git-p4.py: def importNewBranch(self, branch, maxChange): - return blob - else: - return None -+ print("Found parent of %s in commit %s" % (branch, blob[0])) -+ return blob[0] ++ print("Found parent of %s in commit %s" % (branch, commit)) ++ return commit + return None def importChanges(self, changes, origin_revision=0): -- gitgitgadget ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/2] git-p4: ensure complex branches are cloned correctly 2021-05-05 11:56 ` [PATCH v2 0/2] " Joachim Kuebart via GitGitGadget @ 2021-05-05 11:56 ` Joachim Kuebart via GitGitGadget 2021-05-05 11:56 ` [PATCH v2 2/2] git-p4: speed up search for branch parent Joachim Kuebart via GitGitGadget 1 sibling, 0 replies; 10+ messages in thread From: Joachim Kuebart via GitGitGadget @ 2021-05-05 11:56 UTC (permalink / raw) To: git; +Cc: Luke Diamand, Joachim Kuebart, Joachim Kuebart, Joachim Kuebart From: Joachim Kuebart <joachim.kuebart@gmail.com> When importing a branch from p4, git-p4 searches the history of the parent branch for the branch point. The test for the complex branch structure ensures all files have the expected contents, but doesn't examine the branch structure. Check for the correct branch structure by making sure that the initial commit on each branch is empty. This ensures that the initial commit's parent is indeed the correct branch-off point. Signed-off-by: Joachim Kuebart <joachim.kuebart@gmail.com> --- t/t9801-git-p4-branch.sh | 2 ++ 1 file changed, 2 insertions(+) diff --git a/t/t9801-git-p4-branch.sh b/t/t9801-git-p4-branch.sh index ff94c3f17df1..50a6f8bad5c5 100755 --- a/t/t9801-git-p4-branch.sh +++ b/t/t9801-git-p4-branch.sh @@ -294,11 +294,13 @@ test_expect_success 'git p4 clone complex branches' ' test_path_is_file file3 && grep update file2 && git reset --hard p4/depot/branch4 && + git diff-tree --quiet HEAD && test_path_is_file file1 && test_path_is_file file2 && test_path_is_missing file3 && ! grep update file2 && git reset --hard p4/depot/branch5 && + git diff-tree --quiet HEAD && test_path_is_file file1 && test_path_is_file file2 && test_path_is_file file3 && -- gitgitgadget ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/2] git-p4: speed up search for branch parent 2021-05-05 11:56 ` [PATCH v2 0/2] " Joachim Kuebart via GitGitGadget 2021-05-05 11:56 ` [PATCH v2 1/2] git-p4: ensure complex branches are cloned correctly Joachim Kuebart via GitGitGadget @ 2021-05-05 11:56 ` Joachim Kuebart via GitGitGadget 1 sibling, 0 replies; 10+ messages in thread From: Joachim Kuebart via GitGitGadget @ 2021-05-05 11:56 UTC (permalink / raw) To: git; +Cc: Luke Diamand, Joachim Kuebart, Joachim Kuebart, Joachim Kuebart From: Joachim Kuebart <joachim.kuebart@gmail.com> For every new branch that git-p4 imports, it needs to find the commit where it branched off its parent branch. While p4 doesn't record this information explicitly, the first changelist on a branch is usually an identical copy of the parent branch. The method searchParent() tries to find a commit in the history of the given "parent" branch whose tree exactly matches the initial changelist of the new branch, "target". The code iterates through the parent commits and compares each of them to this initial changelist using diff-tree. Since we already know the tree object name we are looking for, spawning diff-tree for each commit is wasteful. Use the "--format" option of "rev-list" to find out the tree object name of each commit in the history, and find the tree whose name is exactly the same as the tree of the target commit to optimize this. This results in a considerable speed-up, at least on Windows. On one Windows machine with a fairly large repository of about 16000 commits in the parent branch, the current code takes over 7 minutes, while the new code only takes just over 10 seconds for the same changelist: Before: $ time git p4 sync Importing from/into multiple branches Depot paths: //depot Importing revision 31274 (100.0%) Updated branches: b1 real 7m41.458s user 0m0.000s sys 0m0.077s After: $ time git p4 sync Importing from/into multiple branches Depot paths: //depot Importing revision 31274 (100.0%) Updated branches: b1 real 0m10.235s user 0m0.000s sys 0m0.062s Signed-off-by: Joachim Kuebart <joachim.kuebart@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Luke Diamand <luke@diamand.org> --- git-p4.py | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/git-p4.py b/git-p4.py index 09c9e93ac401..d34a1946b754 100755 --- a/git-p4.py +++ b/git-p4.py @@ -3600,19 +3600,18 @@ def importNewBranch(self, branch, maxChange): return True def searchParent(self, parent, branch, target): - parentFound = False - for blob in read_pipe_lines(["git", "rev-list", "--reverse", + targetTree = read_pipe(["git", "rev-parse", + "{}^{{tree}}".format(target)]).strip() + for line in read_pipe_lines(["git", "rev-list", "--format=%H %T", "--no-merges", parent]): - blob = blob.strip() - if len(read_pipe(["git", "diff-tree", blob, target])) == 0: - parentFound = True + if line.startswith("commit "): + continue + commit, tree = line.strip().split(" ") + if tree == targetTree: if self.verbose: - print("Found parent of %s in commit %s" % (branch, blob)) - break - if parentFound: - return blob - else: - return None + print("Found parent of %s in commit %s" % (branch, commit)) + return commit + return None def importChanges(self, changes, origin_revision=0): cnt = 1 -- gitgitgadget ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-05-05 11:56 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-04-28 20:06 [PATCH] git-p4: speed up search for branch parent Joachim Kuebart via GitGitGadget 2021-04-29 2:22 ` Junio C Hamano 2021-04-29 7:48 ` Joachim Kuebart 2021-04-29 8:22 ` Luke Diamand 2021-04-29 8:31 ` Junio C Hamano 2021-04-29 19:31 ` Joachim Kuebart 2021-04-29 11:30 ` Joachim Kuebart 2021-05-05 11:56 ` [PATCH v2 0/2] " Joachim Kuebart via GitGitGadget 2021-05-05 11:56 ` [PATCH v2 1/2] git-p4: ensure complex branches are cloned correctly Joachim Kuebart via GitGitGadget 2021-05-05 11:56 ` [PATCH v2 2/2] git-p4: speed up search for branch parent Joachim Kuebart via GitGitGadget
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).