git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Luke Diamand <luke@diamand.org>
To: git@vger.kernel.org
Cc: "Eric Scouten" <eric@scouten.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Merland Romain" <merlorom@yahoo.fr>,
	"Miguel Torroja" <miguel.torroja@gmail.com>,
	"Lars Schneider" <larsxschneider@gmail.com>,
	"Lex Spoon" <lex@lexspoon.org>,
	viniciusalexandre@gmail.com, "Luke Diamand" <luke@diamand.org>
Subject: [PATCHv2 6/6] git-p4: auto-size the block
Date: Fri,  8 Jun 2018 21:32:48 +0100	[thread overview]
Message-ID: <20180608203248.16311-7-luke@diamand.org> (raw)
In-Reply-To: <20180608203248.16311-6-luke@diamand.org>

git-p4 originally would fetch changes in one query. On large repos this
could fail because of the limits that Perforce imposes on the number of
items returned and the number of queries in the database.

To fix this, git-p4 learned to query changes in blocks of 512 changes,
However, this can be very slow - if you have a few million changes,
with each chunk taking about a second, it can be an hour or so.

Although it's possible to tune this value manually with the
"--changes-block-size" option, it's far from obvious to ordinary users
that this is what needs doing.

This change alters the block size dynamically by looking for the
specific error messages returned from the Perforce server, and reducing
the block size if the error is seen, either to the limit reported by the
server, or to half the current block size.

That means we can start out with a very large block size, and then let
it automatically drop down to a value that works without error, while
still failing correctly if some other error occurs.

Signed-off-by: Luke Diamand <luke@diamand.org>
---
 git-p4.py               | 27 +++++++++++++++++++++------
 t/t9818-git-p4-block.sh |  8 ++++++++
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 5f59feb5bb..0354d4df5c 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -47,8 +47,8 @@ def __str__(self):
 # Only labels/tags matching this will be imported/exported
 defaultLabelRegexp = r'[a-zA-Z0-9_\-.]+$'
 
-# Grab changes in blocks of this many revisions, unless otherwise requested
-defaultBlockSize = 512
+# The block size is reduced automatically if required
+defaultBlockSize = 1<<20
 
 p4_access_checked = False
 
@@ -969,7 +969,8 @@ def p4ChangesForPaths(depotPaths, changeRange, requestedBlockSize):
     changes = set()
 
     # Retrieve changes a block at a time, to prevent running
-    # into a MaxResults/MaxScanRows error from the server.
+    # into a MaxResults/MaxScanRows error from the server. If
+    # we _do_ hit one of those errors, turn down the block size
 
     while True:
         cmd = ['changes']
@@ -983,10 +984,24 @@ def p4ChangesForPaths(depotPaths, changeRange, requestedBlockSize):
         for p in depotPaths:
             cmd += ["%s...@%s" % (p, revisionRange)]
 
+        # fetch the changes
+        try:
+            result = p4CmdList(cmd, errors_as_exceptions=True)
+        except P4RequestSizeException as e:
+            if not block_size:
+                block_size = e.limit
+            elif block_size > e.limit:
+                block_size = e.limit
+            else:
+                block_size = max(2, block_size // 2)
+
+            if verbose: print("block size error, retrying with block size {0}".format(block_size))
+            continue
+        except P4Exception as e:
+            die('Error retrieving changes description ({0})'.format(e.p4ExitCode))
+
         # Insert changes in chronological order
-        for entry in reversed(p4CmdList(cmd)):
-            if entry.has_key('p4ExitCode'):
-                die('Error retrieving changes descriptions ({})'.format(entry['p4ExitCode']))
+        for entry in reversed(result):
             if not entry.has_key('change'):
                 continue
             changes.add(int(entry['change']))
diff --git a/t/t9818-git-p4-block.sh b/t/t9818-git-p4-block.sh
index 8840a183ac..ce7cb22ad3 100755
--- a/t/t9818-git-p4-block.sh
+++ b/t/t9818-git-p4-block.sh
@@ -129,6 +129,7 @@ test_expect_success 'Create a repo with multiple depot paths' '
 '
 
 test_expect_success 'Clone repo with multiple depot paths' '
+	test_when_finished cleanup_git &&
 	(
 		cd "$git" &&
 		git p4 clone --changes-block-size=4 //depot/pathA@all //depot/pathB@all \
@@ -138,6 +139,13 @@ test_expect_success 'Clone repo with multiple depot paths' '
 	)
 '
 
+test_expect_success 'Clone repo with self-sizing block size' '
+	test_when_finished cleanup_git &&
+	git p4 clone --changes-block-size=1000000 //depot@all --destination="$git" &&
+	git -C "$git" log --oneline >log &&
+	test_line_count \> 10 log
+'
+
 test_expect_success 'kill p4d' '
 	kill_p4d
 '
-- 
2.17.0.392.gdeb1a6e9b7


  reply	other threads:[~2018-06-08 20:33 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-08 20:32 [PATCHv2 0/6] git-p4: some small fixes updated Luke Diamand
2018-06-08 20:32 ` [PATCHv2 1/6] git-p4: disable-rebase: allow setting this via configuration Luke Diamand
2018-06-08 20:32   ` [PATCHv2 2/6] git-p4: add option to disable syncing of p4/master with p4 Luke Diamand
2018-06-08 20:32     ` [PATCHv2 3/6] git-p4: better error reporting when p4 fails Luke Diamand
2018-06-08 20:32       ` [PATCHv2 4/6] git-p4: raise exceptions from p4CmdList based on error from p4 server Luke Diamand
2018-06-08 20:32         ` [PATCHv2 5/6] git-p4: narrow the scope of exceptions caught when parsing an int Luke Diamand
2018-06-08 20:32           ` Luke Diamand [this message]
2018-06-12 17:10 ` [PATCHv2 0/6] git-p4: some small fixes updated Junio C Hamano
2018-06-12 21:24   ` Luke Diamand
2018-06-12 21:35     ` Junio C Hamano
2018-06-12 21:49       ` Luke Diamand
2018-06-12 21:53         ` Eric Sunshine
2018-06-12 22:23           ` Luke Diamand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180608203248.16311-7-luke@diamand.org \
    --to=luke@diamand.org \
    --cc=eric@scouten.com \
    --cc=git@vger.kernel.org \
    --cc=larsxschneider@gmail.com \
    --cc=lex@lexspoon.org \
    --cc=merlorom@yahoo.fr \
    --cc=miguel.torroja@gmail.com \
    --cc=szeder.dev@gmail.com \
    --cc=viniciusalexandre@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).