git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] git-p4: improve performance with large files
@ 2009-03-04 21:54 Sam Hocevar
  2009-03-04 23:05 ` thestar
  0 siblings, 1 reply; 15+ messages in thread
From: Sam Hocevar @ 2009-03-04 21:54 UTC (permalink / raw
  To: git

   The current git-p4 way of concatenating strings performs in O(n^2)
and is therefore terribly slow with large files because of unnecessary
memory copies. The following patch makes the operation O(n).

   Using this patch, importing a 17GB repository with large files
(50 to 500MB) takes 2 hours instead of a week.

Signed-off-by: Sam Hocevar <sam@zoy.org>
---
 contrib/fast-import/git-p4 |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
index 9fdb0c6..09e9746 100755
--- a/contrib/fast-import/git-p4
+++ b/contrib/fast-import/git-p4
@@ -990,11 +990,12 @@ class P4Sync(Command):
         while j < len(filedata):
             stat = filedata[j]
             j += 1
-            text = ''
+            data = []
             while j < len(filedata) and filedata[j]['code'] in ('text', 'unicode', 'binary'):
-                text += filedata[j]['data']
+                data.append(filedata[j]['data'])
                 del filedata[j]['data']
                 j += 1
+            text = "".join(data)
 
             if not stat.has_key('depotFile'):
                 sys.stderr.write("p4 print fails with: %s\n" % repr(stat))
-- 
1.6.1.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] git-p4: improve performance with large files
  2009-03-04 21:54 [PATCH] git-p4: improve performance with large files Sam Hocevar
@ 2009-03-04 23:05 ` thestar
  2009-03-05 17:23   ` Sam Hocevar
  0 siblings, 1 reply; 15+ messages in thread
From: thestar @ 2009-03-04 23:05 UTC (permalink / raw
  To: Sam Hocevar; +Cc: git

Quoting Sam Hocevar <sam@zoy.org>:

>    The current git-p4 way of concatenating strings performs in O(n^2)
> and is therefore terribly slow with large files because of unnecessary
> memory copies. The following patch makes the operation O(n).

The reason why it uses simple concatenation is to cut down on memory usage.
  - It is a tradeoff.

I think the modification you have made below is reasonable, however be  
aware that memory usage could double, which substantially reduce the  
size of the changesets that git-p4 would be able to import /at all/,  
rather than to merely be slow.

That said, you do need to delete the data temporary array to cut down  
on memory.
  -- I would do this immediately after the "".join(data).

>
>    Using this patch, importing a 17GB repository with large files
> (50 to 500MB) takes 2 hours instead of a week.
>
> Signed-off-by: Sam Hocevar <sam@zoy.org>
> ---
>  contrib/fast-import/git-p4 |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
> index 9fdb0c6..09e9746 100755
> --- a/contrib/fast-import/git-p4
> +++ b/contrib/fast-import/git-p4
> @@ -990,11 +990,12 @@ class P4Sync(Command):
>          while j < len(filedata):
>              stat = filedata[j]
>              j += 1
> -            text = ''
> +            data = []
>              while j < len(filedata) and filedata[j]['code'] in   
> ('text', 'unicode', 'binary'):
> -                text += filedata[j]['data']
> +                data.append(filedata[j]['data'])
>                  del filedata[j]['data']
>                  j += 1
> +            text = "".join(data)
>
>              if not stat.has_key('depotFile'):
>                  sys.stderr.write("p4 print fails with: %s\n" % repr(stat))
> --
> 1.6.1.3
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] git-p4: improve performance with large files
  2009-03-04 23:05 ` thestar
@ 2009-03-05 17:23   ` Sam Hocevar
  2009-03-06  0:01     ` thestar
  2009-03-06  1:14     ` Junio C Hamano
  0 siblings, 2 replies; 15+ messages in thread
From: Sam Hocevar @ 2009-03-05 17:23 UTC (permalink / raw
  To: git

On Thu, Mar 05, 2009, thestar@fussycoder.id.au wrote:

> >   The current git-p4 way of concatenating strings performs in O(n^2)
> >and is therefore terribly slow with large files because of unnecessary
> >memory copies. The following patch makes the operation O(n).
> 
> The reason why it uses simple concatenation is to cut down on memory usage.
>  - It is a tradeoff.
> 
> I think the modification you have made below is reasonable, however be  
> aware that memory usage could double, which substantially reduce the  
> size of the changesets that git-p4 would be able to import /at all/,  
> rather than to merely be slow.

   Uhm, no. The memory usage could be an additional X, where X is the
size of the biggest file in the commit. Remember that commit() stores
the complete commit data in memory before sending it to fast-import.
Also, on my machine the extra memory is already used because at some
point, "text += foo" calls realloc() anyway and often duplicates the
memory used by text.

   The ideal solution is to use a generator and refactor the commit
handling as a stream. I am working on that but it involves deeper
changes, so as I am not sure it will be accepted, I'm providing the
attached compromise patch first. At least it solves the appaling speed
issue. I tuned it so that it never uses more than 32 MiB extra memory.

Signed-off-by: Sam Hocevar <sam@zoy.org>
---
 contrib/fast-import/git-p4 |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
index 3832f60..151ae1c 100755
--- a/contrib/fast-import/git-p4
+++ b/contrib/fast-import/git-p4
@@ -984,11 +984,19 @@ class P4Sync(Command):
         while j < len(filedata):
             stat = filedata[j]
             j += 1
+            data = []
             text = ''
             while j < len(filedata) and filedata[j]['code'] in ('text', 'unicod
e', 'binary'):
-                text += filedata[j]['data']
+                data.append(filedata[j]['data'])
                 del filedata[j]['data']
+                # p4 sends 4k chunks, make sure we don't use more than 32 MiB
+                # of additional memory while rebuilding the file data.
+                if len(data) > 8192:
+                    text += ''.join(data)
+                    data = []
                 j += 1
+            text += ''.join(data)
+            del data

             if not stat.has_key('depotFile'):
                 sys.stderr.write("p4 print fails with: %s\n" % repr(stat))

-- 
Sam.

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] git-p4: improve performance with large files
  2009-03-05 17:23   ` Sam Hocevar
@ 2009-03-06  0:01     ` thestar
  2009-03-06  1:14     ` Junio C Hamano
  1 sibling, 0 replies; 15+ messages in thread
From: thestar @ 2009-03-06  0:01 UTC (permalink / raw
  To: Sam Hocevar; +Cc: git

Quoting Sam Hocevar <sam@zoy.org>:

> On Thu, Mar 05, 2009, thestar@fussycoder.id.au wrote:
>
>> >   The current git-p4 way of concatenating strings performs in O(n^2)
>> >and is therefore terribly slow with large files because of unnecessary
>> >memory copies. The following patch makes the operation O(n).
>>
>> The reason why it uses simple concatenation is to cut down on memory usage.
>>  - It is a tradeoff.
>>
>> I think the modification you have made below is reasonable, however be
>> aware that memory usage could double, which substantially reduce the
>> size of the changesets that git-p4 would be able to import /at all/,
>> rather than to merely be slow.
>
>    Uhm, no. The memory usage could be an additional X, where X is the
> size of the biggest file in the commit. Remember that commit() stores
> the complete commit data in memory before sending it to fast-import.
> Also, on my machine the extra memory is already used because at some
> point, "text += foo" calls realloc() anyway and often duplicates the
> memory used by text.

You are correct - sorry, got confused as I somehow got mistaken that  
that form (foo += bar) can potentially be optimized, but it doesn't.   
That's what I get for not paying enough attention to the language  
used... :(

>    The ideal solution is to use a generator and refactor the commit
> handling as a stream. I am working on that but it involves deeper
> changes, so as I am not sure it will be accepted, I'm providing the
> attached compromise patch first. At least it solves the appaling speed
> issue. I tuned it so that it never uses more than 32 MiB extra memory.

That is definitely the ideal solution - I would hope that it gets  
accepted if you manage to cut down on memory usage - as it really is a  
limitation for certain repositories. (Infact, this is why I no-longer  
use git-p4).

>
> Signed-off-by: Sam Hocevar <sam@zoy.org>
> ---
>  contrib/fast-import/git-p4 |   10 +++++++++-
>  1 files changed, 9 insertions(+), 1 deletions(-)
>
> diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
> index 3832f60..151ae1c 100755
> --- a/contrib/fast-import/git-p4
> +++ b/contrib/fast-import/git-p4
> @@ -984,11 +984,19 @@ class P4Sync(Command):
>          while j < len(filedata):
>              stat = filedata[j]
>              j += 1
> +            data = []
>              text = ''
>              while j < len(filedata) and filedata[j]['code'] in   
> ('text', 'unicod
> e', 'binary'):
> -                text += filedata[j]['data']
> +                data.append(filedata[j]['data'])
>                  del filedata[j]['data']
> +                # p4 sends 4k chunks, make sure we don't use more   
> than 32 MiB
> +                # of additional memory while rebuilding the file data.
> +                if len(data) > 8192:
> +                    text += ''.join(data)
> +                    data = []
>                  j += 1
> +            text += ''.join(data)
> +            del data
>
>              if not stat.has_key('depotFile'):
>                  sys.stderr.write("p4 print fails with: %s\n" % repr(stat))
>
> --
> Sam.
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] git-p4: improve performance with large files
  2009-03-05 17:23   ` Sam Hocevar
  2009-03-06  0:01     ` thestar
@ 2009-03-06  1:14     ` Junio C Hamano
  2009-03-06  1:25       ` Han-Wen Nienhuys
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2009-03-06  1:14 UTC (permalink / raw
  To: Simon Hausmann, Han-Wen Nienhuys; +Cc: git, Sam Hocevar


Sam Hocevar <sam@zoy.org> writes:

> ... The ideal solution is to use a generator and refactor the commit
> handling as a stream. I am working on that but it involves deeper
> changes, so as I am not sure it will be accepted, I'm providing the
> attached compromise patch first. At least it solves the appaling speed
> issue. I tuned it so that it never uses more than 32 MiB extra memory.
>
> Signed-off-by: Sam Hocevar <sam@zoy.org>
> ---

I do not do p4, but the patch looks obviously correct.  Comments?

>  contrib/fast-import/git-p4 |   10 +++++++++-
>  1 files changed, 9 insertions(+), 1 deletions(-)
>
> diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
> index 3832f60..151ae1c 100755
> --- a/contrib/fast-import/git-p4
> +++ b/contrib/fast-import/git-p4
> @@ -984,11 +984,19 @@ class P4Sync(Command):
>          while j < len(filedata):
>              stat = filedata[j]
>              j += 1
> +            data = []
>              text = ''
>              while j < len(filedata) and filedata[j]['code'] in ('text', 'unicod
> e', 'binary'):
> -                text += filedata[j]['data']
> +                data.append(filedata[j]['data'])
>                  del filedata[j]['data']
> +                # p4 sends 4k chunks, make sure we don't use more than 32 MiB
> +                # of additional memory while rebuilding the file data.
> +                if len(data) > 8192:
> +                    text += ''.join(data)
> +                    data = []
>                  j += 1
> +            text += ''.join(data)
> +            del data
>
>              if not stat.has_key('depotFile'):
>                  sys.stderr.write("p4 print fails with: %s\n" % repr(stat))
>
> -- 
> Sam.
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] git-p4: improve performance with large files
  2009-03-06  1:14     ` Junio C Hamano
@ 2009-03-06  1:25       ` Han-Wen Nienhuys
  2009-03-06  8:53         ` Sam Hocevar
  0 siblings, 1 reply; 15+ messages in thread
From: Han-Wen Nienhuys @ 2009-03-06  1:25 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Simon Hausmann, git, Sam Hocevar

I don't understand the point of trying to save the 32 mb, if people
are sending you blobs that are that large.

The approach to avoid sequences of appends looks sound.


On Thu, Mar 5, 2009 at 10:14 PM, Junio C Hamano <gitster@pobox.com> wrote:

>> ... The ideal solution is to use a generator and refactor the commit
>> handling as a stream. I am working on that but it involves deeper
>> changes, so as I am not sure it will be accepted, I'm providing the
>> attached compromise patch first. At least it solves the appaling speed
>> issue. I tuned it so that it never uses more than 32 MiB extra memory.

>> +            text += ''.join(data)
>> +            del data

i'd say

  data = []

add a comment that you're trying to save memory. There is no reason to
remove data from the namespace.

-- 
Han-Wen Nienhuys
Google Engineering Belo Horizonte
hanwen@google.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] git-p4: improve performance with large files
  2009-03-06  1:25       ` Han-Wen Nienhuys
@ 2009-03-06  8:53         ` Sam Hocevar
  2009-03-06  9:42           ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Sam Hocevar @ 2009-03-06  8:53 UTC (permalink / raw
  To: git

On Thu, Mar 05, 2009, Han-Wen Nienhuys wrote:

> i'd say
> 
>   data = []
> 
> add a comment that you're trying to save memory. There is no reason to
> remove data from the namespace.

   Okay. Here is an improved version.

Signed-off-by: Sam Hocevar <sam@zoy.org>
---
 contrib/fast-import/git-p4 |   13 ++++++++++++-
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
index 3832f60..db0ea0a 100755
--- a/contrib/fast-import/git-p4
+++ b/contrib/fast-import/git-p4
@@ -984,11 +984,22 @@ class P4Sync(Command):
         while j < len(filedata):
             stat = filedata[j]
             j += 1
+            data = []
             text = ''
+            # Append data every 8192 chunks to 1) ensure decent performance
+            # by not making too many string concatenations and 2) avoid
+            # excessive memory usage by purging "data" often enough. p4
+            # sends 4k chunks, so we should not use more than 32 MiB of
+            # additional memory while rebuilding the file data.
             while j < len(filedata) and filedata[j]['code'] in ('text', 'unicod
e', 'binary'):
-                text += filedata[j]['data']
+                data.append(filedata[j]['data'])
                 del filedata[j]['data']
+                if len(data) > 8192:
+                    text += ''.join(data)
+                    data = []
                 j += 1
+            text += ''.join(data)
+            data = None

             if not stat.has_key('depotFile'):
                 sys.stderr.write("p4 print fails with: %s\n" % repr(stat))

-- 
Sam.

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] git-p4: improve performance with large files
  2009-03-06  8:53         ` Sam Hocevar
@ 2009-03-06  9:42           ` Junio C Hamano
  2009-03-06 10:13             ` [PATCH v4] " Sam Hocevar
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2009-03-06  9:42 UTC (permalink / raw
  To: git

Sam Hocevar <sam@zoy.org> writes:

> On Thu, Mar 05, 2009, Han-Wen Nienhuys wrote:
>
>> i'd say
>> 
>>   data = []
>> 
>> add a comment that you're trying to save memory. There is no reason to
>> remove data from the namespace.
>
>    Okay. Here is an improved version.
>
> Signed-off-by: Sam Hocevar <sam@zoy.org>

Sorry, but that's not a commit log message, so signing it off does not add
much value to the patch.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4] git-p4: improve performance with large files
  2009-03-06  9:42           ` Junio C Hamano
@ 2009-03-06 10:13             ` Sam Hocevar
  2009-03-07 12:25               ` [PATCH v5] git-p4: improve performance when importing huge files by reducing the number of string concatenations while constraining memory usage Sam Hocevar
  0 siblings, 1 reply; 15+ messages in thread
From: Sam Hocevar @ 2009-03-06 10:13 UTC (permalink / raw
  To: git

On Fri, Mar 06, 2009, Junio C Hamano wrote:

> Sorry, but that's not a commit log message, so signing it off does not add
> much value to the patch.

   I'm sorry, for some reason git format-patch is eating my commit
messages (other commits appear just fine). As I'm doing this from a
Windows machine, I smell CR/LF issues in msysgit. Here is a clean
version:


git-p4: improve performance when importing huge files by reducing
the number of string concatenations while constraining memory usage.

Signed-off-by: Sam Hocevar <sam@zoy.org>
---
 contrib/fast-import/git-p4 |   13 ++++++++++++-
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
index 3832f60..db0ea0a 100755
--- a/contrib/fast-import/git-p4
+++ b/contrib/fast-import/git-p4
@@ -984,11 +984,22 @@ class P4Sync(Command):
         while j < len(filedata):
             stat = filedata[j]
             j += 1
+            data = []
             text = ''
+            # Append data every 8192 chunks to 1) ensure decent performance
+            # by not making too many string concatenations and 2) avoid
+            # excessive memory usage by purging "data" often enough. p4
+            # sends 4k chunks, so we should not use more than 32 MiB of
+            # additional memory while rebuilding the file data.
             while j < len(filedata) and filedata[j]['code'] in ('text', 'unicod
e', 'binary'):
-                text += filedata[j]['data']
+                data.append(filedata[j]['data'])
                 del filedata[j]['data']
+                if len(data) > 8192:
+                    text += ''.join(data)
+                    data = []
                 j += 1
+            text += ''.join(data)
+            data = None

             if not stat.has_key('depotFile'):
                 sys.stderr.write("p4 print fails with: %s\n" % repr(stat))
-- 
Sam.

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH] git-p4: remove unnecessary semicolons at end of lines
@ 2009-03-06 15:53 Sam Hocevar
  2009-03-06 16:55 ` Brandon Casey
  2009-03-07 12:26 ` [PATCH v2] git-p4: remove unnecessary semicolons at end of lines Sam Hocevar
  0 siblings, 2 replies; 15+ messages in thread
From: Sam Hocevar @ 2009-03-06 15:53 UTC (permalink / raw
  To: git

   This is a purely cosmetic patch that makes the git-p4 code look more
pythonish by getting rid of end-of-line semicolons.


git-p4: remove unnecessary semicolons at end of lines.

Signed-off-by: Sam Hocevar <sam@zoy.org>
---
 contrib/fast-import/git-p4 |   46 ++++++++++++++++++++++----------------------
 1 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
index 3832f60..7ea5ac6 100755
--- a/contrib/fast-import/git-p4
+++ b/contrib/fast-import/git-p4
@@ -12,7 +12,7 @@ import optparse, sys, os, marshal, popen2, subprocess, shelve
 import tempfile, getopt, sha, os.path, time, platform
 import re

-from sets import Set;
+from sets import Set

 verbose = False

@@ -240,7 +240,7 @@ def p4Cmd(cmd):
     result = {}
     for entry in list:
         result.update(entry)
-    return result;
+    return result

 def p4Where(depotPath):
     if not depotPath.endswith("/"):
@@ -281,7 +281,7 @@ def currentGitBranch():
 def isValidGitDir(path):
     if (os.path.exists(path + "/HEAD")
         and os.path.exists(path + "/refs") and os.path.exists(path + "/objects")):
-        return True;
+        return True
     return False

 def parseRevision(ref):
@@ -328,8 +328,8 @@ def extractSettingsGitLog(log):

 def gitBranchExists(branch):
     proc = subprocess.Popen(["git", "rev-parse", branch],
-                            stderr=subprocess.PIPE, stdout=subprocess.PIPE);
-    return proc.wait() == 0;
+                            stderr=subprocess.PIPE, stdout=subprocess.PIPE)
+    return proc.wait() == 0

 _gitConfig = {}
 def gitConfig(key):
@@ -492,7 +492,7 @@ class P4RollBack(Command):
         maxChange = int(args[0])

         if "p4ExitCode" in p4Cmd("changes -m 1"):
-            die("Problems executing p4");
+            die("Problems executing p4")

         if self.rollbackLocalBranches:
             refPrefix = "refs/heads/"
@@ -663,7 +663,7 @@ class P4Submit(Command):
             if response == "s":
                 print "Skipping! Good luck with the next patches..."
                 for f in editedFiles:
-                    p4_system("revert \"%s\"" % f);
+                    p4_system("revert \"%s\"" % f)
                 for f in filesToAdd:
                     system("rm %s" %f)
                 return
@@ -734,7 +734,7 @@ class P4Submit(Command):
             if os.environ.has_key("P4EDITOR"):
                 editor = os.environ.get("P4EDITOR")
             else:
-                editor = os.environ.get("EDITOR", defaultEditor);
+                editor = os.environ.get("EDITOR", defaultEditor)
             system(editor + " " + fileName)

             response = "y"
@@ -753,9 +753,9 @@ class P4Submit(Command):
                 p4_write_pipe("submit -i", submitTemplate)
             else:
                 for f in editedFiles:
-                    p4_system("revert \"%s\"" % f);
+                    p4_system("revert \"%s\"" % f)
                 for f in filesToAdd:
-                    p4_system("revert \"%s\"" % f);
+                    p4_system("revert \"%s\"" % f)
                     system("rm %s" %f)

             os.remove(fileName)
@@ -977,9 +977,9 @@ class P4Sync(Command):

             if "p4ExitCode" in filedata[0]:
                 die("Problems executing p4. Error: [%d]."
-                    % (filedata[0]['p4ExitCode']));
+                    % (filedata[0]['p4ExitCode']))

-        j = 0;
+        j = 0
         contents = {}
         while j < len(filedata):
             stat = filedata[j]
@@ -1303,8 +1303,8 @@ class P4Sync(Command):
     def importNewBranch(self, branch, maxChange):
         # make fast-import flush all changes to disk and update the refs using the checkpoint
         # command so that we can try to find the branch parent in the git history
-        self.gitStream.write("checkpoint\n\n");
-        self.gitStream.flush();
+        self.gitStream.write("checkpoint\n\n")
+        self.gitStream.flush()
         branchPrefix = self.depotPaths[0] + branch + "/"
         range = "@1,%s" % maxChange
         #print "prefix" + branchPrefix
@@ -1364,12 +1364,12 @@ class P4Sync(Command):
                                 fullBranch = self.projectName + branch
                                 if fullBranch not in self.p4BranchesInGit:
                                     if not self.silent:
-                                        print("\n    Importing new branch %s" % fullBranch);
+                                        print("\n    Importing new branch %s" % fullBranch)
                                     if self.importNewBranch(branch, change - 1):
                                         parent = ""
                                         self.p4BranchesInGit.append(fullBranch)
                                     if not self.silent:
-                                        print("\n    Resuming with change %s" % change);
+                                        print("\n    Resuming with change %s" % change)

                                 if self.verbose:
                                     print "parent determined through known branches: %s" % parent
@@ -1485,7 +1485,7 @@ class P4Sync(Command):
             self.branch = self.refPrefix + "master"
             if gitBranchExists("refs/heads/p4") and self.importIntoRemotes:
                 system("git update-ref %s refs/heads/p4" % self.branch)
-                system("git branch -D p4");
+                system("git branch -D p4")
             # create it /after/ importing, when master exists
             if not gitBranchExists(self.refPrefix + "HEAD") and self.importIntoRemotes and gitBranchExists(self.branch):
                 system("git symbolic-ref %sHEAD %s" % (self.refPrefix, self.branch))
@@ -1591,7 +1591,7 @@ class P4Sync(Command):
         self.loadUserMapFromCache()
         self.labels = {}
         if self.detectLabels:
-            self.getLabels();
+            self.getLabels()

         if self.detectBranches:
             ## FIXME - what's a P4 projectName ?
@@ -1615,7 +1615,7 @@ class P4Sync(Command):

         importProcess = subprocess.Popen(["git", "fast-import"],
                                          stdin=subprocess.PIPE, stdout=subprocess.PIPE,
-                                         stderr=subprocess.PIPE);
+                                         stderr=subprocess.PIPE)
         self.gitOutput = importProcess.stdout
         self.gitStream = importProcess.stdin
         self.gitError = importProcess.stderr
@@ -1688,9 +1688,9 @@ class P4Rebase(Command):

     def rebase(self):
         if os.system("git update-index --refresh") != 0:
-            die("Some files in your working directory are modified and different than what is in your index. You can use git update-index <filename> to bring the index up-to-date or stash away all your changes with git stash.");
+            die("Some files in your working directory are modified and different than what is in your index. You can use git update-index <filename> to bring the index up-to-date or stash away all your changes with git stash.")
         if len(read_pipe("git diff-index HEAD --")) > 0:
-            die("You have uncommited changes. Please commit them before rebasing or stash them away with git stash.");
+            die("You have uncommited changes. Please commit them before rebasing or stash them away with git stash.")

         [upstream, settings] = findUpstreamBranchPoint()
         if len(upstream) == 0:
@@ -1866,7 +1866,7 @@ def main():
                                        description = cmd.description,
                                        formatter = HelpFormatter())

-        (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
+        (cmd, args) = parser.parse_args(sys.argv[2:], cmd)
     global verbose
     verbose = cmd.verbose
     if cmd.needsGit:
@@ -1877,7 +1877,7 @@ def main():
                 if os.path.exists(cmd.gitdir):
                     cdup = read_pipe("git rev-parse --show-cdup").strip()
                     if len(cdup) > 0:
-                        chdir(cdup);
+                        chdir(cdup)

         if not isValidGitDir(cmd.gitdir):
             if isValidGitDir(cmd.gitdir + "/.git"):

-- 
Sam.

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] git-p4: remove unnecessary semicolons at end of lines
  2009-03-06 15:53 [PATCH] git-p4: remove unnecessary semicolons at end of lines Sam Hocevar
@ 2009-03-06 16:55 ` Brandon Casey
  2009-03-06 17:11   ` msysgit corrupting commit messages? Sam Hocevar
  2009-03-07 12:26 ` [PATCH v2] git-p4: remove unnecessary semicolons at end of lines Sam Hocevar
  1 sibling, 1 reply; 15+ messages in thread
From: Brandon Casey @ 2009-03-06 16:55 UTC (permalink / raw
  To: Sam Hocevar; +Cc: git

Sam Hocevar wrote:
>    This is a purely cosmetic patch that makes the git-p4 code look more
> pythonish by getting rid of end-of-line semicolons.


I get the impression that you do not intend for the comments above to be
part of the commit message.  If that is true, then they should be placed
further down after the '---' and before the diff-stat, or in some other
way partitioned from the commit message (maybe using --->8--- notation).
Otherwise they will _become_ part of the commit message if Junio applies
this patch as-is.

-brandon


> git-p4: remove unnecessary semicolons at end of lines.
> 
> Signed-off-by: Sam Hocevar <sam@zoy.org>
> ---


Comments not intended to be part of the commit message go here, where they
will be ignored by git-apply.


>  contrib/fast-import/git-p4 |   46 ++++++++++++++++++++++----------------------
>  1 files changed, 23 insertions(+), 23 deletions(-)
> 
> diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
> index 3832f60..7ea5ac6 100755
> --- a/contrib/fast-import/git-p4
> +++ b/contrib/fast-import/git-p4
> @@ -12,7 +12,7 @@ import optparse, sys, os, marshal, popen2, subprocess, shelve
<snip>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* msysgit corrupting commit messages?
  2009-03-06 16:55 ` Brandon Casey
@ 2009-03-06 17:11   ` Sam Hocevar
  2009-03-07  2:48     ` Johannes Schindelin
  0 siblings, 1 reply; 15+ messages in thread
From: Sam Hocevar @ 2009-03-06 17:11 UTC (permalink / raw
  To: git

On Fri, Mar 06, 2009, Brandon Casey wrote:

> I get the impression that you do not intend for the comments above to be
> part of the commit message.  If that is true, then they should be placed
> further down after the '---' and before the diff-stat, or in some other
> way partitioned from the commit message (maybe using --->8--- notation).
> Otherwise they will _become_ part of the commit message if Junio applies
> this patch as-is.

   Okay. I'm definitely having a problem with git on MSYS. Is anyone
else seeing it butcher commits it exports? They appear fine in git log
or git log -p, but not in git format-patch. I've never had the problem
on a Linux system.

-- 
Sam.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: msysgit corrupting commit messages?
  2009-03-06 17:11   ` msysgit corrupting commit messages? Sam Hocevar
@ 2009-03-07  2:48     ` Johannes Schindelin
  0 siblings, 0 replies; 15+ messages in thread
From: Johannes Schindelin @ 2009-03-07  2:48 UTC (permalink / raw
  To: Sam Hocevar; +Cc: git

Hi,

On Fri, 6 Mar 2009, Sam Hocevar wrote:

> On Fri, Mar 06, 2009, Brandon Casey wrote:
> 
> > I get the impression that you do not intend for the comments above to be
> > part of the commit message.  If that is true, then they should be placed
> > further down after the '---' and before the diff-stat, or in some other
> > way partitioned from the commit message (maybe using --->8--- notation).
> > Otherwise they will _become_ part of the commit message if Junio applies
> > this patch as-is.
> 
>    Okay. I'm definitely having a problem with git on MSYS. Is anyone
> else seeing it butcher commits it exports? They appear fine in git log
> or git log -p, but not in git format-patch. I've never had the problem
> on a Linux system.

I have never had problems with msysGit's format-patch.  From Brandon's 
description however, it appears as if the offending changes were done 
manually...

Note: I missed the original message.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v5] git-p4: improve performance when importing huge files by reducing the number of string concatenations while constraining memory usage.
  2009-03-06 10:13             ` [PATCH v4] " Sam Hocevar
@ 2009-03-07 12:25               ` Sam Hocevar
  0 siblings, 0 replies; 15+ messages in thread
From: Sam Hocevar @ 2009-03-07 12:25 UTC (permalink / raw
  To: git


Signed-off-by: Sam Hocevar <sam@zoy.org>
---
 This is a properly formatted version of the previous patch.

 contrib/fast-import/git-p4 |   13 ++++++++++++-
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
index 3832f60..db0ea0a 100755
--- a/contrib/fast-import/git-p4
+++ b/contrib/fast-import/git-p4
@@ -984,11 +984,22 @@ class P4Sync(Command):
         while j < len(filedata):
             stat = filedata[j]
             j += 1
+            data = []
             text = ''
+            # Append data every 8192 chunks to 1) ensure decent performance
+            # by not making too many string concatenations and 2) avoid
+            # excessive memory usage by purging "data" often enough. p4
+            # sends 4k chunks, so we should not use more than 32 MiB of
+            # additional memory while rebuilding the file data.
             while j < len(filedata) and filedata[j]['code'] in ('text', 'unicode', 'binary'):
-                text += filedata[j]['data']
+                data.append(filedata[j]['data'])
                 del filedata[j]['data']
+                if len(data) > 8192:
+                    text += ''.join(data)
+                    data = []
                 j += 1
+            text += ''.join(data)
+            data = None
 
             if not stat.has_key('depotFile'):
                 sys.stderr.write("p4 print fails with: %s\n" % repr(stat))
-- 
1.6.1.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2] git-p4: remove unnecessary semicolons at end of lines.
  2009-03-06 15:53 [PATCH] git-p4: remove unnecessary semicolons at end of lines Sam Hocevar
  2009-03-06 16:55 ` Brandon Casey
@ 2009-03-07 12:26 ` Sam Hocevar
  1 sibling, 0 replies; 15+ messages in thread
From: Sam Hocevar @ 2009-03-07 12:26 UTC (permalink / raw
  To: git


Signed-off-by: Sam Hocevar <sam@zoy.org>
---
 This is a properly formatted version of the previous patch.

 contrib/fast-import/git-p4 |   46 ++++++++++++++++++++++----------------------
 1 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
index 3832f60..7ea5ac6 100755
--- a/contrib/fast-import/git-p4
+++ b/contrib/fast-import/git-p4
@@ -12,7 +12,7 @@ import optparse, sys, os, marshal, popen2, subprocess, shelve
 import tempfile, getopt, sha, os.path, time, platform
 import re
 
-from sets import Set;
+from sets import Set
 
 verbose = False
 
@@ -240,7 +240,7 @@ def p4Cmd(cmd):
     result = {}
     for entry in list:
         result.update(entry)
-    return result;
+    return result
 
 def p4Where(depotPath):
     if not depotPath.endswith("/"):
@@ -281,7 +281,7 @@ def currentGitBranch():
 def isValidGitDir(path):
     if (os.path.exists(path + "/HEAD")
         and os.path.exists(path + "/refs") and os.path.exists(path + "/objects")):
-        return True;
+        return True
     return False
 
 def parseRevision(ref):
@@ -328,8 +328,8 @@ def extractSettingsGitLog(log):
 
 def gitBranchExists(branch):
     proc = subprocess.Popen(["git", "rev-parse", branch],
-                            stderr=subprocess.PIPE, stdout=subprocess.PIPE);
-    return proc.wait() == 0;
+                            stderr=subprocess.PIPE, stdout=subprocess.PIPE)
+    return proc.wait() == 0
 
 _gitConfig = {}
 def gitConfig(key):
@@ -492,7 +492,7 @@ class P4RollBack(Command):
         maxChange = int(args[0])
 
         if "p4ExitCode" in p4Cmd("changes -m 1"):
-            die("Problems executing p4");
+            die("Problems executing p4")
 
         if self.rollbackLocalBranches:
             refPrefix = "refs/heads/"
@@ -663,7 +663,7 @@ class P4Submit(Command):
             if response == "s":
                 print "Skipping! Good luck with the next patches..."
                 for f in editedFiles:
-                    p4_system("revert \"%s\"" % f);
+                    p4_system("revert \"%s\"" % f)
                 for f in filesToAdd:
                     system("rm %s" %f)
                 return
@@ -734,7 +734,7 @@ class P4Submit(Command):
             if os.environ.has_key("P4EDITOR"):
                 editor = os.environ.get("P4EDITOR")
             else:
-                editor = os.environ.get("EDITOR", defaultEditor);
+                editor = os.environ.get("EDITOR", defaultEditor)
             system(editor + " " + fileName)
 
             response = "y"
@@ -753,9 +753,9 @@ class P4Submit(Command):
                 p4_write_pipe("submit -i", submitTemplate)
             else:
                 for f in editedFiles:
-                    p4_system("revert \"%s\"" % f);
+                    p4_system("revert \"%s\"" % f)
                 for f in filesToAdd:
-                    p4_system("revert \"%s\"" % f);
+                    p4_system("revert \"%s\"" % f)
                     system("rm %s" %f)
 
             os.remove(fileName)
@@ -977,9 +977,9 @@ class P4Sync(Command):
 
             if "p4ExitCode" in filedata[0]:
                 die("Problems executing p4. Error: [%d]."
-                    % (filedata[0]['p4ExitCode']));
+                    % (filedata[0]['p4ExitCode']))
 
-        j = 0;
+        j = 0
         contents = {}
         while j < len(filedata):
             stat = filedata[j]
@@ -1303,8 +1303,8 @@ class P4Sync(Command):
     def importNewBranch(self, branch, maxChange):
         # make fast-import flush all changes to disk and update the refs using the checkpoint
         # command so that we can try to find the branch parent in the git history
-        self.gitStream.write("checkpoint\n\n");
-        self.gitStream.flush();
+        self.gitStream.write("checkpoint\n\n")
+        self.gitStream.flush()
         branchPrefix = self.depotPaths[0] + branch + "/"
         range = "@1,%s" % maxChange
         #print "prefix" + branchPrefix
@@ -1364,12 +1364,12 @@ class P4Sync(Command):
                                 fullBranch = self.projectName + branch
                                 if fullBranch not in self.p4BranchesInGit:
                                     if not self.silent:
-                                        print("\n    Importing new branch %s" % fullBranch);
+                                        print("\n    Importing new branch %s" % fullBranch)
                                     if self.importNewBranch(branch, change - 1):
                                         parent = ""
                                         self.p4BranchesInGit.append(fullBranch)
                                     if not self.silent:
-                                        print("\n    Resuming with change %s" % change);
+                                        print("\n    Resuming with change %s" % change)
 
                                 if self.verbose:
                                     print "parent determined through known branches: %s" % parent
@@ -1485,7 +1485,7 @@ class P4Sync(Command):
             self.branch = self.refPrefix + "master"
             if gitBranchExists("refs/heads/p4") and self.importIntoRemotes:
                 system("git update-ref %s refs/heads/p4" % self.branch)
-                system("git branch -D p4");
+                system("git branch -D p4")
             # create it /after/ importing, when master exists
             if not gitBranchExists(self.refPrefix + "HEAD") and self.importIntoRemotes and gitBranchExists(self.branch):
                 system("git symbolic-ref %sHEAD %s" % (self.refPrefix, self.branch))
@@ -1591,7 +1591,7 @@ class P4Sync(Command):
         self.loadUserMapFromCache()
         self.labels = {}
         if self.detectLabels:
-            self.getLabels();
+            self.getLabels()
 
         if self.detectBranches:
             ## FIXME - what's a P4 projectName ?
@@ -1615,7 +1615,7 @@ class P4Sync(Command):
 
         importProcess = subprocess.Popen(["git", "fast-import"],
                                          stdin=subprocess.PIPE, stdout=subprocess.PIPE,
-                                         stderr=subprocess.PIPE);
+                                         stderr=subprocess.PIPE)
         self.gitOutput = importProcess.stdout
         self.gitStream = importProcess.stdin
         self.gitError = importProcess.stderr
@@ -1688,9 +1688,9 @@ class P4Rebase(Command):
 
     def rebase(self):
         if os.system("git update-index --refresh") != 0:
-            die("Some files in your working directory are modified and different than what is in your index. You can use git update-index <filename> to bring the index up-to-date or stash away all your changes with git stash.");
+            die("Some files in your working directory are modified and different than what is in your index. You can use git update-index <filename> to bring the index up-to-date or stash away all your changes with git stash.")
         if len(read_pipe("git diff-index HEAD --")) > 0:
-            die("You have uncommited changes. Please commit them before rebasing or stash them away with git stash.");
+            die("You have uncommited changes. Please commit them before rebasing or stash them away with git stash.")
 
         [upstream, settings] = findUpstreamBranchPoint()
         if len(upstream) == 0:
@@ -1866,7 +1866,7 @@ def main():
                                        description = cmd.description,
                                        formatter = HelpFormatter())
 
-        (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
+        (cmd, args) = parser.parse_args(sys.argv[2:], cmd)
     global verbose
     verbose = cmd.verbose
     if cmd.needsGit:
@@ -1877,7 +1877,7 @@ def main():
                 if os.path.exists(cmd.gitdir):
                     cdup = read_pipe("git rev-parse --show-cdup").strip()
                     if len(cdup) > 0:
-                        chdir(cdup);
+                        chdir(cdup)
 
         if not isValidGitDir(cmd.gitdir):
             if isValidGitDir(cmd.gitdir + "/.git"):
-- 
1.6.1.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-03-07 12:33 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-06 15:53 [PATCH] git-p4: remove unnecessary semicolons at end of lines Sam Hocevar
2009-03-06 16:55 ` Brandon Casey
2009-03-06 17:11   ` msysgit corrupting commit messages? Sam Hocevar
2009-03-07  2:48     ` Johannes Schindelin
2009-03-07 12:26 ` [PATCH v2] git-p4: remove unnecessary semicolons at end of lines Sam Hocevar
  -- strict thread matches above, loose matches on Subject: below --
2009-03-04 21:54 [PATCH] git-p4: improve performance with large files Sam Hocevar
2009-03-04 23:05 ` thestar
2009-03-05 17:23   ` Sam Hocevar
2009-03-06  0:01     ` thestar
2009-03-06  1:14     ` Junio C Hamano
2009-03-06  1:25       ` Han-Wen Nienhuys
2009-03-06  8:53         ` Sam Hocevar
2009-03-06  9:42           ` Junio C Hamano
2009-03-06 10:13             ` [PATCH v4] " Sam Hocevar
2009-03-07 12:25               ` [PATCH v5] git-p4: improve performance when importing huge files by reducing the number of string concatenations while constraining memory usage Sam Hocevar

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).