* [RFC PATCH] git-p4: add option to store files in Git LFS on import @ 2015-08-28 12:24 larsxschneider 2015-08-28 12:24 ` larsxschneider 2015-08-30 8:49 ` Luke Diamand 0 siblings, 2 replies; 9+ messages in thread From: larsxschneider @ 2015-08-28 12:24 UTC (permalink / raw) To: git, luke, technoweenie; +Cc: Lars Schneider From: Lars Schneider <larsxschneider@gmail.com> I am migrating huge Perforce repositories including history to Git. Some of them contain large files that would blow up the resulting Git repositories. This patch adds an option to store these files in Git LFS [1] on git-p4 clone. In order to run the unit tests you need to install the Git LFS extension [2]. Known limitations: The option "use-lfs-if-size-exceeds" looks at the uncompressed file size. Sometimes huge XML files are tiny if compressed. I wonder if there is an easy way to learn about the size of a file in a git pack file. I assume compressing it is the only way to know. Feedback is highly appreciated. Thank you, Lars [1] https://git-lfs.github.com/ [2] https://github.com/github/git-lfs/releases/ Lars Schneider (1): git-p4: add option to store files in Git LFS on import Documentation/git-p4.txt | 12 ++ git-p4.py | 94 ++++++++++++++-- t/t9822-git-p4-lfs.sh | 277 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 374 insertions(+), 9 deletions(-) create mode 100755 t/t9822-git-p4-lfs.sh -- 1.9.5 (Apple Git-50.3) ^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH] git-p4: add option to store files in Git LFS on import 2015-08-28 12:24 [RFC PATCH] git-p4: add option to store files in Git LFS on import larsxschneider @ 2015-08-28 12:24 ` larsxschneider 2015-08-30 9:08 ` Luke Diamand 2015-08-30 8:49 ` Luke Diamand 1 sibling, 1 reply; 9+ messages in thread From: larsxschneider @ 2015-08-28 12:24 UTC (permalink / raw) To: git, luke, technoweenie; +Cc: Lars Schneider From: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> --- Documentation/git-p4.txt | 12 ++ git-p4.py | 94 ++++++++++++++-- t/t9822-git-p4-lfs.sh | 277 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 374 insertions(+), 9 deletions(-) create mode 100755 t/t9822-git-p4-lfs.sh diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt index 82aa5d6..a188840 100644 --- a/Documentation/git-p4.txt +++ b/Documentation/git-p4.txt @@ -252,6 +252,18 @@ Git repository: Use a client spec to find the list of interesting files in p4. See the "CLIENT SPEC" section below. +--use-lfs-if-size-exceeds <n>:: + Store files that have an uncompressed size exceeding 'n' bytes in + Git LFS. Download and install the Git LFS command line extension to + use that option. + More info here: https://git-lfs.github.com/ + +--use-lfs-for-extension <extension>:: + Store files with 'extension' in Git LFS. Do not prefix the extensions + with a '.'. You can use this option multiple times. Download and + install the Git LFS command line extension to use that option. + More info here: https://git-lfs.github.com/ + -/ <path>:: Exclude selected depot paths when cloning or syncing. diff --git a/git-p4.py b/git-p4.py index 073f87b..e031021 100755 --- a/git-p4.py +++ b/git-p4.py @@ -22,6 +22,7 @@ import platform import re import shutil import stat +import errno try: from subprocess import CalledProcessError @@ -104,6 +105,16 @@ def chdir(path, is_client_path=False): path = os.getcwd() os.environ['PWD'] = path +def mkdir_p(path): + # Copied from http://stackoverflow.com/questions/600268/mkdir-p-functionality-in-python + try: + os.makedirs(path) + except OSError as exc: # Python >2.5 + if exc.errno == errno.EEXIST and os.path.isdir(path): + pass + else: + raise + def die(msg): if verbose: raise Exception(msg) @@ -1994,6 +2005,11 @@ class P4Sync(Command, P4UserMap): optparse.make_option("-/", dest="cloneExclude", action="append", type="string", help="exclude depot path"), + optparse.make_option("--use-lfs-if-size-exceeds", dest="lfsMinimumFileSize", type="int", + help="Use LFS to store files bigger than the given threshold in bytes."), + optparse.make_option("--use-lfs-for-extension", dest="lfsFileExtensions", + action="append", type="string", + help="Use LFS to store files with the given file extension(s)."), ] self.description = """Imports from Perforce into a git repository.\n example: @@ -2025,6 +2041,9 @@ class P4Sync(Command, P4UserMap): self.clientSpecDirs = None self.tempBranches = [] self.tempBranchLocation = "git-p4-tmp" + self.lfsFiles = [] + self.lfsMinimumFileSize = None + self.lfsFileExtensions = [] if gitConfig("git-p4.syncFromOrigin") == "false": self.syncWithOrigin = False @@ -2145,6 +2164,63 @@ class P4Sync(Command, P4UserMap): return branches + def writeToGitStream(self, gitMode, relPath, contents): + self.gitStream.write('M %s inline %s\n' % (gitMode, relPath)) + self.gitStream.write('data %d\n' % sum(len(d) for d in contents)) + for d in contents: + self.gitStream.write(d) + self.gitStream.write('\n') + + def writeGitAttributesToStream(self): + gitAttributes = [f + ' filter=lfs -text\n' for f in self.lfsFiles if not self.hasFileLFSExtension(f)] + self.writeToGitStream( + '100644', + '.gitattributes', + ['*.' + f + ' filter=lfs -text\n' for f in self.lfsFileExtensions] + + [f + ' filter=lfs -text\n' for f in self.lfsFiles if not self.hasFileLFSExtension(f)] + ) + + def hasFileLFSExtension(self, relPath): + return reduce( + lambda a, b: a or b, + [relPath.endswith('.' + e) for e in self.lfsFileExtensions], + False + ) + + def isFileLargerThanLFSTreshold(self, relPath, contents): + return self.lfsMinimumFileSize and sum(len(d) for d in contents) >= self.lfsMinimumFileSize + + def generateLFSPointerFile(self, relPath, contents): + # Write P4 content to temp file + p4ContentTempFile = tempfile.NamedTemporaryFile(prefix='git-lfs', delete=False) + for d in contents: + p4ContentTempFile.write(d) + p4ContentTempFile.flush() + + # Generate LFS pointer file based on P4 content + lfsProcess = subprocess.Popen( + ['git', 'lfs', 'pointer', '--file=' + p4ContentTempFile.name], + stdout=subprocess.PIPE + ) + lfsPointerFile = lfsProcess.stdout.read() + if lfsProcess.wait(): + die('git-lfs command failed. Did you install the extension?') + contents = [i+'\n' for i in lfsPointerFile.split('\n')[2:][:-1]] + + # Write P4 content to LFS + oid = contents[1].split(' ')[1].split(':')[1][:-1] + oidPath = os.path.join(self.cloneDestination, '.git', 'lfs', 'objects', oid[:2], oid[2:4]) + mkdir_p(oidPath) + shutil.move(p4ContentTempFile.name, os.path.join(oidPath, oid)) + + # Update Git attributes + self.lfsFiles.append(relPath) + self.writeGitAttributesToStream() + + # LFS Spec states that pointer files should not have the executable bit set. + gitMode = '100644' + return (gitMode, contents) + # output one file from the P4 stream # - helper for streamP4Files @@ -2213,17 +2289,13 @@ class P4Sync(Command, P4UserMap): text = regexp.sub(r'$\1$', text) contents = [ text ] - self.gitStream.write("M %s inline %s\n" % (git_mode, relPath)) + if relPath == '.gitattributes': + die('.gitattributes already exists in P4.') - # total length... - length = 0 - for d in contents: - length = length + len(d) + if self.isFileLargerThanLFSTreshold(relPath, contents) or self.hasFileLFSExtension(relPath): + (git_mode, contents) = self.generateLFSPointerFile(relPath, contents) - self.gitStream.write("data %d\n" % length) - for d in contents: - self.gitStream.write(d) - self.gitStream.write("\n") + self.writeToGitStream(git_mode, relPath, contents) def streamOneP4Deletion(self, file): relPath = self.stripRepoPath(file['path'], self.branchPrefixes) @@ -2231,6 +2303,10 @@ class P4Sync(Command, P4UserMap): sys.stderr.write("delete %s\n" % relPath) self.gitStream.write("D %s\n" % relPath) + if relPath in self.lfsFiles: + self.lfsFiles.remove(relPath) + self.writeGitAttributesToStream() + # handle another chunk of streaming data def streamP4FilesCb(self, marshalled): diff --git a/t/t9822-git-p4-lfs.sh b/t/t9822-git-p4-lfs.sh new file mode 100755 index 0000000..b27bf29 --- /dev/null +++ b/t/t9822-git-p4-lfs.sh @@ -0,0 +1,277 @@ +#!/bin/sh + +test_description='Clone repositories and store files in LFS' + +( git lfs help ) >/dev/null 2>&1 || { + skip_all='skipping git p4 LFS tests; no git lfs' + test_done +} + +. ./lib-git-p4.sh + +test_expect_success 'start p4d' ' + start_p4d +' + +test_expect_success 'Create repo with binary files' ' + client_view "//depot/... //client/..." && + ( + cd "$cli" && + echo "text" >file.txt && + echo "bin 13 bytes" >file.dat && + p4 add file.txt && + p4 add file.dat && + p4 submit -d "Add text and binary file" && + echo "bin 13 bytes" >file2.bin && + p4 add file2.bin && + p4 submit -d "Add another binary file with same content" + echo "bin 14 bytess" >file3.bin && + p4 add file3.bin && + p4 submit -d "Add another binary file with different content" + ) +' + +test_expect_success 'Store files in LFS based on size (10 bytes)' ' + client_view "//depot/... //client/..." && + git p4 clone --use-client-spec --use-lfs-if-size-exceeds=10 --destination="$git" //depot@all && + test_when_finished cleanup_git && + ( + cd "$git" && + + cat >expect <<-\EOF && + .git/lfs/objects/d4/43/d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 + .git/lfs/objects/e5/fe/e5fec48503cd7b85eb9ffaea3311cde2fe9542078b9640369032b26bb5403fff + EOF + find ".git/lfs/objects" -type f >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + version https://git-lfs.github.com/spec/v1 + oid sha256:d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 + size 13 + EOF + cat file.dat >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + version https://git-lfs.github.com/spec/v1 + oid sha256:d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 + size 13 + EOF + cat file2.bin >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + version https://git-lfs.github.com/spec/v1 + oid sha256:e5fec48503cd7b85eb9ffaea3311cde2fe9542078b9640369032b26bb5403fff + size 14 + EOF + cat file3.bin >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + file.dat filter=lfs -text + file2.bin filter=lfs -text + file3.bin filter=lfs -text + EOF + cat .gitattributes >actual && + test_cmp expect actual + ) +' + +test_expect_success 'Store files in LFS based on size (14 bytes)' ' + client_view "//depot/... //client/..." && + git p4 clone --use-client-spec --use-lfs-if-size-exceeds=14 --destination="$git" //depot@all && + test_when_finished cleanup_git && + ( + cd "$git" && + + cat >expect <<-\EOF && + .git/lfs/objects/e5/fe/e5fec48503cd7b85eb9ffaea3311cde2fe9542078b9640369032b26bb5403fff + EOF + find ".git/lfs/objects" -type f >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + bin 13 bytes + EOF + cat file.dat >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + bin 13 bytes + EOF + cat file2.bin >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + version https://git-lfs.github.com/spec/v1 + oid sha256:e5fec48503cd7b85eb9ffaea3311cde2fe9542078b9640369032b26bb5403fff + size 14 + EOF + cat file3.bin >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + file3.bin filter=lfs -text + EOF + cat .gitattributes >actual && + test_cmp expect actual + ) +' + +test_expect_success 'Store files in LFS based on extension (dat)' ' + client_view "//depot/... //client/..." && + git p4 clone --use-client-spec --use-lfs-for-extension=dat --destination="$git" //depot@all && + test_when_finished cleanup_git && + ( + cd "$git" && + + cat >expect <<-\EOF && + .git/lfs/objects/d4/43/d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 + EOF + find ".git/lfs/objects" -type f >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + version https://git-lfs.github.com/spec/v1 + oid sha256:d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 + size 13 + EOF + cat file.dat >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + bin 13 bytes + EOF + cat file2.bin >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + bin 14 bytess + EOF + cat file3.bin >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + *.dat filter=lfs -text + EOF + cat .gitattributes >actual && + test_cmp expect actual + ) +' + +test_expect_success 'Store files in LFS based on size (14 bytes) and extension (dat)' ' + client_view "//depot/... //client/..." && + git p4 clone \ + --use-client-spec \ + --use-lfs-if-size-exceeds=14 \ + --use-lfs-for-extension=dat \ + --destination="$git" //depot@all && + test_when_finished cleanup_git && + ( + cd "$git" && + + cat >expect <<-\EOF && + .git/lfs/objects/d4/43/d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 + .git/lfs/objects/e5/fe/e5fec48503cd7b85eb9ffaea3311cde2fe9542078b9640369032b26bb5403fff + EOF + find ".git/lfs/objects" -type f >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + version https://git-lfs.github.com/spec/v1 + oid sha256:d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 + size 13 + EOF + cat file.dat >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + bin 13 bytes + EOF + cat file2.bin >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + version https://git-lfs.github.com/spec/v1 + oid sha256:e5fec48503cd7b85eb9ffaea3311cde2fe9542078b9640369032b26bb5403fff + size 14 + EOF + cat file3.bin >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + *.dat filter=lfs -text + file3.bin filter=lfs -text + EOF + cat .gitattributes >actual && + test_cmp expect actual + ) +' + +test_expect_success 'Remove file from repo and store files in LFS based on size (10 bytes)' ' + client_view "//depot/... //client/..." && + ( + cd "$cli" && + p4 delete file3.bin && + p4 submit -d "Remove file" + ) && + + git p4 clone --use-client-spec --use-lfs-if-size-exceeds=10 --destination="$git" //depot@all && + test_when_finished cleanup_git && + ( + cd "$git" && + + # Note that file3 remains here as it referenced in the history + cat >expect <<-\EOF && + .git/lfs/objects/d4/43/d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 + .git/lfs/objects/e5/fe/e5fec48503cd7b85eb9ffaea3311cde2fe9542078b9640369032b26bb5403fff + EOF + find ".git/lfs/objects" -type f >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + version https://git-lfs.github.com/spec/v1 + oid sha256:d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 + size 13 + EOF + cat file.dat >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + version https://git-lfs.github.com/spec/v1 + oid sha256:d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 + size 13 + EOF + cat file2.bin >actual && + test_cmp expect actual && + + cat >expect <<-\EOF && + file.dat filter=lfs -text + file2.bin filter=lfs -text + EOF + cat .gitattributes >actual && + test_cmp expect actual + ) +' + +test_expect_success 'Clone repo with existing .gitattributes file' ' + client_view "//depot/... //client/..." && + ( + cd "$cli" && + + echo "*.txt text" >.gitattributes && + p4 add .gitattributes && + p4 submit -d "Add .gitattributes" + ) && + + test_must_fail git p4 clone --use-client-spec --destination="$git" //depot 2>errs && + grep ".gitattributes already exists in P4." errs +' + +test_expect_success 'kill p4d' ' + kill_p4d +' + +test_done -- 1.9.5 (Apple Git-50.3) ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] git-p4: add option to store files in Git LFS on import 2015-08-28 12:24 ` larsxschneider @ 2015-08-30 9:08 ` Luke Diamand 0 siblings, 0 replies; 9+ messages in thread From: Luke Diamand @ 2015-08-30 9:08 UTC (permalink / raw) To: Lars Schneider; +Cc: Git Users, technoweenie Comments below. > > diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt > index 82aa5d6..a188840 100644 > --- a/Documentation/git-p4.txt > +++ b/Documentation/git-p4.txt > @@ -252,6 +252,18 @@ Git repository: > Use a client spec to find the list of interesting files in p4. > See the "CLIENT SPEC" section below. > <snip> > > +def mkdir_p(path): > + # Copied from http://stackoverflow.com/questions/600268/mkdir-p-functionality-in-python > + try: > + os.makedirs(path) > + except OSError as exc: # Python >2.5 > + if exc.errno == errno.EEXIST and os.path.isdir(path): > + pass > + else: > + raise Sigh. We need to upgrade to Python 3.... Coujld also just do: if not path.isdir(path) os.makedirs(path) (Although there is a tiny race hazard if something else creates the same path between isdir and makedir, but the way you're using it, seems unlikely. > + > def die(msg): > if verbose: > raise Exception(msg) > @@ -1994,6 +2005,11 @@ class P4Sync(Command, P4UserMap): > optparse.make_option("-/", dest="cloneExclude", > action="append", type="string", > help="exclude depot path"), > + optparse.make_option("--use-lfs-if-size-exceeds", dest="lfsMinimumFileSize", type="int", > + help="Use LFS to store files bigger than the given threshold in bytes."), > + optparse.make_option("--use-lfs-for-extension", dest="lfsFileExtensions", > + action="append", type="string", > + help="Use LFS to store files with the given file extension(s)."), > ] > self.description = """Imports from Perforce into a git repository.\n > example: > @@ -2025,6 +2041,9 @@ class P4Sync(Command, P4UserMap): > self.clientSpecDirs = None > self.tempBranches = [] > self.tempBranchLocation = "git-p4-tmp" > + self.lfsFiles = [] > + self.lfsMinimumFileSize = None > + self.lfsFileExtensions = [] > > if gitConfig("git-p4.syncFromOrigin") == "false": > self.syncWithOrigin = False > @@ -2145,6 +2164,63 @@ class P4Sync(Command, P4UserMap): > > return branches > > + def writeToGitStream(self, gitMode, relPath, contents): > + self.gitStream.write('M %s inline %s\n' % (gitMode, relPath)) > + self.gitStream.write('data %d\n' % sum(len(d) for d in contents)) > + for d in contents: > + self.gitStream.write(d) > + self.gitStream.write('\n') > + > + def writeGitAttributesToStream(self): > + gitAttributes = [f + ' filter=lfs -text\n' for f in self.lfsFiles if not self.hasFileLFSExtension(f)] > + self.writeToGitStream( > + '100644', > + '.gitattributes', > + ['*.' + f + ' filter=lfs -text\n' for f in self.lfsFileExtensions] + > + [f + ' filter=lfs -text\n' for f in self.lfsFiles if not self.hasFileLFSExtension(f)] > + ) > + > + def hasFileLFSExtension(self, relPath): > + return reduce( > + lambda a, b: a or b, > + [relPath.endswith('.' + e) for e in self.lfsFileExtensions], > + False > + ) > + > + def isFileLargerThanLFSTreshold(self, relPath, contents): > + return self.lfsMinimumFileSize and sum(len(d) for d in contents) >= self.lfsMinimumFileSize Could have a command-line option "--try-compress-first" (or some such) which compresses the file, and it it's very compressible, leaves it alone. It would trade speed of cloning with not-using-LFS-needlessly. > + > + def generateLFSPointerFile(self, relPath, contents): > + # Write P4 content to temp file > + p4ContentTempFile = tempfile.NamedTemporaryFile(prefix='git-lfs', delete=False) > + for d in contents: > + p4ContentTempFile.write(d) > + p4ContentTempFile.flush() > + > + # Generate LFS pointer file based on P4 content > + lfsProcess = subprocess.Popen( > + ['git', 'lfs', 'pointer', '--file=' + p4ContentTempFile.name], > + stdout=subprocess.PIPE > + ) > + lfsPointerFile = lfsProcess.stdout.read() > + if lfsProcess.wait(): > + die('git-lfs command failed. Did you install the extension?') We're going to leave the P4 content file lying around undeleted here; is there any way to avoid that and cleanup nicely? > + contents = [i+'\n' for i in lfsPointerFile.split('\n')[2:][:-1]] > + > + # Write P4 content to LFS > + oid = contents[1].split(' ')[1].split(':')[1][:-1] > + oidPath = os.path.join(self.cloneDestination, '.git', 'lfs', 'objects', oid[:2], oid[2:4]) > + mkdir_p(oidPath) > + shutil.move(p4ContentTempFile.name, os.path.join(oidPath, oid)) > + > + # Update Git attributes > + self.lfsFiles.append(relPath) > + self.writeGitAttributesToStream() > + > + # LFS Spec states that pointer files should not have the executable bit set. > + gitMode = '100644' > + return (gitMode, contents) > + > # output one file from the P4 stream > # - helper for streamP4Files > > @@ -2213,17 +2289,13 @@ class P4Sync(Command, P4UserMap): > text = regexp.sub(r'$\1$', text) > contents = [ text ] > > - self.gitStream.write("M %s inline %s\n" % (git_mode, relPath)) > + if relPath == '.gitattributes': > + die('.gitattributes already exists in P4.') > > - # total length... > - length = 0 > - for d in contents: > - length = length + len(d) > + if self.isFileLargerThanLFSTreshold(relPath, contents) or self.hasFileLFSExtension(relPath): s/Treshold/Threshold/g > + (git_mode, contents) = self.generateLFSPointerFile(relPath, contents) > > - self.gitStream.write("data %d\n" % length) > - for d in contents: > - self.gitStream.write(d) > - self.gitStream.write("\n") > + self.writeToGitStream(git_mode, relPath, contents) > > def streamOneP4Deletion(self, file): > relPath = self.stripRepoPath(file['path'], self.branchPrefixes) > @@ -2231,6 +2303,10 @@ class P4Sync(Command, P4UserMap): > sys.stderr.write("delete %s\n" % relPath) > self.gitStream.write("D %s\n" % relPath) > > + if relPath in self.lfsFiles: > + self.lfsFiles.remove(relPath) > + self.writeGitAttributesToStream() > + > # handle another chunk of streaming data > def streamP4FilesCb(self, marshalled): > > diff --git a/t/t9822-git-p4-lfs.sh b/t/t9822-git-p4-lfs.sh > new file mode 100755 > index 0000000..b27bf29 > --- /dev/null > +++ b/t/t9822-git-p4-lfs.sh > @@ -0,0 +1,277 @@ > +#!/bin/sh > + > +test_description='Clone repositories and store files in LFS' > + > +( git lfs help ) >/dev/null 2>&1 || { Does this need to be in a subshell? > + skip_all='skipping git p4 LFS tests; no git lfs' > + test_done > +} > + > +. ./lib-git-p4.sh > + > +test_expect_success 'start p4d' ' > + start_p4d > +' > + > +test_expect_success 'Create repo with binary files' ' > + client_view "//depot/... //client/..." && > + ( > + cd "$cli" && > + echo "text" >file.txt && > + echo "bin 13 bytes" >file.dat && > + p4 add file.txt && > + p4 add file.dat && > + p4 submit -d "Add text and binary file" && > + echo "bin 13 bytes" >file2.bin && > + p4 add file2.bin && > + p4 submit -d "Add another binary file with same content" > + echo "bin 14 bytess" >file3.bin && > + p4 add file3.bin && > + p4 submit -d "Add another binary file with different content" > + ) > +' > + > +test_expect_success 'Store files in LFS based on size (10 bytes)' ' > + client_view "//depot/... //client/..." && > + git p4 clone --use-client-spec --use-lfs-if-size-exceeds=10 --destination="$git" //depot@all && > + test_when_finished cleanup_git && > + ( > + cd "$git" && > + > + cat >expect <<-\EOF && > + .git/lfs/objects/d4/43/d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 > + .git/lfs/objects/e5/fe/e5fec48503cd7b85eb9ffaea3311cde2fe9542078b9640369032b26bb5403fff This feels like it could be very fragile. Every time a new file gets added to the tests we'll end up having to mess around with SHA1 digests. Surely all we care about is that LFS can recreate the files we gave it originally? Plus it makes it quite hard to understand what's going on! > + EOF > + find ".git/lfs/objects" -type f >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + version https://git-lfs.github.com/spec/v1 > + oid sha256:d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 > + size 13 > + EOF > + cat file.dat >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + version https://git-lfs.github.com/spec/v1 > + oid sha256:d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 > + size 13 > + EOF > + cat file2.bin >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + version https://git-lfs.github.com/spec/v1 > + oid sha256:e5fec48503cd7b85eb9ffaea3311cde2fe9542078b9640369032b26bb5403fff > + size 14 > + EOF > + cat file3.bin >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + file.dat filter=lfs -text > + file2.bin filter=lfs -text > + file3.bin filter=lfs -text > + EOF > + cat .gitattributes >actual && > + test_cmp expect actual > + ) > +' > + > +test_expect_success 'Store files in LFS based on size (14 bytes)' ' > + client_view "//depot/... //client/..." && > + git p4 clone --use-client-spec --use-lfs-if-size-exceeds=14 --destination="$git" //depot@all && > + test_when_finished cleanup_git && > + ( > + cd "$git" && > + > + cat >expect <<-\EOF && > + .git/lfs/objects/e5/fe/e5fec48503cd7b85eb9ffaea3311cde2fe9542078b9640369032b26bb5403fff > + EOF > + find ".git/lfs/objects" -type f >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + bin 13 bytes > + EOF > + cat file.dat >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + bin 13 bytes > + EOF > + cat file2.bin >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + version https://git-lfs.github.com/spec/v1 > + oid sha256:e5fec48503cd7b85eb9ffaea3311cde2fe9542078b9640369032b26bb5403fff > + size 14 > + EOF > + cat file3.bin >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + file3.bin filter=lfs -text > + EOF > + cat .gitattributes >actual && > + test_cmp expect actual > + ) > +' > + > +test_expect_success 'Store files in LFS based on extension (dat)' ' > + client_view "//depot/... //client/..." && > + git p4 clone --use-client-spec --use-lfs-for-extension=dat --destination="$git" //depot@all && > + test_when_finished cleanup_git && > + ( > + cd "$git" && > + > + cat >expect <<-\EOF && > + .git/lfs/objects/d4/43/d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 > + EOF > + find ".git/lfs/objects" -type f >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + version https://git-lfs.github.com/spec/v1 > + oid sha256:d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 > + size 13 > + EOF > + cat file.dat >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + bin 13 bytes > + EOF > + cat file2.bin >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + bin 14 bytess > + EOF > + cat file3.bin >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + *.dat filter=lfs -text > + EOF > + cat .gitattributes >actual && > + test_cmp expect actual > + ) > +' > + > +test_expect_success 'Store files in LFS based on size (14 bytes) and extension (dat)' ' > + client_view "//depot/... //client/..." && > + git p4 clone \ > + --use-client-spec \ > + --use-lfs-if-size-exceeds=14 \ > + --use-lfs-for-extension=dat \ > + --destination="$git" //depot@all && > + test_when_finished cleanup_git && > + ( > + cd "$git" && > + > + cat >expect <<-\EOF && > + .git/lfs/objects/d4/43/d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 > + .git/lfs/objects/e5/fe/e5fec48503cd7b85eb9ffaea3311cde2fe9542078b9640369032b26bb5403fff > + EOF > + find ".git/lfs/objects" -type f >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + version https://git-lfs.github.com/spec/v1 > + oid sha256:d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 > + size 13 > + EOF > + cat file.dat >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + bin 13 bytes > + EOF > + cat file2.bin >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + version https://git-lfs.github.com/spec/v1 > + oid sha256:e5fec48503cd7b85eb9ffaea3311cde2fe9542078b9640369032b26bb5403fff > + size 14 > + EOF > + cat file3.bin >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + *.dat filter=lfs -text > + file3.bin filter=lfs -text > + EOF > + cat .gitattributes >actual && > + test_cmp expect actual > + ) > +' > + > +test_expect_success 'Remove file from repo and store files in LFS based on size (10 bytes)' ' > + client_view "//depot/... //client/..." && > + ( > + cd "$cli" && > + p4 delete file3.bin && > + p4 submit -d "Remove file" > + ) && > + > + git p4 clone --use-client-spec --use-lfs-if-size-exceeds=10 --destination="$git" //depot@all && > + test_when_finished cleanup_git && > + ( > + cd "$git" && > + > + # Note that file3 remains here as it referenced in the history > + cat >expect <<-\EOF && > + .git/lfs/objects/d4/43/d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 > + .git/lfs/objects/e5/fe/e5fec48503cd7b85eb9ffaea3311cde2fe9542078b9640369032b26bb5403fff > + EOF > + find ".git/lfs/objects" -type f >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + version https://git-lfs.github.com/spec/v1 > + oid sha256:d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 > + size 13 > + EOF > + cat file.dat >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + version https://git-lfs.github.com/spec/v1 > + oid sha256:d443795c1aa3ff7e62afd89d6a86bb84ceba0305f6c22151aa8ee95077a39101 > + size 13 > + EOF > + cat file2.bin >actual && > + test_cmp expect actual && > + > + cat >expect <<-\EOF && > + file.dat filter=lfs -text > + file2.bin filter=lfs -text > + EOF > + cat .gitattributes >actual && > + test_cmp expect actual > + ) > +' > + > +test_expect_success 'Clone repo with existing .gitattributes file' ' > + client_view "//depot/... //client/..." && > + ( > + cd "$cli" && > + > + echo "*.txt text" >.gitattributes && > + p4 add .gitattributes && > + p4 submit -d "Add .gitattributes" > + ) && > + > + test_must_fail git p4 clone --use-client-spec --destination="$git" //depot 2>errs && > + grep ".gitattributes already exists in P4." errs > +' > + > +test_expect_success 'kill p4d' ' > + kill_p4d > +' > + > +test_done > -- > 1.9.5 (Apple Git-50.3) > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] git-p4: add option to store files in Git LFS on import 2015-08-28 12:24 [RFC PATCH] git-p4: add option to store files in Git LFS on import larsxschneider 2015-08-28 12:24 ` larsxschneider @ 2015-08-30 8:49 ` Luke Diamand 2015-08-30 10:18 ` Lars Schneider 1 sibling, 1 reply; 9+ messages in thread From: Luke Diamand @ 2015-08-30 8:49 UTC (permalink / raw) To: Lars Schneider; +Cc: Git Users, technoweenie On 28 August 2015 at 13:24, <larsxschneider@gmail.com> wrote: > From: Lars Schneider <larsxschneider@gmail.com> > > I am migrating huge Perforce repositories including history to Git. Some of them contain large files that would blow up the resulting Git repositories. This patch adds an option to store these files in Git LFS [1] on git-p4 clone. I'm a bit worried by this. LFS isn't the only way to handle large files in git - there's also git annex (which I've used in a similar situation) and obviously random homebrew solutions. We're going to end up with git-p4 sprouting ever increasing numbers of --use-XXX-if-size-exceeds options. On the other hand, having it integrated into git-p4 is quite nice as it saves a lot of messing around. Would it be possible as a start to have (within git-p4) a generic spot-big-files-and-handle-them-differently patch, and a second patch to add specific LFS support? That then means that other schemes would be a lot easier to add in future. Some other comments inline. > > In order to run the unit tests you need to install the Git LFS extension [2]. > > Known limitations: > The option "use-lfs-if-size-exceeds" looks at the uncompressed file size. Sometimes huge XML files are tiny if compressed. I wonder if there is an easy way to learn about the size of a file in a git pack file. I assume compressing it is the only way to know. > > Feedback is highly appreciated. > > Thank you, > Lars > > > [1] https://git-lfs.github.com/ > [2] https://github.com/github/git-lfs/releases/ > > Lars Schneider (1): > git-p4: add option to store files in Git LFS on import > > Documentation/git-p4.txt | 12 ++ > git-p4.py | 94 ++++++++++++++-- > t/t9822-git-p4-lfs.sh | 277 +++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 374 insertions(+), 9 deletions(-) > create mode 100755 t/t9822-git-p4-lfs.sh > > -- > 1.9.5 (Apple Git-50.3) Can you switch to a newer git - this one's quite old now so if there are regressions introduced later, you won't know about them! > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] git-p4: add option to store files in Git LFS on import 2015-08-30 8:49 ` Luke Diamand @ 2015-08-30 10:18 ` Lars Schneider 2015-08-30 16:36 ` Luke Diamand 0 siblings, 1 reply; 9+ messages in thread From: Lars Schneider @ 2015-08-30 10:18 UTC (permalink / raw) To: Luke Diamand; +Cc: Git Users, technoweenie Thanks for your feedback! I like the “handle big files” plugin kind of idea. However, I wonder if it makes sense to put more and more stuff into git-p4.py (>3000 LOC already). What do you think about splitting git-p4 into multiple files? Regarding Python 3: Would you drop Python 2 support or do you want to support Python 2/3 in parallel? I would prefer the former… - Lars On 30 Aug 2015, at 10:49, Luke Diamand <luke@diamand.org> wrote: > On 28 August 2015 at 13:24, <larsxschneider@gmail.com> wrote: >> From: Lars Schneider <larsxschneider@gmail.com> >> >> I am migrating huge Perforce repositories including history to Git. Some of them contain large files that would blow up the resulting Git repositories. This patch adds an option to store these files in Git LFS [1] on git-p4 clone. > > I'm a bit worried by this. LFS isn't the only way to handle large > files in git - there's also git annex (which I've used in a similar > situation) and obviously random homebrew solutions. We're going to end > up with git-p4 sprouting ever increasing numbers of > --use-XXX-if-size-exceeds options. On the other hand, having it > integrated into git-p4 is quite nice as it saves a lot of messing > around. > > Would it be possible as a start to have (within git-p4) a generic > spot-big-files-and-handle-them-differently patch, and a second patch > to add specific LFS support? That then means that other schemes would > be a lot easier to add in future. > > Some other comments inline. > >> >> In order to run the unit tests you need to install the Git LFS extension [2]. >> >> Known limitations: >> The option "use-lfs-if-size-exceeds" looks at the uncompressed file size. Sometimes huge XML files are tiny if compressed. I wonder if there is an easy way to learn about the size of a file in a git pack file. I assume compressing it is the only way to know. >> >> Feedback is highly appreciated. >> >> Thank you, >> Lars >> >> >> [1] https://git-lfs.github.com/ >> [2] https://github.com/github/git-lfs/releases/ >> >> Lars Schneider (1): >> git-p4: add option to store files in Git LFS on import >> >> Documentation/git-p4.txt | 12 ++ >> git-p4.py | 94 ++++++++++++++-- >> t/t9822-git-p4-lfs.sh | 277 +++++++++++++++++++++++++++++++++++++++++++++++ >> 3 files changed, 374 insertions(+), 9 deletions(-) >> create mode 100755 t/t9822-git-p4-lfs.sh >> >> -- >> 1.9.5 (Apple Git-50.3) > > Can you switch to a newer git - this one's quite old now so if there > are regressions introduced later, you won't know about them! ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] git-p4: add option to store files in Git LFS on import 2015-08-30 10:18 ` Lars Schneider @ 2015-08-30 16:36 ` Luke Diamand 2015-09-03 9:40 ` Lars Schneider 0 siblings, 1 reply; 9+ messages in thread From: Luke Diamand @ 2015-08-30 16:36 UTC (permalink / raw) To: Lars Schneider; +Cc: Git Users, Rick Olson On 30 August 2015 at 11:18, Lars Schneider <larsxschneider@gmail.com> wrote: > Thanks for your feedback! > > I like the “handle big files” plugin kind of idea. However, I wonder if it makes sense to put more and more stuff into git-p4.py (>3000 LOC already). What do you think about splitting git-p4 into multiple files? I was wondering about that. I think for now, the simplicity of keeping everything in one file is worth the slight extra pain. I don't imagine that the big-file-handler code would be very large. > > Regarding Python 3: > Would you drop Python 2 support or do you want to support Python 2/3 in parallel? I would prefer the former… For quite some time we would need to support both; we can't just have a release of git that one day breaks git-p4 for people stuck on Python 2. But it might not be that hard to support both (though converting all those print statements could be quite tiresome). ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] git-p4: add option to store files in Git LFS on import 2015-08-30 16:36 ` Luke Diamand @ 2015-09-03 9:40 ` Lars Schneider 2015-09-03 10:07 ` Luke Diamand 2015-09-03 10:12 ` John Keeping 0 siblings, 2 replies; 9+ messages in thread From: Lars Schneider @ 2015-09-03 9:40 UTC (permalink / raw) To: Luke Diamand; +Cc: Git Users On 30 Aug 2015, at 18:36, Luke Diamand <luke@diamand.org> wrote: > On 30 August 2015 at 11:18, Lars Schneider <larsxschneider@gmail.com> wrote: >> Thanks for your feedback! >> >> I like the “handle big files” plugin kind of idea. However, I wonder if it makes sense to put more and more stuff into git-p4.py (>3000 LOC already). What do you think about splitting git-p4 into multiple files? > > I was wondering about that. I think for now, the simplicity of keeping > everything in one file is worth the slight extra pain. I don't imagine > that the big-file-handler code would be very large. OK. > >> >> Regarding Python 3: >> Would you drop Python 2 support or do you want to support Python 2/3 in parallel? I would prefer the former… > > For quite some time we would need to support both; we can't just have > a release of git that one day breaks git-p4 for people stuck on Python > 2. But it might not be that hard to support both (though converting > all those print statements could be quite tiresome). Agreed. However supporting both versions increases code complexity as well as testing effort. Would a compromise like the following work? We fork “git-p4.py” to “git-p4-python2.py” and just apply important bug fixes to that file. All new development happens on a Python 3 only git-p4.py. Cheers, Lars ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] git-p4: add option to store files in Git LFS on import 2015-09-03 9:40 ` Lars Schneider @ 2015-09-03 10:07 ` Luke Diamand 2015-09-03 10:12 ` John Keeping 1 sibling, 0 replies; 9+ messages in thread From: Luke Diamand @ 2015-09-03 10:07 UTC (permalink / raw) To: Lars Schneider; +Cc: Git Users >>> >>> Regarding Python 3: >>> Would you drop Python 2 support or do you want to support Python 2/3 in parallel? I would prefer the former… >> >> For quite some time we would need to support both; we can't just have >> a release of git that one day breaks git-p4 for people stuck on Python >> 2. But it might not be that hard to support both (though converting >> all those print statements could be quite tiresome). > Agreed. However supporting both versions increases code complexity as well as testing effort. Would a compromise like the following work? We fork “git-p4.py” to “git-p4-python2.py” and just apply important bug fixes to that file. All new development happens on a Python 3 only git-p4.py. I'm not a python expert, but I think we're quite a way from that point anyway. I think we'd want to run 2to3 on it and make it work - at that point it should work on both python 2.7 (and earlier? I don't know) and python 3.x. By the time that's done, we may well find that we _can_ just drop python2 support, or fork, as you suggest. Running 2to3 also includes adding test cases for all the code that is in there that's not currently covered so that end-users don't find out the hard way that we've missed bits. That's why I think it's a fairly long-term goal. Regardless, I think we'd want to have a wider discussion about the best way forward, and there doesn't seem much point having that discussion now when there's no actual code! ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] git-p4: add option to store files in Git LFS on import 2015-09-03 9:40 ` Lars Schneider 2015-09-03 10:07 ` Luke Diamand @ 2015-09-03 10:12 ` John Keeping 1 sibling, 0 replies; 9+ messages in thread From: John Keeping @ 2015-09-03 10:12 UTC (permalink / raw) To: Lars Schneider; +Cc: Luke Diamand, Git Users On Thu, Sep 03, 2015 at 11:40:20AM +0200, Lars Schneider wrote: > > On 30 Aug 2015, at 18:36, Luke Diamand <luke@diamand.org> wrote: > > > On 30 August 2015 at 11:18, Lars Schneider <larsxschneider@gmail.com> wrote: > >> Thanks for your feedback! > >> > >> I like the “handle big files” plugin kind of idea. However, I > >> wonder if it makes sense to put more and more stuff into git-p4.py > >> (>3000 LOC already). What do you think about splitting git-p4 into > >> multiple files? > > > > I was wondering about that. I think for now, the simplicity of keeping > > everything in one file is worth the slight extra pain. I don't imagine > > that the big-file-handler code would be very large. > OK. > > > > >> > >> Regarding Python 3: > >> Would you drop Python 2 support or do you want to support Python > >> 2/3 in parallel? I would prefer the former… > > > > For quite some time we would need to support both; we can't just have > > a release of git that one day breaks git-p4 for people stuck on Python > > 2. But it might not be that hard to support both (though converting > > all those print statements could be quite tiresome). > Agreed. However supporting both versions increases code complexity as > well as testing effort. Would a compromise like the following work? We > fork “git-p4.py” to “git-p4-python2.py” and just apply important bug > fixes to that file. All new development happens on a Python 3 only > git-p4.py. Documentation/CodingGuidelines currently says: - As a minimum, we aim to be compatible with Python 2.6 and 2.7. - Where required libraries do not restrict us to Python 2, we try to also be compatible with Python 3.1 and later. That was added in commit 9ef43dd (CodingGuidelines: add Python coding guidelines, 2013-01-30), which gives the following rationale in the commit message: - Advocating Python 3 support in all scripts is currently unrealistic because: - 'p4 -G' provides output in a format that is very hard to use with Python 3 (and its documentation claims Python 3 is unsupported). Has that changed? I also found a message describing why the output is hard to use with Python 3: http://permalink.gmane.org/gmane.comp.version-control.git/213316 If that problem can be solved, I don't think it would be difficult to support 2.6+ and 3.x with a single file. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-09-03 10:13 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-08-28 12:24 [RFC PATCH] git-p4: add option to store files in Git LFS on import larsxschneider 2015-08-28 12:24 ` larsxschneider 2015-08-30 9:08 ` Luke Diamand 2015-08-30 8:49 ` Luke Diamand 2015-08-30 10:18 ` Lars Schneider 2015-08-30 16:36 ` Luke Diamand 2015-09-03 9:40 ` Lars Schneider 2015-09-03 10:07 ` Luke Diamand 2015-09-03 10:12 ` John Keeping
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).