git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2 0/2] git-p4: fix Git LFS pointer parsing
@ 2016-04-20  8:10 larsxschneider
  2016-04-20  8:10 ` [PATCH v2 1/2] travis-ci: update Git-LFS and P4 to the latest version larsxschneider
  2016-04-20  8:10 ` [PATCH v2 2/2] git-p4: fix Git LFS pointer parsing larsxschneider
  0 siblings, 2 replies; 9+ messages in thread
From: larsxschneider @ 2016-04-20  8:10 UTC (permalink / raw)
  To: git; +Cc: luke, sschuberth, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

v1: http://thread.gmane.org/gmane.comp.version-control.git/291917/

diff to v1:
* make git-p4 LFS compatible with Git LFS versions prior to 1.2.0
  Thanks Junio for the little push ... I guess I was a bit lazy here :)
* improve the git-p4 LFS tests to detect invalid LFS pointer files

Thanks,
Lars

Lars Schneider (2):
  travis-ci: update Git-LFS and P4 to the latest version
  git-p4: fix Git LFS pointer parsing

 .travis.yml               |  4 ++--
 git-p4.py                 | 15 ++++++++++++---
 t/t9824-git-p4-git-lfs.sh |  4 ++++
 3 files changed, 18 insertions(+), 5 deletions(-)

--
2.5.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/2] travis-ci: update Git-LFS and P4 to the latest version
  2016-04-20  8:10 [PATCH v2 0/2] git-p4: fix Git LFS pointer parsing larsxschneider
@ 2016-04-20  8:10 ` larsxschneider
  2016-04-20  8:10 ` [PATCH v2 2/2] git-p4: fix Git LFS pointer parsing larsxschneider
  1 sibling, 0 replies; 9+ messages in thread
From: larsxschneider @ 2016-04-20  8:10 UTC (permalink / raw)
  To: git; +Cc: luke, sschuberth, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 .travis.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index 78e433b..4acf617 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -22,8 +22,8 @@ addons:
 env:
   global:
     - DEVELOPER=1
-    - P4_VERSION="15.2"
-    - GIT_LFS_VERSION="1.1.0"
+    - P4_VERSION="16.1"
+    - GIT_LFS_VERSION="1.2.0"
     - DEFAULT_TEST_TARGET=prove
     - GIT_PROVE_OPTS="--timer --jobs 3 --state=failed,slow,save"
     - GIT_TEST_OPTS="--verbose --tee"
-- 
2.5.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] git-p4: fix Git LFS pointer parsing
  2016-04-20  8:10 [PATCH v2 0/2] git-p4: fix Git LFS pointer parsing larsxschneider
  2016-04-20  8:10 ` [PATCH v2 1/2] travis-ci: update Git-LFS and P4 to the latest version larsxschneider
@ 2016-04-20  8:10 ` larsxschneider
  2016-04-20  8:59   ` Sebastian Schuberth
  2016-04-20 21:01   ` Junio C Hamano
  1 sibling, 2 replies; 9+ messages in thread
From: larsxschneider @ 2016-04-20  8:10 UTC (permalink / raw)
  To: git; +Cc: luke, sschuberth, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Git LFS 1.2.0 removed a preamble from the output of the 'git lfs pointer'
command [1] which broke the parsing of this output. Adjust the parser
to support the old and the new format.

[1] https://github.com/github/git-lfs/commit/da2935d9a739592bc775c98d8ef4df9c72ea3b43

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 git-p4.py                 | 15 ++++++++++++---
 t/t9824-git-p4-git-lfs.sh |  4 ++++
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 527d44b..ab52bc3 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1064,8 +1064,17 @@ class GitLFS(LargeFileSystem):
         if pointerProcess.wait():
             os.remove(contentFile)
             die('git-lfs pointer command failed. Did you install the extension?')
-        pointerContents = [i+'\n' for i in pointerFile.split('\n')[2:][:-1]]
-        oid = pointerContents[1].split(' ')[1].split(':')[1][:-1]
+
+        # Git LFS removed the preamble in the output of the 'pointer' command
+        # starting from version 1.2.0. Check for the preamble here to support
+        # earlier versions.
+        # c.f. https://github.com/github/git-lfs/commit/da2935d9a739592bc775c98d8ef4df9c72ea3b43
+        preamble = 'Git LFS pointer for ' + contentFile + '\n\n'
+        if pointerFile.startswith(preamble):
+            pointerFile = pointerFile[len(preamble):]
+
+        oidEntry = [i for i in pointerFile.split('\n') if i.startswith('oid')]
+        oid = oidEntry[0].split(' ')[1].split(':')[1]
         localLargeFile = os.path.join(
             os.getcwd(),
             '.git', 'lfs', 'objects', oid[:2], oid[2:4],
@@ -1073,7 +1082,7 @@ class GitLFS(LargeFileSystem):
         )
         # LFS Spec states that pointer files should not have the executable bit set.
         gitMode = '100644'
-        return (gitMode, pointerContents, localLargeFile)
+        return (gitMode, pointerFile, localLargeFile)
 
     def pushFile(self, localLargeFile):
         uploadProcess = subprocess.Popen(
diff --git a/t/t9824-git-p4-git-lfs.sh b/t/t9824-git-p4-git-lfs.sh
index 0b664a3..ca93ac8 100755
--- a/t/t9824-git-p4-git-lfs.sh
+++ b/t/t9824-git-p4-git-lfs.sh
@@ -13,6 +13,10 @@ test_file_in_lfs () {
 	FILE="$1" &&
 	SIZE="$2" &&
 	EXPECTED_CONTENT="$3" &&
+	sed -n '1,1 p' "$FILE" | grep "^version " &&
+	sed -n '2,2 p' "$FILE" | grep "^oid " &&
+	sed -n '3,3 p' "$FILE" | grep "^size " &&
+	test_line_count = 3 "$FILE" &&
 	cat "$FILE" | grep "size $SIZE" &&
 	HASH=$(cat "$FILE" | grep "oid sha256:" | sed -e "s/oid sha256://g") &&
 	LFS_FILE=".git/lfs/objects/$(echo "$HASH" | cut -c1-2)/$(echo "$HASH" | cut -c3-4)/$HASH" &&
-- 
2.5.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] git-p4: fix Git LFS pointer parsing
  2016-04-20  8:10 ` [PATCH v2 2/2] git-p4: fix Git LFS pointer parsing larsxschneider
@ 2016-04-20  8:59   ` Sebastian Schuberth
  2016-04-20 18:30     ` Junio C Hamano
  2016-04-20 21:01   ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: Sebastian Schuberth @ 2016-04-20  8:59 UTC (permalink / raw)
  To: larsxschneider; +Cc: Git Mailing List, luke

On Wed, Apr 20, 2016 at 10:10 AM,  <larsxschneider@gmail.com> wrote:

> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -1064,8 +1064,17 @@ class GitLFS(LargeFileSystem):
>          if pointerProcess.wait():
>              os.remove(contentFile)
>              die('git-lfs pointer command failed. Did you install the extension?')
> -        pointerContents = [i+'\n' for i in pointerFile.split('\n')[2:][:-1]]
> -        oid = pointerContents[1].split(' ')[1].split(':')[1][:-1]
> +
> +        # Git LFS removed the preamble in the output of the 'pointer' command

git-lfs did not remove the output. It simply goes to stderr instead of
stdout now. That said, could a fix simply be to capture both stdout
and sterr? If the output to the streams remain interleaved it should
look exactly like before.

> +        # starting from version 1.2.0. Check for the preamble here to support
> +        # earlier versions.
> +        # c.f. https://github.com/github/git-lfs/commit/da2935d9a739592bc775c98d8ef4df9c72ea3b43
> +        preamble = 'Git LFS pointer for ' + contentFile + '\n\n'
> +        if pointerFile.startswith(preamble):
> +            pointerFile = pointerFile[len(preamble):]
> +
> +        oidEntry = [i for i in pointerFile.split('\n') if i.startswith('oid')]
> +        oid = oidEntry[0].split(' ')[1].split(':')[1]

Why do we need to remove the preamble at all, if present? If all we
want is the oid, we should simply only look at the line that starts
with that keyword, which would skip any preamble. Which is what you
already do here. However, I'd probably use .splitlines() instead of
.split('\n') and .startswith('oid ') (note the trailing space) instead
of .startswith('oid') to ensure "oid" is a separate word.

But then again, I wonder why there's so much split() logic involved in
extracting the oid. Couldn't we replace all of that with a regexp like

oid = re.search(r"^oid \w+:(\w+)", pointerFile, re.MULTILINE).group(1)

-- 
Sebastian Schuberth

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] git-p4: fix Git LFS pointer parsing
  2016-04-20  8:59   ` Sebastian Schuberth
@ 2016-04-20 18:30     ` Junio C Hamano
  2016-04-20 20:03       ` Lars Schneider
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2016-04-20 18:30 UTC (permalink / raw)
  To: Sebastian Schuberth; +Cc: larsxschneider, Git Mailing List, luke

Sebastian Schuberth <sschuberth@gmail.com> writes:

> Why do we need to remove the preamble at all, if present? If all we
> want is the oid, we should simply only look at the line that starts
> with that keyword, which would skip any preamble. Which is what you
> already do here. However, I'd probably use .splitlines() instead of
> .split('\n') and .startswith('oid ') (note the trailing space) instead
> of .startswith('oid') to ensure "oid" is a separate word.
>
> But then again, I wonder why there's so much split() logic involved in
> extracting the oid. Couldn't we replace all of that with a regexp like
>
> oid = re.search(r"^oid \w+:(\w+)", pointerFile, re.MULTILINE).group(1)

Yup, all of that is a very useful suggestion.  If we know how the
piece of information we want is identified in the output,
specifically looking for it would future-proof the code better, as
it will not be affected by future change that adds unexpected cruft
to the output we are reading from.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] git-p4: fix Git LFS pointer parsing
  2016-04-20 18:30     ` Junio C Hamano
@ 2016-04-20 20:03       ` Lars Schneider
  0 siblings, 0 replies; 9+ messages in thread
From: Lars Schneider @ 2016-04-20 20:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sebastian Schuberth, Git Mailing List, luke


On 20 Apr 2016, at 20:30, Junio C Hamano <gitster@pobox.com> wrote:

> Sebastian Schuberth <sschuberth@gmail.com> writes:
> 
>> Why do we need to remove the preamble at all, if present?

Because I need the content of a valid Git LFS pointer file 
a few lines below:
https://github.com/larsxschneider/git/blob/5ee7601c49b6eaa9da5eb47db5cf12b5d8d672c9/git-p4.py#L1085

This pointer file content is then written to the Git repository 
instead of the actual file content.


>> If all we
>> want is the oid, we should simply only look at the line that starts
>> with that keyword, which would skip any preamble. Which is what you
>> already do here. However, I'd probably use .splitlines() instead of
>> .split('\n') and .startswith('oid ') (note the trailing space) instead
>> of .startswith('oid') to ensure "oid" is a separate word.
>> 
>> But then again, I wonder why there's so much split() logic involved in
>> extracting the oid. Couldn't we replace all of that with a regexp like
>> 
>> oid = re.search(r"^oid \w+:(\w+)", pointerFile, re.MULTILINE).group(1)
> 
> Yup, all of that is a very useful suggestion.  If we know how the
> piece of information we want is identified in the output,
> specifically looking for it would future-proof the code better, as
> it will not be affected by future change that adds unexpected cruft
> to the output we are reading from.

I agree that this is a better solution.

@Junio: Can you squash the patch below or do you prefer a v3?

Thanks,
Lars


diff --git a/git-p4.py b/git-p4.py
index ab52bc3..a0d529b 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1073,8 +1073,7 @@ class GitLFS(LargeFileSystem):
         if pointerFile.startswith(preamble):
             pointerFile = pointerFile[len(preamble):]

-        oidEntry = [i for i in pointerFile.split('\n') if i.startswith('oid')]
-        oid = oidEntry[0].split(' ')[1].split(':')[1]
+        oid = re.search(r'^oid \w+:(\w+)', pointerFile, re.MULTILINE).group(1)
         localLargeFile = os.path.join(
             os.getcwd(),
             '.git', 'lfs', 'objects', oid[:2], oid[2:4],

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] git-p4: fix Git LFS pointer parsing
  2016-04-20  8:10 ` [PATCH v2 2/2] git-p4: fix Git LFS pointer parsing larsxschneider
  2016-04-20  8:59   ` Sebastian Schuberth
@ 2016-04-20 21:01   ` Junio C Hamano
  2016-04-22  7:53     ` Lars Schneider
  1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2016-04-20 21:01 UTC (permalink / raw)
  To: larsxschneider; +Cc: git, luke, sschuberth

larsxschneider@gmail.com writes:

> -        pointerContents = [i+'\n' for i in pointerFile.split('\n')[2:][:-1]]
> -        oid = pointerContents[1].split(' ')[1].split(':')[1][:-1]
> +
> +        # Git LFS removed the preamble in the output of the 'pointer' command
> +        # starting from version 1.2.0. Check for the preamble here to support
> +        # earlier versions.
> +        # c.f. https://github.com/github/git-lfs/commit/da2935d9a739592bc775c98d8ef4df9c72ea3b43
> +        preamble = 'Git LFS pointer for ' + contentFile + '\n\n'
> +        if pointerFile.startswith(preamble):
> +            pointerFile = pointerFile[len(preamble):]
> +
> +        oidEntry = [i for i in pointerFile.split('\n') if i.startswith('oid')]
> +        oid = oidEntry[0].split(' ')[1].split(':')[1]
>          localLargeFile = os.path.join(
>              os.getcwd(),
>              '.git', 'lfs', 'objects', oid[:2], oid[2:4],
> @@ -1073,7 +1082,7 @@ class GitLFS(LargeFileSystem):
>          )
>          # LFS Spec states that pointer files should not have the executable bit set.
>          gitMode = '100644'
> -        return (gitMode, pointerContents, localLargeFile)
> +        return (gitMode, pointerFile, localLargeFile)

It seems to me that you used to return pointerContents which is an
array of lines (each of which are LF terminated); the updated one
returns pointerFile which is a bare string with many lines.

Is that change intentional?  Does the difference matter to the
caller of this method?  Even if it doesn't, is it a good idea to
change it as part of this commit?

>      def pushFile(self, localLargeFile):
>          uploadProcess = subprocess.Popen(
> diff --git a/t/t9824-git-p4-git-lfs.sh b/t/t9824-git-p4-git-lfs.sh
> index 0b664a3..ca93ac8 100755
> --- a/t/t9824-git-p4-git-lfs.sh
> +++ b/t/t9824-git-p4-git-lfs.sh
> @@ -13,6 +13,10 @@ test_file_in_lfs () {
>  	FILE="$1" &&
>  	SIZE="$2" &&
>  	EXPECTED_CONTENT="$3" &&
> +	sed -n '1,1 p' "$FILE" | grep "^version " &&
> +	sed -n '2,2 p' "$FILE" | grep "^oid " &&
> +	sed -n '3,3 p' "$FILE" | grep "^size " &&
> +	test_line_count = 3 "$FILE" &&
>  	cat "$FILE" | grep "size $SIZE" &&
>  	HASH=$(cat "$FILE" | grep "oid sha256:" | sed -e "s/oid sha256://g") &&
>  	LFS_FILE=".git/lfs/objects/$(echo "$HASH" | cut -c1-2)/$(echo "$HASH" | cut -c3-4)/$HASH" &&

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] git-p4: fix Git LFS pointer parsing
  2016-04-20 21:01   ` Junio C Hamano
@ 2016-04-22  7:53     ` Lars Schneider
  2016-04-22  7:55       ` Sebastian Schuberth
  0 siblings, 1 reply; 9+ messages in thread
From: Lars Schneider @ 2016-04-22  7:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Users, luke, sschuberth


> On 20 Apr 2016, at 23:01, Junio C Hamano <gitster@pobox.com> wrote:
> 
> larsxschneider@gmail.com writes:
> 
>> -        pointerContents = [i+'\n' for i in pointerFile.split('\n')[2:][:-1]]
pointerFile was split here by '\n'. The splitting removes the newline and
the i+'\n' adds it again. This back and forth makes it unnecessary hard
to read.


>> -        oid = pointerContents[1].split(' ')[1].split(':')[1][:-1]
>> +
>> +        # Git LFS removed the preamble in the output of the 'pointer' command
>> +        # starting from version 1.2.0. Check for the preamble here to support
>> +        # earlier versions.
>> +        # c.f. https://github.com/github/git-lfs/commit/da2935d9a739592bc775c98d8ef4df9c72ea3b43
>> +        preamble = 'Git LFS pointer for ' + contentFile + '\n\n'
>> +        if pointerFile.startswith(preamble):
>> +            pointerFile = pointerFile[len(preamble):]
>> +
>> +        oidEntry = [i for i in pointerFile.split('\n') if i.startswith('oid')]
>> +        oid = oidEntry[0].split(' ')[1].split(':')[1]
>>         localLargeFile = os.path.join(
>>             os.getcwd(),
>>             '.git', 'lfs', 'objects', oid[:2], oid[2:4],
>> @@ -1073,7 +1082,7 @@ class GitLFS(LargeFileSystem):
>>         )
>>         # LFS Spec states that pointer files should not have the executable bit set.
>>         gitMode = '100644'
>> -        return (gitMode, pointerContents, localLargeFile)
>> +        return (gitMode, pointerFile, localLargeFile)
> 
> It seems to me that you used to return pointerContents which is an
> array of lines (each of which are LF terminated); the updated one
> returns pointerFile which is a bare string with many lines.
The pointerContents is a string with LF separators (see comment above). 
The only difference is that the old implementation was a list of strings 
each with a LF at the end ['line1\n'],['line2\n'] and the new 
implementation is one string with LF separators 'line1\nline2\n'.

pointerContents goes through a few layers and is eventually used in 
the "writeToGitStream" method [1] like this:

        for d in contents:
            self.gitStream.write(d)

At this point it doesn't matter if it is an array of strings or one
string.

[1] https://github.com/git/git/blob/e6ac6e1f7d54584c2b03f073b5f329a37f4a9561/git-p4.py#L2401-L2402


> Is that change intentional?  Does the difference matter to the
> caller of this method?  Even if it doesn't, is it a good idea to
> change it as part of this commit?
Yes, it was intentional. I wanted to make the code simpler/more readable.
As shown above the change doesn't matter to the caller method.

I understand your concern of making this change part of the commit.
However, I don't see another way because passing pointerFile as
is in the old implementation would brake Git LFS 1.x support (because
pointerFile contains the preamble).

What would be the best way forward? A v3 with a better commit message
mentioning the array -> string change?

Thanks for the review,
Lars

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] git-p4: fix Git LFS pointer parsing
  2016-04-22  7:53     ` Lars Schneider
@ 2016-04-22  7:55       ` Sebastian Schuberth
  0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Schuberth @ 2016-04-22  7:55 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Junio C Hamano, Git Users, luke

On Fri, Apr 22, 2016 at 9:53 AM, Lars Schneider
<larsxschneider@gmail.com> wrote:

> What would be the best way forward? A v3 with a better commit message
> mentioning the array -> string change?

I'd vote for that, yes. Also v3 could then properly incorporate my regexp.

-- 
Sebastian Schuberth

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-04-22  7:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-20  8:10 [PATCH v2 0/2] git-p4: fix Git LFS pointer parsing larsxschneider
2016-04-20  8:10 ` [PATCH v2 1/2] travis-ci: update Git-LFS and P4 to the latest version larsxschneider
2016-04-20  8:10 ` [PATCH v2 2/2] git-p4: fix Git LFS pointer parsing larsxschneider
2016-04-20  8:59   ` Sebastian Schuberth
2016-04-20 18:30     ` Junio C Hamano
2016-04-20 20:03       ` Lars Schneider
2016-04-20 21:01   ` Junio C Hamano
2016-04-22  7:53     ` Lars Schneider
2016-04-22  7:55       ` Sebastian Schuberth

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).