git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] archive: Store checksum correctly
@ 2019-07-23  2:57 Matt Turner
  2019-07-23 16:49 ` Junio C Hamano
  2019-07-24  1:04 ` Jonathan Nieder
  0 siblings, 2 replies; 8+ messages in thread
From: Matt Turner @ 2019-07-23  2:57 UTC (permalink / raw)
  To: git; +Cc: David Oberhollenzer, Matt Turner

tar2sqfs (part of https://github.com/topics/tar2sqfs) rejects tarballs
made with git archive with the message

    invalid tar header checksum!

tar2sqfs recomputes the tarball's checksum to verify it. Its checksum
implementation agrees with GNU tar, which contains a comment that states

    Fill in the checksum field.  It's formatted differently from the
    other fields: it has [6] digits, a null, then a space ...

Correcting this allows tar2sqfs to correctly process the tarballs made
by git archive.

Signed-off-by: Matt Turner <mattst88@gmail.com>
---
 archive-tar.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/archive-tar.c b/archive-tar.c
index 3e53aac1e6..f9a157bfd1 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -215,7 +215,9 @@ static void prepare_header(struct archiver_args *args,
 	memcpy(header->magic, "ustar", 6);
 	memcpy(header->version, "00", 2);
 
-	xsnprintf(header->chksum, sizeof(header->chksum), "%07o", ustar_header_chksum(header));
+	xsnprintf(header->chksum, sizeof(header->chksum), "%06o", ustar_header_chksum(header));
+	header->chksum[6] = '\0';
+	header->chksum[7] = ' ';
 }
 
 static void write_extended_header(struct archiver_args *args,
-- 
2.22.0.dirty


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] archive: Store checksum correctly
  2019-07-23  2:57 [PATCH] archive: Store checksum correctly Matt Turner
@ 2019-07-23 16:49 ` Junio C Hamano
  2019-07-23 19:31   ` Jeff King
  2019-07-23 19:38   ` René Scharfe
  2019-07-24  1:04 ` Jonathan Nieder
  1 sibling, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2019-07-23 16:49 UTC (permalink / raw)
  To: Matt Turner, Rene Scharfe; +Cc: git, David Oberhollenzer

Matt Turner <mattst88@gmail.com> writes:

> tar2sqfs (part of https://github.com/topics/tar2sqfs) rejects tarballs
> made with git archive with the message
>
>     invalid tar header checksum!
>
> tar2sqfs recomputes the tarball's checksum to verify it. Its checksum
> implementation agrees with GNU tar, which contains a comment that states
>
>     Fill in the checksum field.  It's formatted differently from the
>     other fields: it has [6] digits, a null, then a space ...
>
> Correcting this allows tar2sqfs to correctly process the tarballs made
> by git archive.
>
> Signed-off-by: Matt Turner <mattst88@gmail.com>
> ---
>  archive-tar.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/archive-tar.c b/archive-tar.c
> index 3e53aac1e6..f9a157bfd1 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -215,7 +215,9 @@ static void prepare_header(struct archiver_args *args,
>  	memcpy(header->magic, "ustar", 6);
>  	memcpy(header->version, "00", 2);
>  
> -	xsnprintf(header->chksum, sizeof(header->chksum), "%07o", ustar_header_chksum(header));
> +	xsnprintf(header->chksum, sizeof(header->chksum), "%06o", ustar_header_chksum(header));
> +	header->chksum[6] = '\0';
> +	header->chksum[7] = ' ';

Wow.  The choice of %07o is as old as the very original of tar-tree
implementation in our codebase, starting at ae64bbc1 ("tar-tree:
Introduce write_entry()", 2006-03-25).

I think the updated behaviour matches Wikipedia [*1*] where it
spells out that 6 octal is followed by a NUL and a SP; it also says
various implementations do not adhere to this format---perhaps they
meant us ;-)

Frustratingly, I couldn't find this spelled out in POSIX [*2*] X-<.
The closest I found there was

    All other fields are leading zero-filled octal numbers using
    digits from the ISO/IEC 646:1991 standard IRV. Each numeric
    field is terminated by one or more <space> or NUL characters.

which sounds as if the standard wanted to be deliberately more
inclusive than what tar2sqfs is insisting on.

I think a change like this would impact kernel.org folks' tarball
uploading tool, but that is not a reason not to apply this patch.



[References].

*1* https://en.wikipedia.org/wiki/Tar_(computing)
*2* http://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] archive: Store checksum correctly
  2019-07-23 16:49 ` Junio C Hamano
@ 2019-07-23 19:31   ` Jeff King
  2019-07-23 19:38   ` René Scharfe
  1 sibling, 0 replies; 8+ messages in thread
From: Jeff King @ 2019-07-23 19:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matt Turner, Rene Scharfe, git, David Oberhollenzer

On Tue, Jul 23, 2019 at 09:49:44AM -0700, Junio C Hamano wrote:

> I think a change like this would impact kernel.org folks' tarball
> uploading tool, but that is not a reason not to apply this patch.

That was my thought, too. Distro projects like homebrew rely on
stable hashes of upstream tarballs, and they get cranky when those
tarballs change. Ideally the projects they package would provide
byte-stable tarballs, but many of them rely on on-the-fly tarball
generation by hosting sites like GitHub.

Which isn't to say we should never fix bugs in the tarballs that we
produce, but it's not entirely clear to me that this _is_ a bug, and not
just one tool being overly picky.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] archive: Store checksum correctly
  2019-07-23 16:49 ` Junio C Hamano
  2019-07-23 19:31   ` Jeff King
@ 2019-07-23 19:38   ` René Scharfe
  2019-07-23 20:08     ` René Scharfe
  2019-07-23 21:21     ` Junio C Hamano
  1 sibling, 2 replies; 8+ messages in thread
From: René Scharfe @ 2019-07-23 19:38 UTC (permalink / raw)
  To: Junio C Hamano, Matt Turner; +Cc: git, David Oberhollenzer

Am 23.07.19 um 18:49 schrieb Junio C Hamano:
> Matt Turner <mattst88@gmail.com> writes:
>
>> tar2sqfs (part of https://github.com/topics/tar2sqfs) rejects tarballs
>> made with git archive with the message
>>
>>     invalid tar header checksum!
>>
>> tar2sqfs recomputes the tarball's checksum to verify it. Its checksum
>> implementation agrees with GNU tar, which contains a comment that states
>>
>>     Fill in the checksum field.  It's formatted differently from the
>>     other fields: it has [6] digits, a null, then a space ...
>>
>> Correcting this allows tar2sqfs to correctly process the tarballs made
>> by git archive.
>>
>> Signed-off-by: Matt Turner <mattst88@gmail.com>
>> ---
>>  archive-tar.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/archive-tar.c b/archive-tar.c
>> index 3e53aac1e6..f9a157bfd1 100644
>> --- a/archive-tar.c
>> +++ b/archive-tar.c
>> @@ -215,7 +215,9 @@ static void prepare_header(struct archiver_args *args,
>>  	memcpy(header->magic, "ustar", 6);
>>  	memcpy(header->version, "00", 2);
>>
>> -	xsnprintf(header->chksum, sizeof(header->chksum), "%07o", ustar_header_chksum(header));
>> +	xsnprintf(header->chksum, sizeof(header->chksum), "%06o", ustar_header_chksum(header));
>> +	header->chksum[6] = '\0';
>> +	header->chksum[7] = ' ';
>
> Wow.  The choice of %07o is as old as the very original of tar-tree
> implementation in our codebase, starting at ae64bbc1 ("tar-tree:
> Introduce write_entry()", 2006-03-25).

Actually it's already in 731ab9ccf2 ("[PATCH] create tar archives of
tree on the fly", 2005-04-28).

> I think the updated behaviour matches Wikipedia [*1*] where it
> spells out that 6 octal is followed by a NUL and a SP; it also says
> various implementations do not adhere to this format---perhaps they
> meant us ;-)

OpenBSD's pax(1) does the same if I read
https://github.com/openbsd/src/blob/master/bin/pax/tar.c correctly.

> Frustratingly, I couldn't find this spelled out in POSIX [*2*] X-<.
> The closest I found there was
>
>     All other fields are leading zero-filled octal numbers using
>     digits from the ISO/IEC 646:1991 standard IRV. Each numeric
>     field is terminated by one or more <space> or NUL characters.
>
> which sounds as if the standard wanted to be deliberately more
> inclusive than what tar2sqfs is insisting on.

Yes, I remember following that specific paragraph when writing the
code.  The wording probably tries to account for the different
variants invented by vendors over the decades.

is_checksum_valid() in
https://github.com/AgentD/squashfs-tools-ng/blob/master/lib/tar/checksum.c
compares the formatted checksum byte-by-byte.  That seems
unnecessarily strict.  Parsing and comparing the numerical value
of the checksum would be more forgiving, better adhere to POSIX and
might be a tiny bit quicker.

René

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] archive: Store checksum correctly
  2019-07-23 19:38   ` René Scharfe
@ 2019-07-23 20:08     ` René Scharfe
  2019-07-23 21:34       ` David Oberhollenzer
  2019-07-23 21:21     ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: René Scharfe @ 2019-07-23 20:08 UTC (permalink / raw)
  To: Matt Turner, David Oberhollenzer; +Cc: Junio C Hamano, git

Am 23.07.19 um 21:38 schrieb René Scharfe:
> is_checksum_valid() in
> https://github.com/AgentD/squashfs-tools-ng/blob/master/lib/tar/checksum.c
> compares the formatted checksum byte-by-byte.  That seems
> unnecessarily strict.  Parsing and comparing the numerical value
> of the checksum would be more forgiving, better adhere to POSIX and
> might be a tiny bit quicker.

I mean something like the patch below.  Code and text size are bigger,
but it's more lenient and writes less.  Untested.

(Side note: I'm a bit surprised that GCC 8.3 adds the eight spaces one
 by one in the middle loop with -O2..)

---
 lib/tar/checksum.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/lib/tar/checksum.c b/lib/tar/checksum.c
index a2a101a..af94ab4 100644
--- a/lib/tar/checksum.c
+++ b/lib/tar/checksum.c
@@ -1,15 +1,27 @@
 /* SPDX-License-Identifier: GPL-3.0-or-later */
 #include "internal.h"

-void update_checksum(tar_header_t *hdr)
+static unsigned int get_checksum(const tar_header_t *hdr)
 {
+	const unsigned char *header_start = (const unsigned char *)hdr;
+	const unsigned char *chksum_start = (const unsigned char *)hdr->chksum;
+	const unsigned char *header_end = header_start + sizeof(*hdr);
+	const unsigned char *chksum_end = chksum_start + sizeof(hdr->chksum);
+	const unsigned char *p;
 	unsigned int chksum = 0;
-	size_t i;

-	memset(hdr->chksum, ' ', sizeof(hdr->chksum));
+	for (p = header_start; p < chksum_start; p++)
+		chksum += *p;
+	for (; p < chksum_end; p++)
+		chksum += ' ';
+	for (; p < header_end; p++)
+		chksum += *p;
+	return chksum;
+}

-	for (i = 0; i < sizeof(*hdr); ++i)
-		chksum += ((unsigned char *)hdr)[i];
+void update_checksum(tar_header_t *hdr)
+{
+	unsigned int chksum = get_checksum(hdr);

 	sprintf(hdr->chksum, "%06o", chksum);
 	hdr->chksum[6] = '\0';
@@ -18,9 +30,10 @@ void update_checksum(tar_header_t *hdr)

 bool is_checksum_valid(const tar_header_t *hdr)
 {
-	tar_header_t copy;
+	unsigned int calculated_chksum = get_checksum(hdr);
+	uint64_t read_chksum;

-	memcpy(&copy, hdr, sizeof(*hdr));
-	update_checksum(&copy);
-	return memcmp(hdr, &copy, sizeof(*hdr)) == 0;
+	if (read_octal(hdr->chksum, sizeof(hdr->chksum), &read_chksum))
+		return 0;
+	return read_chksum == calculated_chksum;
 }
--
2.22.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] archive: Store checksum correctly
  2019-07-23 19:38   ` René Scharfe
  2019-07-23 20:08     ` René Scharfe
@ 2019-07-23 21:21     ` Junio C Hamano
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2019-07-23 21:21 UTC (permalink / raw)
  To: René Scharfe; +Cc: Matt Turner, git, David Oberhollenzer

René Scharfe <l.s.r@web.de> writes:

>> Wow.  The choice of %07o is as old as the very original of tar-tree
>> implementation in our codebase, starting at ae64bbc1 ("tar-tree:
>> Introduce write_entry()", 2006-03-25).
>
> Actually it's already in 731ab9ccf2 ("[PATCH] create tar archives of
> tree on the fly", 2005-04-28).

Yup, after viewing "git show ae64bbc1" I found out the commit added
a new helper to do %07o without touching the existing one that did
the same.  Problem with relying on "git blame" too much X-<.

>> I think the updated behaviour matches Wikipedia [*1*] where it
>> spells out that 6 octal is followed by a NUL and a SP; it also says
>> various implementations do not adhere to this format---perhaps they
>> meant us ;-)
>
> OpenBSD's pax(1) does the same if I read
> https://github.com/openbsd/src/blob/master/bin/pax/tar.c correctly.

What's more interesting is that their verifier in tar_id() compares
the ulong value read from textual checksum with the ulong value
computed.  I agree with you that it would be the more robust way
than what is done by squshfs tools (ng).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] archive: Store checksum correctly
  2019-07-23 20:08     ` René Scharfe
@ 2019-07-23 21:34       ` David Oberhollenzer
  0 siblings, 0 replies; 8+ messages in thread
From: David Oberhollenzer @ 2019-07-23 21:34 UTC (permalink / raw)
  To: René Scharfe, Matt Turner; +Cc: Junio C Hamano, git

Hi,

the implementation of tar2sqfs was primarily based on `man 5 tar`[0],
Wikipedia[1] and trial/error with archives generated by GNU tar and
samples from [2] which I also integrated as test cases.

On 7/23/19 10:08 PM, René Scharfe wrote:
> Am 23.07.19 um 21:38 schrieb René Scharfe:
>> is_checksum_valid() in
>> https://github.com/AgentD/squashfs-tools-ng/blob/master/lib/tar/checksum.c
>> compares the formatted checksum byte-by-byte.  That seems
>> unnecessarily strict.  Parsing and comparing the numerical value
>> of the checksum would be more forgiving, better adhere to POSIX and
>> might be a tiny bit quicker.
>

I agree with that. The current code was probably the simplest way to
move forward with existing code after implementing sqfs2tar and it stayed
that way, since it worked with all archives I tested it with.

As for every file format parser, tar2sqfs should of course be as tolerant
as possible in what it accepts.

> I mean something like the patch below.  Code and text size are bigger,
> but it's more lenient and writes less.  Untested.
> 
I applied and tested the patch. It looks similar to the solution I came up
with and the test cases still pass. Also, on my system tar2sqfs now accepts
git-archive tar balls which it didn't before applying the patch, so I
would go with this.

Thanks,

David


[0] https://www.freebsd.org/cgi/man.cgi?query=tar&sektion=5&manpath=FreeBSD+12.0-RELEASE&arch=default&format=html
[1] https://en.wikipedia.org/wiki/Tar_(computing)#File_format
[2] https://dev.gentoo.org/~mgorny/articles/portability-of-tar-features.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] archive: Store checksum correctly
  2019-07-23  2:57 [PATCH] archive: Store checksum correctly Matt Turner
  2019-07-23 16:49 ` Junio C Hamano
@ 2019-07-24  1:04 ` Jonathan Nieder
  1 sibling, 0 replies; 8+ messages in thread
From: Jonathan Nieder @ 2019-07-24  1:04 UTC (permalink / raw)
  To: Matt Turner; +Cc: git, David Oberhollenzer, Rene Scharfe

(cc: René Scharfe, "git archive" expert)
Matt Turner wrote:

> tar2sqfs (part of https://github.com/topics/tar2sqfs) rejects tarballs
> made with git archive with the message
>
>     invalid tar header checksum!
>
> tar2sqfs recomputes the tarball's checksum to verify it. Its checksum
> implementation agrees with GNU tar, which contains a comment that states
>
>     Fill in the checksum field.  It's formatted differently from the
>     other fields: it has [6] digits, a null, then a space ...
>
> Correcting this allows tar2sqfs to correctly process the tarballs made
> by git archive.
>
> Signed-off-by: Matt Turner <mattst88@gmail.com>
> ---
>  archive-tar.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Nice.  Is this something that can be covered in tests as well?  (See
t500* for existing "git archive" tests, and see test_lazy_prereq in case
you'd like the test to use an external tool like tar2sqfs that not all
users may have.)

Thanks,
Jonathan

(patch left unsnipped for reference)

> diff --git a/archive-tar.c b/archive-tar.c
> index 3e53aac1e6..f9a157bfd1 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -215,7 +215,9 @@ static void prepare_header(struct archiver_args *args,
>  	memcpy(header->magic, "ustar", 6);
>  	memcpy(header->version, "00", 2);
>  
> -	xsnprintf(header->chksum, sizeof(header->chksum), "%07o", ustar_header_chksum(header));
> +	xsnprintf(header->chksum, sizeof(header->chksum), "%06o", ustar_header_chksum(header));
> +	header->chksum[6] = '\0';
> +	header->chksum[7] = ' ';
>  }
>  
>  static void write_extended_header(struct archiver_args *args,

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-07-24  1:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-23  2:57 [PATCH] archive: Store checksum correctly Matt Turner
2019-07-23 16:49 ` Junio C Hamano
2019-07-23 19:31   ` Jeff King
2019-07-23 19:38   ` René Scharfe
2019-07-23 20:08     ` René Scharfe
2019-07-23 21:34       ` David Oberhollenzer
2019-07-23 21:21     ` Junio C Hamano
2019-07-24  1:04 ` Jonathan Nieder

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).