* [PATCH] Add initial support for pax extended attributes @ 2018-05-22 10:05 Pedro Alvarez 2018-05-23 2:34 ` Junio C Hamano 2018-05-23 22:54 ` [PATCH v2 0/1] import-tars: read overlong names from pax extended header Pedro Alvarez 0 siblings, 2 replies; 6+ messages in thread From: Pedro Alvarez @ 2018-05-22 10:05 UTC (permalink / raw) To: git Cc: Pedro Alvarez Piedehierro, Felipe Contreras, Junio C Hamano, Johannes Schindelin From: Pedro Alvarez Piedehierro <palvarez89@gmail.com> Sometimes the tar files will contain pax extended attributes to deal with cases where the information needed doesn't fit in a standard ustar entry. One of these cases is when the path is larger than 100 characters. A pax entry will appear containing two standard ustart entries. The first entry will have an 'x' typeflag, and contain the the extended attributes. The pax extended attributes contain one or multiple records constructed as follows: "%d %s=%s\n", <length>, <keyword>, <value> This commit makes sure that we always read the extended attibutes from pax entries, and in the case of finding one, we parse its records looking for 'path' information. If this information is found, it's stored to be used in the next ustar entry. Information about the Pax Interchange Format can be found at: https://www.freebsd.org/cgi/man.cgi?manpath=FreeBSD+8-current&query=tar&sektion=5. Before this change, importing gcc tarballs[1] would fail with the following error: fast-import crash report: fast-import process: 82899 parent process : 82897 at 2018-05-21 12:35:27 +0000 fatal: Unsupported command: 29 atime=1516870168.93527949 Most Recent Commands Before Crash --------------------------------- M 644 :22495 gcc-7.3.0/libstdc++-v3/testsuite/20_util/duration/PaxHeaders.4467/comparison_operators M 644 :140367 gcc-7.3.0/gcc/ada/s-gloloc-mingw.adb M 644 :75143 gcc-7.3.0/gcc/testsuite/gcc.c-torture/execute/builtins/PaxHeaders.4467/strncat-chk-lib.c <snip> M 644 :135585 gcc-7.3.0/gcc/testsuite/c-c++-common/attr-warn-unused-result.c M 644 :54956 gcc-7.3.0/gcc/testsuite/go.test/test/fixedbugs/PaxHeaders.4467/bug335.dir M 644 :20632 27 mtime=1483272463.905435 * 29 atime=1516870168.93527949 [1]: http://ftp.gnu.org/gnu/gcc/gcc-7.3.0/gcc-7.3.0.tar.xz Signed-off-by: Pedro Alvarez <palvarez89@gmail.com> --- contrib/fast-import/import-tars.perl | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/contrib/fast-import/import-tars.perl b/contrib/fast-import/import-tars.perl index d60b4315ed..c2e54ec7a3 100755 --- a/contrib/fast-import/import-tars.perl +++ b/contrib/fast-import/import-tars.perl @@ -63,6 +63,8 @@ foreach my $tar_file (@ARGV) my $have_top_dir = 1; my ($top_dir, %files); + my $next_path = ''; + while (read(I, $_, 512) == 512) { my ($name, $mode, $uid, $gid, $size, $mtime, $chksum, $typeflag, $linkname, $magic, @@ -70,6 +72,13 @@ foreach my $tar_file (@ARGV) $prefix) = unpack 'Z100 Z8 Z8 Z8 Z12 Z12 Z8 Z1 Z100 Z6 Z2 Z32 Z32 Z8 Z8 Z*', $_; + + unless ($next_path eq '') { + # Recover name from previous extended header + $name = $next_path; + $next_path = ''; + } + last unless length($name); if ($name eq '././@LongLink') { # GNU tar extension @@ -90,13 +99,32 @@ foreach my $tar_file (@ARGV) Z8 Z1 Z100 Z6 Z2 Z32 Z32 Z8 Z8 Z*', $_; } - next if $name =~ m{/\z}; $mode = oct $mode; $size = oct $size; $mtime = oct $mtime; next if $typeflag == 5; # directory - if ($typeflag != 1) { # handle hard links later + if ($typeflag eq 'x') { # extended header + # If extended header, check for path + my $pax_header = ''; + while ($size > 0 && read(I, $_, 512) == 512) { + $pax_header = $pax_header . substr($_, 0, $size); + $size -= 512; + } + + my @lines = split /\n/, $pax_header; + foreach my $line (@lines) { + my ($len, $entry) = split / /, $line; + my ($key, $value) = split /=/, $entry; + if ($key eq 'path') { + $next_path = $value; + } + } + next; + } elsif ($name =~ m{/\z}) { + # If it's a folder, ignore + next; + } elsif ($typeflag != 1) { # handle hard links later print FI "blob\n", "mark :$next_mark\n"; if ($typeflag == 2) { # symbolic link print FI "data ", length($linkname), "\n", -- 2.11.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] Add initial support for pax extended attributes 2018-05-22 10:05 [PATCH] Add initial support for pax extended attributes Pedro Alvarez @ 2018-05-23 2:34 ` Junio C Hamano 2018-05-23 4:57 ` Jeff King 2018-05-23 22:54 ` [PATCH v2 0/1] import-tars: read overlong names from pax extended header Pedro Alvarez 1 sibling, 1 reply; 6+ messages in thread From: Junio C Hamano @ 2018-05-23 2:34 UTC (permalink / raw) To: Pedro Alvarez Cc: git, Pedro Alvarez Piedehierro, Felipe Contreras, Johannes Schindelin Pedro Alvarez <pedro.alvarez@codethink.co.uk> writes: > From: Pedro Alvarez Piedehierro <palvarez89@gmail.com> > Subject: [PATCH] Add initial support for pax extended attributes Lead it with the name of the area you are adding support for pax ext header, e.g. Subject: [PATCH] import-tars: read overlong names from pax extended header or something. > Sometimes the tar files will contain pax extended attributes to deal > with cases where the information needed doesn't fit in a standard > ustar entry. > > One of these cases is when the path is larger than 100 characters. A > pax entry will appear containing two standard ustart entries. The first u-start? us-tart? sound yummy. I think s/ustart entries/ustar headers/ > entry will have an 'x' typeflag, and contain the the extended attributes. s/contain/&s/ > > The pax extended attributes contain one or multiple records constructed as > follows: > > "%d %s=%s\n", <length>, <keyword>, <value> > This commit makes sure that we always read the extended attibutes from s/This commit makes sure/Make sure/; > pax entries, and in the case of finding one, we parse its records > looking for 'path' information. If this information is found, it's > stored to be used in the next ustar entry. > > Information about the Pax Interchange Format can be found at: > > https://www.freebsd.org/cgi/man.cgi?manpath=FreeBSD+8-current&query=tar&sektion=5. > Before this change, importing gcc tarballs[1] would fail with the > following error: > > fast-import crash report: > fast-import process: 82899 > parent process : 82897 > at 2018-05-21 12:35:27 +0000 > > fatal: Unsupported command: 29 atime=1516870168.93527949 Drop "Before this change, " and move the above to the very beginning of the proposed log message. The problem description is always "without this patch applied, we have this problem that needs to be fixed", so "Before this change" is an unnecessary thing to say. The remainder of the crash log may or may not be in the problem description, if we want to shoot for brevity. If I were writing a log message for this patch, I'd go for even shorter version, e.g. Importing gcc tarballs[1] with import-tars script (in contrib/) fails when hitting a pax extended header that records a long pathname. Teach the code to parse and grab information from pax extended headers, and reconstruct a long pathname that is split into multiple records, to correct this problem. The code to parse pax extended headers were written, consulting the Pax Interchange Format documentation [2]. [1] http://ftp.gnu.org/gnu/gcc/gcc-7.3.0/gcc-7.3.0.tar.xz [2] https://www.freebsd.org/cgi/man.cgi?manpath=FreeBSD+8-current&query=tar&sektion=5 > index d60b4315ed..c2e54ec7a3 100755 > --- a/contrib/fast-import/import-tars.perl > +++ b/contrib/fast-import/import-tars.perl > @@ -63,6 +63,8 @@ foreach my $tar_file (@ARGV) > my $have_top_dir = 1; > my ($top_dir, %files); > > + my $next_path = ''; > + > while (read(I, $_, 512) == 512) { > my ($name, $mode, $uid, $gid, $size, $mtime, > $chksum, $typeflag, $linkname, $magic, > @@ -70,6 +72,13 @@ foreach my $tar_file (@ARGV) > $prefix) = unpack 'Z100 Z8 Z8 Z8 Z12 Z12 > Z8 Z1 Z100 Z6 > Z2 Z32 Z32 Z8 Z8 Z*', $_; > + > + unless ($next_path eq '') { > + # Recover name from previous extended header > + $name = $next_path; > + $next_path = ''; > + } > + > last unless length($name); > if ($name eq '././@LongLink') { > # GNU tar extension > @@ -90,13 +99,32 @@ foreach my $tar_file (@ARGV) > Z8 Z1 Z100 Z6 > Z2 Z32 Z32 Z8 Z8 Z*', $_; > } > - next if $name =~ m{/\z}; > $mode = oct $mode; > $size = oct $size; > $mtime = oct $mtime; > next if $typeflag == 5; # directory > > - if ($typeflag != 1) { # handle hard links later > + if ($typeflag eq 'x') { # extended header > + # If extended header, check for path > + my $pax_header = ''; > + while ($size > 0 && read(I, $_, 512) == 512) { Would we ever get a short-read (i.e. we ask to read 512 bytes, syscall returns after reading only 256 bytes, even though next call to read would give the remaining 256 bytes and later ones)? If we do get a short-read, would that be an error (in which case, how are we handling it)? If it is not an error, should we continue reading, instead of leaving the loop? > + $pax_header = $pax_header . substr($_, 0, $size); > + $size -= 512; > + } > + > + my @lines = split /\n/, $pax_header; > + foreach my $line (@lines) { > + my ($len, $entry) = split / /, $line; > + my ($key, $value) = split /=/, $entry; > + if ($key eq 'path') { > + $next_path = $value; > + } > + } > + next; > + } elsif ($name =~ m{/\z}) { > + # If it's a folder, ignore It is a very good idea to add comment on why we are ignoring and doing 'next' here, but the tar format documentation you cited never uses the word 'folder'. s/folder/directory/, probably. > + next; > + } elsif ($typeflag != 1) { # handle hard links later > print FI "blob\n", "mark :$next_mark\n"; > if ($typeflag == 2) { # symbolic link > print FI "data ", length($linkname), "\n", Thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Add initial support for pax extended attributes 2018-05-23 2:34 ` Junio C Hamano @ 2018-05-23 4:57 ` Jeff King 2018-05-23 23:38 ` Junio C Hamano 0 siblings, 1 reply; 6+ messages in thread From: Jeff King @ 2018-05-23 4:57 UTC (permalink / raw) To: Junio C Hamano Cc: Pedro Alvarez, git, Pedro Alvarez Piedehierro, Felipe Contreras, Johannes Schindelin On Wed, May 23, 2018 at 11:34:52AM +0900, Junio C Hamano wrote: > > @@ -90,13 +99,32 @@ foreach my $tar_file (@ARGV) > > Z8 Z1 Z100 Z6 > > Z2 Z32 Z32 Z8 Z8 Z*', $_; > > } > > - next if $name =~ m{/\z}; > > $mode = oct $mode; > > $size = oct $size; > > $mtime = oct $mtime; > > next if $typeflag == 5; # directory > > > > - if ($typeflag != 1) { # handle hard links later > > + if ($typeflag eq 'x') { # extended header > > + # If extended header, check for path > > + my $pax_header = ''; > > + while ($size > 0 && read(I, $_, 512) == 512) { > > Would we ever get a short-read (i.e. we ask to read 512 bytes, > syscall returns after reading only 256 bytes, even though next call > to read would give the remaining 256 bytes and later ones)? No, because perl's read() is buffered (you need sysread() to get a real syscall read). We might read fewer than 512 if we hit EOF, but I think that would be a truncated input, then, since ustar does everything in 512-byte records. I do think we'd fail to notice the truncation, which isn't ideal. But it looks like the rest of the script suffers from the same issue. If anybody cares, it might not be too hard to wrap all of the 512-byte read calls into a helper that dies on bogus input. I sort of assumed this was mostly a proof of concept script and nobody used it, though. :) It makes me wonder if there is a better-tested tar-reading module in CPAN that could be used (though at the expense of requiring an extra dependency). -Peff ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Add initial support for pax extended attributes 2018-05-23 4:57 ` Jeff King @ 2018-05-23 23:38 ` Junio C Hamano 0 siblings, 0 replies; 6+ messages in thread From: Junio C Hamano @ 2018-05-23 23:38 UTC (permalink / raw) To: Jeff King Cc: Pedro Alvarez, git, Pedro Alvarez Piedehierro, Felipe Contreras, Johannes Schindelin Jeff King <peff@peff.net> writes: > I do think we'd fail to notice the truncation, which isn't ideal. But it > looks like the rest of the script suffers from the same issue. > > If anybody cares, it might not be too hard to wrap all of the 512-byte > read calls into a helper that dies on bogus input. Perhaps. In any case, the patch presented here is an improvement over the status quo, so let's move the world forward by taking it without any further "while at it" fixes, which can come later when people feel inclined to do so. Thanks, both, for writing and reviewing ;-) ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 0/1] import-tars: read overlong names from pax extended header 2018-05-22 10:05 [PATCH] Add initial support for pax extended attributes Pedro Alvarez 2018-05-23 2:34 ` Junio C Hamano @ 2018-05-23 22:54 ` Pedro Alvarez 2018-05-23 22:54 ` [PATCH v2 1/1] " Pedro Alvarez 1 sibling, 1 reply; 6+ messages in thread From: Pedro Alvarez @ 2018-05-23 22:54 UTC (permalink / raw) To: git; +Cc: gitster, peff, Pedro Alvarez Piedehierro From: Pedro Alvarez Piedehierro <palvarez89@gmail.com> Hello! In this version I've trimmed and improved the commit message as suggested. Regarding the error handling, as Jeff mentioned, could be improved in general in the entire script. But I guess I could do it if needed to get this patch approved. Thanks for reviewing and giving me some feedback! Pedro. Pedro Alvarez Piedehierro (1): import-tars: read overlong names from pax extended header contrib/fast-import/import-tars.perl | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-) -- 2.11.0 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/1] import-tars: read overlong names from pax extended header 2018-05-23 22:54 ` [PATCH v2 0/1] import-tars: read overlong names from pax extended header Pedro Alvarez @ 2018-05-23 22:54 ` Pedro Alvarez 0 siblings, 0 replies; 6+ messages in thread From: Pedro Alvarez @ 2018-05-23 22:54 UTC (permalink / raw) To: git Cc: gitster, peff, Pedro Alvarez Piedehierro, Felipe Contreras, Johannes Schindelin From: Pedro Alvarez Piedehierro <palvarez89@gmail.com> Importing gcc tarballs[1] with import-tars script (in contrib) fails when hitting a pax extended header. Make sure we always read the extended attributes from the pax entries, and store the 'path' value if found to be used in the next ustar entry. The code to parse pax extended headers was written consulting the Pax Pax Interchange Format documentation [2]. [1] http://ftp.gnu.org/gnu/gcc/gcc-7.3.0/gcc-7.3.0.tar.xz [2] https://www.freebsd.org/cgi/man.cgi?manpath=FreeBSD+8-current&query=tar&sektion=5 Signed-off-by: Pedro Alvarez <palvarez89@gmail.com> --- contrib/fast-import/import-tars.perl | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/contrib/fast-import/import-tars.perl b/contrib/fast-import/import-tars.perl index d60b4315ed..e800d9f5c9 100755 --- a/contrib/fast-import/import-tars.perl +++ b/contrib/fast-import/import-tars.perl @@ -63,6 +63,8 @@ foreach my $tar_file (@ARGV) my $have_top_dir = 1; my ($top_dir, %files); + my $next_path = ''; + while (read(I, $_, 512) == 512) { my ($name, $mode, $uid, $gid, $size, $mtime, $chksum, $typeflag, $linkname, $magic, @@ -70,6 +72,13 @@ foreach my $tar_file (@ARGV) $prefix) = unpack 'Z100 Z8 Z8 Z8 Z12 Z12 Z8 Z1 Z100 Z6 Z2 Z32 Z32 Z8 Z8 Z*', $_; + + unless ($next_path eq '') { + # Recover name from previous extended header + $name = $next_path; + $next_path = ''; + } + last unless length($name); if ($name eq '././@LongLink') { # GNU tar extension @@ -90,13 +99,31 @@ foreach my $tar_file (@ARGV) Z8 Z1 Z100 Z6 Z2 Z32 Z32 Z8 Z8 Z*', $_; } - next if $name =~ m{/\z}; $mode = oct $mode; $size = oct $size; $mtime = oct $mtime; next if $typeflag == 5; # directory - if ($typeflag != 1) { # handle hard links later + if ($typeflag eq 'x') { # extended header + # If extended header, check for path + my $pax_header = ''; + while ($size > 0 && read(I, $_, 512) == 512) { + $pax_header = $pax_header . substr($_, 0, $size); + $size -= 512; + } + + my @lines = split /\n/, $pax_header; + foreach my $line (@lines) { + my ($len, $entry) = split / /, $line; + my ($key, $value) = split /=/, $entry; + if ($key eq 'path') { + $next_path = $value; + } + } + next; + } elsif ($name =~ m{/\z}) { # directory + next; + } elsif ($typeflag != 1) { # handle hard links later print FI "blob\n", "mark :$next_mark\n"; if ($typeflag == 2) { # symbolic link print FI "data ", length($linkname), "\n", -- 2.11.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-05-23 23:38 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-05-22 10:05 [PATCH] Add initial support for pax extended attributes Pedro Alvarez 2018-05-23 2:34 ` Junio C Hamano 2018-05-23 4:57 ` Jeff King 2018-05-23 23:38 ` Junio C Hamano 2018-05-23 22:54 ` [PATCH v2 0/1] import-tars: read overlong names from pax extended header Pedro Alvarez 2018-05-23 22:54 ` [PATCH v2 1/1] " Pedro Alvarez
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).