git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Pedro Alvarez <pedro.alvarez@codethink.co.uk>
Cc: git@vger.kernel.org,
	Pedro Alvarez Piedehierro <palvarez89@gmail.com>,
	Felipe Contreras <felipe.contreras@gmail.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH] Add initial support for pax extended attributes
Date: Wed, 23 May 2018 11:34:52 +0900	[thread overview]
Message-ID: <xmqqd0xnw14j.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <20180522100548.29881-1-pedro.alvarez@codethink.co.uk> (Pedro Alvarez's message of "Tue, 22 May 2018 11:05:48 +0100")

Pedro Alvarez <pedro.alvarez@codethink.co.uk> writes:

> From: Pedro Alvarez Piedehierro <palvarez89@gmail.com>
> Subject: [PATCH] Add initial support for pax extended attributes

Lead it with the name of the area you are adding support for pax ext
header, e.g.

    Subject: [PATCH] import-tars: read overlong names from pax extended header

or something.

> Sometimes the tar files will contain pax extended attributes to deal
> with cases where the information needed doesn't fit in a standard
> ustar entry.
>
> One of these cases is when the path is larger than 100 characters. A
> pax entry will appear containing two standard ustart entries. The first

u-start? us-tart? sound yummy.  I think s/ustart entries/ustar headers/

> entry will have an 'x' typeflag, and contain the the extended attributes.

s/contain/&s/

>
> The pax extended attributes contain one or multiple records constructed as
> follows:
>
>     "%d %s=%s\n", <length>, <keyword>, <value>

> This commit makes sure that we always read the extended attibutes from

s/This commit makes sure/Make sure/;

> pax entries, and in the case of finding one, we parse its records
> looking for 'path' information. If this information is found, it's
> stored to be used in the next ustar entry.
>
> Information about the Pax Interchange Format can be found at:
>
>     https://www.freebsd.org/cgi/man.cgi?manpath=FreeBSD+8-current&query=tar&sektion=5.

> Before this change, importing gcc tarballs[1] would fail with the
> following error:
>
>     fast-import crash report:
>         fast-import process: 82899
>         parent process     : 82897
>         at 2018-05-21 12:35:27 +0000
>
>     fatal: Unsupported command: 29 atime=1516870168.93527949

Drop "Before this change, " and move the above to the very beginning
of the proposed log message.  The problem description is always
"without this patch applied, we have this problem that needs to be
fixed", so "Before this change" is an unnecessary thing to say.

The remainder of the crash log may or may not be in the problem
description, if we want to shoot for brevity.  If I were writing a
log message for this patch, I'd go for even shorter version, e.g.

	Importing gcc tarballs[1] with import-tars script (in
	contrib/) fails when hitting a pax extended header that
	records a long pathname.

	Teach the code to parse and grab information from pax
	extended headers, and reconstruct a long pathname that is
	split into multiple records, to correct this problem.

        The code to parse pax extended headers were written,
        consulting the Pax Interchange Format documentation [2].

	[1] http://ftp.gnu.org/gnu/gcc/gcc-7.3.0/gcc-7.3.0.tar.xz
	[2] https://www.freebsd.org/cgi/man.cgi?manpath=FreeBSD+8-current&query=tar&sektion=5

> index d60b4315ed..c2e54ec7a3 100755
> --- a/contrib/fast-import/import-tars.perl
> +++ b/contrib/fast-import/import-tars.perl
> @@ -63,6 +63,8 @@ foreach my $tar_file (@ARGV)
>  	my $have_top_dir = 1;
>  	my ($top_dir, %files);
>  
> +	my $next_path = '';
> +
>  	while (read(I, $_, 512) == 512) {
>  		my ($name, $mode, $uid, $gid, $size, $mtime,
>  			$chksum, $typeflag, $linkname, $magic,
> @@ -70,6 +72,13 @@ foreach my $tar_file (@ARGV)
>  			$prefix) = unpack 'Z100 Z8 Z8 Z8 Z12 Z12
>  			Z8 Z1 Z100 Z6
>  			Z2 Z32 Z32 Z8 Z8 Z*', $_;
> +
> +		unless ($next_path eq '') {
> +			# Recover name from previous extended header
> +			$name = $next_path;
> +			$next_path = '';
> +		}
> +
>  		last unless length($name);
>  		if ($name eq '././@LongLink') {
>  			# GNU tar extension
> @@ -90,13 +99,32 @@ foreach my $tar_file (@ARGV)
>  			Z8 Z1 Z100 Z6
>  			Z2 Z32 Z32 Z8 Z8 Z*', $_;
>  		}
> -		next if $name =~ m{/\z};
>  		$mode = oct $mode;
>  		$size = oct $size;
>  		$mtime = oct $mtime;
>  		next if $typeflag == 5; # directory
>  
> -		if ($typeflag != 1) { # handle hard links later
> +		if ($typeflag eq 'x') { # extended header
> +			# If extended header, check for path
> +			my $pax_header = '';
> +			while ($size > 0 && read(I, $_, 512) == 512) {

Would we ever get a short-read (i.e. we ask to read 512 bytes,
syscall returns after reading only 256 bytes, even though next call
to read would give the remaining 256 bytes and later ones)?

If we do get a short-read, would that be an error (in which case,
how are we handling it)?  If it is not an error, should we continue
reading, instead of leaving the loop?

> +				$pax_header = $pax_header . substr($_, 0, $size);
> +				$size -= 512;
> +			}
> +
> +			my @lines = split /\n/, $pax_header;
> +			foreach my $line (@lines) {
> +				my ($len, $entry) = split / /, $line;
> +				my ($key, $value) = split /=/, $entry;
> +				if ($key eq 'path') {
> +					$next_path = $value;
> +				}
> +			}
> +			next;
> +		} elsif ($name =~ m{/\z}) {
> +			# If it's a folder, ignore

It is a very good idea to add comment on why we are ignoring and
doing 'next' here, but the tar format documentation you cited never
uses the word 'folder'.  s/folder/directory/, probably.

> +			next;
> +		} elsif ($typeflag != 1) { # handle hard links later
>  			print FI "blob\n", "mark :$next_mark\n";
>  			if ($typeflag == 2) { # symbolic link
>  				print FI "data ", length($linkname), "\n",

Thanks.

  reply	other threads:[~2018-05-23  2:35 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-22 10:05 [PATCH] Add initial support for pax extended attributes Pedro Alvarez
2018-05-23  2:34 ` Junio C Hamano [this message]
2018-05-23  4:57   ` Jeff King
2018-05-23 23:38     ` Junio C Hamano
2018-05-23 22:54 ` [PATCH v2 0/1] import-tars: read overlong names from pax extended header Pedro Alvarez
2018-05-23 22:54   ` [PATCH v2 1/1] " Pedro Alvarez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqd0xnw14j.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=felipe.contreras@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=palvarez89@gmail.com \
    --cc=pedro.alvarez@codethink.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).