git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Keegan Carruthers-Smith <keegan.csmith@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: git archive generates tar with malformed pax extended attribute
Date: Fri, 24 May 2019 04:13:37 -0400	[thread overview]
Message-ID: <20190524081337.GA9082@sigill.intra.peff.net> (raw)
In-Reply-To: <CAMVcy0ThtcDNjqat0+nQ4B91hC30NTUe=RW8v9WDxA2Q-4SyRA@mail.gmail.com>

On Fri, May 24, 2019 at 09:35:51AM +0200, Keegan Carruthers-Smith wrote:

> > I can't reproduce on Linux, using GNU tar (1.30) nor with bsdtar 3.3.3
> > (from Debian's bsdtar package). What does your "tar --version" say?
> 
> bsdtar 2.8.3 - libarchive 2.8.3

Interesting. I wonder if there was a libarchive bug that was fixed
between 2.8.3 and 3.3.3.

> > Git does write a pax header with the commit id in it as a comment.
> > Presumably that's what it's complaining about (but it is not malformed
> > according to any tar I've tried). If you feed git-archive a tree rather
> > than a commit, that is omitted. What does:
> >
> >   git archive --format tar c21b98da2^{tree} | tar tf - >/dev/null
> >
> > say? If it doesn't complain, then we know it's indeed the pax comment
> > field.
> 
> It also complains
> 
>   $ git archive --format tar c21b98da2^{tree} | tar tf - >/dev/null
>   tar: Ignoring malformed pax extended attribute
>   tar: Error exit delayed from previous errors.

Ah, OK. So it's not the comment field at all, but some other entry.

> Some more context: I work at Sourcegraph.com We mirror a lot of repos
> from github.com. We usually interact with a working copy by running
> git archive on it in our infrastructure. This is the first repository
> that I have noticed which produces this error. An interesting thing to
> note is the commit metadata contains a lot of non-ascii text which was
> my guess at what my be tripping up the tar creation.

Yeah, though the only thing that makes it into the tarfile is the actual
tree entries. I'd imagine the file content is not likely to be a source
of problems, as it's common to see binary gunk there. Most of the
filenames are pretty mundane, but this symlink destination is a little
funny:

  $ git archive ... | tar tvf - | grep nicovideo4as.swc
  lrwxrwxrwx root/root         0 2019-05-24 03:05 libs/nicovideo4as.swc -> PK\003\004\024

That's not the full story, though. It is indeed a symlink in the
tree:

  $ git ls-tree -r HEAD libs/nicovideo4as.swc
  120000 blob ec3137b5fcaeae25cf67927068af116517683806	libs/nicovideo4as.swc

But the contents of that blob, which should be the destination filename,
are definitely not:

  $ git cat-file blob ec3137b5f | wc -c
  57804
  $ git cat-file blob ec3137b5f | xxd | head -1
  00000000: 504b 0304 1400 0800 0800 5069 694e 0000  PK........PiiN..

There's quite a bit more data there. And what tar showed us goes up to
the first NUL, which does not seem surprising.

It's possible Git is doing the wrong thing on the writing side, but
given that newer versions of bsdtar handle it fine, I'd guess that the
old one simply had problems consuming poorly formed symlink filenames.

-Peff

  reply	other threads:[~2019-05-24  8:13 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-24  6:45 git archive generates tar with malformed pax extended attribute Keegan Carruthers-Smith
2019-05-24  7:06 ` Jeff King
2019-05-24  7:35   ` Keegan Carruthers-Smith
2019-05-24  8:13     ` Jeff King [this message]
2019-05-25 13:26       ` René Scharfe
2019-05-25 13:46         ` Andreas Schwab
2019-05-25 21:07         ` Ævar Arnfjörð Bjarmason
2019-05-26 21:33           ` René Scharfe
2019-05-28  5:44             ` Jeff King
2019-05-28  5:58         ` Jeff King
2019-05-28 18:01           ` René Scharfe
2019-05-28 19:08             ` Jeff King
2019-05-28 23:34               ` René Scharfe
2019-05-29  1:17                 ` Jeff King
2019-05-29 17:54                   ` René Scharfe
2019-05-30 11:55                     ` Jeff King
2019-06-02 16:58                       ` René Scharfe
2019-06-04 20:53                         ` Jeff King
2019-05-27  5:11       ` Keegan Carruthers-Smith
2019-05-25 20:46   ` Ævar Arnfjörð Bjarmason
2019-05-25 21:19     ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190524081337.GA9082@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=keegan.csmith@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).