git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>, git@vger.kernel.org
Subject: Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
Date: Tue, 21 Jun 2016 00:54:11 +0200	[thread overview]
Message-ID: <57687413.3030609@web.de> (raw)
In-Reply-To: <20160616043733.GA18323@sigill.intra.peff.net>

Am 16.06.2016 um 06:37 schrieb Jeff King:
> The ustar format has a fixed-length field for the size of
> each file entry which is supposed to contain up to 11 bytes
> of octal-formatted data plus a NUL or space terminator.
>
> These means that the largest size we can represent is
> 077777777777, or 1 byte short of 8GB. The correct solution
> for a larger file, according to POSIX.1-2001, is to add an
> extended pax header, similar to how we handle long
> filenames. This patch does that, and writes zero for the
> size field in the ustar header (the last bit is not
> mentioned by POSIX, but it matches how GNU tar behaves with
> --format=pax).
>
> This should be a strict improvement over the current
> behavior, which is to die in xsnprintf with a "BUG".
> However, there's some interesting history here.
>
> Prior to f2f0267 (archive-tar: use xsnprintf for trivial
> formatting, 2015-09-24), we silently overflowed the "size"
> field. The extra bytes ended up in the "mtime" field of the
> header, which was then immediately written itself,
> overwriting our extra bytes. What that means depends on how
> many bytes we wrote.
>
> If the size was 64GB or greater, then we actually overflowed
> digits into the mtime field, meaning our value was was
> effectively right-shifted by those lost octal digits. And
> this patch is again a strict improvement over that.
>
> But if the size was between 8GB and 64GB, then our 12-byte
> field held all of the actual digits, and only our NUL
> terminator overflowed. According to POSIX, there should be a
> NUL or space at the end of the field. However, GNU tar seems
> to be lenient here, and will correctly parse a size up 64GB
> (minus one) from the field. So sizes in this range might
> have just worked, depending on the implementation reading
> the tarfile.
>
> This patch is mostly still an improvement there, as the 8GB
> limit is specifically mentioned in POSIX as the correct
> limit. But it's possible that it could be a regression
> (versus the pre-f2f0267 state) if all of the following are
> true:
>
>    1. You have a file between 8GB and 64GB.
>
>    2. Your tar implementation _doesn't_ know about pax
>       extended headers.
>
>    3. Your tar implementation _does_ parse 12-byte sizes from
>       the ustar header without a delimiter.
>
> It's probably not worth worrying about such an obscure set
> of conditions, but I'm documenting it here just in case.
>
> There's no test included here. I did confirm that this works
> (using GNU tar) with:
>
>    $ dd if=/dev/zero seek=64G bs=1 count=1 of=huge
>    $ git add huge
>    $ git commit -q -m foo
>    $ git archive HEAD | head -c 10000 | tar tvf - 2>/dev/null
>    -rw-rw-r-- root/root 68719476737 2016-06-15 21:07 huge
>
> Pre-f2f0267, this would yield a bogus size of 8GB, and
> post-f2f0267, git-archive simply dies.
>
> Unfortunately, it's quite an expensive test to run. For one
> thing, unless your filesystem supports files with holes, it
> takes 64GB of disk space (you might think piping straight to
> `hash-object --stdin` would be better, but it's not; that
> tries to buffer all 64GB in RAM!). Furthermore, hashing and
> compressing the object takes several minutes of CPU time.
>
> We could ship just the resulting compressed object data as a
> loose object, but even that takes 64MB. So sadly, this code
> path remains untested in the test suite.

If we could set the limit to a lower value than 8GB for testing then we 
could at least check if the extended header is written, e.g. if 
ustar_size() could be convinced to return 0 every time using a hidden 
command line parameter or an environment variable or something better.

>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>   archive-tar.c | 28 +++++++++++++++++++++++++++-
>   1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/archive-tar.c b/archive-tar.c
> index cb99df2..7340b64 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -137,6 +137,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword,
>   	strbuf_addch(sb, '\n');
>   }
>
> +/*
> + * Like strbuf_append_ext_header, but for numeric values.
> + */
> +static void strbuf_append_ext_header_uint(struct strbuf *sb,
> +					  const char *keyword,
> +					  uintmax_t value)
> +{
> +	char buf[40]; /* big enough for 2^128 in decimal, plus NUL */
> +	int len;
> +
> +	len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value);
> +	strbuf_append_ext_header(sb, keyword, buf, len);
> +}
> +
>   static unsigned int ustar_header_chksum(const struct ustar_header *header)
>   {
>   	const unsigned char *p = (const unsigned char *)header;
> @@ -163,12 +177,21 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen)
>   	return i;
>   }
>
> +static inline unsigned long ustar_size(uintmax_t size)
> +{
> +	if (size < 077777777777UL)

Shouldn't that be less-or-equal?

> +		return size;
> +	else
> +		return 0;
> +}
> +
>   static void prepare_header(struct archiver_args *args,
>   			   struct ustar_header *header,
>   			   unsigned int mode, unsigned long size)
>   {
>   	xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777);
> -	xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? size : 0);
> +	xsnprintf(header->size, sizeof(header->size), "%011lo",
> +		  S_ISREG(mode) ? ustar_size(size) : 0);
>   	xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time);
>
>   	xsnprintf(header->uid, sizeof(header->uid), "%07o", 0);
> @@ -267,6 +290,9 @@ static int write_tar_entry(struct archiver_args *args,
>   			memcpy(header.linkname, buffer, size);
>   	}
>
> +	if (ustar_size(size) != size)
> +		strbuf_append_ext_header_uint(&ext_header, "size", size);

It needs "S_ISREG(mode) && " as well, no?  In practice it probably 
doesn't matter (until someone stores a 8GB long symlink target), but the 
size field should only be set for regular files.

> +
>   	prepare_header(args, &header, mode, size);
>
>   	if (ext_header.len > 0) {
>

  reply	other threads:[~2016-06-20 22:54 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-16  4:35 [PATCH 0/2] friendlier handling of overflows in archive-tar Jeff King
2016-06-16  4:37 ` [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King
2016-06-20 22:54   ` René Scharfe [this message]
2016-06-21 15:59     ` Jeff King
2016-06-21 16:02       ` Jeff King
2016-06-21 20:42       ` René Scharfe
2016-06-21 20:57         ` René Scharfe
2016-06-21 21:04           ` Jeff King
2016-06-22  5:46             ` René Scharfe
2016-06-21 21:02         ` Jeff King
2016-06-22  5:46           ` René Scharfe
2016-06-23 19:21             ` Jeff King
2016-06-21 20:54       ` René Scharfe
2016-06-21 19:44   ` Robin H. Johnson
2016-06-21 20:57     ` Jeff King
2016-06-16  4:37 ` [PATCH 2/2] archive-tar: write extended headers for far-future mtime Jeff King
2016-06-20 22:54   ` René Scharfe
2016-06-22  5:46     ` René Scharfe
2016-06-23 19:22       ` Jeff King
2016-06-23 21:38         ` René Scharfe
2016-06-23 21:39           ` Jeff King
2016-06-16 17:55 ` [PATCH 0/2] friendlier handling of overflows in archive-tar Junio C Hamano
2016-06-21 16:16 ` Jeff King
2016-06-21 16:16   ` [PATCH v2 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King
2016-06-21 16:17   ` [PATCH v2 2/2] archive-tar: write extended headers for far-future mtime Jeff King
2016-06-21 18:43   ` [PATCH 0/2] friendlier handling of overflows in archive-tar Junio C Hamano
2016-06-23 23:15   ` [PATCH v3] " Jeff King
2016-06-23 23:20     ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Jeff King
2016-06-23 23:31       ` Jeff King
2016-06-24 16:38       ` Johannes Sixt
2016-06-24 16:46         ` Jeff King
2016-06-24 17:05           ` Johannes Sixt
2016-06-24 19:39             ` [PATCH 0/4] portable signal-checking in tests Jeff King
2016-06-24 19:43               ` [PATCH 1/4] tests: factor portable signal check out of t0005 Jeff King
2016-06-24 20:52                 ` Johannes Sixt
2016-06-24 21:05                   ` Jeff King
2016-06-24 21:32                     ` Johannes Sixt
2016-06-24 19:44               ` [PATCH 2/4] t0005: use test_match_signal as appropriate Jeff King
2016-06-24 19:45               ` [PATCH 3/4] test_must_fail: use test_match_signal Jeff King
2016-06-24 19:45               ` [PATCH 4/4] t/lib-git-daemon: " Jeff King
2016-06-24 19:48               ` [PATCH 0/4] portable signal-checking in tests Jeff King
2016-06-24 18:56       ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Junio C Hamano
2016-06-24 19:07         ` Jeff King
2016-06-24 19:44           ` Junio C Hamano
2016-06-24 20:58           ` Eric Sunshine
2016-06-24 21:09             ` Jeff King
2016-06-24 20:58           ` Jeff King
2016-06-24 22:41             ` Junio C Hamano
2016-06-24 23:22               ` Jeff King
2016-06-23 23:21     ` [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB Jeff King
2016-06-24 19:01       ` Junio C Hamano
2016-06-24 19:10         ` Jeff King
2016-06-24 19:45           ` Junio C Hamano
2016-06-24 19:46             ` Jeff King
2016-06-23 23:21     ` [PATCH v3 3/4] archive-tar: write extended headers for far-future mtime Jeff King
2016-06-24 19:06       ` Junio C Hamano
2016-06-24 19:16         ` Jeff King
2016-06-23 23:21     ` [PATCH v3 4/4] archive-tar: drop return value Jeff King
2016-06-24 11:49       ` Remi Galan Alfonso
2016-06-24 13:13         ` Jeff King
2016-06-24 19:10           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57687413.3030609@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).