From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id ECE512018A for ; Fri, 24 Jun 2016 19:07:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751082AbcFXTHs (ORCPT ); Fri, 24 Jun 2016 15:07:48 -0400 Received: from cloud.peff.net ([50.56.180.127]:59884 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1750876AbcFXTHr (ORCPT ); Fri, 24 Jun 2016 15:07:47 -0400 Received: (qmail 29712 invoked by uid 102); 24 Jun 2016 19:07:47 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Fri, 24 Jun 2016 15:07:47 -0400 Received: (qmail 21691 invoked by uid 107); 24 Jun 2016 19:08:02 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Fri, 24 Jun 2016 15:08:02 -0400 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Fri, 24 Jun 2016 15:07:45 -0400 Date: Fri, 24 Jun 2016 15:07:44 -0400 From: Jeff King To: Junio C Hamano Cc: git@vger.kernel.org, =?utf-8?B?UmVuw6k=?= Scharfe , "Robin H. Johnson" Subject: Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Message-ID: <20160624190744.GA32118@sigill.intra.peff.net> References: <20160623231512.GA27683@sigill.intra.peff.net> <20160623232041.GA3668@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Jun 24, 2016 at 11:56:19AM -0700, Junio C Hamano wrote: > Jeff King writes: > > > The ustar format only has room for 11 (or 12, depending on > > some implementations) octal digits for the size and mtime of > > each file. After this, we have to add pax extended headers > > to specify the real data, and git does not yet know how to > > do so. > > I am not a native speaker but "After" above made me hiccup. I think > I am correct to understand that it means "after passing this limit", > aka "to represent files bigger or newer than these", but still it > felt somewhat strange. Yeah, I agree that it reads badly. I'm not sure what I was thinking. I'll tweak it in the re-roll. > > +# See if our system tar can handle a tar file with huge sizes and dates far in > > +# the future, and that we can actually parse its output. > > +# > > +# The reference file was generated by GNU tar, and the magic time and size are > > +# both octal 01000000000001, which overflows normal ustar fields. > > +# > > +# When parsing, we'll pull out only the year from the date; that > > +# avoids any question of timezones impacting the result. > > ... as long as the month-day part is not close to the year boundary. > So this explanation is insuffucient to convince the reader that > "that avoids any question" is correct, without saying that it is in > August of year 4147. I thought that part didn't need to be said, but I can say it (technically we can include the month, too, but I don't think that level of accuracy is really important for these tests). > > +tar_info () { > > + "$TAR" tvf "$1" | awk '{print $3 " " $4}' | cut -d- -f1 > > +} > > A blank after the shell function to make it easier to see the > boundary. I was intentionally trying to couple it with prereq below, as the comment describes both of them. > Seeing an awk piped into cut always makes me want to suggest a > single sed/awk/perl invocation. I want the auto-splitting of awk, but then to auto-split the result using a different delimiter. Is there a not-painful way to do that in awk? I could certainly come up with a regex to do it in sed, but I wanted to keep the parsing as liberal and generic as possible. Certainly I could do it in perl, but I had the general impression that we prefer to keep the dependency on perl to a minimum. Maybe it doesn't matter. > > +# We expect git to die with SIGPIPE here (otherwise we > > +# would generate the whole 64GB). > > +test_expect_failure BUNZIP 'generate tar with huge size' ' > > + { > > + git archive HEAD > > + echo $? >exit-code > > + } | head -c 4096 >huge.tar && > > + echo 141 >expect && > > + test_cmp expect exit-code > > +' > > "head -c" is GNU-ism, isn't it? You're right; for some reason I thought it was in POSIX. We do have a couple instances of it, but they are all in the valgrind setup code (which I guess most people don't ever run). > "dd bs=1 count=4096" is hopefully more portable. Hmm. I always wonder whether dd is actually very portable, but we do use it already, at least. Perhaps the perl monstrosity in t9300 could be replaced with that, too. > ksh signal death you already know about. I wonder if we want to > expose something like list_contains as a friend of test_cmp. > > list_contains 141,269 $(cat exit-code) I think we would want something more like: test_signal_match 13 $(cat exit-code) Each call site should not have to know about every signal convention (and in your example, the magic "3" of Windows is left out). -Peff