git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: suzuki toshiya <mpsuzuki@hiroshima-u.ac.jp>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH] git-archive: accept --owner and --group like GNU tar
Date: Tue, 2 Jan 2018 22:36:27 +0100	[thread overview]
Message-ID: <e45b654e-4fa5-2ad0-4a38-cf853ab2f0df@web.de> (raw)
In-Reply-To: <5A4B2DA5.907@hiroshima-u.ac.jp>

Am 02.01.2018 um 07:58 schrieb suzuki toshiya:
> Dear René ,
> 
> René Scharfe wrote:
>> Am 29.12.2017 um 15:05 schrieb suzuki toshiya:
>>> The ownership of files created by git-archive is always
>>> root:root. Add --owner and --group options which work
>>> like the GNU tar equivalent to allow overriding these
>>> defaults.
>>
>> In which situations do you use the new options?
>>
>> (The sender would need to know the names and/or IDs on the receiving
>> end.  And the receiver would need to be root to set both IDs, or be a
>> group member to set the group ID; I guess the latter is more common.)
> 
> Thank you for asking the background.
> 
> In the case that additional contents are appended to the tar file
> generated by git-archive, the part by git-archive and the part
> appended by common tar would have different UID/GID, because common
> tar preserves the UID/GID of the original files.
> 
> Of cource, both of GNU tar and bsdtar have the options to set
> UID/GID manually, but their syntax are different.
> 
> In the recent source package of poppler (poppler.freedesktop.org),
> there are 2 sets of UID/GIDs are found:
> https://poppler.freedesktop.org/poppler-0.62.0.tar.xz
> 
> I've discussed with the maintainers of poppler, and there was a
> suggestion to propose a feature to git.
> https://lists.freedesktop.org/archives/poppler/2017-December/012739.html

So this is in the context of generating release tarballs that contain
untracked files as well.  That's done in Git's own Makefile, too:

  dist: git-archive$(X) configure
          ./git-archive --format=tar \
                  --prefix=$(GIT_TARNAME)/ HEAD^{tree} > $(GIT_TARNAME).tar
          @mkdir -p $(GIT_TARNAME)
          @cp configure $(GIT_TARNAME)
          @echo $(GIT_VERSION) > $(GIT_TARNAME)/version
          @$(MAKE) -C git-gui TARDIR=../$(GIT_TARNAME)/git-gui dist-version
          $(TAR) rf $(GIT_TARNAME).tar \
                  $(GIT_TARNAME)/configure \
                  $(GIT_TARNAME)/version \
                  $(GIT_TARNAME)/git-gui/version
          @$(RM) -r $(GIT_TARNAME)
          gzip -f -9 $(GIT_TARNAME).tar

Having files with different owners and groups is a non-issue when
extracting with --no-same-owner, which is the default for regular users.
I assume this covers most use cases in the wild.

The generated archive leaks the IDs of the user preparing the archive in
the appended entries for untracked files.  I think that's more of a
concern.  Publishing a valid non-root username on your build system may
invite attackers.

Changing the build procedure to set owner and group to root as well as
UID and GID to zero seems like a better idea.  This is complicated by
the inconsistent command line options for GNU tar and bsdtar, as you
mentioned.

So how about making it possible to append untracked files using git
archive?  This could simplify the dist target for Git as well.  It's
orthogonal to adding the ability to explicitly specify owner and group,
but might suffice in most (all?) cases.

Not sure what kind of file name transformation abilities would be
needed and how to package them nicely.  The --transform option of GNU
tar with its sed replace expressions seems quite heavy for me.  With
poppler it's only used to add the --prefix string; I'd expect that to
be done for all appended files anyway.

Perhaps something like --append-file=<FILE> with no transformation
feature is already enough for most cases?

>> Would it make sense to support the new options for ZIP files as well?
> 
> I was not aware of the availability of UID/GID in pkzip file format...
> Oh, checking APPNOTE.TXT ( https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT ),
> there is a storage! (see 4.5.7-Unix Extra Field). But it seems
> that current git-archive emits pkzip without the field.

Indeed.  Git doesn't track owners and groups, so it doesn't make
sense to emit that kind of information so far.  If git archive grows
options to specify such meta-information then it should be supported
by all archive formats (or documented to be tar-specific).

However, the UNIX Extra Field in ZIP archives only allows to store
UID and GID (no names), which is useless unless the sender knows the
ID range of the receiver -- which is unlikely when distributing
software on the Internet.  And even then it won't work with Windows,
which has long Security Identifiers (SIDs) instead.

So these are more advantages for letting git archive append untracked
files: It's format-agnostic and more portable.


[snipped interesting history of security-related tar options]

Btw. I like how bsdtar has --insecure as a synonym for -p (preserve
file permissions when extracting).  It's a bit sad that this is still
the default for root, though.  OpenBSD cut that behavior out of their
tar almost 20 years ago.  (An evil tar archive could be used to fill
the quota of unsuspecting users, or add setuid executables.)

>>> +#if ULONG_MAX > 0xFFFFFFFFUL
>>> +    /*
>>> +     * --owner, --group rejects uid/gid greater than 32-bit
>>> +     * limits, even on 64-bit platforms.
>>> +     */
>>> +    if (ul > 0xFFFFFFFFUL)
>>> +        return STR_IS_DIGIT_TOO_LARGE;
>>> +#endif
>>
>> The #if is not really necessary, is it?  Compilers should be able to
>> optimize the conditional out on 32-bit platforms.
> 
> Thanks for finding this, I'm glad to have a chance to ask a
> question; git is not needed to care for 16-bit platforms?

I'm not aware of attempts to compile Git on 16-bit systems, but I
doubt it would be much fun to use the result anyway.  It couldn't
handle files bigger than 64KB for a start.

>> GNU tar and bsdtar show the names of owner and group with -t -v at
>> least, albeit in slightly different formats.  Can this help avoid
>> parsing the archive on our own?
> 
> Yeah, writing yet another tar archive parser in C, to avoid the additional
> dependency to Python or newer Perl (Archive::Tar since perl-5.10), is
> painful, I feel (not only for me but also for the maintainers).
> If tar command itself works well, it would be the best.
> 
> But, I'm not sure whether the format of "tar tv" output is stably
> standardized. It's the reason why I wrote Python tool. If I execute
> git-archive with sufficently long randomized username & uid in
> several times, it would be good test?

The append-untracked feature would avoid that issue as well.

A unique username doesn't have to be long or random if you control all
other strings that "tar tv" would show (in particular file names).  You
could e.g. use "UserName" and then simply grep for it.

Checking that git archive doesn't add unwanted characters is harder if
the format isn't strictly known -- how to reliably recognize it sneaking
in an extra space or slash?  Not sure if such a test is really needed,
though.

>> But getting a short program like zipdetails for tar would be nice as
>> well of course. :)
> 
> I wrote something in C:
> https://github.com/mpsuzuki/git/blob/pullreq-20171227-c/t/helper/test-parse-tar-file.c

OK.  (But the append-untracked feature could be tested without such a
helper. :)

Just some loose comments instead of a real review: You could get the
line count down by including Git's tar.h and parse-options.h
(Documentation/technical/api-parse-options.txt).

> but if somebody wants the support of other tar variants,
> he/she would have some headache :-)

Yeah, the history of tar and its various platform-specific formats
fills volumes, no doubt.

René

  reply	other threads:[~2018-01-02 21:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-29 14:05 [PATCH] git-archive: accept --owner and --group like GNU tar suzuki toshiya
2018-01-01 18:29 ` René Scharfe
2018-01-02  0:32   ` Perry Hutchison
2018-01-04  0:43     ` René Scharfe
     [not found] ` <df39f62558314cf6a9d9df3e23f31dd8@OS2PR01MB1147.jpnprd01.prod.outlook.com>
2018-01-02  6:58   ` suzuki toshiya
2018-01-02 21:36     ` René Scharfe [this message]
     [not found]     ` <59a1fc058278463996ed68c970a5e08a@OS2PR01MB1147.jpnprd01.prod.outlook.com>
2018-01-04  1:29       ` suzuki toshiya
     [not found]       ` <955dae095d504b00b3e1c8a956ba852a@OS2PR01MB1147.jpnprd01.prod.outlook.com>
2018-01-04  2:25         ` suzuki toshiya
2018-01-04 16:59           ` René Scharfe
2018-01-04 18:22             ` Junio C Hamano
2018-01-05 13:54               ` René Scharfe
2018-01-05 19:10                 ` Junio C Hamano
     [not found]             ` <2112f9c245d64f4e89361df7e9de9374@OS2PR01MB1147.jpnprd01.prod.outlook.com>
2018-01-05  4:23               ` suzuki toshiya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e45b654e-4fa5-2ad0-4a38-cf853ab2f0df@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=mpsuzuki@hiroshima-u.ac.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).