git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: suzuki toshiya <mpsuzuki@hiroshima-u.ac.jp>
To: "René Scharfe" <l.s.r@web.de>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH] git-archive: accept --owner and --group like GNU tar
Date: Thu, 04 Jan 2018 10:29:17 +0900	[thread overview]
Message-ID: <5A4D836D.2010409@hiroshima-u.ac.jp> (raw)
In-Reply-To: <59a1fc058278463996ed68c970a5e08a@OS2PR01MB1147.jpnprd01.prod.outlook.com>

Dear René ,

By overlooking your response, I was writing a patch to add
uid/gid into zip archive X-D (not finished yet)
https://github.com/mpsuzuki/git/tree/add-zip-uid-gid
However, I found that most unix platforms use infozip's
extension to store uid/gid instead of pkzip's extension...

> So this is in the context of generating release tarballs that contain
> untracked files as well.  That's done in Git's own Makefile, too:

Oh, I should check other software's tarball :-)

> The generated archive leaks the IDs of the user preparing the archive in
> the appended entries for untracked files.  I think that's more of a
> concern.  Publishing a valid non-root username on your build system may
> invite attackers.

Hmm, I was not aware of such security concern about the
tarball including the developers username.

> So how about making it possible to append untracked files using git
> archive?  This could simplify the dist target for Git as well.  It's
> orthogonal to adding the ability to explicitly specify owner and group,
> but might suffice in most (all?) cases.

Hmm, it could be reasonable to assume that --append-file
would serve more cases than --uid --gid options. There
might be many people who don't care multiple UID/GID in
the source tarball, but want to append some files to the
archive generated by git-archive. I would take a look how
to do that. A point I'm afraid is that some people may
request to pass the file listing the pathnames instead of
giving many --append-file options (and a few people could
want to have a built-in default list specified by GNU
convention :-)).

I want to hear other experts' comment; no need for me to
work "--uid" "--gid" anymore, and should I switch to
"--append-file" options?

Regards,
mpsuzuki

René Scharfe wrote:
> Am 02.01.2018 um 07:58 schrieb suzuki toshiya:
>> Dear René ,
>>
>> René Scharfe wrote:
>>> Am 29.12.2017 um 15:05 schrieb suzuki toshiya:
>>>> The ownership of files created by git-archive is always
>>>> root:root. Add --owner and --group options which work
>>>> like the GNU tar equivalent to allow overriding these
>>>> defaults.
>>> In which situations do you use the new options?
>>>
>>> (The sender would need to know the names and/or IDs on the receiving
>>> end.  And the receiver would need to be root to set both IDs, or be a
>>> group member to set the group ID; I guess the latter is more common.)
>> Thank you for asking the background.
>>
>> In the case that additional contents are appended to the tar file
>> generated by git-archive, the part by git-archive and the part
>> appended by common tar would have different UID/GID, because common
>> tar preserves the UID/GID of the original files.
>>
>> Of cource, both of GNU tar and bsdtar have the options to set
>> UID/GID manually, but their syntax are different.
>>
>> In the recent source package of poppler (poppler.freedesktop.org),
>> there are 2 sets of UID/GIDs are found:
>> https://poppler.freedesktop.org/poppler-0.62.0.tar.xz
>>
>> I've discussed with the maintainers of poppler, and there was a
>> suggestion to propose a feature to git.
>> https://lists.freedesktop.org/archives/poppler/2017-December/012739.html
> 
> So this is in the context of generating release tarballs that contain
> untracked files as well.  That's done in Git's own Makefile, too:
> 
>   dist: git-archive$(X) configure
>           ./git-archive --format=tar \
>                   --prefix=$(GIT_TARNAME)/ HEAD^{tree} > $(GIT_TARNAME).tar
>           @mkdir -p $(GIT_TARNAME)
>           @cp configure $(GIT_TARNAME)
>           @echo $(GIT_VERSION) > $(GIT_TARNAME)/version
>           @$(MAKE) -C git-gui TARDIR=../$(GIT_TARNAME)/git-gui dist-version
>           $(TAR) rf $(GIT_TARNAME).tar \
>                   $(GIT_TARNAME)/configure \
>                   $(GIT_TARNAME)/version \
>                   $(GIT_TARNAME)/git-gui/version
>           @$(RM) -r $(GIT_TARNAME)
>           gzip -f -9 $(GIT_TARNAME).tar
> 
> Having files with different owners and groups is a non-issue when
> extracting with --no-same-owner, which is the default for regular users.
> I assume this covers most use cases in the wild.
> 
> The generated archive leaks the IDs of the user preparing the archive in
> the appended entries for untracked files.  I think that's more of a
> concern.  Publishing a valid non-root username on your build system may
> invite attackers.
> 
> Changing the build procedure to set owner and group to root as well as
> UID and GID to zero seems like a better idea.  This is complicated by
> the inconsistent command line options for GNU tar and bsdtar, as you
> mentioned.
> 
> So how about making it possible to append untracked files using git
> archive?  This could simplify the dist target for Git as well.  It's
> orthogonal to adding the ability to explicitly specify owner and group,
> but might suffice in most (all?) cases.
> 
> Not sure what kind of file name transformation abilities would be
> needed and how to package them nicely.  The --transform option of GNU
> tar with its sed replace expressions seems quite heavy for me.  With
> poppler it's only used to add the --prefix string; I'd expect that to
> be done for all appended files anyway.
> 
> Perhaps something like --append-file=<FILE> with no transformation
> feature is already enough for most cases?
> 
>>> Would it make sense to support the new options for ZIP files as well?
>> I was not aware of the availability of UID/GID in pkzip file format...
>> Oh, checking APPNOTE.TXT ( https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT ),
>> there is a storage! (see 4.5.7-Unix Extra Field). But it seems
>> that current git-archive emits pkzip without the field.
> 
> Indeed.  Git doesn't track owners and groups, so it doesn't make
> sense to emit that kind of information so far.  If git archive grows
> options to specify such meta-information then it should be supported
> by all archive formats (or documented to be tar-specific).
> 
> However, the UNIX Extra Field in ZIP archives only allows to store
> UID and GID (no names), which is useless unless the sender knows the
> ID range of the receiver -- which is unlikely when distributing
> software on the Internet.  And even then it won't work with Windows,
> which has long Security Identifiers (SIDs) instead.
> 
> So these are more advantages for letting git archive append untracked
> files: It's format-agnostic and more portable.
> 
> 
> [snipped interesting history of security-related tar options]
> 
> Btw. I like how bsdtar has --insecure as a synonym for -p (preserve
> file permissions when extracting).  It's a bit sad that this is still
> the default for root, though.  OpenBSD cut that behavior out of their
> tar almost 20 years ago.  (An evil tar archive could be used to fill
> the quota of unsuspecting users, or add setuid executables.)
> 
>>>> +#if ULONG_MAX > 0xFFFFFFFFUL
>>>> +    /*
>>>> +     * --owner, --group rejects uid/gid greater than 32-bit
>>>> +     * limits, even on 64-bit platforms.
>>>> +     */
>>>> +    if (ul > 0xFFFFFFFFUL)
>>>> +        return STR_IS_DIGIT_TOO_LARGE;
>>>> +#endif
>>> The #if is not really necessary, is it?  Compilers should be able to
>>> optimize the conditional out on 32-bit platforms.
>> Thanks for finding this, I'm glad to have a chance to ask a
>> question; git is not needed to care for 16-bit platforms?
> 
> I'm not aware of attempts to compile Git on 16-bit systems, but I
> doubt it would be much fun to use the result anyway.  It couldn't
> handle files bigger than 64KB for a start.
> 
>>> GNU tar and bsdtar show the names of owner and group with -t -v at
>>> least, albeit in slightly different formats.  Can this help avoid
>>> parsing the archive on our own?
>> Yeah, writing yet another tar archive parser in C, to avoid the additional
>> dependency to Python or newer Perl (Archive::Tar since perl-5.10), is
>> painful, I feel (not only for me but also for the maintainers).
>> If tar command itself works well, it would be the best.
>>
>> But, I'm not sure whether the format of "tar tv" output is stably
>> standardized. It's the reason why I wrote Python tool. If I execute
>> git-archive with sufficently long randomized username & uid in
>> several times, it would be good test?
> 
> The append-untracked feature would avoid that issue as well.
> 
> A unique username doesn't have to be long or random if you control all
> other strings that "tar tv" would show (in particular file names).  You
> could e.g. use "UserName" and then simply grep for it.
> 
> Checking that git archive doesn't add unwanted characters is harder if
> the format isn't strictly known -- how to reliably recognize it sneaking
> in an extra space or slash?  Not sure if such a test is really needed,
> though.
> 
>>> But getting a short program like zipdetails for tar would be nice as
>>> well of course. :)
>> I wrote something in C:
>> https://github.com/mpsuzuki/git/blob/pullreq-20171227-c/t/helper/test-parse-tar-file.c
> 
> OK.  (But the append-untracked feature could be tested without such a
> helper. :)
> 
> Just some loose comments instead of a real review: You could get the
> line count down by including Git's tar.h and parse-options.h
> (Documentation/technical/api-parse-options.txt).
> 
>> but if somebody wants the support of other tar variants,
>> he/she would have some headache :-)
> 
> Yeah, the history of tar and its various platform-specific formats
> fills volumes, no doubt.
> 
> René
> 


  parent reply	other threads:[~2018-01-04  1:29 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-29 14:05 [PATCH] git-archive: accept --owner and --group like GNU tar suzuki toshiya
2018-01-01 18:29 ` René Scharfe
2018-01-02  0:32   ` Perry Hutchison
2018-01-04  0:43     ` René Scharfe
     [not found] ` <df39f62558314cf6a9d9df3e23f31dd8@OS2PR01MB1147.jpnprd01.prod.outlook.com>
2018-01-02  6:58   ` suzuki toshiya
2018-01-02 21:36     ` René Scharfe
     [not found]     ` <59a1fc058278463996ed68c970a5e08a@OS2PR01MB1147.jpnprd01.prod.outlook.com>
2018-01-04  1:29       ` suzuki toshiya [this message]
     [not found]       ` <955dae095d504b00b3e1c8a956ba852a@OS2PR01MB1147.jpnprd01.prod.outlook.com>
2018-01-04  2:25         ` suzuki toshiya
2018-01-04 16:59           ` René Scharfe
2018-01-04 18:22             ` Junio C Hamano
2018-01-05 13:54               ` René Scharfe
2018-01-05 19:10                 ` Junio C Hamano
     [not found]             ` <2112f9c245d64f4e89361df7e9de9374@OS2PR01MB1147.jpnprd01.prod.outlook.com>
2018-01-05  4:23               ` suzuki toshiya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5A4D836D.2010409@hiroshima-u.ac.jp \
    --to=mpsuzuki@hiroshima-u.ac.jp \
    --cc=git@vger.kernel.org \
    --cc=l.s.r@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).