git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: suzuki toshiya <mpsuzuki@hiroshima-u.ac.jp>
To: "René Scharfe" <l.s.r@web.de>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH] git-archive: accept --owner and --group like GNU tar
Date: Thu, 04 Jan 2018 11:25:13 +0900	[thread overview]
Message-ID: <5A4D9089.3050209@hiroshima-u.ac.jp> (raw)
In-Reply-To: <955dae095d504b00b3e1c8a956ba852a@OS2PR01MB1147.jpnprd01.prod.outlook.com>

Hi,

> Hmm, it could be reasonable to assume that --append-file
> would serve more cases than --uid --gid options. There
> might be many people who don't care multiple UID/GID in
> the source tarball, but want to append some files to the
> archive generated by git-archive. I would take a look how
> to do that. A point I'm afraid is that some people may
> request to pass the file listing the pathnames instead of
> giving many --append-file options (and a few people could
> want to have a built-in default list specified by GNU
> convention :-)).

Taking a glance on parse-options.h, I could not find the
existing class collecting the operands as an array (or
linked list) from multiple "--xxx=yyy" options. Similar
things might be the collecting the pathnames to pathspec
structure. Should I write something with OPTION_CALLBACK?

Regards,
mpsuzuki

suzuki toshiya wrote:
> Dear René ,
> 
> By overlooking your response, I was writing a patch to add
> uid/gid into zip archive X-D (not finished yet)
> https://github.com/mpsuzuki/git/tree/add-zip-uid-gid
> However, I found that most unix platforms use infozip's
> extension to store uid/gid instead of pkzip's extension...
> 
>> So this is in the context of generating release tarballs that contain
>> untracked files as well.  That's done in Git's own Makefile, too:
> 
> Oh, I should check other software's tarball :-)
> 
>> The generated archive leaks the IDs of the user preparing the archive in
>> the appended entries for untracked files.  I think that's more of a
>> concern.  Publishing a valid non-root username on your build system may
>> invite attackers.
> 
> Hmm, I was not aware of such security concern about the
> tarball including the developers username.
> 
>> So how about making it possible to append untracked files using git
>> archive?  This could simplify the dist target for Git as well.  It's
>> orthogonal to adding the ability to explicitly specify owner and group,
>> but might suffice in most (all?) cases.
> 
> Hmm, it could be reasonable to assume that --append-file
> would serve more cases than --uid --gid options. There
> might be many people who don't care multiple UID/GID in
> the source tarball, but want to append some files to the
> archive generated by git-archive. I would take a look how
> to do that. A point I'm afraid is that some people may
> request to pass the file listing the pathnames instead of
> giving many --append-file options (and a few people could
> want to have a built-in default list specified by GNU
> convention :-)).
> 
> I want to hear other experts' comment; no need for me to
> work "--uid" "--gid" anymore, and should I switch to
> "--append-file" options?
> 
> Regards,
> mpsuzuki
> 
> René Scharfe wrote:
>> Am 02.01.2018 um 07:58 schrieb suzuki toshiya:
>>> Dear René ,
>>>
>>> René Scharfe wrote:
>>>> Am 29.12.2017 um 15:05 schrieb suzuki toshiya:
>>>>> The ownership of files created by git-archive is always
>>>>> root:root. Add --owner and --group options which work
>>>>> like the GNU tar equivalent to allow overriding these
>>>>> defaults.
>>>> In which situations do you use the new options?
>>>>
>>>> (The sender would need to know the names and/or IDs on the receiving
>>>> end.  And the receiver would need to be root to set both IDs, or be a
>>>> group member to set the group ID; I guess the latter is more common.)
>>> Thank you for asking the background.
>>>
>>> In the case that additional contents are appended to the tar file
>>> generated by git-archive, the part by git-archive and the part
>>> appended by common tar would have different UID/GID, because common
>>> tar preserves the UID/GID of the original files.
>>>
>>> Of cource, both of GNU tar and bsdtar have the options to set
>>> UID/GID manually, but their syntax are different.
>>>
>>> In the recent source package of poppler (poppler.freedesktop.org),
>>> there are 2 sets of UID/GIDs are found:
>>> https://poppler.freedesktop.org/poppler-0.62.0.tar.xz
>>>
>>> I've discussed with the maintainers of poppler, and there was a
>>> suggestion to propose a feature to git.
>>> https://lists.freedesktop.org/archives/poppler/2017-December/012739.html
>> So this is in the context of generating release tarballs that contain
>> untracked files as well.  That's done in Git's own Makefile, too:
>>
>>   dist: git-archive$(X) configure
>>           ./git-archive --format=tar \
>>                   --prefix=$(GIT_TARNAME)/ HEAD^{tree} > $(GIT_TARNAME).tar
>>           @mkdir -p $(GIT_TARNAME)
>>           @cp configure $(GIT_TARNAME)
>>           @echo $(GIT_VERSION) > $(GIT_TARNAME)/version
>>           @$(MAKE) -C git-gui TARDIR=../$(GIT_TARNAME)/git-gui dist-version
>>           $(TAR) rf $(GIT_TARNAME).tar \
>>                   $(GIT_TARNAME)/configure \
>>                   $(GIT_TARNAME)/version \
>>                   $(GIT_TARNAME)/git-gui/version
>>           @$(RM) -r $(GIT_TARNAME)
>>           gzip -f -9 $(GIT_TARNAME).tar
>>
>> Having files with different owners and groups is a non-issue when
>> extracting with --no-same-owner, which is the default for regular users.
>> I assume this covers most use cases in the wild.
>>
>> The generated archive leaks the IDs of the user preparing the archive in
>> the appended entries for untracked files.  I think that's more of a
>> concern.  Publishing a valid non-root username on your build system may
>> invite attackers.
>>
>> Changing the build procedure to set owner and group to root as well as
>> UID and GID to zero seems like a better idea.  This is complicated by
>> the inconsistent command line options for GNU tar and bsdtar, as you
>> mentioned.
>>
>> So how about making it possible to append untracked files using git
>> archive?  This could simplify the dist target for Git as well.  It's
>> orthogonal to adding the ability to explicitly specify owner and group,
>> but might suffice in most (all?) cases.
>>
>> Not sure what kind of file name transformation abilities would be
>> needed and how to package them nicely.  The --transform option of GNU
>> tar with its sed replace expressions seems quite heavy for me.  With
>> poppler it's only used to add the --prefix string; I'd expect that to
>> be done for all appended files anyway.
>>
>> Perhaps something like --append-file=<FILE> with no transformation
>> feature is already enough for most cases?
>>
>>>> Would it make sense to support the new options for ZIP files as well?
>>> I was not aware of the availability of UID/GID in pkzip file format...
>>> Oh, checking APPNOTE.TXT ( https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT ),
>>> there is a storage! (see 4.5.7-Unix Extra Field). But it seems
>>> that current git-archive emits pkzip without the field.
>> Indeed.  Git doesn't track owners and groups, so it doesn't make
>> sense to emit that kind of information so far.  If git archive grows
>> options to specify such meta-information then it should be supported
>> by all archive formats (or documented to be tar-specific).
>>
>> However, the UNIX Extra Field in ZIP archives only allows to store
>> UID and GID (no names), which is useless unless the sender knows the
>> ID range of the receiver -- which is unlikely when distributing
>> software on the Internet.  And even then it won't work with Windows,
>> which has long Security Identifiers (SIDs) instead.
>>
>> So these are more advantages for letting git archive append untracked
>> files: It's format-agnostic and more portable.
>>
>>
>> [snipped interesting history of security-related tar options]
>>
>> Btw. I like how bsdtar has --insecure as a synonym for -p (preserve
>> file permissions when extracting).  It's a bit sad that this is still
>> the default for root, though.  OpenBSD cut that behavior out of their
>> tar almost 20 years ago.  (An evil tar archive could be used to fill
>> the quota of unsuspecting users, or add setuid executables.)
>>
>>>>> +#if ULONG_MAX > 0xFFFFFFFFUL
>>>>> +    /*
>>>>> +     * --owner, --group rejects uid/gid greater than 32-bit
>>>>> +     * limits, even on 64-bit platforms.
>>>>> +     */
>>>>> +    if (ul > 0xFFFFFFFFUL)
>>>>> +        return STR_IS_DIGIT_TOO_LARGE;
>>>>> +#endif
>>>> The #if is not really necessary, is it?  Compilers should be able to
>>>> optimize the conditional out on 32-bit platforms.
>>> Thanks for finding this, I'm glad to have a chance to ask a
>>> question; git is not needed to care for 16-bit platforms?
>> I'm not aware of attempts to compile Git on 16-bit systems, but I
>> doubt it would be much fun to use the result anyway.  It couldn't
>> handle files bigger than 64KB for a start.
>>
>>>> GNU tar and bsdtar show the names of owner and group with -t -v at
>>>> least, albeit in slightly different formats.  Can this help avoid
>>>> parsing the archive on our own?
>>> Yeah, writing yet another tar archive parser in C, to avoid the additional
>>> dependency to Python or newer Perl (Archive::Tar since perl-5.10), is
>>> painful, I feel (not only for me but also for the maintainers).
>>> If tar command itself works well, it would be the best.
>>>
>>> But, I'm not sure whether the format of "tar tv" output is stably
>>> standardized. It's the reason why I wrote Python tool. If I execute
>>> git-archive with sufficently long randomized username & uid in
>>> several times, it would be good test?
>> The append-untracked feature would avoid that issue as well.
>>
>> A unique username doesn't have to be long or random if you control all
>> other strings that "tar tv" would show (in particular file names).  You
>> could e.g. use "UserName" and then simply grep for it.
>>
>> Checking that git archive doesn't add unwanted characters is harder if
>> the format isn't strictly known -- how to reliably recognize it sneaking
>> in an extra space or slash?  Not sure if such a test is really needed,
>> though.
>>
>>>> But getting a short program like zipdetails for tar would be nice as
>>>> well of course. :)
>>> I wrote something in C:
>>> https://github.com/mpsuzuki/git/blob/pullreq-20171227-c/t/helper/test-parse-tar-file.c
>> OK.  (But the append-untracked feature could be tested without such a
>> helper. :)
>>
>> Just some loose comments instead of a real review: You could get the
>> line count down by including Git's tar.h and parse-options.h
>> (Documentation/technical/api-parse-options.txt).
>>
>>> but if somebody wants the support of other tar variants,
>>> he/she would have some headache :-)
>> Yeah, the history of tar and its various platform-specific formats
>> fills volumes, no doubt.
>>
>> René
>>
> 
> 


  parent reply	other threads:[~2018-01-04  2:25 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-29 14:05 [PATCH] git-archive: accept --owner and --group like GNU tar suzuki toshiya
2018-01-01 18:29 ` René Scharfe
2018-01-02  0:32   ` Perry Hutchison
2018-01-04  0:43     ` René Scharfe
     [not found] ` <df39f62558314cf6a9d9df3e23f31dd8@OS2PR01MB1147.jpnprd01.prod.outlook.com>
2018-01-02  6:58   ` suzuki toshiya
2018-01-02 21:36     ` René Scharfe
     [not found]     ` <59a1fc058278463996ed68c970a5e08a@OS2PR01MB1147.jpnprd01.prod.outlook.com>
2018-01-04  1:29       ` suzuki toshiya
     [not found]       ` <955dae095d504b00b3e1c8a956ba852a@OS2PR01MB1147.jpnprd01.prod.outlook.com>
2018-01-04  2:25         ` suzuki toshiya [this message]
2018-01-04 16:59           ` René Scharfe
2018-01-04 18:22             ` Junio C Hamano
2018-01-05 13:54               ` René Scharfe
2018-01-05 19:10                 ` Junio C Hamano
     [not found]             ` <2112f9c245d64f4e89361df7e9de9374@OS2PR01MB1147.jpnprd01.prod.outlook.com>
2018-01-05  4:23               ` suzuki toshiya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5A4D9089.3050209@hiroshima-u.ac.jp \
    --to=mpsuzuki@hiroshima-u.ac.jp \
    --cc=git@vger.kernel.org \
    --cc=l.s.r@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).