git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: suzuki toshiya <mpsuzuki@hiroshima-u.ac.jp>
To: "René Scharfe" <l.s.r@web.de>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH] git-archive: accept --owner and --group like GNU tar
Date: Tue, 02 Jan 2018 15:58:45 +0900	[thread overview]
Message-ID: <5A4B2DA5.907@hiroshima-u.ac.jp> (raw)
In-Reply-To: <df39f62558314cf6a9d9df3e23f31dd8@OS2PR01MB1147.jpnprd01.prod.outlook.com>

Dear René ,

René Scharfe wrote:
> Am 29.12.2017 um 15:05 schrieb suzuki toshiya:
>> The ownership of files created by git-archive is always
>> root:root. Add --owner and --group options which work
>> like the GNU tar equivalent to allow overriding these
>> defaults.
> 
> In which situations do you use the new options?
> 
> (The sender would need to know the names and/or IDs on the receiving
> end.  And the receiver would need to be root to set both IDs, or be a
> group member to set the group ID; I guess the latter is more common.)

Thank you for asking the background.

In the case that additional contents are appended to the tar file
generated by git-archive, the part by git-archive and the part
appended by common tar would have different UID/GID, because common
tar preserves the UID/GID of the original files.

Of cource, both of GNU tar and bsdtar have the options to set
UID/GID manually, but their syntax are different.

In the recent source package of poppler (poppler.freedesktop.org),
there are 2 sets of UID/GIDs are found:
https://poppler.freedesktop.org/poppler-0.62.0.tar.xz

I've discussed with the maintainers of poppler, and there was a
suggestion to propose a feature to git.
https://lists.freedesktop.org/archives/poppler/2017-December/012739.html

So now I'm trying.

> Would it make sense to support the new options for ZIP files as well?

I was not aware of the availability of UID/GID in pkzip file format...
Oh, checking APPNOTE.TXT ( 
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT ),
there is a storage! (see 4.5.7-Unix Extra Field). But it seems
that current git-archive emits pkzip without the field.

The background why I propose the options for tar format was described
in above. Similar things are hoped by pkzip users? If it's required,
I will try.

>> +--owner=<name>[:<uid>]::
>> +	Force <name> as owner and <uid> as uid for the files in the tar
>> +	archive.  If <uid> is not supplied, <name> can be either a user
>> +	name or numeric UID.  In this case the missing part (UID or
>> +	name) will be inferred from the current host's user database.
>> +
>> +--group=<name>[:<gid>]::
>> +	Force <name> as group and <gid> as gid for the files in the tar
>> +	archive.  If <gid> is not supplied, <name> can be either a group
>> +	name or numeric GID.  In this case the missing part (GID or
>> +	name) will be inferred from the current host's group database.
>> +
> 
> IIUC the default behavior is kept, i.e. without these options the
> archive entries appear to be owned by root:root.  I think it's a good
> idea to mention this here.

Indeed. The default behaviour of git-archive without these options
(root:root) would be different from that of (common) tar (preserving
uid/gid of the files to be archived), it should be clarified.

> bsdtar has --uname, --uid, --gname, and -gid, which seem simpler.  At
> least you could use OPT_STRING and OPT_INTEGER with them (plus a range
> check).  And they should be easier to explain.

Thank you very much for proposing good alternative. Indeed, such well-
separated options make the code simple & stable. However, according
to the manual search systems of FreeBSD ( https://www.freebsd.org/cgi/man.cgi ),
the options for such functionalities are not always same.

FreeBSD 8.2 and earlier: --uname, --gname, --uid, --gid are unavailable.
it seems that using "mtree" was the preferred way to specify such).

FreeBSD 8.3 and later: --uname, --gname, --uid, --gid are available.
the manual says follows:

      --uid id
	     Use the provided user id number and ignore	the user name from the
	     archive.  On create, if --uname is	not also specified, the	user
	     name will be set to match the user	id.

      --uname name
	     Use the provided user name.  On extract, this overrides the user
	     name in the archive; if the provided user name does not exist on
	     the system, it will be ignored and	the user id (from the archive
	     or	from the --uid option) will be used instead.  On create, this
	     sets the user name	that will be stored in the archive; the	name
	     is	not verified against the system	user database.

Thus, to emulate (post 2012-) bsdtar perfectly, getpwnam(), getpwuid() etc
would be still needed to implement "--uid" X-(.

Tracking the history of bsdtar, maybe I should track the history of GNU
tar. According to ChangeLog, even --owner --group are rather newer option
since 1.13.18 (released on 2000-10-29). The original syntax was like this.

`--owner=USER'
      Specifies that `tar' should use USER as the owner of members when
      creating archives, instead of the user associated with the source
      file.  USER is first decoded as a user symbolic name, but if this
      interpretation fails, it has to be a decimal numeric user ID.

      There is no value indicating a missing number, and `0' usually
      means `root'.  Some people like to force `0' as the value to offer
      in their distributions for the owner of files, because the `root'
      user is anonymous anyway, so that might as well be the owner of
      anonymous archives.

      This option does not affect extraction from archives.

Oh, there is no colon separated syntax! According to ChangeLog, the
introduction of colon separated syntax was on 2011-08-13 and
released as GNU tar-1.27 (2013-10-06).

`--owner=USER'
      Specifies that `tar' should use USER as the owner of members when
      creating archives, instead of the user associated with the source
      file.  USER can specify a symbolic name, or a numeric ID, or both
      as NAME:ID.  *Note override::.

      This option does not affect extraction from archives.

Comparing the original --owner and current --owner description, a
strange point is that the original description says "USER is first
decoded as a user symbolic name, but if this interpretation fails,
it has to be a decimal numeric user ID." What? It seems that
"checking whether the specified username is known by the host system
and its numerical uid is resolvable - if unresolvable, try to
parse as decimal value - if failed, take it as fatal error". Here
I quote the related part.

tar-1.14/src/names.c

     119 /* Given UNAME, set the corresponding UID and return 1, or else, return 
0.  */
     120 int
     121 uname_to_uid (char const *uname, uid_t *uidp)
     122 {
     123   struct passwd *passwd;
     124
     125   if (cached_no_such_uname
     126       && strcmp (uname, cached_no_such_uname) == 0)
     127     return 0;
     128
     129   if (!cached_uname
     130       || uname[0] != cached_uname[0]
     131       || strcmp (uname, cached_uname) != 0)
     132     {
     133       passwd = getpwnam (uname);
     134       if (passwd)
     135         {
     136           cached_uid = passwd->pw_uid;
     137           assign_string (&cached_uname, passwd->pw_name);
     138         }
     139       else
     140         {
     141           assign_string (&cached_no_such_uname, uname);
     142           return 0;
     143         }
     144     }
     145   *uidp = cached_uid;
     146   return 1;
     147 }   1087       case OWNER_OPTION:

tar-1.14/src/tar.c
    1088         if (! (strlen (optarg) < UNAME_FIELD_SIZE
    1089                && uname_to_uid (optarg, &owner_option)))
    1090           {
    1091             uintmax_t u;
    1092             if (xstrtoumax (optarg, 0, 10, &u, "") == LONGINT_OK
    1093                 && u == (uid_t) u)
    1094               owner_option = u;
    1095             else
    1096               FATAL_ERROR ((0, 0, "%s: %s", quotearg_colon (optarg),
    1097                             _("Invalid owner")));
    1098           }
    1099         break;

In summary, there are following types.

a) older GNU tar
--owner must match with the host database, no option to set
uname & uid separately.

b) newer GNU tar
--owner accepts unknown username and/or uid.
if only one part is given and known by the host system,
the missing part is deduced by it.
if only one part is given and unknown by the host system,
the missing part is unchanged from the file to be archived.

c) newer bsd tar
--uname/--uid accept unknown username and/or uid.
username is just used to override uname entry of the archive,
but uid is used to override both of uid and uname entries,
if uid is known and username is not specified.
If uid is unknown, uid is overriden, but the username entry
is unchanged from the file to be archived.

which behaviour is to be simulated? I want to propose
yet another one, similar to c) but incompatble.

d) --uname, --uid, --gname, --gid check only the syntax
(to kick the username starting with digit, non-digit
uid, etc) and no check for known/unknown.

>> +#if ULONG_MAX > 0xFFFFFFFFUL
>> +	/*
>> +	 * --owner, --group rejects uid/gid greater than 32-bit
>> +	 * limits, even on 64-bit platforms.
>> +	 */
>> +	if (ul > 0xFFFFFFFFUL)
>> +		return STR_IS_DIGIT_TOO_LARGE;
>> +#endif
> 
> The #if is not really necessary, is it?  Compilers should be able to
> optimize the conditional out on 32-bit platforms.

Thanks for finding this, I'm glad to have a chance to ask a
question; git is not needed to care for 16-bit platforms?

>> +	/* the operand is known to be non-digit */
>> +
>> +	args->uname = xstrdup(tar_owner);
>> +	pw = getpwnam(tar_owner);
> 
> How well does this work on Windows?  In daemon.c we avoid calling
> getpwnam(3), getgrnam(3) etc. if NO_POSIX_GOODIES is not defined.

OK, I can enclose them by ifdefs of NO_POSIX_GOODIES. But,
maybe the design the options would be discussed for first.

Both of latest GNU and BSD tar call getpwnam() or getpwuid(),
but designing as all of --uname --uid --gname --gid as "only syntax
is checked (non-digit UID/GID should be refused), but known/unknown
is not checked" would be the most portable.

> GNU tar and bsdtar show the names of owner and group with -t -v at
> least, albeit in slightly different formats.  Can this help avoid
> parsing the archive on our own?

Yeah, writing yet another tar archive parser in C, to avoid the additional
dependency to Python or newer Perl (Archive::Tar since perl-5.10), is
painful, I feel (not only for me but also for the maintainers).
If tar command itself works well, it would be the best.

But, I'm not sure whether the format of "tar tv" output is stably
standardized. It's the reason why I wrote Python tool. If I execute
git-archive with sufficently long randomized username & uid in
several times, it would be good test?

> But getting a short program like zipdetails for tar would be nice as
> well of course. :)

I wrote something in C:
https://github.com/mpsuzuki/git/blob/pullreq-20171227-c/t/helper/test-parse-tar-file.c

but if somebody wants the support of other tar variants,
he/she would have some headache :-)

Regards,
mpsuzuki


  parent reply	other threads:[~2018-01-02  6:58 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-29 14:05 [PATCH] git-archive: accept --owner and --group like GNU tar suzuki toshiya
2018-01-01 18:29 ` René Scharfe
2018-01-02  0:32   ` Perry Hutchison
2018-01-04  0:43     ` René Scharfe
     [not found] ` <df39f62558314cf6a9d9df3e23f31dd8@OS2PR01MB1147.jpnprd01.prod.outlook.com>
2018-01-02  6:58   ` suzuki toshiya [this message]
2018-01-02 21:36     ` René Scharfe
     [not found]     ` <59a1fc058278463996ed68c970a5e08a@OS2PR01MB1147.jpnprd01.prod.outlook.com>
2018-01-04  1:29       ` suzuki toshiya
     [not found]       ` <955dae095d504b00b3e1c8a956ba852a@OS2PR01MB1147.jpnprd01.prod.outlook.com>
2018-01-04  2:25         ` suzuki toshiya
2018-01-04 16:59           ` René Scharfe
2018-01-04 18:22             ` Junio C Hamano
2018-01-05 13:54               ` René Scharfe
2018-01-05 19:10                 ` Junio C Hamano
     [not found]             ` <2112f9c245d64f4e89361df7e9de9374@OS2PR01MB1147.jpnprd01.prod.outlook.com>
2018-01-05  4:23               ` suzuki toshiya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5A4B2DA5.907@hiroshima-u.ac.jp \
    --to=mpsuzuki@hiroshima-u.ac.jp \
    --cc=git@vger.kernel.org \
    --cc=l.s.r@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).