git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Bug in git archive + .gitattributes + relative path
@ 2023-03-03 10:25 Cristian Le
  2023-03-03 15:19 ` René Scharfe
  0 siblings, 1 reply; 22+ messages in thread
From: Cristian Le @ 2023-03-03 10:25 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1730 bytes --]

Using `git archive` with or without `--worktree-attributes` does not 
properly read `.gitattributes` files if using a relative path in 
`<tree-ish>`.
Related github comment: 
https://github.com/rpm-software-management/tito/pull/445#issuecomment-1450298871
Related stackoverflow discussion: 
https://stackoverflow.com/questions/52804334/how-to-ignore-files-directories-in-git-archive-and-only-create-an-archive-of-a

Git version: `2.39.2`
Mwe git repo: Two files:
```
# .gitattributes:
.git_archival.txt export-subst
```
```
# .git_archival.txt:
node: $Format:%H$
```

Commands to reproduce and expected behaviour:
```console
$ git archive HEAD:./ --output=test.tar.gz
$ tar -axf test.tar.gz .git_archival.txt -O
node: 745ce26169fb44e04d91d40ee581cccd591c941e
```
Important: Notice the path `./` given after `HEAD`.

Actual output:
```console
$ tar -axf test.tar.gz .git_archival.txt -O
node: $Format:%H$
```

It doesn't matter if `.gitattributes` is in a subfolder, or if I change 
the relative path `./` to a subfolder, the files are still not properly 
generated.

Using `--worktree-attributes` did not have any effect either.
According to the documentation, I understand that the expected behaviour 
with regards to `--worktree-attributes`:
- Read the `.gitattributes` of the relative path, e.g. `./sub_dir` 
regardless of `--worktree-attributes`. (similar behaviour as not passing 
a relative path)
- Include the `.gitattributes` of the top-level path if 
`--worktree-attributes` is passed

Maybe the intended behaviour is to completely ignore all 
`.gitattributes` unless `--worktree-attributes` is provided, in which 
case, it does not have the intended behaviour and please include a flag 
to achieve the above behaviour.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4273 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Bug in git archive + .gitattributes + relative path
  2023-03-03 10:25 Bug in git archive + .gitattributes + relative path Cristian Le
@ 2023-03-03 15:19 ` René Scharfe
  2023-03-03 15:38   ` Cristian Le
  0 siblings, 1 reply; 22+ messages in thread
From: René Scharfe @ 2023-03-03 15:19 UTC (permalink / raw)
  To: Cristian Le, git

Am 03.03.23 um 11:25 schrieb Cristian Le:
> Using `git archive` with or without `--worktree-attributes` does not properly read `.gitattributes` files if using a relative path in `<tree-ish>`.
> Related github comment: https://github.com/rpm-software-management/tito/pull/445#issuecomment-1450298871
> Related stackoverflow discussion: https://stackoverflow.com/questions/52804334/how-to-ignore-files-directories-in-git-archive-and-only-create-an-archive-of-a
>
> Git version: `2.39.2`
> Mwe git repo: Two files:
> ```
> # .gitattributes:
> .git_archival.txt export-subst
> ```
> ```
> # .git_archival.txt:
> node: $Format:%H$
> ```
>
> Commands to reproduce and expected behaviour:
> ```console
> $ git archive HEAD:./ --output=test.tar.gz
> $ tar -axf test.tar.gz .git_archival.txt -O
> node: 745ce26169fb44e04d91d40ee581cccd591c941e
> ```
> Important: Notice the path `./` given after `HEAD`.
>
> Actual output:
> ```console
> $ tar -axf test.tar.gz .git_archival.txt -O
> node: $Format:%H$
> ``>
> It doesn't matter if `.gitattributes` is in a subfolder, or if I change the relative path `./` to a subfolder, the files are still not properly generated.
>
> Using `--worktree-attributes` did not have any effect either.

That's expected behavior.  You specify a tree (HEAD:./), while
export-subst requires a commit.  git-archive(1) doesn't spell that out,
but advises to "See gitattributes(5) for details.", which states: "The
expansion depends on the availability of a commit ID, i.e., if
git-archive(1) has been given a tree instead of a commit or a tag then
no replacement will be done.".

The other attribute supported by git archive, export-ignore, does not
require a commit.

The Stack Overflow discussion seems to be about a different issue:
Attributes are always based on the repository root directory, which
gives unexpected results when archiving a subtree.

> According to the documentation, I understand that the expected behaviour with regards to `--worktree-attributes`:
> - Read the `.gitattributes` of the relative path, e.g. `./sub_dir` regardless of `--worktree-attributes`. (similar behaviour as not passing a relative path)
> - Include the `.gitattributes` of the top-level path if `--worktree-attributes` is passed
>
> Maybe the intended behaviour is to completely ignore all `.gitattributes` unless `--worktree-attributes` is provided, in which case, it does not have the intended behaviour and please include a flag to achieve the above behaviour.
>

The option --worktree-attributes allows uncommitted .gitattribute files
to be read, but has no effect on whether export-subst can actually be
applied.

So on the Git side we could improve the documentation and improve
reading attributes when archiving subtrees (no idea how, though,
admittedly).  But that wouldn't help you, I suppose.  In your issue
#444 you write that "git archive HEAD" works, but "git archive HEAD:./"
doesn't.  Why do you need to use the latter?

René


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Bug in git archive + .gitattributes + relative path
  2023-03-03 15:19 ` René Scharfe
@ 2023-03-03 15:38   ` Cristian Le
  2023-03-04 13:58     ` René Scharfe
  0 siblings, 1 reply; 22+ messages in thread
From: Cristian Le @ 2023-03-03 15:38 UTC (permalink / raw)
  To: René Scharfe, git

[-- Attachment #1: Type: text/plain, Size: 4187 bytes --]

> In your issue #444 you write that "git archive HEAD" works, but "git archive HEAD:./" doesn't. Why do you need to use the latter?

Specifically we want to allow for `HEAD:./sub_dir` where `./sub_dir` 
contains `.gitattributes` and `.git_archive.txt`.

Alternatively, it would be helpful if we can pass `--transform` commands 
of `tar` directly so that we can change the paths.

Overall what we are doing in tito is that the source would be in `./src` 
and outside is metadata like `./my_package.spec`. We are using `git 
archive HEAD:./src --prefix=my_package-1.0.0` to pass the appropriate 
form that the rpm spec file can locate. In a tar command we can use 
`--transform=s|^src/|my_package-1.0.0/|` to achieve the equivalent. 
However we cannot use the `tar` directly because that would affect the 
timestamps and permissions of the file that are set by `git archive`.

So allowing for something like `git archive HEAD 
--transform=s|^src/|my_package-1.0.0/|`, where the transform is done 
after `.gitattributes` is performed would solve this issue.

On 2023/03/03 16:19, René Scharfe wrote:
> External Email
> ________________________________
>
> Am 03.03.23 um 11:25 schrieb Cristian Le:
>> Using `git archive` with or without `--worktree-attributes` does not properly read `.gitattributes` files if using a relative path in `<tree-ish>`.
>> Related github comment: https://github.com/rpm-software-management/tito/pull/445#issuecomment-1450298871
>> Related stackoverflow discussion: https://stackoverflow.com/questions/52804334/how-to-ignore-files-directories-in-git-archive-and-only-create-an-archive-of-a
>>
>> Git version: `2.39.2`
>> Mwe git repo: Two files:
>> ```
>> # .gitattributes:
>> .git_archival.txt export-subst
>> ```
>> ```
>> # .git_archival.txt:
>> node: $Format:%H$
>> ```
>>
>> Commands to reproduce and expected behaviour:
>> ```console
>> $ git archive HEAD:./ --output=test.tar.gz
>> $ tar -axf test.tar.gz .git_archival.txt -O
>> node: 745ce26169fb44e04d91d40ee581cccd591c941e
>> ```
>> Important: Notice the path `./` given after `HEAD`.
>>
>> Actual output:
>> ```console
>> $ tar -axf test.tar.gz .git_archival.txt -O
>> node: $Format:%H$
>> ``>
>> It doesn't matter if `.gitattributes` is in a subfolder, or if I change the relative path `./` to a subfolder, the files are still not properly generated.
>>
>> Using `--worktree-attributes` did not have any effect either.
> That's expected behavior.  You specify a tree (HEAD:./), while
> export-subst requires a commit.  git-archive(1) doesn't spell that out,
> but advises to "See gitattributes(5) for details.", which states: "The
> expansion depends on the availability of a commit ID, i.e., if
> git-archive(1) has been given a tree instead of a commit or a tag then
> no replacement will be done.".
>
> The other attribute supported by git archive, export-ignore, does not
> require a commit.
>
> The Stack Overflow discussion seems to be about a different issue:
> Attributes are always based on the repository root directory, which
> gives unexpected results when archiving a subtree.
>
>> According to the documentation, I understand that the expected behaviour with regards to `--worktree-attributes`:
>> - Read the `.gitattributes` of the relative path, e.g. `./sub_dir` regardless of `--worktree-attributes`. (similar behaviour as not passing a relative path)
>> - Include the `.gitattributes` of the top-level path if `--worktree-attributes` is passed
>>
>> Maybe the intended behaviour is to completely ignore all `.gitattributes` unless `--worktree-attributes` is provided, in which case, it does not have the intended behaviour and please include a flag to achieve the above behaviour.
>>
> The option --worktree-attributes allows uncommitted .gitattribute files
> to be read, but has no effect on whether export-subst can actually be
> applied.
>
> So on the Git side we could improve the documentation and improve
> reading attributes when archiving subtrees (no idea how, though,
> admittedly).  But that wouldn't help you, I suppose.  In your issue
> #444 you write that "git archive HEAD" works, but "git archive HEAD:./"
> doesn't.  Why do you need to use the latter?
>
> René
>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4273 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Bug in git archive + .gitattributes + relative path
  2023-03-03 15:38   ` Cristian Le
@ 2023-03-04 13:58     ` René Scharfe
  2023-03-04 15:11       ` Cristian Le
                         ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: René Scharfe @ 2023-03-04 13:58 UTC (permalink / raw)
  To: Cristian Le, git

Am 03.03.23 um 16:38 schrieb Cristian Le:
>> In your issue #444 you write that "git archive HEAD" works, but
>> "git archive HEAD:./" doesn't. Why do you need to use the latter?
>
> Specifically we want to allow for `HEAD:./sub_dir` where `./sub_dir`
> contains `.gitattributes` and `.git_archive.txt`.
>
> Alternatively, it would be helpful if we can pass `--transform`
> commands of `tar` directly so that we can change the paths.
>
> Overall what we are doing in tito is that the source would be in
> `./src` and outside is metadata like `./my_package.spec`. We are
> using `git archive HEAD:./src --prefix=my_package-1.0.0` to pass the
> appropriate form that the rpm spec file can locate. In a tar command
> we can use `--transform=s|^src/|my_package-1.0.0/|` to achieve the
> equivalent.

What is Tito?  https://github.com/rpm-software-management/tito says:
"Tito is a tool for managing RPM based projects using git for their
source code repository."  It supports Git repositories containing
multiple projects.  I suppose that means e.g. for Git's own repo that
Tito would allow creating a separate RPM file for e.g. git-gui.

Side note: Tito features include: "Create reliable tar.gz files with
consistent checksums from any tag."  That's achieved by compressing
using "gzip -n -c".  Avoiding the native tgz support of git archive --
probably only because the code predates it -- shields Tito from the
change to use our internal gzip implementation discussed recently in
https://lore.kernel.org/git/a812a664-67ea-c0ba-599f-cb79e2d96694@gmail.com/

Note, however, that the tar output of git archive is not guaranteed to
be stable between Git versions, either.  Recently adding such a stable
format was proposed in
https://lore.kernel.org/git/20230205221728.4179674-1-sandals@crustytoothpaste.net/

The code for calling git archive with a tree was present in Tito's
initial commit, which says that it was taken from Spacewalk:
https://github.com/rpm-software-management/tito/commit/e87345d7b7.
There it was introduced along with a script that changes the mtime
of archive entries from the current time to the commit timestamp by
https://github.com/spacewalkproject/spacewalk/commit/34267e39d472.

I don't fully understand the explanation in its commit message
("make it possible to call make srpm even if the directory of the
package has changed"); perhaps it requires more domain knowledge.
But I can understand the need for archiving sub-directories in the
context of supporting multi-project repositories.

> However we cannot use the `tar` directly because that would affect
> the timestamps and permissions of the file that are set by `git
> archive`.

GNU tar has the options --mode and --mtime to chose permissions and
modifications of files added to an archive.

git archive is going to get an --mtime option as well in the next
release, by the way.

> So allowing for something like `git archive HEAD
> --transform=s|^src/|my_package-1.0.0/|`, where the transform is done
> after `.gitattributes` is performed would solve this issue.

GNU tar has this --transform option, bsdtar similarly has -s.  Both
also have --strip-components (GNU tar only for extraction, though),
which is a bit simpler and should suffice for your use case.

--- >8 ---
Subject: [PATCH] archive: add --strip-components

Allow removing leading elements from paths of archive entries.  That's
useful when archiving sub-directories and not wanting to keep the
common path prefix, e.g.:

   $ git archive --strip-components=1 HEAD sha1dc | tar tf -
   .gitattributes
   LICENSE.txt
   sha1.c
   sha1.h
   ubc_check.c
   ubc_check.h

The same can be achieved by specifying a tree instead of a commit and
a pathspec:

   $ git archive HEAD:sha1dc | tar tf -
   .gitattributes
   LICENSE.txt
   sha1.c
   sha1.h
   ubc_check.c
   ubc_check.h

However, this doesn't support the export-subst attribute, doesn't
include the commit hash as an archive comment and uses the current time
instead of the commit date as mtime for archive entries.

The new option is adapted from bsdtar.  GNU tar provides it as well, but
only for extraction.

The new option does not affect the paths of entries added by --add-file
and --add-virtual-file because they are handcrafted to their desired
values already.  Similarly, the value of --prefix is not subject to
component stripping.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 Documentation/git-archive.txt |  6 ++++++
 archive.c                     | 16 ++++++++++++++++
 archive.h                     |  1 +
 t/t5000-tar-tree.sh           | 13 +++++++++++++
 4 files changed, 36 insertions(+)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index 6bab201d37..5dad917e7b 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -55,6 +55,12 @@ OPTIONS
 	rightmost value is used for all tracked files.  See below which
 	value gets used by `--add-file` and `--add-virtual-file`.

+--strip-components=<n>::
+	Remove the specified number of leading path elements.  Pathnames
+	with fewer elements will be silently skipped.  Does not affect
+	the prefix added by `--prefix`, nor entries added with
+	`--add-file` or `--add-virtual-file`.
+
 -o <file>::
 --output=<file>::
 	Write the archive to <file> instead of stdout.
diff --git a/archive.c b/archive.c
index 9aeaf2bd87..8308d4d9c4 100644
--- a/archive.c
+++ b/archive.c
@@ -166,6 +166,18 @@ static int write_archive_entry(const struct object_id *oid, const char *base,
 		args->convert = check_attr_export_subst(check);
 	}

+	if (args->strip_components > 0) {
+		size_t orig_baselen = baselen;
+		for (int i = 0; i < args->strip_components; i++) {
+			const char *slash = memchr(base, '/', baselen);
+			if (!slash)
+				return S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0;
+			baselen -= slash - base + 1;
+			base = slash + 1;
+		}
+		strbuf_remove(&path, args->baselen, orig_baselen - baselen);
+	}
+
 	if (args->verbose)
 		fprintf(stderr, "%.*s\n", (int)path.len, path.buf);

@@ -593,12 +605,15 @@ static int parse_archive_args(int argc, const char **argv,
 	int verbose = 0;
 	int i;
 	int list = 0;
+	int strip_components = 0;
 	int worktree_attributes = 0;
 	struct option opts[] = {
 		OPT_GROUP(""),
 		OPT_STRING(0, "format", &format, N_("fmt"), N_("archive format")),
 		OPT_STRING(0, "prefix", &base, N_("prefix"),
 			N_("prepend prefix to each pathname in the archive")),
+		OPT_INTEGER(0, "strip-components", &strip_components,
+			N_("remove leading path elements")),
 		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
 		  N_("add untracked file to archive"), 0, add_file_cb,
 		  (intptr_t)&base },
@@ -675,6 +690,7 @@ static int parse_archive_args(int argc, const char **argv,
 	args->baselen = strlen(base);
 	args->worktree_attributes = worktree_attributes;
 	args->mtime_option = mtime_option;
+	args->strip_components = strip_components;

 	return argc;
 }
diff --git a/archive.h b/archive.h
index 7178e2a9a2..e9becbd57d 100644
--- a/archive.h
+++ b/archive.h
@@ -23,6 +23,7 @@ struct archiver_args {
 	unsigned int worktree_attributes : 1;
 	unsigned int convert : 1;
 	int compression_level;
+	int strip_components;
 	struct string_list extra_files;
 	struct pretty_print_context *pretty_ctx;
 };
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 918a2fc7c6..629d2e78d7 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -271,6 +271,19 @@ test_expect_success 'git get-tar-commit-id' '
 	test_cmp expect actual
 '

+test_expect_success 'git archive --strip-components' '
+	git archive --strip-components=3 HEAD >strip3.tar &&
+	(
+		mkdir strip3 &&
+		cd strip3 &&
+		"$TAR" xf ../strip3.tar &&
+		find . | grep -v "^\.\$" | sort >../strip3.lst
+	) &&
+	sed -ne "s-\([^/]*/\)\{3\}-./-p" a.lst >expect &&
+	test_cmp expect strip3.lst &&
+	diff -r a/long_path_to_a_file/long_path_to_a_file strip3
+'
+
 test_expect_success 'git archive with --output, override inferred format' '
 	git archive --format=tar --output=d4.zip HEAD &&
 	test_cmp_bin b.tar d4.zip
--
2.39.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Bug in git archive + .gitattributes + relative path
  2023-03-04 13:58     ` René Scharfe
@ 2023-03-04 15:11       ` Cristian Le
  2023-03-05  9:32         ` René Scharfe
  2023-03-06 16:56       ` Junio C Hamano
  2023-03-06 17:27       ` Junio C Hamano
  2 siblings, 1 reply; 22+ messages in thread
From: Cristian Le @ 2023-03-04 15:11 UTC (permalink / raw)
  To: René Scharfe, git

[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]

> I suppose that means e.g. for Git's own repo that
> Tito would allow creating a separate RPM file for e.g. git-gui.
Indeed, that pretty much sums the main idea of why they need sub 
directories there. Ideally we don't want different special cases, and 
instead use as much native behaviour as possible. The main issue here is 
just making it work with `export-subst`, and if it can be achieved with 
both `strip-components` and `prefix`, that might work, even though 
`--transform` would be more flexible.

> However we cannot use the `tar` directly because that would affect
> the timestamps and permissions of the file that are set by `git
> archive`.
I should have explained more thoroughly, I have tried to work around by 
doing a `git archive` and then extracting and re-compressing after 
fixing the paths, but this does not preserve the owner and timestamp of 
the original `git archive`, nor can I use the current implementation of 
their tar fixer to correct these since the headers are different. I do 
not have enough expertise to know what headers need to be set, how to 
set the timestamps and so on. I don't know if the `gzip -n -c` could do 
a better job at that, but we would still not be able to use it as is 
because we would want the correctly generate `export-subst` files, for 
example for a project built with `setuptools_scm` that injects the 
version of the python package from the last tag:

https://github.com/pypa/setuptools_scm#git-archives

> There it was introduced along with a script that changes the mtime
> of archive entries from the current time to the commit timestamp by
> https://github.com/spacewalkproject/spacewalk/commit/34267e39d472.
Thanks for pointing me to this. From your understanding, if we only use 
the git commit directly, we would in principle not need the whole tar 
fixer 
(https://github.com/rpm-software-management/tito/blob/91ef962220ec5154722760dbbd982bed032ee484/src/tito/tar.py#L28-L60 
and 
https://github.com/rpm-software-management/tito/blob/91ef962220ec5154722760dbbd982bed032ee484/src/tito/common.py#L871-L890)? 
If there is no crucial information in the header that is different 
between the `git archive` and the `tar`/`gzip` with appropriate `mtime`, 
`mode` and maybe others, maybe there is some hope in fixing this locally.

I'll be looking forward to `--strip-components`, but just to confirm my 
reading of the email, the intent is to have it working with 
`export-subst` right?

Cheers and thanks for your intuitive answers.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4273 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Bug in git archive + .gitattributes + relative path
  2023-03-04 15:11       ` Cristian Le
@ 2023-03-05  9:32         ` René Scharfe
  0 siblings, 0 replies; 22+ messages in thread
From: René Scharfe @ 2023-03-05  9:32 UTC (permalink / raw)
  To: Cristian Le, git

Am 04.03.23 um 16:11 schrieb Cristian Le:
> I should have explained more thoroughly, I have tried to work around
> by doing a `git archive` and then extracting and re-compressing after
> fixing the paths, but this does not preserve the owner and timestamp
> of the original `git archive`, nor can I use the current
> implementation of their tar fixer to correct these since the headers
> are different. I do not have enough expertise to know what headers
> need to be set, how to set the timestamps and so on.

GNU tar has the options --owner, --group and --mtime that allow you to
specify arbitrary values for these fields when archiving, no header
tinkering needed.  That said, tarring with git archive, untarring and
retarring with GNU tar is overly complicated and horrible when git
archive can do the whole job itself.

> I don't know if the `gzip -n -c` could do a better job at that,

That just compresses tar files to tgz (or tar.gz) format and does not
affect any tar headers.

> but we would still
> not be able to use it as is because we would want the correctly
> generate `export-subst` files, for example for a project built with
> `setuptools_scm` that injects the version of the python package from
> the last tag:
>
> https://github.com/pypa/setuptools_scm#git-archives

Note that git archive allows injecting extra files with the options
 --add-file and --add-virtual-file.  Could that feature perhaps be used
instead of export-subst?

>> There it was introduced along with a script that changes the mtime
>> of archive entries from the current time to the commit timestamp by
>> https://github.com/spacewalkproject/spacewalk/commit/34267e39d472.
> Thanks for pointing me to this. From your understanding, if we only use the git commit directly, we would in principle not need the whole tar fixer (https://github.com/rpm-software-management/tito/blob/91ef962220ec5154722760dbbd982bed032ee484/src/tito/tar.py#L28-L60 and https://github.com/rpm-software-management/tito/blob/91ef962220ec5154722760dbbd982bed032ee484/src/tito/common.py#L871-L890)? If there is no crucial information in the header that is different between the `git archive` and the `tar`/`gzip` with appropriate `mtime`, `mode` and maybe others, maybe there is some hope in fixing this locally.

The tar fixer was rewritten in Python in the meantime and I didn't look
it it closely, but if it only sets the mtime of all archive entries to
the commit date then it's not needed when giving git archive a commit
or tag instead of a tree.

> I'll be looking forward to `--strip-components`, but just to confirm
> my reading of the email, the intent is to have it working with
> `export-subst` right?
That option would work the same regardless whether git archive is given
a tree, commit or tag, and it doesn't care about attributes or file
contents.  It would be especially helpful when giving a commit or tag
and a pathspec, and export-subst needs a commit or tag.  So in short:
Yes. :)

René

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Bug in git archive + .gitattributes + relative path
  2023-03-04 13:58     ` René Scharfe
  2023-03-04 15:11       ` Cristian Le
@ 2023-03-06 16:56       ` Junio C Hamano
  2023-03-06 17:51         ` René Scharfe
  2023-03-06 17:27       ` Junio C Hamano
  2 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2023-03-06 16:56 UTC (permalink / raw)
  To: René Scharfe; +Cc: Cristian Le, git

René Scharfe <l.s.r@web.de> writes:

>    $ git archive --strip-components=1 HEAD sha1dc | tar tf -
>    .gitattributes
>    LICENSE.txt
>    sha1.c
>    sha1.h
>    ubc_check.c
>    ubc_check.h

What should happen to paths that match the given pathspec that do
not have enough number of components?  E.g. "cache.h" when the
command is "git archive --strip-components=1 HEAD \*.h"?  Should it
be documented?

> The new option does not affect the paths of entries added by --add-file
> and --add-virtual-file because they are handcrafted to their desired
> values already.  Similarly, the value of --prefix is not subject to
> component stripping.

Very sensible.

> diff --git a/archive.c b/archive.c
> index 9aeaf2bd87..8308d4d9c4 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -166,6 +166,18 @@ static int write_archive_entry(const struct object_id *oid, const char *base,
>  		args->convert = check_attr_export_subst(check);
>  	}

We probably could save attribute lookup overhead by moving the new
logic a bit higher in the function?

No, that would invalidate the path_without_prefix variable by using
strbuf_remove() on &path, and will break the attribute look-up.  The
variable is used only once before this point and never used later,
but as an independent future-proofing, we may want to remove the
variable or narrow the scope.  It's totally out of scope of the
patch, though.

> +	if (args->strip_components > 0) {
> +		size_t orig_baselen = baselen;
> +		for (int i = 0; i < args->strip_components; i++) {
> +			const char *slash = memchr(base, '/', baselen);
> +			if (!slash)
> +				return S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0;
> +			baselen -= slash - base + 1;
> +			base = slash + 1;
> +		}
> +		strbuf_remove(&path, args->baselen, orig_baselen - baselen);
> +	}

Nice to see that the core logic of the new feature is surprisingly
small.

>  	if (args->verbose)
>  		fprintf(stderr, "%.*s\n", (int)path.len, path.buf);

By having the verbose output after the path stripping, we won't show
the leading components we stripped, making it similar to what we
would see when we piped the resulting archive to "| tar tf -".  I
guess this makes sense than showing the original path.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Bug in git archive + .gitattributes + relative path
  2023-03-04 13:58     ` René Scharfe
  2023-03-04 15:11       ` Cristian Le
  2023-03-06 16:56       ` Junio C Hamano
@ 2023-03-06 17:27       ` Junio C Hamano
  2023-03-06 18:28         ` René Scharfe
  2 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2023-03-06 17:27 UTC (permalink / raw)
  To: René Scharfe; +Cc: Cristian Le, git

René Scharfe <l.s.r@web.de> writes:

> Subject: [PATCH] archive: add --strip-components
>
> Allow removing leading elements from paths of archive entries.  That's
> useful when archiving sub-directories and not wanting to keep the
> common path prefix, e.g.:
>
>    $ git archive --strip-components=1 HEAD sha1dc | tar tf -
>    .gitattributes
>    LICENSE.txt
>    sha1.c
>    sha1.h
>    ubc_check.c
>    ubc_check.h
>
> The same can be achieved by specifying a tree instead of a commit and
> a pathspec:
>
>    $ git archive HEAD:sha1dc | tar tf -
>    .gitattributes
>    LICENSE.txt
>    sha1.c
>    sha1.h
>    ubc_check.c
>    ubc_check.h

Another way I am not sure is working as designed is

    $ cd sha1dc && git archive HEAD . | tar tf -
    .gitattributes
    LICENSE.txt
    sha1.c
    sha1.h
    ubc_check.c
    ubc_check.h

I didn't check if the attribute look-up is done on the correct path
or export-subst kicks in in such a use, though.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Bug in git archive + .gitattributes + relative path
  2023-03-06 16:56       ` Junio C Hamano
@ 2023-03-06 17:51         ` René Scharfe
  0 siblings, 0 replies; 22+ messages in thread
From: René Scharfe @ 2023-03-06 17:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Cristian Le, git

Am 06.03.23 um 17:56 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>    $ git archive --strip-components=1 HEAD sha1dc | tar tf -
>>    .gitattributes
>>    LICENSE.txt
>>    sha1.c
>>    sha1.h
>>    ubc_check.c
>>    ubc_check.h
>
> What should happen to paths that match the given pathspec that do
> not have enough number of components?  E.g. "cache.h" when the
> command is "git archive --strip-components=1 HEAD \*.h"?  Should it
> be documented?

Entries whose full path is stripped away don't make it into the archive.
That behavior is copied from bsdtar along with the option name and most
of its description in git-archive.txt.

Alternatively we could warn or die.  The latter would be a bit awkward
because we'd either have to check all paths first or risk reporting them
after writing at least some headers.

No strong preference, but following the precedence set by bsdtar makes
the most sense to me.

>> The new option does not affect the paths of entries added by --add-file
>> and --add-virtual-file because they are handcrafted to their desired
>> values already.  Similarly, the value of --prefix is not subject to
>> component stripping.
>
> Very sensible.
>
>> diff --git a/archive.c b/archive.c
>> index 9aeaf2bd87..8308d4d9c4 100644
>> --- a/archive.c
>> +++ b/archive.c
>> @@ -166,6 +166,18 @@ static int write_archive_entry(const struct object_id *oid, const char *base,
>>  		args->convert = check_attr_export_subst(check);
>>  	}
>
> We probably could save attribute lookup overhead by moving the new
> logic a bit higher in the function?
>
> No, that would invalidate the path_without_prefix variable by using
> strbuf_remove() on &path, and will break the attribute look-up.  The
> variable is used only once before this point and never used later,
> but as an independent future-proofing, we may want to remove the
> variable or narrow the scope.  It's totally out of scope of the
> patch, though.

Would you have noticed that attribute lookup breakage without the
presence of that variable? :)

The sad thing is that we concatenate base and filename here and
then attr.c::collect_some_attrs() goes and splits them again.  It
also uses the concatenated path, but perhaps that can be avoided?

>> +	if (args->strip_components > 0) {
>> +		size_t orig_baselen = baselen;
>> +		for (int i = 0; i < args->strip_components; i++) {
>> +			const char *slash = memchr(base, '/', baselen);
>> +			if (!slash)
>> +				return S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0;
>> +			baselen -= slash - base + 1;
>> +			base = slash + 1;
>> +		}
>> +		strbuf_remove(&path, args->baselen, orig_baselen - baselen);
>> +	}
>
> Nice to see that the core logic of the new feature is surprisingly
> small.
>
>>  	if (args->verbose)
>>  		fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
>
> By having the verbose output after the path stripping, we won't show
> the leading components we stripped, making it similar to what we
> would see when we piped the resulting archive to "| tar tf -".  I
> guess this makes sense than showing the original path.

Right, printing the path as it appears in the archive makes sense.
bsdtar does the same..

René

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Bug in git archive + .gitattributes + relative path
  2023-03-06 17:27       ` Junio C Hamano
@ 2023-03-06 18:28         ` René Scharfe
  2023-03-06 18:59           ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: René Scharfe @ 2023-03-06 18:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Cristian Le, git

Am 06.03.23 um 18:27 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> Subject: [PATCH] archive: add --strip-components
>>
>> Allow removing leading elements from paths of archive entries.  That's
>> useful when archiving sub-directories and not wanting to keep the
>> common path prefix, e.g.:
>>
>>    $ git archive --strip-components=1 HEAD sha1dc | tar tf -
>>    .gitattributes
>>    LICENSE.txt
>>    sha1.c
>>    sha1.h
>>    ubc_check.c
>>    ubc_check.h
>>
>> The same can be achieved by specifying a tree instead of a commit and
>> a pathspec:
>>
>>    $ git archive HEAD:sha1dc | tar tf -
>>    .gitattributes
>>    LICENSE.txt
>>    sha1.c
>>    sha1.h
>>    ubc_check.c
>>    ubc_check.h
>
> Another way I am not sure is working as designed is
>
>     $ cd sha1dc && git archive HEAD . | tar tf -
>     .gitattributes
>     LICENSE.txt
>     sha1.c
>     sha1.h
>     ubc_check.c
>     ubc_check.hq
>
> I didn't check if the attribute look-up is done on the correct path
> or export-subst kicks in in such a use, though.

export-subst is supported in that invocation because git archive has a
commit to work with.

I can kinda see others preferring the directory prefix "sha1dc/" added
to those entries.  Perhaps it depends on what git archive is supposed to
archive: A commit or the files of a commit?  I'm in the latter camp, and
expect to see the same paths as given by git ls-files or git ls-tree.

But that invocation in a sub-directory probably has the same problem
with attributes as the one with a sub-tree above it, i.e. that
attributes are always looked up relative to the repository root.  I
wonder if 47cfc9bd7d (attr: add flag `--source` to work with tree-ish,
2023-01-14) provided the means to fix this when it added a tree_oid
parameter to git_check_attr().

René

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Bug in git archive + .gitattributes + relative path
  2023-03-06 18:28         ` René Scharfe
@ 2023-03-06 18:59           ` Junio C Hamano
  2023-03-06 21:32             ` René Scharfe
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2023-03-06 18:59 UTC (permalink / raw)
  To: René Scharfe; +Cc: Cristian Le, git

René Scharfe <l.s.r@web.de> writes:

>> Another way I am not sure is working as designed is
>>
>>     $ cd sha1dc && git archive HEAD . | tar tf -
>>     .gitattributes
>>     LICENSE.txt
>>     sha1.c
>>     sha1.h
>>     ubc_check.c
>>     ubc_check.hq
>>
>> I didn't check if the attribute look-up is done on the correct path
>> or export-subst kicks in in such a use, though.
>
> export-subst is supported in that invocation because git archive has a
> commit to work with.
>
> I can kinda see others preferring the directory prefix "sha1dc/" added
> to those entries.  Perhaps it depends on what git archive is supposed to
> archive: A commit or the files of a commit?  I'm in the latter camp, and
> expect to see the same paths as given by git ls-files or git ls-tree.
>
> But that invocation in a sub-directory probably has the same problem
> with attributes as the one with a sub-tree above it, i.e. that
> attributes are always looked up relative to the repository root.  I
> wonder if 47cfc9bd7d (attr: add flag `--source` to work with tree-ish,
> 2023-01-14) provided the means to fix this when it added a tree_oid
> parameter to git_check_attr().

It somehow feels that the use of pathspec in "git archive" is
somewhat iffy, e.g.

   $ cd sha1dc && git archive HEAD :/ | tar tf -

does not compare very well with

   $ cd sha1dc && git ls-tree -r HEAD :/

For that matter, replacing ":/" (full tree) with ".." (we know one
level above is the root of the working tree) has the same "why don't
they work the same way???" confusion.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Bug in git archive + .gitattributes + relative path
  2023-03-06 18:59           ` Junio C Hamano
@ 2023-03-06 21:32             ` René Scharfe
  2023-03-06 22:34               ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: René Scharfe @ 2023-03-06 21:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Cristian Le, git, Matthias Görgens

Am 06.03.23 um 19:59 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
> It somehow feels that the use of pathspec in "git archive" is
> somewhat iffy, e.g.
>
>    $ cd sha1dc && git archive HEAD :/ | tar tf -
>
> does not compare very well with
>
>    $ cd sha1dc && git ls-tree -r HEAD :/
>
> For that matter, replacing ":/" (full tree) with ".." (we know one
> level above is the root of the working tree) has the same "why don't
> they work the same way???" confusion.

Right,
https://lore.kernel.org/git/CA+X7ob8DWGmoVTxSVvrFN68V=pcaZripfP=s+LpWvXN-6L7W7Q@mail.gmail.com/
reports the second one as well.

It's a consequence of effectively cd'ing into the prefix tree in
archive.c::parse_treeish_arg() and using the empty string as prefix from
then on.  If we stop doing that then we're getting much closer (path at
the end):

   $ git -C xdiff ls-files '../sha1dc/*.c' '../xdiff/*.c'
   ../sha1dc/sha1.c
   ../sha1dc/ubc_check.c
   xdiffi.c
   xemit.c
   xhistogram.c
   xmerge.c
   xpatience.c
   xprepare.c
   xutils.c


   $ git -C xdiff archive HEAD '../sha1dc/*.c' '../xdiff/*.c' | tar tf -
   ../sha1dc/
   ../sha1dc/sha1.c
   ../sha1dc/ubc_check.c
   xdiffi.c
   xemit.c
   xhistogram.c
   xmerge.c
   xpatience.c
   xprepare.c
   xutils.c

Not sure if we want those leading double dots.  bsdtar has them:

   $ (cd xdiff && tar cf - ../sha1dc/*.c ../xdiff/*.c) | tar tf -
   ../sha1dc/sha1.c
   ../sha1dc/ubc_check.c
   ../xdiff/xdiffi.c
   ../xdiff/xemit.c
   ../xdiff/xhistogram.c
   ../xdiff/xmerge.c
   ../xdiff/xpatience.c
   ../xdiff/xprepare.c
   ../xdiff/xutils.c

... but GNU tar strips them off and warns about them:

   $ (cd xdiff && gtar cf - ../sha1dc/*.c ../xdiff/*.c) | tar tf -
   gtar: Removing leading `../' from member names
   gtar: Removing leading `../' from hard link targets
   sha1dc/sha1.c
   sha1dc/ubc_check.c
   xdiff/xdiffi.c
   xdiff/xemit.c
   xdiff/xhistogram.c
   xdiff/xmerge.c
   xdiff/xpatience.c
   xdiff/xprepare.c
   xdiff/xutils.c

Neither of them resolves "../$PWD/" parts to "" like git ls-tree does,
but I can accept that difference.  And then we'd need to keep leading
"../", I suppose.  Still unsure.

And I don't know why PATHSPEC_PREFER_CWD is necessary.

So no sign-off, yet.


 archive.c | 60 +++++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 35 insertions(+), 25 deletions(-)

diff --git a/archive.c b/archive.c
index 9aeaf2bd87..c7e9f58b02 100644
--- a/archive.c
+++ b/archive.c
@@ -139,6 +139,7 @@ static int write_archive_entry(const struct object_id *oid, const char *base,
 		void *context)
 {
 	static struct strbuf path = STRBUF_INIT;
+	static struct strbuf scratch = STRBUF_INIT;
 	struct archiver_context *c = context;
 	struct archiver_args *args = c->args;
 	write_archive_entry_fn_t write_entry = c->write_entry;
@@ -148,6 +149,14 @@ static int write_archive_entry(const struct object_id *oid, const char *base,
 	void *buffer;
 	enum object_type type;

+	/*
+	 * NEEDSWORK: variable names could be clearer:
+	 * - args->prefix is the current working directory,
+	 * - args->base with args->baselen is the --prefix value,
+	 * - base with baselen is the path of the current tree,
+	 * - args->base + base + filename is the path in the archive,
+	 * - path_without_prefix is base + filename.
+	 */
 	args->convert = 0;
 	strbuf_reset(&path);
 	strbuf_grow(&path, PATH_MAX);
@@ -166,6 +175,15 @@ static int write_archive_entry(const struct object_id *oid, const char *base,
 		args->convert = check_attr_export_subst(check);
 	}

+	if (args->prefix) {
+		const char *rel = relative_path(path_without_prefix,
+						args->prefix, &scratch);
+		if (!strcmp(rel, "./"))
+			return S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0;
+		strbuf_setlen(&path, args->baselen);
+		strbuf_addstr(&path, rel);
+	}
+
 	if (args->verbose)
 		fprintf(stderr, "%.*s\n", (int)path.len, path.buf);

@@ -401,14 +419,15 @@ static int reject_entry(const struct object_id *oid UNUSED,
 	return ret;
 }

-static int path_exists(struct archiver_args *args, const char *path)
+static int path_exists(struct archiver_args *args, const char *prefix,
+		       const char *path)
 {
 	const char *paths[] = { path, NULL };
 	struct path_exists_context ctx;
 	int ret;

 	ctx.args = args;
-	parse_pathspec(&ctx.pathspec, 0, 0, "", paths);
+	parse_pathspec(&ctx.pathspec, 0, 0, prefix, paths);
 	ctx.pathspec.recursive = 1;
 	ret = read_tree(args->repo, args->tree,
 			&ctx.pathspec,
@@ -417,30 +436,35 @@ static int path_exists(struct archiver_args *args, const char *path)
 	return ret != 0;
 }

-static void parse_pathspec_arg(const char **pathspec,
+static void parse_pathspec_arg(const char **pathspec, const char *prefix,
 		struct archiver_args *ar_args)
 {
+	const char *match_all[] = { ".", NULL };
+
+	if (prefix && !*pathspec)
+		pathspec = match_all;
+
 	/*
 	 * must be consistent with parse_pathspec in path_exists()
 	 * Also if pathspec patterns are dependent, we're in big
 	 * trouble as we test each one separately
 	 */
 	parse_pathspec(&ar_args->pathspec, 0,
-		       PATHSPEC_PREFER_FULL,
-		       "", pathspec);
+		       PATHSPEC_PREFER_CWD,
+		       prefix, pathspec);
 	ar_args->pathspec.recursive = 1;
 	if (pathspec) {
 		while (*pathspec) {
-			if (**pathspec && !path_exists(ar_args, *pathspec))
+			if (**pathspec &&
+			    !path_exists(ar_args, prefix, *pathspec))
 				die(_("pathspec '%s' did not match any files"), *pathspec);
 			pathspec++;
 		}
 	}
 }

-static void parse_treeish_arg(const char **argv,
-		struct archiver_args *ar_args, const char *prefix,
-		int remote)
+static void parse_treeish_arg(const char **argv, struct archiver_args *ar_args,
+			      int remote)
 {
 	const char *name = argv[0];
 	const struct object_id *commit_oid;
@@ -479,20 +503,6 @@ static void parse_treeish_arg(const char **argv,
 	if (!tree)
 		die(_("not a tree object: %s"), oid_to_hex(&oid));

-	if (prefix) {
-		struct object_id tree_oid;
-		unsigned short mode;
-		int err;
-
-		err = get_tree_entry(ar_args->repo,
-				     &tree->object.oid,
-				     prefix, &tree_oid,
-				     &mode);
-		if (err || !S_ISDIR(mode))
-			die(_("current working directory is untracked"));
-
-		tree = parse_tree_indirect(&tree_oid);
-	}
 	ar_args->refname = ref;
 	ar_args->tree = tree;
 	ar_args->commit_oid = commit_oid;
@@ -710,8 +720,8 @@ int write_archive(int argc, const char **argv, const char *prefix,
 		setup_git_directory();
 	}

-	parse_treeish_arg(argv, &args, prefix, remote);
-	parse_pathspec_arg(argv + 1, &args);
+	parse_treeish_arg(argv, &args, remote);
+	parse_pathspec_arg(argv + 1, prefix, &args);

 	rc = ar->write_archive(ar, &args);



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Bug in git archive + .gitattributes + relative path
  2023-03-06 21:32             ` René Scharfe
@ 2023-03-06 22:34               ` Junio C Hamano
  2023-03-11 20:47                 ` René Scharfe
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2023-03-06 22:34 UTC (permalink / raw)
  To: René Scharfe; +Cc: Cristian Le, git, Matthias Görgens

René Scharfe <l.s.r@web.de> writes:

> Neither of them resolves "../$PWD/" parts to "" like git ls-tree does,
> but I can accept that difference.  And then we'd need to keep leading
> "../", I suppose.  Still unsure.

I offhand do not know how well it would mix with --strip-components
if we leave the leading "../".

But it certainly would be nice if we somehow:

 * can keep the current behaviour where "git -C sub archive" records
   paths relative to "sub" for backward compatibility.

 * fail loudly when "git -C sub archive <pathspec>" makes us use
   "../" prefix because <pathspec> goes above the $PWD for backward
   compatibility and sanity.

 * with --some-option, make "git -C sub archive --some-option :/"
   act exactly like "git archive :/".

> And I don't know why PATHSPEC_PREFER_CWD is necessary.
>
> So no sign-off, yet.
>
>
>  archive.c | 60 +++++++++++++++++++++++++++++++++++-------------------------
>  1 file changed, 35 insertions(+), 25 deletions(-)
>
> diff --git a/archive.c b/archive.c
> index 9aeaf2bd87..c7e9f58b02 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -139,6 +139,7 @@ static int write_archive_entry(const struct object_id *oid, const char *base,
>  		void *context)
>  {
>  	static struct strbuf path = STRBUF_INIT;
> +	static struct strbuf scratch = STRBUF_INIT;
>  	struct archiver_context *c = context;
>  	struct archiver_args *args = c->args;
>  	write_archive_entry_fn_t write_entry = c->write_entry;
> @@ -148,6 +149,14 @@ static int write_archive_entry(const struct object_id *oid, const char *base,
>  	void *buffer;
>  	enum object_type type;
>
> +	/*
> +	 * NEEDSWORK: variable names could be clearer:
> +	 * - args->prefix is the current working directory,
> +	 * - args->base with args->baselen is the --prefix value,
> +	 * - base with baselen is the path of the current tree,
> +	 * - args->base + base + filename is the path in the archive,
> +	 * - path_without_prefix is base + filename.
> +	 */
>  	args->convert = 0;
>  	strbuf_reset(&path);
>  	strbuf_grow(&path, PATH_MAX);
> @@ -166,6 +175,15 @@ static int write_archive_entry(const struct object_id *oid, const char *base,
>  		args->convert = check_attr_export_subst(check);
>  	}
>
> +	if (args->prefix) {
> +		const char *rel = relative_path(path_without_prefix,
> +						args->prefix, &scratch);
> +		if (!strcmp(rel, "./"))
> +			return S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0;
> +		strbuf_setlen(&path, args->baselen);
> +		strbuf_addstr(&path, rel);
> +	}
> +
>  	if (args->verbose)
>  		fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
>
> @@ -401,14 +419,15 @@ static int reject_entry(const struct object_id *oid UNUSED,
>  	return ret;
>  }
>
> -static int path_exists(struct archiver_args *args, const char *path)
> +static int path_exists(struct archiver_args *args, const char *prefix,
> +		       const char *path)
>  {
>  	const char *paths[] = { path, NULL };
>  	struct path_exists_context ctx;
>  	int ret;
>
>  	ctx.args = args;
> -	parse_pathspec(&ctx.pathspec, 0, 0, "", paths);
> +	parse_pathspec(&ctx.pathspec, 0, 0, prefix, paths);
>  	ctx.pathspec.recursive = 1;
>  	ret = read_tree(args->repo, args->tree,
>  			&ctx.pathspec,
> @@ -417,30 +436,35 @@ static int path_exists(struct archiver_args *args, const char *path)
>  	return ret != 0;
>  }
>
> -static void parse_pathspec_arg(const char **pathspec,
> +static void parse_pathspec_arg(const char **pathspec, const char *prefix,
>  		struct archiver_args *ar_args)
>  {
> +	const char *match_all[] = { ".", NULL };
> +
> +	if (prefix && !*pathspec)
> +		pathspec = match_all;
> +
>  	/*
>  	 * must be consistent with parse_pathspec in path_exists()
>  	 * Also if pathspec patterns are dependent, we're in big
>  	 * trouble as we test each one separately
>  	 */
>  	parse_pathspec(&ar_args->pathspec, 0,
> -		       PATHSPEC_PREFER_FULL,
> -		       "", pathspec);
> +		       PATHSPEC_PREFER_CWD,
> +		       prefix, pathspec);
>  	ar_args->pathspec.recursive = 1;
>  	if (pathspec) {
>  		while (*pathspec) {
> -			if (**pathspec && !path_exists(ar_args, *pathspec))
> +			if (**pathspec &&
> +			    !path_exists(ar_args, prefix, *pathspec))
>  				die(_("pathspec '%s' did not match any files"), *pathspec);
>  			pathspec++;
>  		}
>  	}
>  }
>
> -static void parse_treeish_arg(const char **argv,
> -		struct archiver_args *ar_args, const char *prefix,
> -		int remote)
> +static void parse_treeish_arg(const char **argv, struct archiver_args *ar_args,
> +			      int remote)
>  {
>  	const char *name = argv[0];
>  	const struct object_id *commit_oid;
> @@ -479,20 +503,6 @@ static void parse_treeish_arg(const char **argv,
>  	if (!tree)
>  		die(_("not a tree object: %s"), oid_to_hex(&oid));
>
> -	if (prefix) {
> -		struct object_id tree_oid;
> -		unsigned short mode;
> -		int err;
> -
> -		err = get_tree_entry(ar_args->repo,
> -				     &tree->object.oid,
> -				     prefix, &tree_oid,
> -				     &mode);
> -		if (err || !S_ISDIR(mode))
> -			die(_("current working directory is untracked"));
> -
> -		tree = parse_tree_indirect(&tree_oid);
> -	}
>  	ar_args->refname = ref;
>  	ar_args->tree = tree;
>  	ar_args->commit_oid = commit_oid;
> @@ -710,8 +720,8 @@ int write_archive(int argc, const char **argv, const char *prefix,
>  		setup_git_directory();
>  	}
>
> -	parse_treeish_arg(argv, &args, prefix, remote);
> -	parse_pathspec_arg(argv + 1, &args);
> +	parse_treeish_arg(argv, &args, remote);
> +	parse_pathspec_arg(argv + 1, prefix, &args);
>
>  	rc = ar->write_archive(ar, &args);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Bug in git archive + .gitattributes + relative path
  2023-03-06 22:34               ` Junio C Hamano
@ 2023-03-11 20:47                 ` René Scharfe
  2023-03-12 21:25                   ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: René Scharfe @ 2023-03-11 20:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Cristian Le, git, Matthias Görgens

Am 06.03.23 um 23:34 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> Neither of them resolves "../$PWD/" parts to "" like git ls-tree does,
>> but I can accept that difference.  And then we'd need to keep leading
>> "../", I suppose.  Still unsure.
>
> I offhand do not know how well it would mix with --strip-components
> if we leave the leading "../".

Not too well when entries from $PWD are shortened and mixed with ones
from elsewhere that aren't.  But that seems like a strange thing to do.
If two sub-trees are needed then git archive should be started in a
shared parent directory higher up.

> But it certainly would be nice if we somehow:
>
>  * can keep the current behaviour where "git -C sub archive" records
>    paths relative to "sub" for backward compatibility.

Right.  That's what relative_path() provides in the patch.

>  * fail loudly when "git -C sub archive <pathspec>" makes us use
>    "../" prefix because <pathspec> goes above the $PWD for backward
>    compatibility and sanity.

Without the patch this fails, but are there really people that depend on
it failing?  We could certainly forbid it, but do we need to?  It costs
a few lines of code (see patch below).

>  * with --some-option, make "git -C sub archive --some-option :/"
>    act exactly like "git archive :/".

Perhaps I'm reading this too literally, but it would be easier to remove
"-C sub" from that command. Or to add "-C $(git rev-parse --show-cdup)".
We could add a shortcut for that (see patch below).

>> And I don't know why PATHSPEC_PREFER_CWD is necessary.

PATHSPEC_PREFER_FULL makes the empty pathspec match everything, while
PATHSPEC_PREFER_CWD makes it match the current working directory, only.
In git archive without these patches both mean the same because we
make the tree of the current working directory the root of our pathspec
worldview.  When we stop doing that it actually makes a difference and
we need the more limited variant.

René


 Documentation/git.txt |  4 ++++
 archive.c             | 36 ++++++++++++++++++++++--------------
 git.c                 | 18 ++++++++++++++++++
 t/t0056-git-C.sh      |  6 ++++++
 4 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 74973d3cc4..e254572fec 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -73,6 +73,10 @@ example the following invocations are equivalent:
     git --git-dir=a.git --work-tree=b -C c status
     git --git-dir=c/a.git --work-tree=c/b status

+--cdup::
+	Run as if git was started at the root of the repository or
+	worktree.  Same as `-C $(git rev-parse --show-cdup)`.
+
 -c <name>=<value>::
 	Pass a configuration parameter to the command. The value
 	given will override values from configuration files.
diff --git a/archive.c b/archive.c
index c7e9f58b02..6f587876e5 100644
--- a/archive.c
+++ b/archive.c
@@ -396,6 +396,7 @@ static const struct archiver *lookup_archiver(const char *name)
 struct path_exists_context {
 	struct pathspec pathspec;
 	struct archiver_args *args;
+	int dotdot;
 };

 static int reject_entry(const struct object_id *oid UNUSED,
@@ -405,35 +406,43 @@ static int reject_entry(const struct object_id *oid UNUSED,
 {
 	int ret = -1;
 	struct path_exists_context *ctx = context;
+	const char *prefix = ctx->args->prefix;
+	struct strbuf sb = STRBUF_INIT;
+	struct strbuf scratch = STRBUF_INIT;

+	strbuf_addbuf(&sb, base);
+	strbuf_addstr(&sb, filename);
 	if (S_ISDIR(mode)) {
-		struct strbuf sb = STRBUF_INIT;
-		strbuf_addbuf(&sb, base);
-		strbuf_addstr(&sb, filename);
 		if (!match_pathspec(ctx->args->repo->index,
 				    &ctx->pathspec,
 				    sb.buf, sb.len, 0, NULL, 1))
 			ret = READ_TREE_RECURSIVE;
-		strbuf_release(&sb);
+	} else {
+		if (starts_with(relative_path(sb.buf, prefix, &scratch), "../"))
+			ctx->dotdot = 1;
 	}
+	strbuf_release(&sb);
+	strbuf_release(&scratch);
 	return ret;
 }

-static int path_exists(struct archiver_args *args, const char *prefix,
+static void check_path(struct archiver_args *args, const char *prefix,
 		       const char *path)
 {
 	const char *paths[] = { path, NULL };
 	struct path_exists_context ctx;
-	int ret;

 	ctx.args = args;
+	ctx.dotdot = 0;
 	parse_pathspec(&ctx.pathspec, 0, 0, prefix, paths);
 	ctx.pathspec.recursive = 1;
-	ret = read_tree(args->repo, args->tree,
-			&ctx.pathspec,
-			reject_entry, &ctx);
+	if (!read_tree(args->repo, args->tree, &ctx.pathspec,
+		       reject_entry, &ctx))
+		die(_("pathspec '%s' did not match any files"), path);
+	if (ctx.dotdot)
+		die(_("pathspec '%s' matches files above current directory"),
+		    path);
 	clear_pathspec(&ctx.pathspec);
-	return ret != 0;
 }

 static void parse_pathspec_arg(const char **pathspec, const char *prefix,
@@ -445,7 +454,7 @@ static void parse_pathspec_arg(const char **pathspec, const char *prefix,
 		pathspec = match_all;

 	/*
-	 * must be consistent with parse_pathspec in path_exists()
+	 * must be consistent with parse_pathspec in check_path()
 	 * Also if pathspec patterns are dependent, we're in big
 	 * trouble as we test each one separately
 	 */
@@ -455,9 +464,8 @@ static void parse_pathspec_arg(const char **pathspec, const char *prefix,
 	ar_args->pathspec.recursive = 1;
 	if (pathspec) {
 		while (*pathspec) {
-			if (**pathspec &&
-			    !path_exists(ar_args, prefix, *pathspec))
-				die(_("pathspec '%s' did not match any files"), *pathspec);
+			if (**pathspec)
+				check_path(ar_args, prefix, *pathspec);
 			pathspec++;
 		}
 	}
diff --git a/git.c b/git.c
index 6171fd6769..22993991a9 100644
--- a/git.c
+++ b/git.c
@@ -293,6 +293,24 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
 			}
 			(*argv)++;
 			(*argc)--;
+		} else if (!strcmp(cmd, "--cdup")) {
+			struct child_process cp = CHILD_PROCESS_INIT;
+			struct strbuf out = STRBUF_INIT;
+			struct strbuf err = STRBUF_INIT;
+
+			cp.git_cmd = 1;
+			strvec_pushl(&cp.args, "rev-parse", "--show-cdup", NULL);
+			if (pipe_command(&cp, NULL, 0, &out, 0, &err, 0)) {
+				strbuf_trim_trailing_newline(&err);
+				if (err.len)
+					die("%s", err.buf);
+				die(_("unable to get repo root or worktree"));
+			}
+			strbuf_trim_trailing_newline(&out);
+			if (out.len && chdir(out.buf))
+				die_errno(_("cannot change to '%s'"), out.buf);
+			strbuf_release(&out);
+			strbuf_release(&err);
 		} else if (skip_prefix(cmd, "--list-cmds=", &cmd)) {
 			trace2_cmd_name("_query_");
 			if (!strcmp(cmd, "parseopt")) {
diff --git a/t/t0056-git-C.sh b/t/t0056-git-C.sh
index 752aa8c945..95107d724e 100755
--- a/t/t0056-git-C.sh
+++ b/t/t0056-git-C.sh
@@ -92,4 +92,10 @@ test_expect_success 'Relative followed by fullpath: "-C ./here -C /there" is equ
 	test_cmp expected actual
 '

+test_expect_success '"--cdup" goes back' '
+	git status >expected &&
+	git -C c --cdup status >actual &&
+	test_cmp expected actual
+'
+
 test_done

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Bug in git archive + .gitattributes + relative path
  2023-03-11 20:47                 ` René Scharfe
@ 2023-03-12 21:25                   ` Junio C Hamano
  2023-03-18 21:30                     ` René Scharfe
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2023-03-12 21:25 UTC (permalink / raw)
  To: René Scharfe; +Cc: Cristian Le, git, Matthias Görgens

René Scharfe <l.s.r@web.de> writes:

>> I offhand do not know how well it would mix with --strip-components
>> if we leave the leading "../".
>
> Not too well when entries from $PWD are shortened and mixed with ones
> from elsewhere that aren't.  But that seems like a strange thing to do.

Yes, it is a strange thing to do.

> If two sub-trees are needed then git archive should be started in a
> shared parent directory higher up.
>
>> But it certainly would be nice if we somehow:
>>
>>  * can keep the current behaviour where "git -C sub archive" records
>>    paths relative to "sub" for backward compatibility.
>
> Right.  That's what relative_path() provides in the patch.
>
>>  * fail loudly when "git -C sub archive <pathspec>" makes us use
>>    "../" prefix because <pathspec> goes above the $PWD for backward
>>    compatibility and sanity.
>
> Without the patch this fails, but are there really people that depend on
> it failing?  We could certainly forbid it, but do we need to?

I dunno.  It was an obvious way to avoid having to think about
interaction with --strip-components and "../", but there certainly
may be other solutions for it people can think of. 

Also on the receiving end, don't people get upset to see that their
"tar xf" escapes the directory they just created only to extract the
tarball?

>>  * with --some-option, make "git -C sub archive --some-option :/"
>>    act exactly like "git archive :/".
>
> Perhaps I'm reading this too literally, but it would be easier to remove
> "-C sub" from that command. Or to add "-C $(git rev-parse --show-cdup)".
> We could add a shortcut for that (see patch below).

More like

	$ cd some/deep/place
	... work work work
	$ git archive --full-tree :/other :/hier :/archy

is what I had in mind.  Without --full-tree, due to the first bullet
point above, paths in our archive are relative to some/deep/place.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Bug in git archive + .gitattributes + relative path
  2023-03-12 21:25                   ` Junio C Hamano
@ 2023-03-18 21:30                     ` René Scharfe
  2023-03-20 16:16                       ` Junio C Hamano
  2023-03-20 20:02                       ` [PATCH] archive: improve support for running in a subdirectory René Scharfe
  0 siblings, 2 replies; 22+ messages in thread
From: René Scharfe @ 2023-03-18 21:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Cristian Le, git, Matthias Görgens

Am 12.03.23 um 22:25 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>>  * fail loudly when "git -C sub archive <pathspec>" makes us use
>>>    "../" prefix because <pathspec> goes above the $PWD for backward
>>>    compatibility and sanity.
>>
>> Without the patch this fails, but are there really people that depend on
>> it failing?  We could certainly forbid it, but do we need to?
>
> I dunno.  It was an obvious way to avoid having to think about
> interaction with --strip-components and "../", but there certainly
> may be other solutions for it people can think of.
>
> Also on the receiving end, don't people get upset to see that their
> "tar xf" escapes the directory they just created only to extract the
> tarball?

bsdtar creates entries starting with "../" without complaint, but
refuses to extract them as-is -- you need to e.g. use --strip-components
to get rid of those dots.  Awkward.

Not allowing those entries to be created is more consistent, especially
since we're already restrictive like that.  OK.

>>>  * with --some-option, make "git -C sub archive --some-option :/"
>>>    act exactly like "git archive :/".
>>
>> Perhaps I'm reading this too literally, but it would be easier to remove
>> "-C sub" from that command. Or to add "-C $(git rev-parse --show-cdup)".
>> We could add a shortcut for that (see patch below).
>
> More like
>
> 	$ cd some/deep/place
> 	... work work work
> 	$ git archive --full-tree :/other :/hier :/archy
>
> is what I had in mind.  Without --full-tree, due to the first bullet
> point above, paths in our archive are relative to some/deep/place.

I don't see the difference.  Here ":/other" is the youngest commit with
"other" in its message, ":/hier" and ":/archy" are pathspecs selecting
subdirectories.  If we stay in a different subdirectory then this is
simply currently forbidden and we'd keep it like that:

   $ cd xdiff
   $ git archive :/needle :/t/t0019 :/t/t4020 | tar tf -
   fatal: pathspec ':/t/t0019' did not match any files

We'd just improve the message a bit.  Going up still helps:

   $ git -C "$(git rev-parse --show-cdup)" archive :/needle :/t/t0019 :/t/t4020 | tar tf -
   t/
   t/t0019/
   t/t0019/parse_json.perl
   t/t4020/
   t/t4020/diff.NUL

   # We could shorten that.
   $ ../git --cdup archive :/needle :/t/t0019 :/t/t4020 | tar tf -
   t/
   t/t0019/
   t/t0019/parse_json.perl
   t/t4020/
   t/t4020/diff.NUL

Or am I missing something here?

René

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Bug in git archive + .gitattributes + relative path
  2023-03-18 21:30                     ` René Scharfe
@ 2023-03-20 16:16                       ` Junio C Hamano
  2023-03-20 20:02                       ` [PATCH] archive: improve support for running in a subdirectory René Scharfe
  1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2023-03-20 16:16 UTC (permalink / raw)
  To: René Scharfe; +Cc: Cristian Le, git, Matthias Görgens

René Scharfe <l.s.r@web.de> writes:

>> 	$ cd some/deep/place
>> 	... work work work
>> 	$ git archive --full-tree :/other :/hier :/archy
>>
>> is what I had in mind.  Without --full-tree, due to the first bullet
>> point above, paths in our archive are relative to some/deep/place.
>
> I don't see the difference.

OK.  I do not work in Java anyway ;-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] archive: improve support for running in a subdirectory
  2023-03-18 21:30                     ` René Scharfe
  2023-03-20 16:16                       ` Junio C Hamano
@ 2023-03-20 20:02                       ` René Scharfe
  2023-03-21 22:59                         ` Junio C Hamano
  2023-03-24 22:27                         ` [PATCH v2] archive: improve support for running in subdirectory René Scharfe
  1 sibling, 2 replies; 22+ messages in thread
From: René Scharfe @ 2023-03-20 20:02 UTC (permalink / raw)
  To: git; +Cc: Cristian Le, Matthias Görgens, Junio C Hamano

When git archive is started in a subdirectory, it archives its
corresponding tree and its child objects, only.  That is intended.  It
does that by effectively cd'ing into that tree and setting "prefix" to
the empty string.

This has unfortunate consequences, though: Attributes are anchored at
the root of the repository and git archive still applies them to
subtrees, causing mismatches.  And when checking pathspecs it cannot
tell the difference between one that doesn't match anything and one that
matches something outside of the subdirectory, leading to a confusing
error message.

Fix that by keeping the "prefix" value and passing it to functions
related to pathspecs and attributes, and shortening the paths written to
the archive and (if --verbose is given) to stdout using relative_path().

Still reject attempts to archive files outside the current directory,
but print a more specific error in that case.  Recognizing it requires a
full traversal of the subtree for each pathspec, however.  Allowing them
would be easier, but archive entry paths starting with "../" can be
problematic to extract -- e.g. bsdtar skips them by default.

Reported-by: Cristian Le <cristian.le@mpsd.mpg.de>
Reported-by: Matthias Görgens <matthias.goergens@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 archive.c               | 71 +++++++++++++++++++++++++++++------------
 t/t5000-tar-tree.sh     | 13 ++++++++
 t/t5001-archive-attr.sh | 16 ++++++++++
 3 files changed, 79 insertions(+), 21 deletions(-)

diff --git a/archive.c b/archive.c
index 1c2ca78e52..c8d66169d1 100644
--- a/archive.c
+++ b/archive.c
@@ -168,6 +168,25 @@ static int write_archive_entry(const struct object_id *oid, const char *base,
 		args->convert = check_attr_export_subst(check);
 	}

+	if (args->prefix) {
+		static struct strbuf buf = STRBUF_INIT;
+		const char *rel;
+
+		rel = relative_path(path_without_prefix, args->prefix, &buf);
+
+		/*
+		 * We don't add an entry for the current working
+		 * directory when we are at the root; skip it also when
+		 * we're in a subdirectory or submodule.  Skip entries
+		 * higher up as well.
+		 */
+		if (!strcmp(rel, "./") || starts_with(rel, "../"))
+			return S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0;
+
+		strbuf_setlen(&path, args->baselen);
+		strbuf_addstr(&path, rel);
+	}
+
 	if (args->verbose)
 		fprintf(stderr, "%.*s\n", (int)path.len, path.buf);

@@ -403,6 +422,27 @@ static int reject_entry(const struct object_id *oid UNUSED,
 	return ret;
 }

+static int reject_outside(const struct object_id *oid UNUSED,
+			  struct strbuf *base, const char *filename,
+			  unsigned mode, void *context)
+{
+	struct archiver_args *args = context;
+	struct strbuf buf = STRBUF_INIT;
+	struct strbuf path = STRBUF_INIT;
+	int ret = 0;
+
+	if (S_ISDIR(mode))
+		return READ_TREE_RECURSIVE;
+
+	strbuf_addbuf(&path, base);
+	strbuf_addstr(&path, filename);
+	if (starts_with(relative_path(path.buf, args->prefix, &buf), "../"))
+		ret = -1;
+	strbuf_release(&buf);
+	strbuf_release(&path);
+	return ret;
+}
+
 static int path_exists(struct archiver_args *args, const char *path)
 {
 	const char *paths[] = { path, NULL };
@@ -410,8 +450,13 @@ static int path_exists(struct archiver_args *args, const char *path)
 	int ret;

 	ctx.args = args;
-	parse_pathspec(&ctx.pathspec, 0, 0, "", paths);
+	parse_pathspec(&ctx.pathspec, 0, PATHSPEC_PREFER_CWD,
+		       args->prefix, paths);
 	ctx.pathspec.recursive = 1;
+	if (args->prefix && read_tree(args->repo, args->tree, &ctx.pathspec,
+				      reject_outside, args))
+		die(_("pathspec '%s' matches files outside the "
+		      "current directory"), path);
 	ret = read_tree(args->repo, args->tree,
 			&ctx.pathspec,
 			reject_entry, &ctx);
@@ -427,9 +472,8 @@ static void parse_pathspec_arg(const char **pathspec,
 	 * Also if pathspec patterns are dependent, we're in big
 	 * trouble as we test each one separately
 	 */
-	parse_pathspec(&ar_args->pathspec, 0,
-		       PATHSPEC_PREFER_FULL,
-		       "", pathspec);
+	parse_pathspec(&ar_args->pathspec, 0, PATHSPEC_PREFER_CWD,
+		       ar_args->prefix, pathspec);
 	ar_args->pathspec.recursive = 1;
 	if (pathspec) {
 		while (*pathspec) {
@@ -441,8 +485,7 @@ static void parse_pathspec_arg(const char **pathspec,
 }

 static void parse_treeish_arg(const char **argv,
-		struct archiver_args *ar_args, const char *prefix,
-		int remote)
+			      struct archiver_args *ar_args, int remote)
 {
 	const char *name = argv[0];
 	const struct object_id *commit_oid;
@@ -481,20 +524,6 @@ static void parse_treeish_arg(const char **argv,
 	if (!tree)
 		die(_("not a tree object: %s"), oid_to_hex(&oid));

-	if (prefix) {
-		struct object_id tree_oid;
-		unsigned short mode;
-		int err;
-
-		err = get_tree_entry(ar_args->repo,
-				     &tree->object.oid,
-				     prefix, &tree_oid,
-				     &mode);
-		if (err || !S_ISDIR(mode))
-			die(_("current working directory is untracked"));
-
-		tree = parse_tree_indirect(&tree_oid);
-	}
 	ar_args->refname = ref;
 	ar_args->tree = tree;
 	ar_args->commit_oid = commit_oid;
@@ -712,7 +741,7 @@ int write_archive(int argc, const char **argv, const char *prefix,
 		setup_git_directory();
 	}

-	parse_treeish_arg(argv, &args, prefix, remote);
+	parse_treeish_arg(argv, &args, remote);
 	parse_pathspec_arg(argv + 1, &args);

 	rc = ar->write_archive(ar, &args);
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 918a2fc7c6..a60ae6145e 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -433,6 +433,19 @@ test_expect_success 'catch non-matching pathspec' '
 	test_must_fail git archive -v HEAD -- "*.abc" >/dev/null
 '

+test_expect_success 'reject paths outside the current directory' '
+	test_must_fail git -C a/bin archive HEAD .. >/dev/null 2>err &&
+	grep "outside the current directory" err
+'
+
+test_expect_success 'allow pathspecs that resolve to the current directory' '
+	git -C a/bin archive -v HEAD ../bin >/dev/null 2>actual &&
+	cat >expect <<-\EOF &&
+	sh
+	EOF
+	test_cmp expect actual
+'
+
 # Pull the size and date of each entry in a tarfile using the system tar.
 #
 # We'll pull out only the year from the date; that avoids any question of
diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh
index 04d300eeda..0ff47a239d 100755
--- a/t/t5001-archive-attr.sh
+++ b/t/t5001-archive-attr.sh
@@ -33,6 +33,13 @@ test_expect_success 'setup' '
 	echo ignored-by-tree.d export-ignore >>.gitattributes &&
 	git add ignored-by-tree ignored-by-tree.d .gitattributes &&

+	mkdir subdir &&
+	>subdir/included &&
+	>subdir/ignored-by-subtree &&
+	>subdir/ignored-by-tree &&
+	echo ignored-by-subtree export-ignore >subdir/.gitattributes &&
+	git add subdir &&
+
 	echo ignored by worktree >ignored-by-worktree &&
 	echo ignored-by-worktree export-ignore >.gitattributes &&
 	git add ignored-by-worktree &&
@@ -93,6 +100,15 @@ test_expect_exists	archive-pathspec-wildcard/ignored-by-worktree
 test_expect_missing	archive-pathspec-wildcard/excluded-by-pathspec.d
 test_expect_missing	archive-pathspec-wildcard/excluded-by-pathspec.d/file

+test_expect_success 'git -C subdir archive' '
+	git -C subdir archive HEAD >archive-subdir.tar &&
+	extract_tar_to_dir archive-subdir
+'
+
+test_expect_exists	archive-subdir/included
+test_expect_missing	archive-subdir/ignored-by-subtree
+test_expect_missing	archive-subdir/ignored-by-tree
+
 test_expect_success 'git archive with worktree attributes' '
 	git archive --worktree-attributes HEAD >worktree.tar &&
 	(mkdir worktree && cd worktree && "$TAR" xf -) <worktree.tar
--
2.40.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] archive: improve support for running in a subdirectory
  2023-03-20 20:02                       ` [PATCH] archive: improve support for running in a subdirectory René Scharfe
@ 2023-03-21 22:59                         ` Junio C Hamano
  2023-03-24 22:26                           ` René Scharfe
  2023-03-24 22:27                         ` [PATCH v2] archive: improve support for running in subdirectory René Scharfe
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2023-03-21 22:59 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Cristian Le, Matthias Görgens

René Scharfe <l.s.r@web.de> writes:

>  archive.c               | 71 +++++++++++++++++++++++++++++------------
>  t/t5000-tar-tree.sh     | 13 ++++++++
>  t/t5001-archive-attr.sh | 16 ++++++++++
>  3 files changed, 79 insertions(+), 21 deletions(-)

There are a handful of CI failures that can be seen at

  https://github.com/git/git/actions/runs/4482588035/jobs/7880821225#step:6:1803
  https://github.com/git/git/actions/runs/4482588035/jobs/7880821849#step:4:1811

which is with this topic in 'seen'.  Exactly the same 'seen' without
this topic seems to pass

  https://github.com/git/git/actions/runs/4484290871

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] archive: improve support for running in a subdirectory
  2023-03-21 22:59                         ` Junio C Hamano
@ 2023-03-24 22:26                           ` René Scharfe
  0 siblings, 0 replies; 22+ messages in thread
From: René Scharfe @ 2023-03-24 22:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Cristian Le, Matthias Görgens

Am 21.03.23 um 23:59 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>  archive.c               | 71 +++++++++++++++++++++++++++++------------
>>  t/t5000-tar-tree.sh     | 13 ++++++++
>>  t/t5001-archive-attr.sh | 16 ++++++++++
>>  3 files changed, 79 insertions(+), 21 deletions(-)
>
> There are a handful of CI failures that can be seen at
>
>   https://github.com/git/git/actions/runs/4482588035/jobs/7880821225#step:6:1803
>   https://github.com/git/git/actions/runs/4482588035/jobs/7880821849#step:4:1811
>
> which is with this topic in 'seen'.  Exactly the same 'seen' without
> this topic seems to pass
>
>   https://github.com/git/git/actions/runs/4484290871

Uh, nasty.  The linux-asan run reveals a use of the return value of
relative_path() after manipulating the underlying buffer.  I hope
fixing that helps linux-musl as well.

René



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2] archive: improve support for running in subdirectory
  2023-03-20 20:02                       ` [PATCH] archive: improve support for running in a subdirectory René Scharfe
  2023-03-21 22:59                         ` Junio C Hamano
@ 2023-03-24 22:27                         ` René Scharfe
  2023-03-27 16:09                           ` Junio C Hamano
  1 sibling, 1 reply; 22+ messages in thread
From: René Scharfe @ 2023-03-24 22:27 UTC (permalink / raw)
  To: git; +Cc: Cristian Le, Matthias Görgens, Junio C Hamano

When git archive is started in a subdirectory, it archives its
corresponding tree and its child objects, only.  That is intended.  It
does that by effectively cd'ing into that tree and setting "prefix" to
the empty string.

This has unfortunate consequences, though: Attributes are anchored at
the root of the repository and git archive still applies them to
subtrees, causing mismatches.  And when checking pathspecs it cannot
tell the difference between one that doesn't match anthing or one that
matches some actual blob outside of the subdirectory, leading to a
confusing error message.

Fix that by keeping the "prefix" value and passing it to pathspec and
attribute functions, and shortening it using relative_path() for paths
written to the archive and (if --verbose is given) to stdout.

Still reject attempts to archive files outside the current directory,
but print a more specific error in that case.  Recognizing it requires a
full traversal of the subtree for each pathspec, however.  Allowing them
would be easier, but archive entry paths starting with "../" can be
problematic to extract -- e.g. bsdtar skips them by default.

Reported-by: Cristian Le <cristian.le@mpsd.mpg.de>
Reported-by: Matthias Görgens <matthias.goergens@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
Interdiff:
  diff --git a/archive.c b/archive.c
  index c8d66169d1..a61615d180 100644
  --- a/archive.c
  +++ b/archive.c
  @@ -169,6 +169,7 @@ static int write_archive_entry(const struct object_id *oid, const char *base,
   	}

   	if (args->prefix) {
  +		static struct strbuf new_path = STRBUF_INIT;
   		static struct strbuf buf = STRBUF_INIT;
   		const char *rel;

  @@ -183,8 +184,11 @@ static int write_archive_entry(const struct object_id *oid, const char *base,
   		if (!strcmp(rel, "./") || starts_with(rel, "../"))
   			return S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0;

  -		strbuf_setlen(&path, args->baselen);
  -		strbuf_addstr(&path, rel);
  +		/* rel can refer to path, so don't edit it in place */
  +		strbuf_reset(&new_path);
  +		strbuf_add(&new_path, args->base, args->baselen);
  +		strbuf_addstr(&new_path, rel);
  +		strbuf_swap(&path, &new_path);
   	}

   	if (args->verbose)

 archive.c               | 75 +++++++++++++++++++++++++++++------------
 t/t5000-tar-tree.sh     | 13 +++++++
 t/t5001-archive-attr.sh | 16 +++++++++
 3 files changed, 83 insertions(+), 21 deletions(-)

diff --git a/archive.c b/archive.c
index 1c2ca78e52..a61615d180 100644
--- a/archive.c
+++ b/archive.c
@@ -168,6 +168,29 @@ static int write_archive_entry(const struct object_id *oid, const char *base,
 		args->convert = check_attr_export_subst(check);
 	}

+	if (args->prefix) {
+		static struct strbuf new_path = STRBUF_INIT;
+		static struct strbuf buf = STRBUF_INIT;
+		const char *rel;
+
+		rel = relative_path(path_without_prefix, args->prefix, &buf);
+
+		/*
+		 * We don't add an entry for the current working
+		 * directory when we are at the root; skip it also when
+		 * we're in a subdirectory or submodule.  Skip entries
+		 * higher up as well.
+		 */
+		if (!strcmp(rel, "./") || starts_with(rel, "../"))
+			return S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0;
+
+		/* rel can refer to path, so don't edit it in place */
+		strbuf_reset(&new_path);
+		strbuf_add(&new_path, args->base, args->baselen);
+		strbuf_addstr(&new_path, rel);
+		strbuf_swap(&path, &new_path);
+	}
+
 	if (args->verbose)
 		fprintf(stderr, "%.*s\n", (int)path.len, path.buf);

@@ -403,6 +426,27 @@ static int reject_entry(const struct object_id *oid UNUSED,
 	return ret;
 }

+static int reject_outside(const struct object_id *oid UNUSED,
+			  struct strbuf *base, const char *filename,
+			  unsigned mode, void *context)
+{
+	struct archiver_args *args = context;
+	struct strbuf buf = STRBUF_INIT;
+	struct strbuf path = STRBUF_INIT;
+	int ret = 0;
+
+	if (S_ISDIR(mode))
+		return READ_TREE_RECURSIVE;
+
+	strbuf_addbuf(&path, base);
+	strbuf_addstr(&path, filename);
+	if (starts_with(relative_path(path.buf, args->prefix, &buf), "../"))
+		ret = -1;
+	strbuf_release(&buf);
+	strbuf_release(&path);
+	return ret;
+}
+
 static int path_exists(struct archiver_args *args, const char *path)
 {
 	const char *paths[] = { path, NULL };
@@ -410,8 +454,13 @@ static int path_exists(struct archiver_args *args, const char *path)
 	int ret;

 	ctx.args = args;
-	parse_pathspec(&ctx.pathspec, 0, 0, "", paths);
+	parse_pathspec(&ctx.pathspec, 0, PATHSPEC_PREFER_CWD,
+		       args->prefix, paths);
 	ctx.pathspec.recursive = 1;
+	if (args->prefix && read_tree(args->repo, args->tree, &ctx.pathspec,
+				      reject_outside, args))
+		die(_("pathspec '%s' matches files outside the "
+		      "current directory"), path);
 	ret = read_tree(args->repo, args->tree,
 			&ctx.pathspec,
 			reject_entry, &ctx);
@@ -427,9 +476,8 @@ static void parse_pathspec_arg(const char **pathspec,
 	 * Also if pathspec patterns are dependent, we're in big
 	 * trouble as we test each one separately
 	 */
-	parse_pathspec(&ar_args->pathspec, 0,
-		       PATHSPEC_PREFER_FULL,
-		       "", pathspec);
+	parse_pathspec(&ar_args->pathspec, 0, PATHSPEC_PREFER_CWD,
+		       ar_args->prefix, pathspec);
 	ar_args->pathspec.recursive = 1;
 	if (pathspec) {
 		while (*pathspec) {
@@ -441,8 +489,7 @@ static void parse_pathspec_arg(const char **pathspec,
 }

 static void parse_treeish_arg(const char **argv,
-		struct archiver_args *ar_args, const char *prefix,
-		int remote)
+			      struct archiver_args *ar_args, int remote)
 {
 	const char *name = argv[0];
 	const struct object_id *commit_oid;
@@ -481,20 +528,6 @@ static void parse_treeish_arg(const char **argv,
 	if (!tree)
 		die(_("not a tree object: %s"), oid_to_hex(&oid));

-	if (prefix) {
-		struct object_id tree_oid;
-		unsigned short mode;
-		int err;
-
-		err = get_tree_entry(ar_args->repo,
-				     &tree->object.oid,
-				     prefix, &tree_oid,
-				     &mode);
-		if (err || !S_ISDIR(mode))
-			die(_("current working directory is untracked"));
-
-		tree = parse_tree_indirect(&tree_oid);
-	}
 	ar_args->refname = ref;
 	ar_args->tree = tree;
 	ar_args->commit_oid = commit_oid;
@@ -712,7 +745,7 @@ int write_archive(int argc, const char **argv, const char *prefix,
 		setup_git_directory();
 	}

-	parse_treeish_arg(argv, &args, prefix, remote);
+	parse_treeish_arg(argv, &args, remote);
 	parse_pathspec_arg(argv + 1, &args);

 	rc = ar->write_archive(ar, &args);
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 918a2fc7c6..a60ae6145e 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -433,6 +433,19 @@ test_expect_success 'catch non-matching pathspec' '
 	test_must_fail git archive -v HEAD -- "*.abc" >/dev/null
 '

+test_expect_success 'reject paths outside the current directory' '
+	test_must_fail git -C a/bin archive HEAD .. >/dev/null 2>err &&
+	grep "outside the current directory" err
+'
+
+test_expect_success 'allow pathspecs that resolve to the current directory' '
+	git -C a/bin archive -v HEAD ../bin >/dev/null 2>actual &&
+	cat >expect <<-\EOF &&
+	sh
+	EOF
+	test_cmp expect actual
+'
+
 # Pull the size and date of each entry in a tarfile using the system tar.
 #
 # We'll pull out only the year from the date; that avoids any question of
diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh
index 04d300eeda..0ff47a239d 100755
--- a/t/t5001-archive-attr.sh
+++ b/t/t5001-archive-attr.sh
@@ -33,6 +33,13 @@ test_expect_success 'setup' '
 	echo ignored-by-tree.d export-ignore >>.gitattributes &&
 	git add ignored-by-tree ignored-by-tree.d .gitattributes &&

+	mkdir subdir &&
+	>subdir/included &&
+	>subdir/ignored-by-subtree &&
+	>subdir/ignored-by-tree &&
+	echo ignored-by-subtree export-ignore >subdir/.gitattributes &&
+	git add subdir &&
+
 	echo ignored by worktree >ignored-by-worktree &&
 	echo ignored-by-worktree export-ignore >.gitattributes &&
 	git add ignored-by-worktree &&
@@ -93,6 +100,15 @@ test_expect_exists	archive-pathspec-wildcard/ignored-by-worktree
 test_expect_missing	archive-pathspec-wildcard/excluded-by-pathspec.d
 test_expect_missing	archive-pathspec-wildcard/excluded-by-pathspec.d/file

+test_expect_success 'git -C subdir archive' '
+	git -C subdir archive HEAD >archive-subdir.tar &&
+	extract_tar_to_dir archive-subdir
+'
+
+test_expect_exists	archive-subdir/included
+test_expect_missing	archive-subdir/ignored-by-subtree
+test_expect_missing	archive-subdir/ignored-by-tree
+
 test_expect_success 'git archive with worktree attributes' '
 	git archive --worktree-attributes HEAD >worktree.tar &&
 	(mkdir worktree && cd worktree && "$TAR" xf -) <worktree.tar
--
2.40.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] archive: improve support for running in subdirectory
  2023-03-24 22:27                         ` [PATCH v2] archive: improve support for running in subdirectory René Scharfe
@ 2023-03-27 16:09                           ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2023-03-27 16:09 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Cristian Le, Matthias Görgens

René Scharfe <l.s.r@web.de> writes:

>   +		/* rel can refer to path, so don't edit it in place */
>   +		strbuf_reset(&new_path);
>   +		strbuf_add(&new_path, args->base, args->baselen);
>   +		strbuf_addstr(&new_path, rel);
>   +		strbuf_swap(&path, &new_path);

Thanks.  That indeed was nasty and tricky.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-03-27 16:09 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-03 10:25 Bug in git archive + .gitattributes + relative path Cristian Le
2023-03-03 15:19 ` René Scharfe
2023-03-03 15:38   ` Cristian Le
2023-03-04 13:58     ` René Scharfe
2023-03-04 15:11       ` Cristian Le
2023-03-05  9:32         ` René Scharfe
2023-03-06 16:56       ` Junio C Hamano
2023-03-06 17:51         ` René Scharfe
2023-03-06 17:27       ` Junio C Hamano
2023-03-06 18:28         ` René Scharfe
2023-03-06 18:59           ` Junio C Hamano
2023-03-06 21:32             ` René Scharfe
2023-03-06 22:34               ` Junio C Hamano
2023-03-11 20:47                 ` René Scharfe
2023-03-12 21:25                   ` Junio C Hamano
2023-03-18 21:30                     ` René Scharfe
2023-03-20 16:16                       ` Junio C Hamano
2023-03-20 20:02                       ` [PATCH] archive: improve support for running in a subdirectory René Scharfe
2023-03-21 22:59                         ` Junio C Hamano
2023-03-24 22:26                           ` René Scharfe
2023-03-24 22:27                         ` [PATCH v2] archive: improve support for running in subdirectory René Scharfe
2023-03-27 16:09                           ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).