git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Cristian Le <cristian.le@mpsd.mpg.de>, git@vger.kernel.org
Subject: Re: Bug in git archive + .gitattributes + relative path
Date: Sat, 4 Mar 2023 14:58:40 +0100	[thread overview]
Message-ID: <70f10864-2cc7-cb9e-f868-2ac0011cad58@web.de> (raw)
In-Reply-To: <8d04019d-511f-0f99-42cc-d0b25720cd71@mpsd.mpg.de>

Am 03.03.23 um 16:38 schrieb Cristian Le:
>> In your issue #444 you write that "git archive HEAD" works, but
>> "git archive HEAD:./" doesn't. Why do you need to use the latter?
>
> Specifically we want to allow for `HEAD:./sub_dir` where `./sub_dir`
> contains `.gitattributes` and `.git_archive.txt`.
>
> Alternatively, it would be helpful if we can pass `--transform`
> commands of `tar` directly so that we can change the paths.
>
> Overall what we are doing in tito is that the source would be in
> `./src` and outside is metadata like `./my_package.spec`. We are
> using `git archive HEAD:./src --prefix=my_package-1.0.0` to pass the
> appropriate form that the rpm spec file can locate. In a tar command
> we can use `--transform=s|^src/|my_package-1.0.0/|` to achieve the
> equivalent.

What is Tito?  https://github.com/rpm-software-management/tito says:
"Tito is a tool for managing RPM based projects using git for their
source code repository."  It supports Git repositories containing
multiple projects.  I suppose that means e.g. for Git's own repo that
Tito would allow creating a separate RPM file for e.g. git-gui.

Side note: Tito features include: "Create reliable tar.gz files with
consistent checksums from any tag."  That's achieved by compressing
using "gzip -n -c".  Avoiding the native tgz support of git archive --
probably only because the code predates it -- shields Tito from the
change to use our internal gzip implementation discussed recently in
https://lore.kernel.org/git/a812a664-67ea-c0ba-599f-cb79e2d96694@gmail.com/

Note, however, that the tar output of git archive is not guaranteed to
be stable between Git versions, either.  Recently adding such a stable
format was proposed in
https://lore.kernel.org/git/20230205221728.4179674-1-sandals@crustytoothpaste.net/

The code for calling git archive with a tree was present in Tito's
initial commit, which says that it was taken from Spacewalk:
https://github.com/rpm-software-management/tito/commit/e87345d7b7.
There it was introduced along with a script that changes the mtime
of archive entries from the current time to the commit timestamp by
https://github.com/spacewalkproject/spacewalk/commit/34267e39d472.

I don't fully understand the explanation in its commit message
("make it possible to call make srpm even if the directory of the
package has changed"); perhaps it requires more domain knowledge.
But I can understand the need for archiving sub-directories in the
context of supporting multi-project repositories.

> However we cannot use the `tar` directly because that would affect
> the timestamps and permissions of the file that are set by `git
> archive`.

GNU tar has the options --mode and --mtime to chose permissions and
modifications of files added to an archive.

git archive is going to get an --mtime option as well in the next
release, by the way.

> So allowing for something like `git archive HEAD
> --transform=s|^src/|my_package-1.0.0/|`, where the transform is done
> after `.gitattributes` is performed would solve this issue.

GNU tar has this --transform option, bsdtar similarly has -s.  Both
also have --strip-components (GNU tar only for extraction, though),
which is a bit simpler and should suffice for your use case.

--- >8 ---
Subject: [PATCH] archive: add --strip-components

Allow removing leading elements from paths of archive entries.  That's
useful when archiving sub-directories and not wanting to keep the
common path prefix, e.g.:

   $ git archive --strip-components=1 HEAD sha1dc | tar tf -
   .gitattributes
   LICENSE.txt
   sha1.c
   sha1.h
   ubc_check.c
   ubc_check.h

The same can be achieved by specifying a tree instead of a commit and
a pathspec:

   $ git archive HEAD:sha1dc | tar tf -
   .gitattributes
   LICENSE.txt
   sha1.c
   sha1.h
   ubc_check.c
   ubc_check.h

However, this doesn't support the export-subst attribute, doesn't
include the commit hash as an archive comment and uses the current time
instead of the commit date as mtime for archive entries.

The new option is adapted from bsdtar.  GNU tar provides it as well, but
only for extraction.

The new option does not affect the paths of entries added by --add-file
and --add-virtual-file because they are handcrafted to their desired
values already.  Similarly, the value of --prefix is not subject to
component stripping.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 Documentation/git-archive.txt |  6 ++++++
 archive.c                     | 16 ++++++++++++++++
 archive.h                     |  1 +
 t/t5000-tar-tree.sh           | 13 +++++++++++++
 4 files changed, 36 insertions(+)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index 6bab201d37..5dad917e7b 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -55,6 +55,12 @@ OPTIONS
 	rightmost value is used for all tracked files.  See below which
 	value gets used by `--add-file` and `--add-virtual-file`.

+--strip-components=<n>::
+	Remove the specified number of leading path elements.  Pathnames
+	with fewer elements will be silently skipped.  Does not affect
+	the prefix added by `--prefix`, nor entries added with
+	`--add-file` or `--add-virtual-file`.
+
 -o <file>::
 --output=<file>::
 	Write the archive to <file> instead of stdout.
diff --git a/archive.c b/archive.c
index 9aeaf2bd87..8308d4d9c4 100644
--- a/archive.c
+++ b/archive.c
@@ -166,6 +166,18 @@ static int write_archive_entry(const struct object_id *oid, const char *base,
 		args->convert = check_attr_export_subst(check);
 	}

+	if (args->strip_components > 0) {
+		size_t orig_baselen = baselen;
+		for (int i = 0; i < args->strip_components; i++) {
+			const char *slash = memchr(base, '/', baselen);
+			if (!slash)
+				return S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0;
+			baselen -= slash - base + 1;
+			base = slash + 1;
+		}
+		strbuf_remove(&path, args->baselen, orig_baselen - baselen);
+	}
+
 	if (args->verbose)
 		fprintf(stderr, "%.*s\n", (int)path.len, path.buf);

@@ -593,12 +605,15 @@ static int parse_archive_args(int argc, const char **argv,
 	int verbose = 0;
 	int i;
 	int list = 0;
+	int strip_components = 0;
 	int worktree_attributes = 0;
 	struct option opts[] = {
 		OPT_GROUP(""),
 		OPT_STRING(0, "format", &format, N_("fmt"), N_("archive format")),
 		OPT_STRING(0, "prefix", &base, N_("prefix"),
 			N_("prepend prefix to each pathname in the archive")),
+		OPT_INTEGER(0, "strip-components", &strip_components,
+			N_("remove leading path elements")),
 		{ OPTION_CALLBACK, 0, "add-file", args, N_("file"),
 		  N_("add untracked file to archive"), 0, add_file_cb,
 		  (intptr_t)&base },
@@ -675,6 +690,7 @@ static int parse_archive_args(int argc, const char **argv,
 	args->baselen = strlen(base);
 	args->worktree_attributes = worktree_attributes;
 	args->mtime_option = mtime_option;
+	args->strip_components = strip_components;

 	return argc;
 }
diff --git a/archive.h b/archive.h
index 7178e2a9a2..e9becbd57d 100644
--- a/archive.h
+++ b/archive.h
@@ -23,6 +23,7 @@ struct archiver_args {
 	unsigned int worktree_attributes : 1;
 	unsigned int convert : 1;
 	int compression_level;
+	int strip_components;
 	struct string_list extra_files;
 	struct pretty_print_context *pretty_ctx;
 };
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 918a2fc7c6..629d2e78d7 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -271,6 +271,19 @@ test_expect_success 'git get-tar-commit-id' '
 	test_cmp expect actual
 '

+test_expect_success 'git archive --strip-components' '
+	git archive --strip-components=3 HEAD >strip3.tar &&
+	(
+		mkdir strip3 &&
+		cd strip3 &&
+		"$TAR" xf ../strip3.tar &&
+		find . | grep -v "^\.\$" | sort >../strip3.lst
+	) &&
+	sed -ne "s-\([^/]*/\)\{3\}-./-p" a.lst >expect &&
+	test_cmp expect strip3.lst &&
+	diff -r a/long_path_to_a_file/long_path_to_a_file strip3
+'
+
 test_expect_success 'git archive with --output, override inferred format' '
 	git archive --format=tar --output=d4.zip HEAD &&
 	test_cmp_bin b.tar d4.zip
--
2.39.2

  reply	other threads:[~2023-03-04 13:58 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-03 10:25 Bug in git archive + .gitattributes + relative path Cristian Le
2023-03-03 15:19 ` René Scharfe
2023-03-03 15:38   ` Cristian Le
2023-03-04 13:58     ` René Scharfe [this message]
2023-03-04 15:11       ` Cristian Le
2023-03-05  9:32         ` René Scharfe
2023-03-06 16:56       ` Junio C Hamano
2023-03-06 17:51         ` René Scharfe
2023-03-06 17:27       ` Junio C Hamano
2023-03-06 18:28         ` René Scharfe
2023-03-06 18:59           ` Junio C Hamano
2023-03-06 21:32             ` René Scharfe
2023-03-06 22:34               ` Junio C Hamano
2023-03-11 20:47                 ` René Scharfe
2023-03-12 21:25                   ` Junio C Hamano
2023-03-18 21:30                     ` René Scharfe
2023-03-20 16:16                       ` Junio C Hamano
2023-03-20 20:02                       ` [PATCH] archive: improve support for running in a subdirectory René Scharfe
2023-03-21 22:59                         ` Junio C Hamano
2023-03-24 22:26                           ` René Scharfe
2023-03-24 22:27                         ` [PATCH v2] archive: improve support for running in subdirectory René Scharfe
2023-03-27 16:09                           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=70f10864-2cc7-cb9e-f868-2ac0011cad58@web.de \
    --to=l.s.r@web.de \
    --cc=cristian.le@mpsd.mpg.de \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).