git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "René Scharfe" <l.s.r@web.de>,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>
Subject: [PATCH v2] import-tars: ignore the global PAX header
Date: Tue, 24 Mar 2020 19:35:45 +0000	[thread overview]
Message-ID: <pull.577.v2.git.1585078545448.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.577.git.1584968924555.gitgitgadget@gmail.com>

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The tar importer in `contrib/fast-import/import-tars.perl` has a very
convenient feature: if _all_ paths stored in the imported `.tar` start
with a common prefix, e.g. `git-2.26.0/` in the tar at
https://github.com/git/git/archive/v2.26.0.tar.gz, then this prefix is
stripped.

This feature makes a ton of sense because it is relatively common to
import two or more revisions of the same project into Git, and obviously
we don't want all files to live in a tree whose name changes from
revision to revision.

Now, the problem with that feature is that it breaks down if there is a
`pax_global_header` "file" located outside of said prefix, at the top of
the tree. This is the case for `.tar` files generated by Git's very own
`git archive` command: it inserts that header, and `git archive` allows
specifying a common prefix (that the header does _not_ share with the
other files contained in the archive) via `--prefix=my-project-1.0.0/`.

Let's just skip any global header when importing `.tar` files into Git.

Note: this global header might contain useful information. For example,
in the output of `git archive`, it lists the original commit, which _is_
useful information. A future improvement to the `import-tars.perl`
script might be to include that information in the commit message, or do
other things with the information (e.g. use `mtime` information
contained in the global header as date of the commit). This patch does
not prevent any future patch from making that happen, it only prevents
the header from being treated as if it was a regular file.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    Ignore the global PAX header in import-tars.perl
    
    This problem came up in Pacman-related work, where PKGBUILD definitions
    would reference the tarballs downloaded from GitHub, and patches would
    be applied on top. To work on those patches efficiently (e.g. when an
    upgrade to a new version of the project no longer lets those patches
    apply), I need to be able to import those tarballs into playground
    worktrees and work on them. I like to use 
    contrib/fast-import/import-tars.perl for that purpose, but it really
    needs to strip the prefix, otherwise it is too tedious to work with it.
    
    Changes since v1:
    
     * Mentioned the implicit prefix-stripping feature of import-tars.perl 
       in the commit message; Without that context, it is really hard to
       understand the motivation for this patch.
     * Clarified in the commit message that this patch does not prevent any
       future patches that would use the information contained in the global
       header.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-577%2Fdscho%2Fimport-tars-skip-pax-header-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-577/dscho/import-tars-skip-pax-header-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/577

Range-diff vs v1:

 1:  718bde8f4a7 ! 1:  842dabe6128 import-tars: ignore the global PAX header
     @@ -2,12 +2,34 @@
      
          import-tars: ignore the global PAX header
      
     -    Git's own `git archive` inserts that header, but it often gets into the
     -    way of `import-tars.perl` e.g. when a prefix was specified (for example
     -    via `--prefix=my-project-1.0.0/`, or when downloading a `.tar.gz` from
     -    GitHub releases): this prefix _should_ be stripped.
     +    The tar importer in `contrib/fast-import/import-tars.perl` has a very
     +    convenient feature: if _all_ paths stored in the imported `.tar` start
     +    with a common prefix, e.g. `git-2.26.0/` in the tar at
     +    https://github.com/git/git/archive/v2.26.0.tar.gz, then this prefix is
     +    stripped.
      
     -    Let's just skip it.
     +    This feature makes a ton of sense because it is relatively common to
     +    import two or more revisions of the same project into Git, and obviously
     +    we don't want all files to live in a tree whose name changes from
     +    revision to revision.
     +
     +    Now, the problem with that feature is that it breaks down if there is a
     +    `pax_global_header` "file" located outside of said prefix, at the top of
     +    the tree. This is the case for `.tar` files generated by Git's very own
     +    `git archive` command: it inserts that header, and `git archive` allows
     +    specifying a common prefix (that the header does _not_ share with the
     +    other files contained in the archive) via `--prefix=my-project-1.0.0/`.
     +
     +    Let's just skip any global header when importing `.tar` files into Git.
     +
     +    Note: this global header might contain useful information. For example,
     +    in the output of `git archive`, it lists the original commit, which _is_
     +    useful information. A future improvement to the `import-tars.perl`
     +    script might be to include that information in the commit message, or do
     +    other things with the information (e.g. use `mtime` information
     +    contained in the global header as date of the commit). This patch does
     +    not prevent any future patch from making that happen, it only prevents
     +    the header from being treated as if it was a regular file.
      
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      


 contrib/fast-import/import-tars.perl | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/contrib/fast-import/import-tars.perl b/contrib/fast-import/import-tars.perl
index e800d9f5c9c..d50ce26d5d9 100755
--- a/contrib/fast-import/import-tars.perl
+++ b/contrib/fast-import/import-tars.perl
@@ -139,6 +139,8 @@
 			print FI "\n";
 		}
 
+		next if ($typeflag eq 'g'); # ignore global header
+
 		my $path;
 		if ($prefix) {
 			$path = "$prefix/$name";

base-commit: b4374e96c84ed9394fed363973eb540da308ed4f
-- 
gitgitgadget

  parent reply	other threads:[~2020-03-24 19:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-23 13:08 [PATCH] import-tars: ignore the global PAX header Johannes Schindelin via GitGitGadget
2020-03-23 17:09 ` René Scharfe
2020-03-23 17:41   ` Junio C Hamano
2020-03-23 21:08     ` Johannes Schindelin
2020-03-23 21:50       ` Junio C Hamano
2020-03-23 21:05   ` Johannes Schindelin
2020-03-23 21:53     ` Junio C Hamano
2020-03-23 23:25 ` brian m. carlson
2020-03-24 13:35   ` Johannes Schindelin
2020-03-24 19:35 ` Johannes Schindelin via GitGitGadget [this message]
2020-03-24 21:04   ` [PATCH v2] " Junio C Hamano
2020-03-25 17:59     ` Johannes Schindelin
2020-03-25 18:43       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.577.v2.git.1585078545448.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=l.s.r@web.de \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).