git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 1/3] Add support for new %w wildcard in checkout filter
@ 2021-09-06 18:10 Calum McConnell
  2021-09-06 18:10 ` [PATCH 2/3] Die if filter is attempted without a worktree Calum McConnell
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Calum McConnell @ 2021-09-06 18:10 UTC (permalink / raw)
  To: git; +Cc: Calum McConnell

When building content filters with gitattributes, for instance to ensure
git stores the plain-text rather than the binary form of data for certain
formats, it is often advantageous to separate the filters into separate,
potentially complex scripts.  However, as the $PWD where content filters
are executed is unspecified the path to scripts needs to be specified as
an absolute path.  That means that the guide for setting up a repository
which uses scripts to filter content cannot simply consist of "include
the following lines in your .git/config file", and it means that the
otherwise safe operation of moving a git repository from one folder to
another is decidedly unsafe.

This %w (short for 'work tree') will allow such scripts to exist and
be executed on each checkout, without needing to be added to the PATH
or be dependent upon the $PWD of the checkout call.

Signed-off-by: Calum McConnell <calumlikesapplepie@gmail.com>
---
 convert.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/convert.c b/convert.c
index 0d6fb3410a..5d64ccce57 100644
--- a/convert.c
+++ b/convert.c
@@ -9,6 +9,7 @@
 #include "sub-process.h"
 #include "utf8.h"
 #include "ll-merge.h"
+#include "repository.h"
 
 /*
  * convert.c - convert a file when checking it out and checking it in.
@@ -630,19 +631,26 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 
 	/* apply % substitution to cmd */
 	struct strbuf cmd = STRBUF_INIT;
-	struct strbuf path = STRBUF_INIT;
+	struct strbuf filePath = STRBUF_INIT;
+	struct strbuf worktreePath = STRBUF_INIT;
 	struct strbuf_expand_dict_entry dict[] = {
 		{ "f", NULL, },
+		{ "w", NULL, }, 
 		{ NULL, NULL, },
 	};
 
-	/* quote the path to preserve spaces, etc. */
-	sq_quote_buf(&path, params->path);
-	dict[0].value = path.buf;
+	/* quote the paths to preserve spaces, etc. */
+	sq_quote_buf(&filePath, params->path);
+	dict[0].value = filePath.buf;
+	
+	sq_quote_buf(&worktreePath, the_repository->worktree);
+	dict[1].value = worktreePath.buf;
 
-	/* expand all %f with the quoted path */
+	/* expand all %f or %w with the quoted path */
 	strbuf_expand(&cmd, params->cmd, strbuf_expand_dict_cb, &dict);
-	strbuf_release(&path);
+	strbuf_release(&filePath);
+  	strbuf_release(&worktreePath);
+
 
 	strvec_push(&child_process.args, cmd.buf);
 	child_process.use_shell = 1;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] Die if filter is attempted without a worktree
  2021-09-06 18:10 [PATCH 1/3] Add support for new %w wildcard in checkout filter Calum McConnell
@ 2021-09-06 18:10 ` Calum McConnell
  2021-09-06 22:09   ` Ævar Arnfjörð Bjarmason
  2021-09-07  8:18   ` Bagas Sanjaya
  2021-09-06 18:10 ` [PATCH 3/3] Document the new gitattributes change Calum McConnell
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 8+ messages in thread
From: Calum McConnell @ 2021-09-06 18:10 UTC (permalink / raw)
  To: git; +Cc: Calum McConnell

As far as I know, this isn't possible.  Rather than add a bunch of
code to workarround something that might not be possible, lets just
halt and catch fire if it does.  This might need to be removed before
the change goes into master

Signed-off-by: Calum McConnell <calumlikesapplepie@gmail.com>
---
 convert.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/convert.c b/convert.c
index 5d64ccce57..df70c250b0 100644
--- a/convert.c
+++ b/convert.c
@@ -646,6 +646,11 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 	sq_quote_buf(&worktreePath, the_repository->worktree);
 	dict[1].value = worktreePath.buf;
 
+	/* The results of a nonexistent worktree could be... weird.  Lets avoid*/
+	if(dict[1].value == NULL){
+		BUG("There is no worktree for this worktree substitution");
+	}
+
 	/* expand all %f or %w with the quoted path */
 	strbuf_expand(&cmd, params->cmd, strbuf_expand_dict_cb, &dict);
 	strbuf_release(&filePath);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] Document the new gitattributes change
  2021-09-06 18:10 [PATCH 1/3] Add support for new %w wildcard in checkout filter Calum McConnell
  2021-09-06 18:10 ` [PATCH 2/3] Die if filter is attempted without a worktree Calum McConnell
@ 2021-09-06 18:10 ` Calum McConnell
  2021-09-07 16:33 ` [PATCH 1/3] Add support for new %w wildcard in checkout filter Jeff King
  2021-09-07 20:28 ` Junio C Hamano
  3 siblings, 0 replies; 8+ messages in thread
From: Calum McConnell @ 2021-09-06 18:10 UTC (permalink / raw)
  To: git; +Cc: Calum McConnell

The warning I included might make this feature hard-to-use: but
people are clever, and there are ways around it, which also
grant other useful properties to smudge- and clean- filters.

For instance, if you add at the beginning of each file you filter,
you add GITTATTRIBUTES_FILTERED_VERSION_1, that grants the "no
double clean" property and allows for versioning of the script.

To do this perfectly, you'd want to let the user declare the
script file, and then have git figure out what version of it
to run.  But that would take an understanding of the conversion
code which I lack, and would also be a lot of a work for a
little-used addition to an underused feature.

Signed-off-by: Calum McConnell <calumlikesapplepie@gmail.com>
---
 Documentation/gitattributes.txt | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 83fd4e19a4..33b0087d4f 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -486,6 +486,19 @@ not exist, or may have different contents. So, smudge and clean commands
 should not try to access the file on disk, but only act as filters on the
 content provided to them on standard input.
 
+Sequence "%w" on the filter command line is replaced with the absolute path
+to the root of the repository working tree.  This is useful if you have
+complex scripts specific to your project.  Note that these scripts should
+be mostly static for the life of the project, since the script version that 
+is run may significantly predate or follow the version of the file being
+processed
+
+------------------------
+[filter "unzip-file"]
+	clean = %w/scripts/filter-unzip-clean
+	smudge = %w/scripts/filter-unzip-smudge
+------------------------
+
 Long Running Filter Process
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] Die if filter is attempted without a worktree
  2021-09-06 18:10 ` [PATCH 2/3] Die if filter is attempted without a worktree Calum McConnell
@ 2021-09-06 22:09   ` Ævar Arnfjörð Bjarmason
  2021-09-07 14:56     ` Calum McConnell
  2021-09-07  8:18   ` Bagas Sanjaya
  1 sibling, 1 reply; 8+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-06 22:09 UTC (permalink / raw)
  To: Calum McConnell; +Cc: git


On Mon, Sep 06 2021, Calum McConnell wrote:

> As far as I know, this isn't possible.  Rather than add a bunch of
> code to workarround something that might not be possible, lets just
> halt and catch fire if it does.  This might need to be removed before
> the change goes into master
>
> Signed-off-by: Calum McConnell <calumlikesapplepie@gmail.com>
> ---
>  convert.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/convert.c b/convert.c
> index 5d64ccce57..df70c250b0 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -646,6 +646,11 @@ static int filter_buffer_or_fd(int in, int out, void *data)
>  	sq_quote_buf(&worktreePath, the_repository->worktree);
>  	dict[1].value = worktreePath.buf;
>  
> +	/* The results of a nonexistent worktree could be... weird.  Lets avoid*/
> +	if(dict[1].value == NULL){
> +		BUG("There is no worktree for this worktree substitution");
> +	}

This BUG() addition is itself buggy, elsewhere e.g. in builtin/gc.c you
can see where we have conditions like:

    the_repository->worktree ? the_repository->worktree : the_repository->gitdir;

I'm not bothering much with the greater context here, but if we suppose
that we have a case where worktreePath.buf is NULL, then
the_repository->worktree surely must have been NULL, and if you check
what sq_quote_buf() does, you'll see:

    void sq_quote_buf(struct strbuf *dst, const char *src)
    [...]
            while (*src) {

I.e. we'd segfault anyway if that "src" were to be NULL.

Even if that weren't the case then that's not the same as the
worktreePath.buf being NULL, which even if we suppose sq_quote_buf()
won't segfault and just returned won't AFAICT ever be the case, see the
comment for strbuf_slopbuf in strbuf.c. So I think that even if you
somehow reached this with a NULL worktree that BUG() won't ever be
reached.

I think this can probably just be dropped, to the extent that we need
some check like this it seems like it should happen a lot earlier in
convert.c than here, i.e. during the early setup can't we detect & abort
if we don't have a required worktree?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] Die if filter is attempted without a worktree
  2021-09-06 18:10 ` [PATCH 2/3] Die if filter is attempted without a worktree Calum McConnell
  2021-09-06 22:09   ` Ævar Arnfjörð Bjarmason
@ 2021-09-07  8:18   ` Bagas Sanjaya
  1 sibling, 0 replies; 8+ messages in thread
From: Bagas Sanjaya @ 2021-09-07  8:18 UTC (permalink / raw)
  To: Calum McConnell, git

On 07/09/21 01.10, Calum McConnell wrote:
> +	/* The results of a nonexistent worktree could be... weird.  Lets avoid*/
> +	if(dict[1].value == NULL){
> +		BUG("There is no worktree for this worktree substitution");
> +	}
> +

Why don't simply print that error message without BUG() (aka using 
die(_("message"))? It can be l10n-ed if you using the approach.

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] Die if filter is attempted without a worktree
  2021-09-06 22:09   ` Ævar Arnfjörð Bjarmason
@ 2021-09-07 14:56     ` Calum McConnell
  0 siblings, 0 replies; 8+ messages in thread
From: Calum McConnell @ 2021-09-07 14:56 UTC (permalink / raw)
  To: git@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 3272 bytes --]

On Tue, 2021-09-07 at 00:09 +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Sep 06 2021, Calum McConnell wrote:
> 
> > As far as I know, this isn't possible.  Rather than add a bunch of
> > code to workarround something that might not be possible, lets just
> > halt and catch fire if it does.  This might need to be removed before
> > the change goes into master
> > 
> > Signed-off-by: Calum McConnell <calumlikesapplepie@gmail.com>
> > ---
> >  convert.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/convert.c b/convert.c
> > index 5d64ccce57..df70c250b0 100644
> > --- a/convert.c
> > +++ b/convert.c
> > @@ -646,6 +646,11 @@ static int filter_buffer_or_fd(int in, int out,
> > void *data)
> >         sq_quote_buf(&worktreePath, the_repository->worktree);
> >         dict[1].value = worktreePath.buf;
> >  
> > +       /* The results of a nonexistent worktree could be... weird. 
> > Lets avoid*/
> > +       if(dict[1].value == NULL){
> > +               BUG("There is no worktree for this worktree
> > substitution");
> > +       }
> 
> This BUG() addition is itself buggy, elsewhere e.g. in builtin/gc.c you
> can see where we have conditions like:
> 
>     the_repository->worktree ? the_repository->worktree :
> the_repository->gitdir;
> 
> I'm not bothering much with the greater context here, but if we suppose
> that we have a case where worktreePath.buf is NULL, then
> the_repository->worktree surely must have been NULL, and if you check
> what sq_quote_buf() does, you'll see:
> 
>     void sq_quote_buf(struct strbuf *dst, const char *src)
>     [...]
>             while (*src) {
> 
> I.e. we'd segfault anyway if that "src" were to be NULL.
> 
> Even if that weren't the case then that's not the same as the
> worktreePath.buf being NULL, which even if we suppose sq_quote_buf()
> won't segfault and just returned won't AFAICT ever be the case, see the
> comment for strbuf_slopbuf in strbuf.c. So I think that even if you
> somehow reached this with a NULL worktree that BUG() won't ever be
> reached.
> 
> I think this can probably just be dropped, to the extent that we need
> some check like this it seems like it should happen a lot earlier in
> convert.c than here, i.e. during the early setup can't we detect & abort
> if we don't have a required worktree?

Part of the reason I inserted this check was because I wasn't sure about
the ordering on a checkout into an empty directory (eg, would there
actually be the filter script when it ran?).  On deeper thought, however,
that problem wouldn't even be solved by this.  Not to mention how you
would need to include the script on the initial commit, and such.

Since the check doesn't even do what I thought it would do, I thought of a
better approach: rather than having it expand to the working tree, it
expands to the git directory.  This means that you can place your scripts
in a much better location: since .git/config must be modified anyways to
use a filter, it doesn't entail a loss of features.

Why I ever thought to use the worktree is beyond me. Patch V2 coming in a
few days.

Calum McConnell


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] Add support for new %w wildcard in checkout filter
  2021-09-06 18:10 [PATCH 1/3] Add support for new %w wildcard in checkout filter Calum McConnell
  2021-09-06 18:10 ` [PATCH 2/3] Die if filter is attempted without a worktree Calum McConnell
  2021-09-06 18:10 ` [PATCH 3/3] Document the new gitattributes change Calum McConnell
@ 2021-09-07 16:33 ` Jeff King
  2021-09-07 20:28 ` Junio C Hamano
  3 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2021-09-07 16:33 UTC (permalink / raw)
  To: Calum McConnell; +Cc: git

On Mon, Sep 06, 2021 at 02:10:00PM -0400, Calum McConnell wrote:

> When building content filters with gitattributes, for instance to ensure
> git stores the plain-text rather than the binary form of data for certain
> formats, it is often advantageous to separate the filters into separate,
> potentially complex scripts.  However, as the $PWD where content filters
> are executed is unspecified the path to scripts needs to be specified as
> an absolute path.  That means that the guide for setting up a repository
> which uses scripts to filter content cannot simply consist of "include
> the following lines in your .git/config file", and it means that the
> otherwise safe operation of moving a git repository from one folder to
> another is decidedly unsafe.
> 
> This %w (short for 'work tree') will allow such scripts to exist and
> be executed on each checkout, without needing to be added to the PATH
> or be dependent upon the $PWD of the checkout call.

I sympathize with your goal here, but I have two high-level thoughts.

One is regarding security.

I was worried a bit that this might provide an opportunity for untrusted
repositories to trigger scripts from within their working trees without
the user having a chance to OK them. But the "%w" here would be used in
the config that the user must manually specify. So that gives them an
opportunity to investigate it as much as they want.

But it does mean we're pointing to code in the working tree that could
change after a git-pull, etc. You _can_ be careful about that (e.g., by
fetching and looking at the results, and then merging only if OK), but
the "use from working tree by default" model makes it easy to screw up
(especially if you put it into your ~/.gitconfig, and then it's used
with every repository, trusted or not).

When discussing similar features (like, say, taking project-level config
after making sure it's OK), our general thinking has been that we should
make sure Git can access some non-worktree location, and make it easy for
users to copy the data to it.

So in its most awkward state, that's asking people to copy the script
somewhere in their PATH (and then update it as needed, after verifying
it each time). That's not _too_ bad as instructions go, but gets hard if
your audience doesn't have a consistent PATH set up.

I wonder if we could do better with one or both of:

  - some way of specifying the repository directory in a config option.
    I think you can do something like that now with:

      mkdir -p .git/scripts
      cp myfilter .git/scripts/foo-filter
      git config filter.foo.clean '$(git rev-parse --git-dir)/scripts/foo-filter'

    which is admittedly a bit awkward. Some kind of syntactic candy
    could help there (having Git recognize "$GIT_SCRIPTS/foo-filter" or
    something).

  - some way of specifying an in-repo blob via config. I think right now
    you could do something like:

      git config filter.foo.clean '
        tmp=$(mktemp -t foo-filter) &&
	trap "rm -f \"$tmp\"" 0 &&
	git cat-file 1234abcd >"$tmp" &&
	"$tmp"
      '

    That's obviously even more horrible, but if we had some kind of
    syntactic sugar, you could set the config to "$oid:1234abcd" or
    something.

    That saves you from the awkward "mkdir/cp" in the first example, but
    gets around the security implications because of the immutability of
    Git objects.

My second thought was that this is a more general problem. It's one for
config itself, and for other script programs you might point to via
config. So it seems like we should be able to come up with a solution
that is more general than just clean/smudge filters.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] Add support for new %w wildcard in checkout filter
  2021-09-06 18:10 [PATCH 1/3] Add support for new %w wildcard in checkout filter Calum McConnell
                   ` (2 preceding siblings ...)
  2021-09-07 16:33 ` [PATCH 1/3] Add support for new %w wildcard in checkout filter Jeff King
@ 2021-09-07 20:28 ` Junio C Hamano
  3 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2021-09-07 20:28 UTC (permalink / raw)
  To: Calum McConnell; +Cc: git

> Subject: [PATCH 1/3] Add support for new %w wildcard in checkout filter

Please write your title to help readers of "git shortlog --no-merges".
With the above, readers would not know where the %w would newly be
allowed to appear, what it does, and how it helps what use case.

> When building content filters with gitattributes, for instance to ensure
> git stores the plain-text rather than the binary form of data for certain
> formats, it is often advantageous to separate the filters into separate,
> potentially complex scripts.  However, as the $PWD where content filters
> are executed is unspecified the path to scripts needs to be specified as
> an absolute path.

Isn't $PWD at least stable in a repository, relative to the top of
worktree or something?  IOW, isn't the above raise a separate
documentation issue that is better solved without any new code?

> That means that the guide for setting up a repository
> which uses scripts to filter content cannot simply consist of "include
> the following lines in your .git/config file", and it means that the
> otherwise safe operation of moving a git repository from one folder to
> another is decidedly unsafe.

If these paths are given as absolute paths (e.g. ~/filter-scripts/),
the repositories that refer to these can be moved freely, as long as
the location of the the directory that holds these auxiliary scripts
is kept stable, no?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-09-07 20:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-06 18:10 [PATCH 1/3] Add support for new %w wildcard in checkout filter Calum McConnell
2021-09-06 18:10 ` [PATCH 2/3] Die if filter is attempted without a worktree Calum McConnell
2021-09-06 22:09   ` Ævar Arnfjörð Bjarmason
2021-09-07 14:56     ` Calum McConnell
2021-09-07  8:18   ` Bagas Sanjaya
2021-09-06 18:10 ` [PATCH 3/3] Document the new gitattributes change Calum McConnell
2021-09-07 16:33 ` [PATCH 1/3] Add support for new %w wildcard in checkout filter Jeff King
2021-09-07 20:28 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).