git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: tboegi@web.de
Cc: git@vger.kernel.org, random_n0body@icloud.com,
	evraiphilippeblain@gmail.com,
	Philippe Blain <levraiphilippeblain@gmail.com>
Subject: Re: [PATCH/RFC v1 1/1] git restore -p . and precomposed unicode
Date: Sun, 24 Jan 2021 11:51:33 -0800	[thread overview]
Message-ID: <xmqqsg6qyuyi.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <20210124151306.23185-1-tboegi@web.de> (tboegi@web.de's message of "Sun, 24 Jan 2021 16:13:06 +0100")

tboegi@web.de writes:

> The solution is to read the config variable "core.precomposeunicode" early.

For a single command like "restore", we need to fail if it is run
outside a repository anyway, so it is OK, but code in "git.c" in
general can be called outside a repository, we may not know where
our configuration files are, though.  So I'd prefer to avoid
adding too many "do this thing early for every single command".

Where do we normally read that variable?  I see calls to
precompose_argv() on only a few codepaths

builtin/diff-files.c:38:	precompose_argv(argc, argv);
builtin/diff-index.c:28:	precompose_argv(argc, argv);
builtin/diff-tree.c:129:	precompose_argv(argc, argv);
builtin/diff.c:455:	precompose_argv(argc, argv);
builtin/submodule--helper.c:1260:	precompose_argv(diff_args.nr, diff_args.v);
parse-options.c:872:	precompose_argv(argc, argv);

I guess the reason we can get away with it is because most of the
newer commands use the parse_options() API, and the call to
precompose_argv() is used by the codepaths implementing these
commands.  And as a rule, these commands read config first before
calling parse_options(), so by the time the control reaches there,
the value of the variable may be already known.

The question is why "restore -p" is so special?  Or does this patch
mean everybody, even the ones that use parse_options() is broken?

I guess "prefix" needs to be munged for everybody, so "restore -p"
on the title of the patch is a red herring, and the problem applies
to all "git" commands---in which case, inside "git.c" would be the
right place to do so.

I wonder if everybody who calls precompoase_argv() has access to the
prefix, though.  If so, would it make more sense to extend the API
to

	precompose_argv(int argc, char **argv, char **prefix)

so that the callers only need to call a single function, without any
additional code like this patch does?

Also, as the current precompose_argv() begins like this:

        void precompose_argv(int argc, const char **argv)
        {
                int i = 0;
                const char *oldarg;
                char *newarg;
                iconv_t ic_precompose;

                if (precomposed_unicode != 1)
                        return;

and environment.c initializes the global to (-1), I wonder if we can
get away with an approach not to read the "config" anywhere outside
precompose_argv() function.  Instead, can't the above snippet become
more like:

                if (precomposed_unicode < 0)
                        precomposed_unicode = read from config;
		if (precomposed_unicode != 1)
			return;

That way, you do not even have to touch anything outside
compat/precompose_utf8.c, other than adjusting the callers to pass
the pointer to their prefix to be munged.

Namely

> +int precompose_read_config_gently(void)

This can become file-scope static.

> +{
> +	git_config_get_bool("core.precomposeunicode", &precomposed_unicode);
> +	return precomposed_unicode == 1;
> +}
>
>  void probe_utf8_pathname_composition(void)
>  {
> @@ -60,6 +65,25 @@ void probe_utf8_pathname_composition(void)
>  	strbuf_release(&path);
>  }
>
> +char *precompose_string_if_needed(const char *in)
> +{

This too.

> diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h
> index 6f843d3e1a..ce82857d73 100644
> --- a/compat/precompose_utf8.h
> +++ b/compat/precompose_utf8.h
> @@ -28,6 +28,8 @@ typedef struct {
>  	struct dirent_prec_psx *dirent_nfc;
>  } PREC_DIR;
>
> +int precompose_read_config_gently(void);
> +char *precompose_string_if_needed(const char *in);
>  void precompose_argv(int argc, const char **argv);
>  void probe_utf8_pathname_composition(void);

And this can go away.

> diff --git a/git-compat-util.h b/git-compat-util.h
> index 104993b975..f34854b66f 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -252,6 +252,14 @@ typedef unsigned long uintptr_t;
>  #ifdef PRECOMPOSE_UNICODE
>  #include "compat/precompose_utf8.h"
>  #else
> +static inline int precompose_read_config_gently(void)
> +{
> +	return 0;
> +}
> +static inline char *precompose_string_if_needed(const char *in)
> +{
> +	return NULL; /* no need to precompose a string */
> +}

So do these.

> diff --git a/git.c b/git.c
> index a00a0a4d94..f09e14f733 100644
> --- a/git.c
> +++ b/git.c
> @@ -421,6 +421,15 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
>  			prefix = setup_git_directory_gently(&nongit_ok);
>  		}
>
> +		if (precompose_read_config_gently()) {
> +			precompose_argv(argc, argv);
> +			if (prefix) {
> +				const char *prec_pfx;
> +					prec_pfx = precompose_string_if_needed(prefix);
> +				if (prec_pfx)
> +					prefix = prec_pfx; /* memory lost */
> +			}
> +		}

And this would move to the beginning of precompose_argv()
implementation.

Also the code we currently use to read the core.precomposeunicode
configuration variable (presumably somewhere in git_default_config()
callchain; I didn't check) can go away.

Which would be much nicer outcome, if doable (again, I didn't check).

> diff --git a/t/t3910-mac-os-precompose.sh b/t/t3910-mac-os-precompose.sh
> index 54ce19e353..bbbc50da93 100755
> --- a/t/t3910-mac-os-precompose.sh
> +++ b/t/t3910-mac-os-precompose.sh
> @@ -191,6 +191,21 @@ test_expect_failure 'handle existing decomposed filenames' '
>  	test_must_be_empty untracked
>  '
>
> +test_expect_success "unicode decomposed: git restore -p . " '
> +	DIRNAMEPWD=dir.Odiarnfc &&
> +	DIRNAMEINREPO=dir.$Adiarnfc &&
> +	export DIRNAMEPWD DIRNAMEINREPO &&
> +	git init $DIRNAMEPWD &&
> +	( cd $DIRNAMEPWD &&
> +		mkdir $DIRNAMEINREPO &&

Style:

	(
		cd $DIRNAMEPWD &&
		mkdir $DIRNAMEINREPO &&

Is "restore" the only thing that has this issue?  

I would imagine that anything that takes '.' pathspec to limit its
operation to the current subdirectory (e.g. "cd sub && git diff .")
would be affected (clarification: I am *not* hinting that other
commands need to be tested---I am however hinting to update the
proposed log message to explain either (1) this applies in general
to commands that does X, or (2) this affects only "restore -p" which
does this unusual thing Y that no other commands do).

Thanks.


> +		cd $DIRNAMEINREPO &&
> +		echo "Initial" >file &&
> +		git add file &&
> +		echo "More stuff" >>file &&
> +		echo y | git restore -p .
> +	)
> +'
> +
>  # Test if the global core.precomposeunicode stops autosensing
>  # Must be the last test case
>  test_expect_success "respect git config --global core.precomposeunicode" '
> --
> 2.30.0.155.g66e871b664

  reply	other threads:[~2021-01-24 19:54 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-06 11:35 git-bugreport-2021-01-06-1209.txt (git can't deal with special characters) Daniel Troger
2021-01-06 14:21 ` Torsten Bögershausen
2021-01-06 16:49   ` Daniel Troger
2021-01-06 21:47     ` Torsten Bögershausen
2021-01-06 22:21       ` Daniel Troger
2021-01-06 23:07         ` Randall S. Becker
2021-01-07 14:34           ` Philippe Blain
2021-01-07 15:49             ` Torsten Bögershausen
2021-01-07 16:21               ` Philippe Blain
2021-01-08 19:07                 ` Torsten Bögershausen
2021-01-24 15:13 ` [PATCH/RFC v1 1/1] git restore -p . and precomposed unicode tboegi
2021-01-24 19:51   ` Junio C Hamano [this message]
2021-01-25 16:53     ` Torsten Bögershausen
2021-01-29 17:15 ` [PATCH v2 1/1] MacOS: precompose_argv_prefix() tboegi
2021-01-29 23:19   ` Junio C Hamano
2021-01-31  0:43   ` Junio C Hamano
2021-01-31  9:50     ` Torsten =?unknown-8bit?Q?B=C3=B6gershausen?=
2021-02-02 15:11 ` [PATCH v3 " tboegi
2021-02-02 17:43   ` Junio C Hamano
2021-02-03 16:28 ` [PATCH v4 " tboegi
2021-02-03 19:33   ` Junio C Hamano
2021-02-03 22:13     ` Junio C Hamano
2021-02-05 17:31       ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqsg6qyuyi.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=evraiphilippeblain@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=levraiphilippeblain@gmail.com \
    --cc=random_n0body@icloud.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).