git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Phillip Wood <phillip.wood123@gmail.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Taylor Blau <me@ttaylorr.com>,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	Thomas Guyot-Sionnest <tguyot@gmail.com>
Subject: Re: [PATCH 3/3] diff --no-index: support reading from named pipes
Date: Tue, 27 Jun 2023 12:44:31 -0700	[thread overview]
Message-ID: <xmqqy1k4g068.fsf@gitster.g> (raw)
In-Reply-To: <990e71882bfdc697285c5b04b92c290679ca22ab.1687874975.git.phillip.wood@dunelm.org.uk> (Phillip Wood's message of "Tue, 27 Jun 2023 15:10:16 +0100")

Phillip Wood <phillip.wood123@gmail.com> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> In some shells, such as bash and zsh, it's possible to use a command
> substitution to provide the output of a command as a file argument to
> another process, like so:
>
>   diff -u <(printf "a\nb\n") <(printf "a\nc\n")
>
> However, this syntax does not produce useful results with "git diff
> --no-index". On macOS, the arguments to the command are named pipes
> under /dev/fd, and git diff doesn't know how to handle a named pipe. On
> Linux, the arguments are symlinks to pipes, so git diff "helpfully"
> diffs these symlinks, comparing their targets like "pipe:[1234]" and
> "pipe:[5678]".
>
> To address this "diff --no-index" is changed so that if a path given on
> the commandline is a named pipe or a symbolic link that resolves to a
> named pipe then we read the data to diff from that pipe. This is
> implemented by generalizing the code that already exists to handle
> reading from stdin when the user passes the path "-".
>
> As process substitution is not support by POSIX this change is tested by
> using a pipe and a symbolic link to a pipe.
>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  diff-no-index.c          | 80 ++++++++++++++++++++++++----------------
>  t/t4053-diff-no-index.sh | 25 +++++++++++++
>  2 files changed, 73 insertions(+), 32 deletions(-)

This looks good, if a bit invasive, to a cursory read, at least to
me.  It is very focused to the real problem at hand, and shows that
the way we split the "no-index" mode out to its own implementation
of filespec population code does make sense.

> -static void populate_from_stdin(struct diff_filespec *s)
> +static void populate_from_pipe(struct diff_filespec *s, int is_stdin)
>  {
>  	struct strbuf buf = STRBUF_INIT;
>  	size_t size = 0;
> +	int fd = 0;
>  
> -	if (strbuf_read(&buf, 0, 0) < 0)
> +	if (!is_stdin)
> +		fd = xopen(s->path, O_RDONLY);
> +	if (strbuf_read(&buf, fd, 0) < 0)
>  		die_errno("error while reading from stdin");
> +	if (!is_stdin)
> +		close(fd);

Given that the error message explicitly says "stdin", and there are
many "if ([!]is_stdin)" sprinkled in the code, I actually suspect
that there should be two separate helpers, one for stdin and one for
non-stdin pipe.  It is especially true since there is only one
caller that does this:

> +	if (is_pipe)
> +		populate_from_pipe(s, name == file_from_standard_input);

which can be

	if (is_pipe) {
		if (name == file_from_standard_input)
			populate_from_stdin(s);
		else
			populate_from_pipe(s);
	}

without losing clarity.  The code that you are sharing by forcing
them to be a single helper to wrap up a handful of members in the s
structure can become its own helper that is called from these two
helper functions.

>  static int queue_diff(struct diff_options *o,
> -		      const char *name1, const char *name2)
> +		      const char *name1, int is_pipe1,
> +		      const char *name2, int is_pipe2)
>  {
>  	int mode1 = 0, mode2 = 0;
>  
> -	if (get_mode(name1, &mode1) || get_mode(name2, &mode2))
> +	if (get_mode(name1, is_pipe1, &mode1) ||
> +	    get_mode(name2, is_pipe2, &mode2))
>  		return -1;

Makes me wonder why the caller of queue_diff() even needs to know if
these two names are pipes; we are calling get_mode() which would run
stat(2) anyway, and the result from stat(2) is what you use (in the
caller) to determine the values of is_pipeN.  Wouldn't it be more
appropriate to leave the caller oblivious of special casing of the
pipes and let get_mode() handle this?  After all, that is how the
existing code special cases the standard input so there is a strong
precedence.

If we go that route, it may make sense to further isolate the
"address comparison" trick used for the standard input mode.
Perhaps we can and do something like

    static int get_mode(const char *path, int *mode, int *special)
    {
	struct stat st;

+	*special = 0; /* default - nothing special */
	...
	else if (path == file_from_standard_input) {
		*mode = create_ce_mode(0666);
+		*pipe_kind = 1; /* STDIN */
+	} else if (stat(path, &st)) {
+		... error ...
+	} else if (S_ISFIFO(st.st_mode)) {
+		*mode = create_ce_mode(0666);
+		*pipe_kind = 2; /* FIFO */
	} else if (lstat(path, &st)) {
		... error ...
	} else {
		*mode = st.st_mode;
	}

and have the caller act on "special" to choose among calling
populate_from_stdin(), populate_from_pipe(), or do nothing for
the regular files?

    Side note: this has an added benefit of highlighting that we do
    stat() and lstat() because of dereferencing.  What I suspect is
    that "git diff --no-index" mode was primarily to give Git
    niceties like rename detection and diff algorithms to those who
    wanted to use in contexts (i.e. contents not tracked by Git)
    they use "diff" by other people like GNU, without bothering to
    update "diff" by other people.  I further suspect that "compare
    the readlink contents", which is very much necessary within the
    Git context, may not fall into the "Git niceties" when they
    invoke "--no-index" mode.  Which leads me to imagine a future
    direction where we only use stat() and not lstat() in the
    "--no-index" codepath.  Having everything including these
    lstat() and stat() calls inside get_mode() will allow such a
    future transition hopefully simpler.

I do not quite see why you decided to move the "is_dir" processing
up and made the caller responsible.  Specifically,

> -	fixup_paths(paths, &replacement);
> +	if (!is_pipe[0] && !is_pipe[1])
> +		fixup_paths(paths, is_dir, &replacement);

this seems fishy when one side is pipe and the other one is not.
When the user says

    $ git diff --no-index <(command) path

fixup_paths() are bypassed because one of them is pipe.  It makes me
suspect that it should be an error if "path" is a directory.  I do
not know if fixup_paths() is the best place for doing such checking,
but somebody should be doing that, no?



  reply	other threads:[~2023-06-27 19:44 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-27 14:10 [PATCH 0/3] diff --no-index: support reading from named pipes Phillip Wood
2023-06-27 14:10 ` [PATCH 1/3] diff --no-index: die on error reading stdin Phillip Wood
2023-06-27 14:10 ` [PATCH 2/3] t4054: test diff --no-index with stdin Phillip Wood
2023-06-27 14:10 ` [PATCH 3/3] diff --no-index: support reading from named pipes Phillip Wood
2023-06-27 19:44   ` Junio C Hamano [this message]
2023-06-28 10:05     ` Phillip Wood
2023-07-05 19:49 ` [PATCH v2 0/4] " Phillip Wood
2023-07-05 19:49   ` [PATCH v2 1/4] diff --no-index: refuse to compare stdin to a directory Phillip Wood
2023-07-05 21:17     ` Junio C Hamano
2023-07-05 19:49   ` [PATCH v2 2/4] diff --no-index: die on error reading stdin Phillip Wood
2023-07-05 21:18     ` Junio C Hamano
2023-07-05 19:49   ` [PATCH v2 3/4] t4054: test diff --no-index with stdin Phillip Wood
2023-07-05 21:22     ` Junio C Hamano
2023-07-05 19:49   ` [PATCH v2 4/4] diff --no-index: support reading from named pipes Phillip Wood
2023-07-05 22:19     ` Junio C Hamano
2023-08-09 17:17     ` Jeff King
2023-08-10 12:56       ` Phillip Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqy1k4g068.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=phillip.wood123@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=tguyot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).