git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Heiko Voigt <hvoigt@hvoigt.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] help: correct behavior for is_executable on Windows
Date: Thu, 16 Aug 2012 00:29:13 +0200	[thread overview]
Message-ID: <20120815222911.GA44080@book.hvoigt.net> (raw)
In-Reply-To: <7v7gt06nyk.fsf@alter.siamese.dyndns.org>

Hi Junio,

On Wed, Aug 15, 2012 at 10:53:55AM -0700, Junio C Hamano wrote:
> Heiko Voigt <hvoigt@hvoigt.net> writes:
> 
> > On Mon, Aug 13, 2012 at 10:48:14AM -0700, Junio C Hamano wrote:
> >> Heiko Voigt <hvoigt@hvoigt.net> writes:
> >> > What do you think?
> >> 
> >> Does having the "stat()" help on Windows in any way?  Does it ever
> >> return an executable bit by itself?
> >
> > No, AFAIK it does not return anything about executability. But I think
> > the stat is still necessary to verify that the file exists and is a
> > regular file.
> 
> But if you are going to read it anyway, you can tell it from open()
> and read() of the first 2 bytes failing, no?  That will still be an
> implementation detail of platform specific "is_path_executable()".

No I am not opening the file anyway. Only when it does not have a
".exe" postfix. That at least was the intention of my previous patch in
this thread.

> So you are forcing Windows an extra and unnecessary stat() that only
> is needed on Cygwin, no?

My first patch in this thread (which lead to this extraction) is about
avoiding the open because it is possibly very costly on our git binaries
in case a virus scanner is running.

The optimized (and correct) check does only look at the given filename
if it contains a ".exe" postfix. A directory (although unlikely) could
also have such a suffix. Should we then consider that directory to be a
git command? AFAICS there is no such check in the earlier codepath.

A stat is much cheaper since it does not open the file. For scripts the
open is much cheaper than for our binaries because they are much
smaller. For clarification: All the git builtin binaries basically
contain the same code. Even though they are hardlinked, the system (or
the scanner) does still consider them like individual files and executes
a scan for each of them. That at least seems to be the case on a number
of systems.

> > @@ -1347,7 +1348,8 @@ ifneq (,$(findstring MINGW,$(uname_S)))
> >  	COMPAT_CFLAGS += -DSTRIP_EXTENSION=\".exe\"
> >  	COMPAT_OBJS += compat/mingw.o compat/winansi.o \
> >  		compat/win32/pthread.o compat/win32/syslog.o \
> > -		compat/win32/poll.o compat/win32/dirent.o
> > +		compat/win32/poll.o compat/win32/dirent.o \
> > +		compat/win32/executable.o
> 
> Looks sensible, even though the filename does not tell what it does
> about "executable".  is_executable.o might be a better name for them.

Agreed, I was not sure about the name anyway. is_executable.o sounds
better.

> > diff --git a/help.c b/help.c
> > index ebc2c42..d9fae3c 100644
> > --- a/help.c
> > +++ b/help.c
> > @@ -106,34 +106,8 @@ static int is_executable(const char *name)
> >  	    !S_ISREG(st.st_mode))
> >  		return 0;
> >  
> > -#if defined(WIN32) || defined(__CYGWIN__)
> > -	/* On Windows we cannot use the executable bit. The executable
> > -	 * state is determined by extension only. We do this first
> > -	 * because with virus scanners opening an executeable for
> > -	 * reading is potentially expensive.
> > -	 */
> > -	if (has_extension(name, ".exe"))
> > -		return S_IXUSR;
> > -
> > -#if defined(__CYGWIN__)
> > -if ((st.st_mode & S_IXUSR) == 0)
> > -#endif
> > -{	/* now that we know it does not have an executable extension,
> > -	   peek into the file instead */
> > -	char buf[3] = { 0 };
> > -	int n;
> > -	int fd = open(name, O_RDONLY);
> > -	st.st_mode &= ~S_IXUSR;
> > -	if (fd >= 0) {
> > -		n = read(fd, buf, 2);
> > -		if (n == 2)
> > -			/* look for a she-bang */
> > -			if (!strcmp(buf, "#!"))
> > -				st.st_mode |= S_IXUSR;
> > -		close(fd);
> > -	}
> > -}
> > -#endif
> > +	correct_executable_stat(name, &st);
> > +
> 
> Yuck.
> 
> Why should we need even a single line of the implementation of a
> function that tells if a given pathname contains an executable
> command, which we know is platform specific?  
> 
> On Posix systems, the implementation will be "stat() and check
> S_IXUSR".  On pure Windows, it will be "check .exe, or open it and
> read the first two bytes". On Cygwin, it will also be "check .exe,
> stat() and check S_IXUSR, or open it and read the first two bytes.
> 
> It is not like the caller of is_executable() needed to run stat for
> other purposes on its own and we can optimize by either borrowing
> the stat data the caller already collected for us, or returning the
> stat data we collected for later use by the caller.  The use of stat
> is entirely contained in the POSIX implementation of this function.
> 
> Why are you so dead-set to shoehorn the different semantics into
> "struct stat" that we know is not an appropriate abstraction across
> platforms?

I simply think filling in the correct information into the data
structure we use on all platforms is the most transparent approach. We
could also call this function

	correct_executable_stat_if_needed_by_platform()

but I considered that name to be too long. As explained above: To be
fully correct the stat call is still needed to make sure we are not
looking at a directory. list_commands_in_dir() from help.c is iterating
over all entries in a given directory so if we skip the stat we are not
sure whether we look at something that is not a regular file. Even
though its unlikely that there will be a directory named something.exe
in the git executable directory I would still like to keep the check in
there.

When first looking at this I also thought about extending the windows
stat replacement to do this whole logic. Then we could completely remove
this platform specific thing. But I decided against that because when
the file does not have a ".exe" postfix we would have an open call in all
stats and to my knowledge that is very expensive on Windows.

I do not know why you are against filling that information into "struct
stat". I could rephrase the above into something like this

	if (platform_is_executeable(name))
		return 1;

but its not that easy to replace to a no op by a macro anymore.

Cheers Heiko

  parent reply	other threads:[~2012-08-15 22:30 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-11  7:00 [PATCH] help: correct behavior for is_executable on Windows Heiko Voigt
2012-08-12  4:30 ` Junio C Hamano
2012-08-13 17:02   ` Heiko Voigt
2012-08-13 17:48     ` Junio C Hamano
2012-08-15 16:50       ` Heiko Voigt
2012-08-15 17:53         ` Junio C Hamano
2012-08-15 19:39           ` Junio C Hamano
2012-08-15 22:29           ` Heiko Voigt [this message]
2012-08-15 23:15             ` Junio C Hamano
2012-08-16  2:02               ` Junio C Hamano
2012-08-16 16:35                 ` Heiko Voigt
  -- strict thread matches above, loose matches on Subject: below --
2017-01-27 13:50 Johannes Schindelin
2017-01-27 18:43 ` Junio C Hamano
2017-01-30 12:44   ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120815222911.GA44080@book.hvoigt.net \
    --to=hvoigt@hvoigt.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).