git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Windows long file paths bug(s) with "-c core.longpaths=true" whilst cloning
@ 2022-06-15 12:45 Lukas Buricin
  2022-06-18 21:57 ` Johannes Schindelin
  0 siblings, 1 reply; 2+ messages in thread
From: Lukas Buricin @ 2022-06-15 12:45 UTC (permalink / raw)
  To: git

Hi

I am seeing multiple problems when cloning. I use "-c
core.longpaths=true" and I of course have the long paths enabled in
Windows.

Cloning long file names appears to work only when the file names
themselves breach the limit MAX_PATH 260 characters, but not in other
cases.

1) When cloning in a directory path that is on its own longer than
MAX_PATH. For example, when cloning to

C:\my_very_long_named_directory_..._reaching_over_MAX_PATH_260_long

then the content of '.git' folder in

C:\my_very_long_named_directory_..._reaching_over_MAX_PATH_260_long\.git

fails to get created. Whilst this may seem like an unlikely scenario
for a single repository being in such a long-named directory, it can
happen with a more complex hierarchy of submodules, where the problem
becomes more obvious ("external" represents directories containing
submodules):

c:\my_projects\my_project\external\app_components\external\app_api_layer\external\framework\external\video_components\external\core_components\external\base\external\windows-third-parties\gtest\...

The first problem seems to be in compat\mingw.c, inside char
*mingw_mktemp(char *template) where we still use wchar_t
wtemplate[MAX_PATH];

After replacing it with MAX_LONG_PATH the problem moves further to
usages of xutftowcs_path() that internally assumes everything to be
just MAX_PATH.

When replaced all the calls to xutftowcs_path() with
xutftowcs_path_ex() providing MAX_LONG_PATH and passing
"core_long_paths", the initial parent git process moves on. However
...

2) The git process spawns another child git process, which doesn't
reflect the parent core_long_paths (being 1), because in the child
process it's 0, hence all the functions that should be allowed to
prepend the paths with \\?\ are told not to extend anything. This
obviously leads to more failures.

3) Even when the directory paths and the files do fit in the 260 limit
(I have shrunk all the paths by renaming 'external' to 'ext' in order
to fit in the MAX_PATH), git falls over in its internal .git folder
because it stores some information in significantly longer paths than
what is present in the repository itself. Using the example of

c:\my_projects\my_project\external\app_components\external\app_api_layer\external\framework\external\video_components\external\core_components\external\base\external\windows-third-parties\gtest\...

there will be folders 'modules' added at each level in the .git
internal folder for all submodules, so will end up somewhat like:

c:\my_projects\my_project\.git\modules\external\app_components\modules\external\app_api_layer\modules\external\framework\modules\external\video_components\modules\external\core_components\modules\external\base\modules\external\windows-third-parties\modules\gtest\...

So the content in .git is much more likely to breach the MAX_PATH
limit than the repository "user" files themselves.

In general, aften seeing the code (admittedly first time ever), I also
have a few questions, because I might be missing some context ...

1) Why do we use MAX_PATH for buffers at all? This might be my
misunderstanding, but I would have thought that providing long file
name support would essentially mean not using that contant other than
for determining whether to prepend given wchar_t* output by \\?\ when
the input const char* is longer than MAX_PATH, before passing the
wchar_t* in Windows API calls? This conversion and eventual prepending
could be done in one place and should be no burden for the CPU.

2) Why not having long paths by default? Again, this might be me
missing some historical context of regression, but it just seems
logical to simply prepend by \\?\ whenever needed rather than fail.

3) Why to have a "-c core.longpaths=true" command line argument at
all? Given that we can detect whether the long file names are enabled
in the OS, we could easily drop that argument and have the full
support for free? (Losing also the problem of passing the flag to
child git processes). In fact we might even not need the OS support
detection, we may simply rely on the Windows API return values and
GetLastError() whilst always proving MAX_LONG_PATH and eventually
prepending paths with \\?\.

Thank you very much in advance for consideration and for eventual explanations.

Have a nice day.

Lukas

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Windows long file paths bug(s) with "-c core.longpaths=true" whilst cloning
  2022-06-15 12:45 Windows long file paths bug(s) with "-c core.longpaths=true" whilst cloning Lukas Buricin
@ 2022-06-18 21:57 ` Johannes Schindelin
  0 siblings, 0 replies; 2+ messages in thread
From: Johannes Schindelin @ 2022-06-18 21:57 UTC (permalink / raw)
  To: Lukas Buricin; +Cc: git

Hi Lukas,

On Wed, 15 Jun 2022, Lukas Buricin wrote:

> I am seeing multiple problems when cloning. I use "-c
> core.longpaths=true" and I of course have the long paths enabled in
> Windows.
>
> Cloning long file names appears to work only when the file names
> themselves breach the limit MAX_PATH 260 characters, but not in other
> cases.
>
> 1) When cloning in a directory path that is on its own longer than
> MAX_PATH. For example, when cloning to
>
> C:\my_very_long_named_directory_..._reaching_over_MAX_PATH_260_long
>
> then the content of '.git' folder in
>
> C:\my_very_long_named_directory_..._reaching_over_MAX_PATH_260_long\.git
>
> fails to get created. Whilst this may seem like an unlikely scenario
> for a single repository being in such a long-named directory, it can
> happen with a more complex hierarchy of submodules, where the problem
> becomes more obvious ("external" represents directories containing
> submodules):
>
> c:\my_projects\my_project\external\app_components\external\app_api_layer\external\framework\external\video_components\external\core_components\external\base\external\windows-third-parties\gtest\...
>
> The first problem seems to be in compat\mingw.c, inside char
> *mingw_mktemp(char *template) where we still use wchar_t
> wtemplate[MAX_PATH];
>
> After replacing it with MAX_LONG_PATH the problem moves further to
> usages of xutftowcs_path() that internally assumes everything to be
> just MAX_PATH.

Indeed. You will see much of the same journey, in the form of testable
patches, at https://github.com/git-for-windows/git/pull/3877.

One major problem, of course, is the assumption of `mktemp()` that the
buffer passed as a parameter has the size `MAX_PATH`. So if we start
assuming that the size is actually much larger, we risk buffer overruns
unless we litter Git's code even outside of `compat/mingw.c` with
`MAX_LONG_PATH` instances, a proposal I have no doubt would be shot down
immediately on the Git mailing list.

> When replaced all the calls to xutftowcs_path() with
> xutftowcs_path_ex() providing MAX_LONG_PATH and passing
> "core_long_paths", the initial parent git process moves on. However
> ...
>
> 2) The git process spawns another child git process, which doesn't
> reflect the parent core_long_paths (being 1), because in the child
> process it's 0,

Why would it be 0, though? The `-c core.longPaths=true` option is parsed
into the environment variable `GIT_CONFIG_PARAMETERS`, which is passed
down to the child process.

> hence all the functions that should be allowed to prepend the paths with
> \\?\ are told not to extend anything. This obviously leads to more
> failures.
>
> 3) Even when the directory paths and the files do fit in the 260 limit
> (I have shrunk all the paths by renaming 'external' to 'ext' in order
> to fit in the MAX_PATH), git falls over in its internal .git folder
> because it stores some information in significantly longer paths than
> what is present in the repository itself. Using the example of
>
> c:\my_projects\my_project\external\app_components\external\app_api_layer\external\framework\external\video_components\external\core_components\external\base\external\windows-third-parties\gtest\...
>
> there will be folders 'modules' added at each level in the .git
> internal folder for all submodules, so will end up somewhat like:
>
> c:\my_projects\my_project\.git\modules\external\app_components\modules\external\app_api_layer\modules\external\framework\modules\external\video_components\modules\external\core_components\modules\external\base\modules\external\windows-third-parties\modules\gtest\...
>
> So the content in .git is much more likely to breach the MAX_PATH
> limit than the repository "user" files themselves.

Right, and there is the additional problem that the current working
directory cannot have more than 260 characters, either, when spawning new
processes.

And we _also_ have to deal with the reality that by far not all
executables handle long paths gracefully, even as we teach `git.exe` more
tricks.

For example, the MSYS2 Bash used by Git for Windows does not handle long
paths. Which is a problem because we still have some core functionality in
Git being implemented as Unix shell script: The `submodule`, `bisect`, and
`mergetool` commands are commonly-used examples.

Now, one approach we discussed over at
https://github.com/git-for-windows/git/pull/3877 is to configure that
registry setting that allows longer paths for executables whose manifest
indicate that they are "long-paths aware". The mechanics of rolling
something like that out would get tricky, though, as it would mean that
the MSYS2 runtime would have to be modified, too, and for obvious reasons
it would then require _all_ of the programs that link to it to be made
long paths-aware, too, otherwise we would be bogged down in buffer
overrun-related problems.

> In general, aften seeing the code (admittedly first time ever), I also
> have a few questions, because I might be missing some context ...
>
> 1) Why do we use MAX_PATH for buffers at all?

That is a good question. Historical reasons, I suppose. You mentioned
`mktemp()` above, which is a fine example how the C library functions were
originally designed with a hard-coded assumption as to buffers of size
`MAX_PATH`. On Windows, there is at least `_mktemp_s()`, which takes a
pointer and a size as parameters, but on Linux/Unix, I am unaware of any
such function, they all seem to expect buffers of size `MAX_PATH`, which
essentially limits what Git can use. I guess a MAX_PATH of 1024 "ought to
be enough for anybody".

> This might be my misunderstanding, but I would have thought that
> providing long file name support would essentially mean not using that
> contant other than for determining whether to prepend given wchar_t*
> output by \\?\ when the input const char* is longer than MAX_PATH,
> before passing the wchar_t* in Windows API calls? This conversion and
> eventual prepending could be done in one place and should be no burden
> for the CPU.

Even functions like `GetFullPathNameW()` require you to provide some
reasonably-sized buffer as parameter. That's the reson why we still stick
with MAX_LONG_PATH, which we chose as 4096. Because I (and IIRC Karsten
Blees) guessed at the time that 4096 out to be enough for anybody.

> 2) Why not having long paths by default? Again, this might be me
> missing some historical context of regression, but it just seems
> logical to simply prepend by \\?\ whenever needed rather than fail.

There are a few reasons why they should not be on by default, most
importantly that many 3rd-party programs cannot handle them. For example,
for a long time CMD itself could not. And of course the MSYS2 version of
`rm.exe`, which we tried to use in the test suite, cannot handle them.

Rather than succeeding to clone repositories into long paths by default
only to let the user run into trouble not only using their favorite tools
on the checked out files, but then also be unable to `rm -rf` the thing,
we chose to disable long paths support by default instead.

There are also performance implications. Accessing files via their
relative paths is much quicker than first constructing that `\\?\` form
(which must be an absolute path).

> 3) Why to have a "-c core.longpaths=true" command line argument at
> all? Given that we can detect whether the long file names are enabled
> in the OS, we could easily drop that argument and have the full
> support for free? (Losing also the problem of passing the flag to
> child git processes). In fact we might even not need the OS support
> detection, we may simply rely on the Windows API return values and
> GetLastError() whilst always proving MAX_LONG_PATH and eventually
> prepending paths with \\?\.

Even just the `mktemp()` example you found _already_ puts a stick into the
wheels of your idea.

I also sense that there's confusion between the two forms long paths
support can take:

- Convert paths that would otherwise be too long to the `\\?\` form and
  use the `*W()` functions of the Windows API. This is supported by all
  Windows versions that are supported by Git for Windows.

- Require users to configure a registry setting, use the `*W()` family of
  Win32 API functions exclusively, add a manifest entry that your
  executable is long paths-aware, and skip that `\\?\` prefixing. See
  https://docs.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation
  for more details. This requires Windows 10 Build 1607 or later.

The more elegant solution would be the latter, of course, but its Windows
10 Build 1607 requirement is incompatible with Git for Windows' promise
that it still supports Windows versions all the way back to Vista
(although you can expect Vista support to be dropped after Git for Windows
v2.37.0 is released, with Windows 7 and Windows 8 support to follow around
the end of this year).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-06-18 21:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-15 12:45 Windows long file paths bug(s) with "-c core.longpaths=true" whilst cloning Lukas Buricin
2022-06-18 21:57 ` Johannes Schindelin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).