git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Lukas Buricin <lukas.buricin@cubicmotion.com>
Cc: git@vger.kernel.org
Subject: Re: Windows long file paths bug(s) with "-c core.longpaths=true" whilst cloning
Date: Sat, 18 Jun 2022 23:57:07 +0200 (CEST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.2206182308590.349@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <CA+c+RB=ud_==QYJMgcwQ=X4imQhxsFvMuKp_bP0H_MBY1BbUgQ@mail.gmail.com>

Hi Lukas,

On Wed, 15 Jun 2022, Lukas Buricin wrote:

> I am seeing multiple problems when cloning. I use "-c
> core.longpaths=true" and I of course have the long paths enabled in
> Windows.
>
> Cloning long file names appears to work only when the file names
> themselves breach the limit MAX_PATH 260 characters, but not in other
> cases.
>
> 1) When cloning in a directory path that is on its own longer than
> MAX_PATH. For example, when cloning to
>
> C:\my_very_long_named_directory_..._reaching_over_MAX_PATH_260_long
>
> then the content of '.git' folder in
>
> C:\my_very_long_named_directory_..._reaching_over_MAX_PATH_260_long\.git
>
> fails to get created. Whilst this may seem like an unlikely scenario
> for a single repository being in such a long-named directory, it can
> happen with a more complex hierarchy of submodules, where the problem
> becomes more obvious ("external" represents directories containing
> submodules):
>
> c:\my_projects\my_project\external\app_components\external\app_api_layer\external\framework\external\video_components\external\core_components\external\base\external\windows-third-parties\gtest\...
>
> The first problem seems to be in compat\mingw.c, inside char
> *mingw_mktemp(char *template) where we still use wchar_t
> wtemplate[MAX_PATH];
>
> After replacing it with MAX_LONG_PATH the problem moves further to
> usages of xutftowcs_path() that internally assumes everything to be
> just MAX_PATH.

Indeed. You will see much of the same journey, in the form of testable
patches, at https://github.com/git-for-windows/git/pull/3877.

One major problem, of course, is the assumption of `mktemp()` that the
buffer passed as a parameter has the size `MAX_PATH`. So if we start
assuming that the size is actually much larger, we risk buffer overruns
unless we litter Git's code even outside of `compat/mingw.c` with
`MAX_LONG_PATH` instances, a proposal I have no doubt would be shot down
immediately on the Git mailing list.

> When replaced all the calls to xutftowcs_path() with
> xutftowcs_path_ex() providing MAX_LONG_PATH and passing
> "core_long_paths", the initial parent git process moves on. However
> ...
>
> 2) The git process spawns another child git process, which doesn't
> reflect the parent core_long_paths (being 1), because in the child
> process it's 0,

Why would it be 0, though? The `-c core.longPaths=true` option is parsed
into the environment variable `GIT_CONFIG_PARAMETERS`, which is passed
down to the child process.

> hence all the functions that should be allowed to prepend the paths with
> \\?\ are told not to extend anything. This obviously leads to more
> failures.
>
> 3) Even when the directory paths and the files do fit in the 260 limit
> (I have shrunk all the paths by renaming 'external' to 'ext' in order
> to fit in the MAX_PATH), git falls over in its internal .git folder
> because it stores some information in significantly longer paths than
> what is present in the repository itself. Using the example of
>
> c:\my_projects\my_project\external\app_components\external\app_api_layer\external\framework\external\video_components\external\core_components\external\base\external\windows-third-parties\gtest\...
>
> there will be folders 'modules' added at each level in the .git
> internal folder for all submodules, so will end up somewhat like:
>
> c:\my_projects\my_project\.git\modules\external\app_components\modules\external\app_api_layer\modules\external\framework\modules\external\video_components\modules\external\core_components\modules\external\base\modules\external\windows-third-parties\modules\gtest\...
>
> So the content in .git is much more likely to breach the MAX_PATH
> limit than the repository "user" files themselves.

Right, and there is the additional problem that the current working
directory cannot have more than 260 characters, either, when spawning new
processes.

And we _also_ have to deal with the reality that by far not all
executables handle long paths gracefully, even as we teach `git.exe` more
tricks.

For example, the MSYS2 Bash used by Git for Windows does not handle long
paths. Which is a problem because we still have some core functionality in
Git being implemented as Unix shell script: The `submodule`, `bisect`, and
`mergetool` commands are commonly-used examples.

Now, one approach we discussed over at
https://github.com/git-for-windows/git/pull/3877 is to configure that
registry setting that allows longer paths for executables whose manifest
indicate that they are "long-paths aware". The mechanics of rolling
something like that out would get tricky, though, as it would mean that
the MSYS2 runtime would have to be modified, too, and for obvious reasons
it would then require _all_ of the programs that link to it to be made
long paths-aware, too, otherwise we would be bogged down in buffer
overrun-related problems.

> In general, aften seeing the code (admittedly first time ever), I also
> have a few questions, because I might be missing some context ...
>
> 1) Why do we use MAX_PATH for buffers at all?

That is a good question. Historical reasons, I suppose. You mentioned
`mktemp()` above, which is a fine example how the C library functions were
originally designed with a hard-coded assumption as to buffers of size
`MAX_PATH`. On Windows, there is at least `_mktemp_s()`, which takes a
pointer and a size as parameters, but on Linux/Unix, I am unaware of any
such function, they all seem to expect buffers of size `MAX_PATH`, which
essentially limits what Git can use. I guess a MAX_PATH of 1024 "ought to
be enough for anybody".

> This might be my misunderstanding, but I would have thought that
> providing long file name support would essentially mean not using that
> contant other than for determining whether to prepend given wchar_t*
> output by \\?\ when the input const char* is longer than MAX_PATH,
> before passing the wchar_t* in Windows API calls? This conversion and
> eventual prepending could be done in one place and should be no burden
> for the CPU.

Even functions like `GetFullPathNameW()` require you to provide some
reasonably-sized buffer as parameter. That's the reson why we still stick
with MAX_LONG_PATH, which we chose as 4096. Because I (and IIRC Karsten
Blees) guessed at the time that 4096 out to be enough for anybody.

> 2) Why not having long paths by default? Again, this might be me
> missing some historical context of regression, but it just seems
> logical to simply prepend by \\?\ whenever needed rather than fail.

There are a few reasons why they should not be on by default, most
importantly that many 3rd-party programs cannot handle them. For example,
for a long time CMD itself could not. And of course the MSYS2 version of
`rm.exe`, which we tried to use in the test suite, cannot handle them.

Rather than succeeding to clone repositories into long paths by default
only to let the user run into trouble not only using their favorite tools
on the checked out files, but then also be unable to `rm -rf` the thing,
we chose to disable long paths support by default instead.

There are also performance implications. Accessing files via their
relative paths is much quicker than first constructing that `\\?\` form
(which must be an absolute path).

> 3) Why to have a "-c core.longpaths=true" command line argument at
> all? Given that we can detect whether the long file names are enabled
> in the OS, we could easily drop that argument and have the full
> support for free? (Losing also the problem of passing the flag to
> child git processes). In fact we might even not need the OS support
> detection, we may simply rely on the Windows API return values and
> GetLastError() whilst always proving MAX_LONG_PATH and eventually
> prepending paths with \\?\.

Even just the `mktemp()` example you found _already_ puts a stick into the
wheels of your idea.

I also sense that there's confusion between the two forms long paths
support can take:

- Convert paths that would otherwise be too long to the `\\?\` form and
  use the `*W()` functions of the Windows API. This is supported by all
  Windows versions that are supported by Git for Windows.

- Require users to configure a registry setting, use the `*W()` family of
  Win32 API functions exclusively, add a manifest entry that your
  executable is long paths-aware, and skip that `\\?\` prefixing. See
  https://docs.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation
  for more details. This requires Windows 10 Build 1607 or later.

The more elegant solution would be the latter, of course, but its Windows
10 Build 1607 requirement is incompatible with Git for Windows' promise
that it still supports Windows versions all the way back to Vista
(although you can expect Vista support to be dropped after Git for Windows
v2.37.0 is released, with Windows 7 and Windows 8 support to follow around
the end of this year).

Ciao,
Dscho

      reply	other threads:[~2022-06-18 21:57 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-15 12:45 Windows long file paths bug(s) with "-c core.longpaths=true" whilst cloning Lukas Buricin
2022-06-18 21:57 ` Johannes Schindelin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.2206182308590.349@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=lukas.buricin@cubicmotion.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).