git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Lukas Buricin <lukas.buricin@cubicmotion.com>
To: git@vger.kernel.org
Subject: Windows long file paths bug(s) with "-c core.longpaths=true" whilst cloning
Date: Wed, 15 Jun 2022 13:45:16 +0100	[thread overview]
Message-ID: <CA+c+RB=ud_==QYJMgcwQ=X4imQhxsFvMuKp_bP0H_MBY1BbUgQ@mail.gmail.com> (raw)

Hi

I am seeing multiple problems when cloning. I use "-c
core.longpaths=true" and I of course have the long paths enabled in
Windows.

Cloning long file names appears to work only when the file names
themselves breach the limit MAX_PATH 260 characters, but not in other
cases.

1) When cloning in a directory path that is on its own longer than
MAX_PATH. For example, when cloning to

C:\my_very_long_named_directory_..._reaching_over_MAX_PATH_260_long

then the content of '.git' folder in

C:\my_very_long_named_directory_..._reaching_over_MAX_PATH_260_long\.git

fails to get created. Whilst this may seem like an unlikely scenario
for a single repository being in such a long-named directory, it can
happen with a more complex hierarchy of submodules, where the problem
becomes more obvious ("external" represents directories containing
submodules):

c:\my_projects\my_project\external\app_components\external\app_api_layer\external\framework\external\video_components\external\core_components\external\base\external\windows-third-parties\gtest\...

The first problem seems to be in compat\mingw.c, inside char
*mingw_mktemp(char *template) where we still use wchar_t
wtemplate[MAX_PATH];

After replacing it with MAX_LONG_PATH the problem moves further to
usages of xutftowcs_path() that internally assumes everything to be
just MAX_PATH.

When replaced all the calls to xutftowcs_path() with
xutftowcs_path_ex() providing MAX_LONG_PATH and passing
"core_long_paths", the initial parent git process moves on. However
...

2) The git process spawns another child git process, which doesn't
reflect the parent core_long_paths (being 1), because in the child
process it's 0, hence all the functions that should be allowed to
prepend the paths with \\?\ are told not to extend anything. This
obviously leads to more failures.

3) Even when the directory paths and the files do fit in the 260 limit
(I have shrunk all the paths by renaming 'external' to 'ext' in order
to fit in the MAX_PATH), git falls over in its internal .git folder
because it stores some information in significantly longer paths than
what is present in the repository itself. Using the example of

c:\my_projects\my_project\external\app_components\external\app_api_layer\external\framework\external\video_components\external\core_components\external\base\external\windows-third-parties\gtest\...

there will be folders 'modules' added at each level in the .git
internal folder for all submodules, so will end up somewhat like:

c:\my_projects\my_project\.git\modules\external\app_components\modules\external\app_api_layer\modules\external\framework\modules\external\video_components\modules\external\core_components\modules\external\base\modules\external\windows-third-parties\modules\gtest\...

So the content in .git is much more likely to breach the MAX_PATH
limit than the repository "user" files themselves.

In general, aften seeing the code (admittedly first time ever), I also
have a few questions, because I might be missing some context ...

1) Why do we use MAX_PATH for buffers at all? This might be my
misunderstanding, but I would have thought that providing long file
name support would essentially mean not using that contant other than
for determining whether to prepend given wchar_t* output by \\?\ when
the input const char* is longer than MAX_PATH, before passing the
wchar_t* in Windows API calls? This conversion and eventual prepending
could be done in one place and should be no burden for the CPU.

2) Why not having long paths by default? Again, this might be me
missing some historical context of regression, but it just seems
logical to simply prepend by \\?\ whenever needed rather than fail.

3) Why to have a "-c core.longpaths=true" command line argument at
all? Given that we can detect whether the long file names are enabled
in the OS, we could easily drop that argument and have the full
support for free? (Losing also the problem of passing the flag to
child git processes). In fact we might even not need the OS support
detection, we may simply rely on the Windows API return values and
GetLastError() whilst always proving MAX_LONG_PATH and eventually
prepending paths with \\?\.

Thank you very much in advance for consideration and for eventual explanations.

Have a nice day.

Lukas

             reply	other threads:[~2022-06-15 12:48 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-15 12:45 Lukas Buricin [this message]
2022-06-18 21:57 ` Windows long file paths bug(s) with "-c core.longpaths=true" whilst cloning Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+c+RB=ud_==QYJMgcwQ=X4imQhxsFvMuKp_bP0H_MBY1BbUgQ@mail.gmail.com' \
    --to=lukas.buricin@cubicmotion.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).