git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Junio C Hamano <gitster@pobox.com>, git <git@vger.kernel.org>
Subject: Re: mt/dir-iterator-updates, was Re: What's cooking in git.git (Jul 2019, #01; Wed, 3)
Date: Thu, 4 Jul 2019 13:39:34 -0300	[thread overview]
Message-ID: <CAHd-oW6s1kGJqaRhm2f5ZG1C48upZu8ypeB_rw7do2=Vi3jZqw@mail.gmail.com> (raw)
In-Reply-To: <nycvar.QRO.7.76.6.1907041136530.44@tvgsbejvaqbjf.bet>

Hi, Dscho

On Thu, Jul 4, 2019 at 7:02 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Junio,
>
> On Wed, 3 Jul 2019, Junio C Hamano wrote:
>
> > * mt/dir-iterator-updates (2019-06-25) 10 commits
[...]
> >  Is this ready for 'next'?
>
> No. It still breaks many dozens of test cases on Windows (if not more)
> because it thinks that it can rely on `st_ino` to detect circular
> symlinks.

I wanted to take a look at the failures to see if I could help, but
I'm not very familiar with azure (and travis-ci doesn't run windows'
tests). So the only build I could find, containing the code from this
series, is this[1]. But it only shows 4 failures and they don't seem
to relate with dir-iterator... Could you point me to the right place,
please?

> In
> https://public-inbox.org/git/nycvar.QRO.7.76.6.1906272046180.44@tvgsbejvaqbjf.bet/
> I had suggested to do something like this:
>
[...]
>
> However, in the meantime I thought about this a bit more and I remembered
> how this is done elsewhere: I saw many recursive symlink resolvers that
> just have an arbitrary cut-off after following, say, 32 links.
>
> In fact, Git itself already has this in abspath.c:
>
>         /* We allow "recursive" symbolic links. Only within reason, though. */
>         #ifndef MAXSYMLINKS
>         #define MAXSYMLINKS 32
>         #endif
>
> But then, the patch in question uses `stat()` instead of `lstat()`, so we
> would not have any way to count the number of symbolic links we followed.

Hm, I think `stat()` itself uses this strategy of an arbitrary cut-off
when resolving a path. So we may also "ignore" circular symlinks and
let the iteration continue until the point where `stat()` will return
an ELOOP. (But it may be expensive...)

> Do we _have_ to, though? At some stage the path we come up with is beyond
> `PATH_MAX` and we can stop right then and there.
>
> Besides, the way `find_recursive_symlinks()` is implemented adds quadratic
> behavior.

Yes, indeed. But it only happens when we have a path like this:
`symlink_to_dir1/symlink_to_dir2/symlink_to_dir3/symlink_to_dir4/...`,
right? I think this case won't be very usual on actual filesystems,
thought.

> So I would like to submit the idea of simplifying the code drastically,
> by skipping the `find_recursive_symlinks()` function altogether.
>
> This would solve another issue I have with that function, anyway: The name
> suggests, to me at least, that we follow symlinks recursively. It does
> not. I think that could have been addressed by using the adjective
> "circular" instead of "recursive".

Indeed, "circular" sounds much better then "recursive".

> But I also think there are enough
> reasons to do away with this function in the first place.

We can delegate the circular symlinks problem to `stat()'s ELOOP` or
`path_len > PATH_MAX`. The only downside is the overhead of iterating
through directories which will be latter discarded for being in
circular symlinks' chains. I mean, the overhead at dir-iterator
shouldn't be much, but the one on the code using this API to do
something for each of these directories (and its contents), may be.
Also we would need to "undo" the work done for each of these
directories if we want to ignore circular symlinks and continue the
iteration, whereas if we try to detect it a priori, we can skip it
from the beginning.

> Ciao,
> Dscho

[1]: https://dev.azure.com/git-for-windows/git/_build/results?buildId=38852

  reply	other threads:[~2019-07-04 16:39 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-03 22:28 What's cooking in git.git (Jul 2019, #01; Wed, 3) Junio C Hamano
2019-07-04  9:26 ` kb/windows-force-utf8, was " Johannes Schindelin
2019-07-08 21:57   ` Junio C Hamano
2019-07-04  9:32 ` nd/index-dump-in-json, " Johannes Schindelin
2019-07-04  9:36 ` ab/no-kwset, " Johannes Schindelin
2019-07-04  9:56 ` Phillip Wood
2019-07-08 22:02   ` Junio C Hamano
2019-07-11  9:51     ` Phillip Wood
2019-07-04 10:03 ` mt/dir-iterator-updates, was " Johannes Schindelin
2019-07-04 16:39   ` Matheus Tavares Bernardino [this message]
2019-07-04 21:31     ` Johannes Schindelin
2019-07-05 14:29       ` Matheus Tavares Bernardino
2019-07-05 18:17         ` Johannes Schindelin
2019-07-08 22:30           ` Matheus Tavares Bernardino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHd-oW6s1kGJqaRhm2f5ZG1C48upZu8ypeB_rw7do2=Vi3jZqw@mail.gmail.com' \
    --to=matheus.bernardino@usp.br \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).