git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
Cc: Junio C Hamano <gitster@pobox.com>, git <git@vger.kernel.org>
Subject: Re: mt/dir-iterator-updates, was Re: What's cooking in git.git (Jul 2019, #01; Wed, 3)
Date: Fri, 5 Jul 2019 20:17:47 +0200 (CEST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.1907052009590.44@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <CAHd-oW6PTFY=_j1RDh8_MdeBmMX77kF+=kOpd-GUnbegMC89yQ@mail.gmail.com>

Hi Matheus,

On Fri, 5 Jul 2019, Matheus Tavares Bernardino wrote:

> On Thu, Jul 4, 2019 at 6:30 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > Hi Matheus,
> >
> > On Thu, 4 Jul 2019, Matheus Tavares Bernardino wrote:
> > >
> > > I wanted to take a look at the failures to see if I could help, [...]
> > > Could you point me to the right place, please?
> [...]
> >
> > I usually click on the "Tests" tab in that page:
> > https://dev.azure.com/gitgitgadget/git/_build/results?buildId=11495&view=ms.vss-test-web.build-test-results-tab
> >
> > You can click on any of the 1,384 (!) failing test cases, it will pop up a
> > pane on the right-hand side that shows the trace of that failing test
> > case. For the full trace of that test script, go to "Attachments" and
> > download the `Standard_Error_Output.log` (via the horizontal bread-crumbs
> > menu you can see when hovering over the file name).
>
> Thanks for the explanation! I inspected some of the
> `Standard_Error_Output.log`'s and it seems the problem is always with
> local clone (which started to use dir-iterator in this series). It
> seems all .git/objects/ dirs are being ignored. That makes sense,
> since st_ino will always be 0 on Windows. But your fixup patch should
> solve this. Is there any azure build for it?

There is no direct Azure Pipeline run for it, as I have not created a
branch with the fixup on top of your branch. But the shears/pu branch in
https://github.com/git-for-windows/git/branches does have the fixup, and
passes all tests.

> [...]
> > >
> > > Hm, I think `stat()` itself uses this strategy of an arbitrary cut-off
> > > when resolving a path. So we may also "ignore" circular symlinks and
> > > let the iteration continue until the point where `stat()` will return
> > > an ELOOP. (But it may be expensive...)
> >
> > This would not be equivalent, though, as your code also tried to address
> > circular references when two or more symlinks are involved, e.g. when
> > one symlink points to a directory that has another symlink that points to
> > the directory containing the first symlink.
>
> Hm, `stat()` also addresses this case doesn't it? For example:
>
> $ mkdir a b
> $ ln -s ../a b/s2a
> $ ln -s ../b a/s2b
> $ stat b/s2a/s2b/s2a/.../s2b
>
> Gives me:
> "too many levels of symbolic links"

Okay, then. Even better.

(With the caveat that I don't know how ubiquitous this behavior is, I
assume you only tested on Linux.)

> > > > Do we _have_ to, though? At some stage the path we come up with is beyond
> > > > `PATH_MAX` and we can stop right then and there.
> > > >
> > > > Besides, the way `find_recursive_symlinks()` is implemented adds quadratic
> > > > behavior.
> > >
> > > Yes, indeed. But it only happens when we have a path like this:
> > > `symlink_to_dir1/symlink_to_dir2/symlink_to_dir3/symlink_to_dir4/...`,
> > > right? I think this case won't be very usual on actual filesystems,
> > > thought.
> >
> > No, the check is performed in a loop, and that loop is executed whether
> > you have symlinks or not. That loop is executed for every item in the
> > iteration, and as we cannot assume a flat directory in general (in fact,
> > we should assume a directory depth proportional to the total number of
> > files), that adds that quadratic behavior.
>
> Oh, you're right. Sorry for the noise, I forgot this function was not
> only called for symlinks but for every directory entry :(
>
> An alternative solution would be to use `lstat()` together with
> `stat()` to only feed symlinked dirs to this function. This should
> reduce a lot the number of calls. Although it'd still be quadratic in
> the worst case, would that be any good?

Why not just skip this logic? At least for now? It really blocks the
development of this patch series, causing `pu` to be broken until the test
failures are resolved.

> [...]
> > > > But I also think there are enough
> > > > reasons to do away with this function in the first place.
> > >
> > > We can delegate the circular symlinks problem to `stat()'s ELOOP`
> >
> > Not really. I mean, we _already_ delegate to the `ELOOP` condition, we
> > cannot even avoid it as long as we keep using `stat()` instead of
> > `lstat()`
>
> Yes, indeed. What I meant is that we may chose to _fully_ delegate to
> ELOOP. The way it is now, we should detect circular symlinks way
> before stat() fails. For example, with the case I showed above, we
> would stop at "b/s2a/s2b" whereas stat() would only fail at a much
> longer "b/s2a/s2b/s2a/s2b/...", far beyond in the iteration.

Sounds like the solution to me that I wanted: drop the special code to
detect circular symlinks. In other words: I like that idea.

> > > The only downside is the overhead of iterating through directories which
> > > will be latter discarded for being in circular symlinks' chains. I mean,
> > > the overhead at dir-iterator shouldn't be much, but the one on the code
> > > using this API to do something for each of these directories (and its
> > > contents), may be. Also we would need to "undo" the work done for each
> > > of these directories if we want to ignore circular symlinks and continue
> > > the iteration, whereas if we try to detect it a priori, we can skip it
> > > from the beginning.
> >
> > Given that the intent of this patch series is a mere refactoring, I wonder
> > why the new, contentious circular symlink detection is slipped into it
> > anyway. It complicates the task, makes reviewing a lot harder, and it
> > holds up the refactoring.
>
> Yes, I apologize for that. I should have split this into 2 or maybe 3
> patchsets... This series started really simple just trying to apply
> the dir-iterator API into local clone. But then, other things became
> necessary and it got more complex.
>
> So, should I send a fixup patch removing find_recursive_symlinks() or
> reroll the series? There's also the option to use stat()+lstat() to
> reduce calls to this function, but it doesn't solve the problem on
> Windows, anyway.

I would suggest to send another iteration that removes
`find_recursive_symlinks()`. Junio most likely interpreted my objections
as a veto against advancing the current iteration to `next`, meaning that
you're good to even rewrite completely in the next iteration, should you
feel the need to. No need for "Oops, fix that" follow-up commits at this
stage.

Ciao,
Dscho

  reply	other threads:[~2019-07-05 18:17 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-03 22:28 What's cooking in git.git (Jul 2019, #01; Wed, 3) Junio C Hamano
2019-07-04  9:26 ` kb/windows-force-utf8, was " Johannes Schindelin
2019-07-08 21:57   ` Junio C Hamano
2019-07-04  9:32 ` nd/index-dump-in-json, " Johannes Schindelin
2019-07-04  9:36 ` ab/no-kwset, " Johannes Schindelin
2019-07-04  9:56 ` Phillip Wood
2019-07-08 22:02   ` Junio C Hamano
2019-07-11  9:51     ` Phillip Wood
2019-07-04 10:03 ` mt/dir-iterator-updates, was " Johannes Schindelin
2019-07-04 16:39   ` Matheus Tavares Bernardino
2019-07-04 21:31     ` Johannes Schindelin
2019-07-05 14:29       ` Matheus Tavares Bernardino
2019-07-05 18:17         ` Johannes Schindelin [this message]
2019-07-08 22:30           ` Matheus Tavares Bernardino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.1907052009590.44@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=matheus.bernardino@usp.br \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).