bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
From: Jim Meyering <jim@meyering.net>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: Bruno Haible <bruno@clisp.org>,
	Simon Josefsson <simon@josefsson.org>,
	bug-gnulib@gnu.org
Subject: Re: fts: Document this module
Date: Thu, 19 Jan 2023 21:24:01 -0800	[thread overview]
Message-ID: <CA+8g5KEBYfqgWEohtbihM5ZcvXUbV-aDsubmXTzZz=J6aRsTkA@mail.gmail.com> (raw)
In-Reply-To: <e508a4ec-0e52-c01f-67dd-a7c6f4006ae2@cs.ucla.edu>

On Thu, Jan 19, 2023 at 7:05 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
>
> On 1/19/23 15:41, Bruno Haible wrote:
> > Jim or Paul, what should we state
> > — either in the 'fts' module description, or in the .texi documentation?
>
> The quick thing is to say in both that the description/documentation is
> incomplete, and that people need to read the source code.
>
> Jim may be able to fill in a bit here, since I think he wrote most of
> that stuff. (I haven't checked this though; sorry, I'm a bit crunched
> for time today.)

Thanks for caring/documenting. Here's a quick summary (for more
detail, see the comments in fts_.h).

This started when I found glibc's fts was insufficiently robust to
meet GNU rm's needs (rm was merely the first user; now, many others
use it):
- O(N^2) behavior in the number of file name components due to cycle detection
- max hierarchy depth was 64k due to type of fts_level being a "short"
- subject to O(N^2) effects for directories with many entries (poor
locality of reference, for which the fix was to process entries in
sorted-inode order (per a heuristic), delaying any "stat" until
operating on the entry)

Re fts's cycle detection:
- contrast glibc's O(depth) time algorithm vs our O(1) implementation
- our cheap-but-lazy O(1)-memory approach is ok for most applications, but
- there's an optional, slightly more costly detect-ASAP approach required for du
    (uses O(max-depth-of-hierarchy) memory)


Fixing those things required ABI changes and nontrivial redesign.


      reply	other threads:[~2023-01-20  5:24 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-19 17:38 fts: Document this module Bruno Haible
2023-01-19 23:30 ` Simon Josefsson via Gnulib discussion list
2023-01-19 23:41   ` Bruno Haible
2023-01-20  3:05     ` Paul Eggert
2023-01-20  5:24       ` Jim Meyering [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.gnu.org/mailman/listinfo/bug-gnulib

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+8g5KEBYfqgWEohtbihM5ZcvXUbV-aDsubmXTzZz=J6aRsTkA@mail.gmail.com' \
    --to=jim@meyering.net \
    --cc=bruno@clisp.org \
    --cc=bug-gnulib@gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=simon@josefsson.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).