From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS22989 209.51.188.0/24 X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id D87C71F47C for ; Fri, 20 Jan 2023 05:24:25 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pIjsw-0004tp-W4; Fri, 20 Jan 2023 00:24:19 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pIjsv-0004tf-QC for bug-gnulib@gnu.org; Fri, 20 Jan 2023 00:24:17 -0500 Received: from mail-lf1-f51.google.com ([209.85.167.51]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pIjst-00030m-JK for bug-gnulib@gnu.org; Fri, 20 Jan 2023 00:24:17 -0500 Received: by mail-lf1-f51.google.com with SMTP id d30so6447987lfv.8 for ; Thu, 19 Jan 2023 21:24:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xa7bTb1/D8AAENzCDHUAHVDEP/BJL81I7fYUs8Ho0uU=; b=Jb5rcqAbEmusdNC149s+Sa9H7Mzn0q5vpaFKdKfkzePe4qVX1d3zALh4cXO8oukAiD wJUx8fjcmOOwzHivOgVod7xCd7e/xu/VDnLNeQccY1qiJpRgdNWFYJ4i2lHw6k0CNalq nVoMea1Rmkfp2XIbi1QTwHK+nJDele9d0w9aA9QvdxVvIGA5Dgw8Lh5MEFDw5xF2aiNu 7itL5oBa9cYmvk53Yqd/y5Qv1WQPd9/xNKICtF8VIRS/NcWU5rnSq7WOm1siZd1m8SFg rKBRLxD0Rn5VzFteqL5MSV9rGtQJQ7FSMvo+tcRSHiGafZlj2sddlOC329p9ljdcuS6a KU+A== X-Gm-Message-State: AFqh2krYtrVv95o9p++kny3/nD6pgB3Ksy0/yFghtzWOktQlwTxIYiT2 pRxZb7pRfmi02pE8XoNZ+SLs3ICmIkKDhBuC6TQ= X-Google-Smtp-Source: AMrXdXveLqQfshu1VCAr4dMKxEefEvTH9+EnFA3sLqkSsyWrUWMLhpsNppQWZxo8TsN0Q7PIpjBh8SN3JqF7tkc3Cig= X-Received: by 2002:ac2:5df7:0:b0:4b5:6930:dfab with SMTP id z23-20020ac25df7000000b004b56930dfabmr993067lfq.324.1674192253512; Thu, 19 Jan 2023 21:24:13 -0800 (PST) MIME-Version: 1.0 References: <56097987.UDWvV4KDcT@nimes> <87r0vqhzku.fsf@josefsson.org> <8008067.n2SeQjunJ6@nimes> In-Reply-To: From: Jim Meyering Date: Thu, 19 Jan 2023 21:24:01 -0800 Message-ID: Subject: Re: fts: Document this module To: Paul Eggert Cc: Bruno Haible , Simon Josefsson , bug-gnulib@gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=209.85.167.51; envelope-from=meyering@gmail.com; helo=mail-lf1-f51.google.com X-Spam_score_int: -13 X-Spam_score: -1.4 X-Spam_bar: - X-Spam_report: (-1.4 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org On Thu, Jan 19, 2023 at 7:05 PM Paul Eggert wrote: > > On 1/19/23 15:41, Bruno Haible wrote: > > Jim or Paul, what should we state > > =E2=80=94 either in the 'fts' module description, or in the .texi docum= entation? > > The quick thing is to say in both that the description/documentation is > incomplete, and that people need to read the source code. > > Jim may be able to fill in a bit here, since I think he wrote most of > that stuff. (I haven't checked this though; sorry, I'm a bit crunched > for time today.) Thanks for caring/documenting. Here's a quick summary (for more detail, see the comments in fts_.h). This started when I found glibc's fts was insufficiently robust to meet GNU rm's needs (rm was merely the first user; now, many others use it): - O(N^2) behavior in the number of file name components due to cycle detect= ion - max hierarchy depth was 64k due to type of fts_level being a "short" - subject to O(N^2) effects for directories with many entries (poor locality of reference, for which the fix was to process entries in sorted-inode order (per a heuristic), delaying any "stat" until operating on the entry) Re fts's cycle detection: - contrast glibc's O(depth) time algorithm vs our O(1) implementation - our cheap-but-lazy O(1)-memory approach is ok for most applications, but - there's an optional, slightly more costly detect-ASAP approach required f= or du (uses O(max-depth-of-hierarchy) memory) Fixing those things required ABI changes and nontrivial redesign.