bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
From: Bruno Haible <bruno@clisp.org>
To: bug-gnulib@gnu.org
Subject: getcwd: Speed up on Linux. Add support for Android.
Date: Wed, 18 Jan 2023 13:36:16 +0100	[thread overview]
Message-ID: <15107551.Y0gOKygRFC@nimes> (raw)

On Android, in the Termux app, I see a test failure:

FAIL: test-getcwd.sh
====================

FAIL test-getcwd.sh (exit status: 5)

What happens, in the test_long_name() function of this test:

- The directory in which the test is run is
    /data/data/com.termux/files/home/testdir1/build/gltests
  The peculiar circumstance is that the ancestor directories
    /data/data
  and
    /data
  are not readable (they produce an error EACCES).

- The test creates a hierarchy by doing 449 times chdir("confdir3"),
  then call rpl_getcwd.

- rpl_getcwd first calls getcwd_system, which fails with error ENAMETOOLONG.

- Then rpl_getcwd goes to the parent directory 455 times. 454 times this
  succeeds (up to /data/data/com.termux); then it fails (since /data/data
  is not readable).

- At this point rpl_getcwd gives up and fails with errno EACCES.

But we can do better: Android uses the Linux kernel. Therefore it has getcwd
available as a system call, and this system call does not care about unreadable
ancestor directories. More precisely, we cannot use the getcwd system call
directly, because it would require chdir("..") calls and thus make our
rpl_getcwd function not multi-thread safe. But the /proc file system supports
a way to translate an fd to a file name, via readlink [1]. That's what we need
here.

So, the fix is to use this /proc file system trick repeatedly; it works as
soon as the directory name is at most 4095 bytes long.

This code makes the ".."-climbing loop a bit slower: after reading all
directory entries it now checks whether the readlink() check works. But
the advantage is that it terminates this loop much earlier than before
and thus saves dozens or hundreds of loop rounds. And in particular if
some of the ancestors are not readable, this won't make the loop fail.

[1] https://lists.gnu.org/archive/html/bug-gnulib/2022-12/msg00053.html


2023-01-18  Bruno Haible  <bruno@clisp.org>

	getcwd: Speed up on Linux. Add support for Android.
	* lib/getcwd.c (__getcwd_generic): On Linux, use a specific readlink
	call to speed up the operation.

diff --git a/lib/getcwd.c b/lib/getcwd.c
index a4f5c5eae3..5201cd06b5 100644
--- a/lib/getcwd.c
+++ b/lib/getcwd.c
@@ -172,6 +172,9 @@ __getcwd_generic (char *buf, size_t size)
 #if HAVE_OPENAT_SUPPORT
   int fd = AT_FDCWD;
   bool fd_needs_closing = false;
+# if defined __linux__
+  bool proc_fs_not_mounted = false;
+# endif
 #else
   char dots[DEEP_NESTING * sizeof ".." + BIG_FILE_NAME_COMPONENT_LENGTH + 1];
   char *dotlist = dots;
@@ -437,6 +440,67 @@ __getcwd_generic (char *buf, size_t size)
 
       thisdev = dotdev;
       thisino = dotino;
+
+#if HAVE_OPENAT_SUPPORT
+      /* On some platforms, a system call returns the directory that FD points
+         to.  This is useful if some of the ancestor directories of the
+         directory are unreadable, because in this situation the loop that
+         climbs up the ancestor hierarchy runs into an EACCES error.
+         For example, in some Android app, /data/data/com.termux is readable,
+         but /data/data and /data are not.  */
+# if defined __linux__
+      /* On Linux, in particular, if /proc is mounted,
+           readlink ("/proc/self/fd/<fd>")
+         returns the directory, if its length is < 4096.  (If the length is
+         >= 4096, it fails with error ENAMETOOLONG, even if the buffer that we
+         pass to the readlink function would be large enough.)  */
+      if (!proc_fs_not_mounted)
+        {
+          char namebuf[14 + 10 + 1];
+          sprintf (namebuf, "/proc/self/fd/%u", (unsigned int) fd);
+          char linkbuf[4096];
+          ssize_t linklen = readlink (namebuf, linkbuf, sizeof linkbuf);
+          if (linklen < 0)
+            {
+              if (errno != ENAMETOOLONG)
+                /* If this call was not successful, the next one will likely be
+                   not successful either.  */
+                proc_fs_not_mounted = true;
+            }
+          else
+            {
+              dirroom = dirp - dir;
+              if (dirroom < linklen)
+                {
+                  if (size != 0)
+                    {
+                      __set_errno (ERANGE);
+                      goto lose;
+                    }
+                  else
+                    {
+                      char *tmp;
+                      size_t oldsize = allocated;
+
+                      allocated += linklen - dirroom;
+                      if (allocated < oldsize
+                          || ! (tmp = realloc (dir, allocated)))
+                        goto memory_exhausted;
+
+                      /* Move current contents up to the end of the buffer.  */
+                      dirp = memmove (tmp + dirroom + (allocated - oldsize),
+                                      tmp + dirroom,
+                                      oldsize - dirroom);
+                      dir = tmp;
+                    }
+                }
+              dirp -= linklen;
+              memcpy (dirp, linkbuf, linklen);
+              break;
+            }
+        }
+# endif
+#endif
     }
 
   if (dirstream && __closedir (dirstream) != 0)





                 reply	other threads:[~2023-01-18 12:37 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.gnu.org/mailman/listinfo/bug-gnulib

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=15107551.Y0gOKygRFC@nimes \
    --to=bruno@clisp.org \
    --cc=bug-gnulib@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).