unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org>
To: Florian Weimer <fweimer@redhat.com>
Cc: James Clarke <jrtc27@debian.org>,
	Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org>,
	John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Subject: Re: [PATCH v2 4/9] linux: Use getdents64 on non-LFS readdir
Date: Tue, 20 Oct 2020 11:09:48 -0300	[thread overview]
Message-ID: <9f2115b4-8ddf-835c-0f7b-a75cdabc3818@linaro.org> (raw)
In-Reply-To: <878sc12hy9.fsf@oldenburg2.str.redhat.com>



On 20/10/2020 09:35, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> On 20/10/2020 04:38, Florian Weimer wrote:
>>> * Adhemerval Zanella via Libc-alpha:
>>>
>>>> ENAMETOOLONG does make more sense, I have fixed it locally.  I am not
>>>> sure I understood by 'delayed', it it report on the readdir call, but
>>>> the next entry will still be reported in the next readdir call.  The
>>>> construction such as won't work:
>>>>
>>>>   while (readdir (dp) != NULL)
>>>>     {
>>>>       [...]
>>>>     }
>>>>
>>>> But this will work as intended:
>>>>
>>>>   while (1)
>>>>     {
>>>>       errno = 0;
>>>>       struct dirent *entry = readdir (dp);
>>>>       if (entry == NULL)
>>>> 	{
>>>> 	  if (errno == ENAMETOOLONG)
>>>> 	    continue;
>>>> 	  break;
>>>> 	}
>>>>     }
>>>
>>> I think it's unreasonable to expect that this pattern will work.  Itg
>>> assumes that readdir updates dp despite the error.  In other
>>> implementations, this may produce an endless loop.  This is why for
>>> readdir_r, we record the ENAMETOOLONG error as pending, and report it
>>> only at the end.  (We should do the same for EOVERFLOW errors.)
>>
>> This is what POSIX states as the way to check for errors:
>>
>>   "Applications wishing to check for error situations should set errno 
>>    to 0 before calling readdir(). If errno is set to non-zero on return, 
>>    an error occurred."
>>
>> What the standard is not clear is whether the position as the directory 
>> stream is updated in a case of failure.
> 
> Yes, that was my concern.  I found this in POSIX (in 2.3 Error Numbers):
> 
> | If an error condition is detected, the action requested may have been
> | partially performed, unless otherwise stated.
> 
> But in general we would be rather concerned if a regular read or write
> on a file descriptor failed *and* still updated the file offset.  Not
> sure about directory streams.

I think the snippet you stated does not really prevent us to return an
empty entry with a transient error (ENAMETOOLONG), since it is clear
different than entries read error for getdents itself (EINVAL, ENOENT).

Former mean that only the referred entry could not be obtained, where
latter means that either there is a EOF (if the directory has been
removed) or the buffer is too small.  For latter we might still try to
resize the internal buffer to accommodate the required size, but this
incur in the memory failure possible error nonetheless.

> 
>> I think it is a fair assumption
>> since it also states that 'When the end of the directory is encountered, 
>> a null pointer shall be returned and errno is not changed.'.  So there
>> is a distinction between end of directory stream and a failure (in the
>> example above checking for 'errno == 0' should indicate the end of the
>> stream).
> 
> The question is whether you are supposed to be able to continue reading
> and encounter errno == 0 eventually.  In general, this is simply not
> true, consider EIO or EUCLEAN.  This is why I went with delayed error
> reporting for readdir_r: it is simply not reasonable to expect that
> applications continue calling readddir_r or readdir after the first
> error because it's not clear how to can avoid an endless loop (even
> assuming that the failing readdir call somehow advanced the read
> pointer).
> 
> I hope this explains my point of view.

I understood your point, however we are constraint by opendir interface.
Different than readdir_r, for readdir we can't signal that some error
has occurred in obtained the entry while returning the partial entry
information (for the case of ENAMETOOLONG for instance).

We can just skip the ENAMETOOLONG entries is set the errno only for EOF, 
but even  with this approach we might encounter different errors (due 
getdents failure for instance) which might shadow this specific interface
error glibc might trigger.

> 
>> And it also defines EOVERFLOW as possible transient error.  It
>> is possible to do for readdir_r because the function return is different
>> than the returned 'direct' itself, for readdir what we have is the
>> 'errno' to signal such failure.
> 
> I don't understand this point.  The approach to detect EOF vs error is
> different for readdir_r, but the question whether continue after the
> first error is equally relevant.

My point is if we decide to ignore entries with ENAMETOOLONG, how to signal 
the readdir caller this has happened?

> 
>>>> Even if go to buffer resize, application will still need to handle
>>>> ENOMEM so I am not sure if using separated buffer is really an
>>>> improvement here (at least on trying to adequate usual way to call
>>>> opendir functions).
>>>
>>> ENOMEM is a temporary error, ENAMETOOLONG depends on file system content
>>> and may be much more difficult to resolve.
>>
>> Even with ENOMEM, application will require to actually handle a possible
>> readdir failure and acts accordingly (either by retrying or skip the
>> file).  But I consider failing with ENOMEM in fact much more confusing,
>> it exposes internal implementation that needs to be handled by the
>> application (the internal memory reallocation), where ENAMETOOLONG is
>> indeed a system limitation that can be resolved by using LFS support.
> 
> I think readdir can already fail with ENOMEM if the error comes from the
> kernel, for example from ext4_ind_map_blocks and ext4_alloc_branch.  So
> adding another corner case where ENOMEM can result does not seem
> egregiously bad to me.

Is it a possible error for getdents? I couldn't really track it, but I
haven't dig into kernel code that deep. In any case it would be good
to update the getdents implementation.

  reply	other threads:[~2020-10-20 14:09 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-02 17:06 [PATCH v2 0/9] Fix getdents{64} regression on some FS Adhemerval Zanella via Libc-alpha
2020-10-02 17:06 ` [PATCH v2 1/9] linux: Move posix dir implementations to Linux Adhemerval Zanella via Libc-alpha
2020-10-13 15:33   ` Florian Weimer via Libc-alpha
2020-10-15 14:08     ` Adhemerval Zanella via Libc-alpha
2020-10-02 17:06 ` [PATCH v2 2/9] linux: Simplify opendir buffer allocation Adhemerval Zanella via Libc-alpha
2020-10-13 15:34   ` Florian Weimer via Libc-alpha
2020-10-02 17:06 ` [PATCH v2 3/9] linux: Add __readdir_unlocked Adhemerval Zanella via Libc-alpha
2020-10-13 15:43   ` Florian Weimer via Libc-alpha
2020-10-15 14:10     ` Adhemerval Zanella via Libc-alpha
2020-10-02 17:06 ` [PATCH v2 4/9] linux: Use getdents64 on non-LFS readdir Adhemerval Zanella via Libc-alpha
2020-10-13 15:59   ` Florian Weimer via Libc-alpha
2020-10-15 14:25     ` Adhemerval Zanella via Libc-alpha
2020-10-19  8:18       ` Florian Weimer via Libc-alpha
2020-10-19 20:00         ` Adhemerval Zanella via Libc-alpha
2020-10-19 20:50           ` Florian Weimer via Libc-alpha
2020-10-19 21:09             ` Adhemerval Zanella via Libc-alpha
2020-10-20  7:38               ` Florian Weimer via Libc-alpha
2020-10-20 12:05                 ` Adhemerval Zanella via Libc-alpha
2020-10-20 12:35                   ` Florian Weimer via Libc-alpha
2020-10-20 14:09                     ` Adhemerval Zanella via Libc-alpha [this message]
2020-10-20 17:42                     ` Adhemerval Zanella via Libc-alpha
2020-10-02 17:06 ` [PATCH v2 5/9] linux: Set internal DIR filepos as off64_t [BZ #23960, BZ #24050] Adhemerval Zanella via Libc-alpha
2020-10-13 16:00   ` Florian Weimer via Libc-alpha
2020-10-15 14:26     ` Adhemerval Zanella via Libc-alpha
2020-10-02 17:06 ` [PATCH v2 6/9] linux: Add __readdir64_unlocked Adhemerval Zanella via Libc-alpha
2020-10-02 17:06 ` [PATCH v2 7/9] linux: Add __old_readdir64_unlocked Adhemerval Zanella via Libc-alpha
2020-10-02 17:06 ` [PATCH v2 8/9] linux: Use getdents64 on readdir64 compat implementation Adhemerval Zanella via Libc-alpha
2020-10-02 17:06 ` [PATCH v2 9/9] dirent: Deprecate getdirentries Adhemerval Zanella via Libc-alpha
2020-10-04 13:08 ` [PATCH v2 0/9] Fix getdents{64} regression on some FS Dave Flogeras via Libc-alpha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9f2115b4-8ddf-835c-0f7b-a75cdabc3818@linaro.org \
    --to=libc-alpha@sourceware.org \
    --cc=adhemerval.zanella@linaro.org \
    --cc=fweimer@redhat.com \
    --cc=glaubitz@physik.fu-berlin.de \
    --cc=jrtc27@debian.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).