Re: Examples of concurrent coproc usage?

unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Chet Ramey <chet.ramey@case.edu>
To: Carl Edquist <edquist@cs.wisc.edu>
Cc: chet.ramey@case.edu, Zachary Santer <zsanter@gmail.com>,
	bug-bash <bug-bash@gnu.org>,
	libc-alpha@sourceware.org
Subject: Re: Examples of concurrent coproc usage?
Date: Mon, 8 Apr 2024 12:21:15 -0400	[thread overview]
Message-ID: <86c3765e-e29d-48d5-b468-3f20b59916b2@case.edu> (raw)
In-Reply-To: <cf2692e7-2b6d-0ba3-678b-29c8efaec1d3@cs.wisc.edu>

[-- Attachment #1.1: Type: text/plain, Size: 7022 bytes --]

On 4/4/24 8:52 AM, Carl Edquist wrote:

> Zack illustrated basically the same point with his example:
> 
>>     exec {fd}< <( some command )
>>     while IFS='' read -r line <&"${fd}"; do
>>       # do stuff
>>     done
>>     {fd}<&-
> 
> A process-substitution open to the shell like this is effectively a 
> one-ended coproc (though not in the jobs list), and it behaves reliably 
> here because the user can count on {fd} to remain open even after the child 
> process terminates.

That exposes the fundamental difference. The procsub is essentially the
same kind of object as a coproc, but it exposes the pipe endpoint(s) as
filenames. The shell maintains open file descriptors to the child process
whose input or output it exposes as a FIFO or a file in /dev/fd, since
you have to have a reader and a writer. The shell closes the file
descriptor and, if necessary, removes the FIFO when the command for which
that was one of the word expansions (or a redirection) completes. coprocs
are designed to be longer-lived, and not associated with a particular
command or redirection.

But the important piece is that $fd is not the file descriptor the shell
keeps open to the procsub -- it's a new file descriptor, dup'd from the
original by the redirection. Since it was used with `exec', it persists
until the script explicitly closes it. It doesn't matter when the shell
reaps the procsub and closes the file descriptor(s) -- the copy in $fd
remains until the script explicitly closes it. You might get read returning
failure at some point, but the shell won't close $fd for you.

Since procsubs expand to filenames, even opening them is sufficient to
give you a new file descriptor (with the usual caveats about how different
OSs handle the /dev/fd device).

You can do this yourself with coprocs right now, with no changes to the
shell.

> So, the user can determine when the coproc fds are no longer needed, 
> whether that's when EOF is hit trying to read from the coproc, or whatever 
> other condition.

Duplicating the file descriptor will do that for you.

> Personally I like the idea of 'closing' a coproc explicitly, but if it's a 
> bother to add options to the coproc keyword, then I would say just let the 
> user be responsible for closing the fds.  Once the coproc has terminated 
> _and_ the coproc's fds are closed, then the coproc can be deallocated.

This is not backwards compatible. coprocs may be a little-used feature, but
you're adding a burden on the shell programmer that wasn't there
previously.

> Apparently there is already some detection in there for when the coproc fds 
> get closed, as the {NAME[@]} fd array members get set to -1 automatically 
> when when you do, eg, 'exec {NAME[0]}<&-'.  So perhaps this won't be a 
> radical change.

Yes, there is some limited checking in the redirection code, since the
shell is supposed to manage the coproc file descriptors for the user.

> 
> Alternatively (or, additionally), you could interpret 'unset NAME' for a 
> coproc to mean "deallocate the coproc."  That is, close the {NAME[@]} fds, 
> unset the NAME variable, and remove any coproc bookkeeping for NAME.

Hmmm. That's not unreasonable.

>> What should it do to make sure that the variables don't hang around with 
>> invalid file descriptors?
> 
> First, just to be clear, the fds to/from the coproc pipes are not invalid 
> when the coproc terminates (you can still read from them); they are only 
> invalid after they are closed.

That's only sort of true; writing to a pipe for which there is no
reader generates SIGPIPE, which is a fatal signal. If the coproc
terminates, the file descriptor to write to it becomes invalid because
it's implicitly closed. If you restrict yourself to reading from coprocs,
or doing one initial write and then only reading from there on, you can
avoid this, but it's not the general case.

> The surprising bit is when they become invalid unexpectedly (from the point 
> of view of the user) because the shell closes them automatically, at the 
> somewhat arbitrary timing when the coproc is reaped.

No real difference from procsubs.

> Second, why is it a problem if the variables keep their (invalid) fds after 
> closing them, if the user is the one that closed them anyway?
> 
> Isn't this how it works with the auto-assigned fd redirections?

Those are different file descriptors.

> 
>      $ exec {d}<.
>      $ echo $d
>      10
>      $ exec {d}<&-
>      $ echo $d
>      10

The shell doesn't try to manage that object in the same way it does a
coproc. The user has explicitly indicated they want to manage it.

> But, as noted, bash apparently already ensures that the variables don't 
> hang around with invalid file descriptors, as once you close them the 
> corresponding variable gets updated to "-1".

Yes, the shell trying to be helpful. It's a managed object.

> If the user has explicitly closed both fd ends for a coproc, it should not 
> be a surprise to the user either way - whether the variable gets unset 
> automatically, or whether it remains with (-1 -1).
> 
> Since you are already unsetting the variable when the coproc is deallocated 
> though, I'd say it's fine to keep doing that -- just don't deallocate the 
> coproc before the user has closed both fds.

It's just not backwards compatible. I might add an option to enable that
kind of management, but probably not for bash-5.3.

> *Except* that it's inherently a race condition whether the original 
> variables will still be intact to save them.
> 
> Even if you attempt to save them immediately:
> 
>      coproc X { exit; }
>      X_BACKUP=( ${X[@]} )
> 
> it's not guaranteed that X_BACKUP=(...) will run before coproc X has been 
> deallocated, and the X variable cleared.

That's not what I mean about saving the file descriptors. But there is a
window there where a short-lived coprocess could be reaped before you dup
the file descriptors. Since the original intent of the feature was that
coprocs were a way to communicate with long-lived processes -- something
more persistent than a process substitution -- it was not really a
concern at the time.

>>> *Or* else add an option to the coproc keyword to explicitly close the 
>>> coproc - which will close both fds and clear the variable.
>>
>> Not going to add any more options to reserved words; that does more 
>> violence to the grammar than I want.
> 
> Not sure how you'd feel about using 'unset' on the coproc variable 
> instead.  (Though as discussed, I think the coproc terminated + fds 
> manually closed condition is also sufficient.)

That does sound promising.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

next prev parent reply	other threads:[~2024-04-08 16:21 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CABkLJULa8c0zr1BkzWLTpAxHBcpb15Xms0-Q2OOVCHiAHuL0uA@mail.gmail.com>
     [not found] ` <9831afe6-958a-fbd3-9434-05dd0c9b602a@draigBrady.com>
2024-03-10 15:29   ` RFE: enable buffering on null-terminated data Zachary Santer
2024-03-10 20:36     ` Carl Edquist
2024-03-11  3:48       ` Zachary Santer
2024-03-11 11:54         ` Carl Edquist
2024-03-11 15:12           ` Examples of concurrent coproc usage? Zachary Santer
2024-03-14  9:58             ` Carl Edquist
2024-03-17 19:40               ` Zachary Santer
2024-04-01 19:24               ` Chet Ramey
2024-04-01 19:31                 ` Chet Ramey
2024-04-02 16:22                   ` Carl Edquist
2024-04-03 13:54                     ` Chet Ramey
2024-04-03 14:32               ` Chet Ramey
2024-04-03 17:19                 ` Zachary Santer
2024-04-08 15:07                   ` Chet Ramey
2024-04-09  3:44                     ` Zachary Santer
2024-04-13 18:45                       ` Chet Ramey
2024-04-14  2:09                         ` Zachary Santer
2024-04-04 12:52                 ` Carl Edquist
2024-04-04 23:23                   ` Martin D Kealey
2024-04-08 19:50                     ` Chet Ramey
2024-04-09 14:46                       ` Zachary Santer
2024-04-13 18:51                         ` Chet Ramey
2024-04-09 15:58                       ` Carl Edquist
2024-04-13 20:10                         ` Chet Ramey
2024-04-14 18:43                           ` Zachary Santer
2024-04-15 18:55                             ` Chet Ramey
2024-04-15 17:01                           ` Carl Edquist
2024-04-17 14:20                             ` Chet Ramey
2024-04-20 22:04                               ` Carl Edquist
2024-04-22 16:06                                 ` Chet Ramey
2024-04-27 16:56                                   ` Carl Edquist
2024-04-28 17:50                                     ` Chet Ramey
2024-04-08 16:21                   ` Chet Ramey [this message]
2024-04-12 16:49                     ` Carl Edquist
2024-04-16 15:48                       ` Chet Ramey
2024-04-20 23:11                         ` Carl Edquist
2024-04-22 16:12                           ` Chet Ramey
2024-04-17 14:37               ` Chet Ramey
2024-04-20 22:04                 ` Carl Edquist
2024-03-12  3:34           ` RFE: enable buffering on null-terminated data Zachary Santer
2024-03-14 14:15             ` Carl Edquist
2024-03-18  0:12               ` Zachary Santer
2024-03-19  5:24                 ` Kaz Kylheku
2024-03-19 12:50                   ` Zachary Santer
2024-03-20  8:55                     ` Carl Edquist
2024-04-19  0:16                       ` Modify buffering of standard streams via environment variables (not LD_PRELOAD)? Zachary Santer
2024-04-19  9:32                         ` Pádraig Brady
2024-04-19 11:36                           ` Zachary Santer
2024-04-19 12:26                             ` Pádraig Brady
2024-04-19 16:11                               ` Zachary Santer
2024-04-20 16:00                         ` Carl Edquist
2024-04-20 20:00                           ` Zachary Santer
2024-04-20 21:45                             ` Carl Edquist

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86c3765e-e29d-48d5-b468-3f20b59916b2@case.edu \
    --to=chet.ramey@case.edu \
    --cc=bug-bash@gnu.org \
    --cc=edquist@cs.wisc.edu \
    --cc=libc-alpha@sourceware.org \
    --cc=zsanter@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).