Re: Examples of concurrent coproc usage?

unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Carl Edquist <edquist@cs.wisc.edu>
To: Chet Ramey <chet.ramey@case.edu>
Cc: Zachary Santer <zsanter@gmail.com>, bug-bash <bug-bash@gnu.org>,
	libc-alpha@sourceware.org
Subject: Re: Examples of concurrent coproc usage?
Date: Sat, 20 Apr 2024 18:11:31 -0500 (CDT)	[thread overview]
Message-ID: <0488843f-339a-f25e-a3d2-cb0afeec91d1@cs.wisc.edu> (raw)
In-Reply-To: <4625270d-c8f6-42d1-afa0-fafb7a33571e@case.edu>

On Tue, 16 Apr 2024, Chet Ramey wrote:

> The bigger concern was how to synchronize between the processes, but 
> that's something that the script writer has to do on their own.

Right.  It can be tricky and depends entirely on what the user's up to.

> My concern was always coproc fds leaking into other processes, 
> especially pipelines. If someone has a coproc now and is `messy' about 
> cleaning it up, I feel like there's the possibility of deadlock.

I get where you're coming from with the concern.  I would welcome being 
shown otherwise, but as far as I can tell, deadlock is a ghost of a 
concern once the coproc is dead.

Maybe it helps to step through it ...

- First, where does deadlock start?  (In the context of pipes)

I think the answer is: When there is a read or write attempted on a pipe 
that blocks (indefinitely).

- What causes a read or a write on a pipe to block?

A pipe read blocks when a corresponding write-end is open, 
but there is no data available to read.

A pipe write blocks when a corresponding read-end is open, 
but the pipe is full.

- Are the coproc's corresponding ends of the shell's pipe fds open?

Well, not if the coproc is really dead.

- Will a read or write ever be attempted?

If the shell's stray coproc fds are left open, sure they will leak into 
pipelines too - but since they're forgotten, in theory no command will 
actually attempt to use them.

- What if a command attempts to use these stray fds anyway, by mistake?

If the coproc is really dead, then its side of the pipe fds will have been 
closed.  Thus read/write attempts on the fds on the shell's side (either 
from the shell itself, or from commands / pipelines that the fds leaked 
into) WILL NOT BLOCK, and thus will not result in deadlock.

(A read attempt will hit EOF, a write attempt will get SIGPIPE/EPIPE.)

HOPEFULLY that is enough to put any reasonable fears of deadlock to bed - 
at least in terms of the shell's leaked fds leading to deadlock.

- But what if the _coproc_ leaked its pipe fds before it died?

At this point I think perhaps we get into what you called a "my arm hurts 
when I do this" situation.  It kind of breaks the whole coproc model: if 
the stdin/stdout of a coproc are still open by one of the coproc's 
children, then I might say the coproc is not really dead.

But anyway I want to be a good sport, for completeness.

An existing use case that would lead to trouble would perhaps have to look 
something like this:

The shell sends a quit command to a coproc, without closing the shell's 
coproc fds.

The coproc has a child, then exits.  The coproc (parent) is dead.  The 
coproc's child has inherited the coproc's pipe fds.  The script author 
_expects_ that the coproc parent will exit, and expects that this will 
trigger the old behavior, that the shell will automatically close its fds 
to the coproc parent.  Thus the author _expects_ that the coproc exiting 
will, indirectly but automatically, cause any blocked reads/writes on 
stdin/stdout in the coproc's child to stop blocking.  Thus the author 
_expects_ the coproc's child to promptly complete, even though its output 
_will not be consumable_ (because the author _expects_ that its stdout 
will be attached to a broken pipe).

But [here's where the potential problem starts] with the new deferring 
behavior, the shell's coproc fds are not automatically closed, and thus 
the coproc's _child_ does not stop blocking, and thus the author's 
short-lived expectations for this coproc's useless child are dashed to the 
ground, while that child is left standing idle until the cows come home. 
(That is, until the shell exits.)

It really seems like a contrived and senseless scenario, doesn't it? 
(Even to me!)

[And an even more far-fetched scenario: a coproc transmits copies of its 
pipe fds to another process over a unix socket ancillary message 
(SCM_RIGHTS), instead of to a child by inheritance.  The rest of the story 
is the same, and equally senseless.]

> But I don't know how extensively they're used, or all the use cases, so 
> I'm not sure how likely it is. I've learned there are users who do 
> things with shell features I never imagined. (People wanting to use 
> coprocs without the shell as the arbiter, for instance. :-) )

Hehe...

Well, yeah, once you gift-wrap yourself a friendly, reliable interface and 
have the freedom to play with it to your heart's content - you find some 
fun things to do with coprocesses.  (Much like regular shell pipelines.)

I get your meaning though - without knowing all the potential uses, it's 
hard to say with absolute certainty that no user will be negatively 
affected by a new improvement or bug fix.

>> [This is a common model for using coprocs, by the way, where an 
>> auxiliary coprocess is left open for the lifetime of the shell session 
>> and never explicitly closed.  When the shell session exits, the fds are 
>> closed implicitly by the OS, and the coprocess sees EOF and exits on 
>> its own.]
>
> That's one common model, yes. Another is that the shell process 
> explicitly sends a close or shutdown command to the coproc, so 
> termination is expected.

Right, but here also (after sending a quit command) the conclusion is the 
same as my point just below - that if the user is expecting the coproc to 
terminate, and expecting the current behavior that as a result the coproc 
variable will go away automatically, then that variable is as good as 
forgotten to the user.

>> If a user expects the coproc variable to go away automatically, that 
>> user won't be accessing a still-open fd from that variable for 
>> anything.
>
> I'm more concerned about a pipe with unread data that would potentially 
> cause problems. I suppose we just need more testing.

If I understand you right, you are talking about a scenario like 
this:

- a coproc writes to its output pipe
- the coproc terminates
- the shell leaves its fd for the read end of this pipe open
- there is unread data left sitting in this pipe
- [theoretical concern here]

Is that right?

I can't imagine this possibly leading to deadlock.  Either (1) the user 
has forgotten about this pipe, and never attempts to read from it, or (2) 
the user attempts to read from this pipe, returning some or all of the 
data, and possibly hitting EOF, but in any case DOES NOT BLOCK.

(I'm sorry if this is basically restating what I've already said earlier.)

> That's more of a "my arm hurts when I do this" situation. If a script 
> opened 500 fds using exec redirection, resource exhaustion would be 
> their own responsibility.

Ha, good!

[I had a small fear that fd exhaustion might have been your actual 
concern.]

>> Meanwhile, the bash man page does not specify the shell's behavior for 
>> when a coproc terminates, so you might say there's room for 
>> interpretation and the new deferring behavior would not break any 
>> promises.
>
> I could always enable it in the devel branch and see what happens with 
> the folks who use that. It would be three years after any release when 
> distros would put it into production anyway.

Oh, fun  :)

>> But since you mention it, writing to a broken pipe is still 
>> semantically meaningful also.  (I would even say valid.)  In the 
>> typical case it's expected behavior for a process to get killed when it 
>> attempts this and shell pipeline programming is designed with this in 
>> mind.
>
> You'd be surprised at how often I get requests to put in an internal 
> SIGPIPE handler to avoid problems/shell termination with builtins 
> writing to closed pipes.

Ah, well, I get it though.  It _is_ a bit jarring to see your shell get 
blown away with something like this -

 	$ exec 9> >(typo)
 	$ ...
 	$ echo >&9  # Boom!

So it does not surprise me that you have some users puzzling over it.

But FWIW I do think it is the most consistent & correct behavior.

Plus, of course, the user can install their own shell handler code for 
that case, or downgrade the effect to a non-fatal error with

 	$ trap '' SIGPIPE

>> So even for write attempts, you introduce uncertain behavior by 
>> automatically closing the fds, when the normal, predictable, valid 
>> thing would be to die by SIGPIPE.
>
> Again, you might be surprised at how many people view that as a bug in 
> the shell.

I'm not terribly surprised, since at first (before reasoning about it) the 
behavior is admittedly alarming.  ("What happened to my terminal?!?!")

But I'd argue the alternative is worse, because then it's an unpredictable 
race between SIGPIPE (which they're complaining about) and EBADF.

> I think we're talking about our different interpretations of `invalid' 
> (EBADF as opposed to EPIPE/SIGPIPE).

Right - just explaining; I think by now we are on the same page.

> My original intention for the coprocs (and Korn's from whence they came) 
> was that the shell would be in the middle -- it's another way for the 
> shell to do IPC.

And coprocesses are great for this, too!

It's just that external commands in a sense are extensions of the shell. 
The arms and legs, you might say, for doing the heavy lifting.

Carl

next prev parent reply	other threads:[~2024-04-20 23:10 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CABkLJULa8c0zr1BkzWLTpAxHBcpb15Xms0-Q2OOVCHiAHuL0uA@mail.gmail.com>
     [not found] ` <9831afe6-958a-fbd3-9434-05dd0c9b602a@draigBrady.com>
2024-03-10 15:29   ` RFE: enable buffering on null-terminated data Zachary Santer
2024-03-10 20:36     ` Carl Edquist
2024-03-11  3:48       ` Zachary Santer
2024-03-11 11:54         ` Carl Edquist
2024-03-11 15:12           ` Examples of concurrent coproc usage? Zachary Santer
2024-03-14  9:58             ` Carl Edquist
2024-03-17 19:40               ` Zachary Santer
2024-04-01 19:24               ` Chet Ramey
2024-04-01 19:31                 ` Chet Ramey
2024-04-02 16:22                   ` Carl Edquist
2024-04-03 13:54                     ` Chet Ramey
2024-04-03 14:32               ` Chet Ramey
2024-04-03 17:19                 ` Zachary Santer
2024-04-08 15:07                   ` Chet Ramey
2024-04-09  3:44                     ` Zachary Santer
2024-04-13 18:45                       ` Chet Ramey
2024-04-14  2:09                         ` Zachary Santer
2024-04-04 12:52                 ` Carl Edquist
2024-04-04 23:23                   ` Martin D Kealey
2024-04-08 19:50                     ` Chet Ramey
2024-04-09 14:46                       ` Zachary Santer
2024-04-13 18:51                         ` Chet Ramey
2024-04-09 15:58                       ` Carl Edquist
2024-04-13 20:10                         ` Chet Ramey
2024-04-14 18:43                           ` Zachary Santer
2024-04-15 18:55                             ` Chet Ramey
2024-04-15 17:01                           ` Carl Edquist
2024-04-17 14:20                             ` Chet Ramey
2024-04-20 22:04                               ` Carl Edquist
2024-04-22 16:06                                 ` Chet Ramey
2024-04-27 16:56                                   ` Carl Edquist
2024-04-28 17:50                                     ` Chet Ramey
2024-04-08 16:21                   ` Chet Ramey
2024-04-12 16:49                     ` Carl Edquist
2024-04-16 15:48                       ` Chet Ramey
2024-04-20 23:11                         ` Carl Edquist [this message]
2024-04-22 16:12                           ` Chet Ramey
2024-04-17 14:37               ` Chet Ramey
2024-04-20 22:04                 ` Carl Edquist
2024-03-12  3:34           ` RFE: enable buffering on null-terminated data Zachary Santer
2024-03-14 14:15             ` Carl Edquist
2024-03-18  0:12               ` Zachary Santer
2024-03-19  5:24                 ` Kaz Kylheku
2024-03-19 12:50                   ` Zachary Santer
2024-03-20  8:55                     ` Carl Edquist
2024-04-19  0:16                       ` Modify buffering of standard streams via environment variables (not LD_PRELOAD)? Zachary Santer
2024-04-19  9:32                         ` Pádraig Brady
2024-04-19 11:36                           ` Zachary Santer
2024-04-19 12:26                             ` Pádraig Brady
2024-04-19 16:11                               ` Zachary Santer
2024-04-20 16:00                         ` Carl Edquist
2024-04-20 20:00                           ` Zachary Santer
2024-04-20 21:45                             ` Carl Edquist

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0488843f-339a-f25e-a3d2-cb0afeec91d1@cs.wisc.edu \
    --to=edquist@cs.wisc.edu \
    --cc=bug-bash@gnu.org \
    --cc=chet.ramey@case.edu \
    --cc=libc-alpha@sourceware.org \
    --cc=zsanter@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).