Re: Examples of concurrent coproc usage?

unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Carl Edquist <edquist@cs.wisc.edu>
To: Chet Ramey <chet.ramey@case.edu>
Cc: Zachary Santer <zsanter@gmail.com>, bug-bash <bug-bash@gnu.org>,
	libc-alpha@sourceware.org
Subject: Re: Examples of concurrent coproc usage?
Date: Thu, 4 Apr 2024 07:52:05 -0500 (CDT)	[thread overview]
Message-ID: <cf2692e7-2b6d-0ba3-678b-29c8efaec1d3@cs.wisc.edu> (raw)
In-Reply-To: <2791ad90-a871-474d-89dd-bc6b20cdd1f2@case.edu>

Hi Chet, thanks for taking the time to review this  :D

[My apologies again upfront for another lengthy (comprehensive?) email.]

On Wed, 3 Apr 2024, Chet Ramey wrote:

> On 4/2/24 12:22 PM, Carl Edquist wrote:
>
>>  the forked coproc has to close its fds to/from _all_ other existing
>>  coprocs (as there can be several).
>
> And there is the issue. Without multi-coproc support, the shell only 
> keeps track of one coproc at a time, so there's only one set of pipe 
> file descriptors to close.

Right, exactly.  The example with the default build (showing the essential 
case that causes deadlock) was to highlight that your multi-coproc support 
code apparently does indeed correctly track and close all these fds, and 
thus prevents the deadlock issue.

On Wed, 3 Apr 2024, Chet Ramey wrote:

> It's straightforward: the coproc process terminates, the shell reaps it, 
> marks it as dead, notifies the user that the process exited, and reaps 
> it before printing the next prompt. I don't observe any different 
> behavior between the default and when compiled for multiple coprocs.
>
> It depends on when the process terminates as to whether you get a prompt 
> back and need to run an additional command before reaping the coproc 
> (macOS, RHEL), which gives you the opportunity to run the `read' 
> command:

Ah, my mistake then - thanks for explaining.  I must have been thrown off 
by the timing, running it with and without an intervening interactive 
prompt before the read command.  When run interactively, an extra 'Enter' 
(or not) before the read command changes the behavior.

So in that case, this issue (that the shell closes its read-end of the 
pipe from a reaped coproc, potentially before being able to read the final 
output) was already there and is not specific to the multi-coproc code.

But in any case, it seems like this is a race then?  That is, whether the 
child process terminates before or after the prompt in question.

> $ coproc WC { wc; }
> [1] 48057
> $ exec {WC[1]}>&-
> $ read -u ${WC[0]} X
> [1]+  Done                    coproc WC { wc; }
> bash: DEBUG warning: cpl_reap: deleting 48057
> $ echo $X
> 0 0 0
>
> (I put in a trace statement to show exactly when the coproc gets reaped and
> deallocated.)

Thanks! (for taking the time to play with this)

Though apparently it's still a race here.  If you diagram the shell and 
coproc (child) processes, I think you'll see that your DEBUG statement can 
also happen _before_ the read command, which would then fail.  You can 
contrive this by adding a small sleep (eg, 0.1s) at the end of 
execute_builtin_or_function (in execute_cmd.c), just before it returns.

Eg:

 	diff --git a/execute_cmd.c b/execute_cmd.c
 	index ed1063e..c72f322 100644
 	--- a/execute_cmd.c
 	+++ b/execute_cmd.c
 	@@ -5535,6 +5535,7 @@ execute_builtin_or_function (words, builtin, var, redirects,
 	   discard_unwind_frame ("saved_fifos");
 	 #endif

 	+  usleep(100000);
 	   return (result);
 	 }

If I do this, I consistently see "read: X: invalid file descriptor 
specification" running the above 4-line "coproc WC" example in a script, 
demonstrating that there is no guarantee that the read command will start 
before the WC coproc is reaped and {WC[0]} is closed, even though it's the 
next statement after 'exec {WC[1]}>&-'.

But (as I'll try to show) you can trip up on this race even without 
slowing down bash itself artificially.

> I can't reproduce your results with non-interactive shells, either, with 
> job control enabled or disabled.

That's fair; let's try it with a script:

 	$ cat cope.sh
 	#!/bin/bash

 	coproc WC { wc; }
 	jobs
 	exec {WC[1]}>&-
 	[[ $1 ]] && sleep "$1"
 	jobs
 	read -u ${WC[0]} X
 	echo $X

Run without sleep, the wc output is seen:

 	$ ./cope.sh
 	[1]+  Running                 coproc WC { wc; } &
 	[1]+  Running                 coproc WC { wc; } &
 	0 0 0

Run with a brief sleep after closing the write end, and it breaks:

 	$ ./cope.sh .1
 	[1]+  Running                 coproc WC { wc; } &
 	[1]+  Done                    coproc WC { wc; }
 	./cope.sh: line 8: read: X: invalid file descriptor specification

And, if I run with "0" for a sleep time, it intermittently behaves like 
either of the above.  Racy!

>> This is a bug.  The shell should not automatically close its read pipe 
>> to a coprocess that has terminated -- it should stay open to read the 
>> final output, and the user should be responsible for closing the read 
>> end explicitly.
>
> How long should the shell defer deallocating the coproc after the 
> process terminates?

I only offer my opinion here, but it strikes me that it definitely should 
_not_ be based on an amount of _time_.  That's inherently racy.

In the above example, there is only a single line to read; but the general 
case there may be many 'records' sitting in the pipe waiting to be 
processed, and processing each record may take an arbitrary amount of 
time.  (Consider a coproc containing a sort command, for example, that 
produces all its output lines at once after it sees EOF, and then 
terminates.)

Zack illustrated basically the same point with his example:

>	exec {fd}< <( some command )
>	while IFS='' read -r line <&"${fd}"; do
>	  # do stuff
>	done
>	{fd}<&-

A process-substitution open to the shell like this is effectively a 
one-ended coproc (though not in the jobs list), and it behaves reliably 
here because the user can count on {fd} to remain open even after the 
child process terminates.

So, the user can determine when the coproc fds are no longer needed, 
whether that's when EOF is hit trying to read from the coproc, or whatever 
other condition.

Personally I like the idea of 'closing' a coproc explicitly, but if it's a 
bother to add options to the coproc keyword, then I would say just let the 
user be responsible for closing the fds.  Once the coproc has terminated 
_and_ the coproc's fds are closed, then the coproc can be deallocated.

Apparently there is already some detection in there for when the coproc 
fds get closed, as the {NAME[@]} fd array members get set to -1 
automatically when when you do, eg, 'exec {NAME[0]}<&-'.  So perhaps this 
won't be a radical change.

Alternatively (or, additionally), you could interpret 'unset NAME' for a 
coproc to mean "deallocate the coproc."  That is, close the {NAME[@]} fds, 
unset the NAME variable, and remove any coproc bookkeeping for NAME.

(Though if the coproc child process hasn't terminated on its own yet, 
still it shouldn't be killed, and perhaps it should remain in the jobs 
list as a background process until it's done.)

...

[And if you _really_ don't want to defer deallocating a coproc after it 
terminates, I suppose you can go ahead and deallocate it in terms of 
removing it from the jobs list and dropping any bookkeeping for it - as 
long as you leave the fds and fd variables intact for the user.

It's a little dicey, but in theory it should not lead to deadlock, even if 
copies of these (now untracked) reaped-coproc pipe fds end up in other 
coprocs.  Why not?  Because (1) this coproc is dead and thus won't be 
trying to read anymore from its (now closed) end, and (2) reads from the 
shell's end will not block since there are no copies of the coproc's write 
end open anymore.

Still, it'd be cleaner to defer deallocation, to avoid these stray (albeit 
harmless) copies of fds making their way into new coprocs.]

> What should it do to make sure that the variables don't hang around with 
> invalid file descriptors?

First, just to be clear, the fds to/from the coproc pipes are not invalid 
when the coproc terminates (you can still read from them); they are only 
invalid after they are closed.

The surprising bit is when they become invalid unexpectedly (from the 
point of view of the user) because the shell closes them automatically, 
at the somewhat arbitrary timing when the coproc is reaped.

Second, why is it a problem if the variables keep their (invalid) fds 
after closing them, if the user is the one that closed them anyway?

Isn't this how it works with the auto-assigned fd redirections?

Eg:

 	$ exec {d}<.
 	$ echo $d
 	10
 	$ exec {d}<&-
 	$ echo $d
 	10

But, as noted, bash apparently already ensures that the variables don't 
hang around with invalid file descriptors, as once you close them the 
corresponding variable gets updated to "-1".

> Or should the user be responsible for unsetting the array variable too? 
> (That's never been a requirement, obviously.)

On the one hand, bash is already messing with the coproc array variables 
(setting the values to -1 when the user closes the fds), so it's not 
really a stretch in my mind for bash to unset the whole variable when the 
coproc is deallocated.  On the other hand, as mentioned above, bash leaves 
automatically allocated fd variables intact after the user explicitly 
closes them.

So I guess either way seems reasonable.

If the user has explicitly closed both fd ends for a coproc, it should not 
be a surprise to the user either way - whether the variable gets unset 
automatically, or whether it remains with (-1 -1).

Since you are already unsetting the variable when the coproc is 
deallocated though, I'd say it's fine to keep doing that -- just don't 
deallocate the coproc before the user has closed both fds.

>> It also invites trouble if the shell variable that holds the fds gets 
>> removed unexpectedly when the coprocess terminates.  (Suddenly the 
>> variable expands to an empty string.)  It seems to me that the proper 
>> time to clear the coproc variable (if at all) is after the user has 
>> explicitly closed both of the fds.
>
> That requires adding more plumbing than I want to,

Your project your call :D

> especially since the user can always save the file descriptors of 
> interest into another variable if they want to use them after the coproc 
> terminates.

*Except* that it's inherently a race condition whether the original 
variables will still be intact to save them.

Even if you attempt to save them immediately:

 	coproc X { exit; }
 	X_BACKUP=( ${X[@]} )

it's not guaranteed that X_BACKUP=(...) will run before coproc X has been 
deallocated, and the X variable cleared.

No doubt this hasn't escaped you, but in any case you can see it for 
yourself if you introduce a small delay in execute_coproc, in the parent, 
just after the call to make_child:

 	diff --git a/execute_cmd.c b/execute_cmd.c
 	index ed1063e..5949e3e 100644
 	--- a/execute_cmd.c
 	+++ b/execute_cmd.c
 	@@ -2440,6 +2440,8 @@ execute_coproc (command, pipe_in, pipe_out,
 	fds_to_close)

 	       exit (estat);
 	     }
 	+  else
 	+    usleep(100000);

 	   close (rpipe[1]);
 	   close (wpipe[0]);

When I try this, X_BACKUP is consistently empty.  Though if you add, say, 
a call to "sleep 0.2" to coproc X, then X_BACKUP consistently gets a copy 
of X's fds in time.

I am sorry if this sounds contrived, but I hope it demonstrates that 
closing fds and and unsetting the variable for a coproc automatically when 
it terminates is fundamentally flawed, because it depends on the arbitrary 
race timing between the two processes.

>> *Or* else add an option to the coproc keyword to explicitly close the 
>> coproc - which will close both fds and clear the variable.
>
> Not going to add any more options to reserved words; that does more 
> violence to the grammar than I want.

Not sure how you'd feel about using 'unset' on the coproc variable 
instead.  (Though as discussed, I think the coproc terminated + fds 
manually closed condition is also sufficient.)

.............

Anyway, as far as I'm concerned there's nothing urgent about all this, but 
(along with the multi-coproc support that you implemented), avoiding the 
current automatic deallocation behavior would seem to go a long way toward 
making coproc a correct and generally useful feature.

Thanks for your time!

Carl

PS Zack you're welcome :)

next prev parent reply	other threads:[~2024-04-04 12:51 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CABkLJULa8c0zr1BkzWLTpAxHBcpb15Xms0-Q2OOVCHiAHuL0uA@mail.gmail.com>
     [not found] ` <9831afe6-958a-fbd3-9434-05dd0c9b602a@draigBrady.com>
2024-03-10 15:29   ` RFE: enable buffering on null-terminated data Zachary Santer
2024-03-10 20:36     ` Carl Edquist
2024-03-11  3:48       ` Zachary Santer
2024-03-11 11:54         ` Carl Edquist
2024-03-11 15:12           ` Examples of concurrent coproc usage? Zachary Santer
2024-03-14  9:58             ` Carl Edquist
2024-03-17 19:40               ` Zachary Santer
2024-04-01 19:24               ` Chet Ramey
2024-04-01 19:31                 ` Chet Ramey
2024-04-02 16:22                   ` Carl Edquist
2024-04-03 13:54                     ` Chet Ramey
2024-04-03 14:32               ` Chet Ramey
2024-04-03 17:19                 ` Zachary Santer
2024-04-08 15:07                   ` Chet Ramey
2024-04-09  3:44                     ` Zachary Santer
2024-04-13 18:45                       ` Chet Ramey
2024-04-14  2:09                         ` Zachary Santer
2024-04-04 12:52                 ` Carl Edquist [this message]
2024-04-04 23:23                   ` Martin D Kealey
2024-04-08 19:50                     ` Chet Ramey
2024-04-09 14:46                       ` Zachary Santer
2024-04-13 18:51                         ` Chet Ramey
2024-04-09 15:58                       ` Carl Edquist
2024-04-13 20:10                         ` Chet Ramey
2024-04-14 18:43                           ` Zachary Santer
2024-04-15 18:55                             ` Chet Ramey
2024-04-15 17:01                           ` Carl Edquist
2024-04-17 14:20                             ` Chet Ramey
2024-04-20 22:04                               ` Carl Edquist
2024-04-22 16:06                                 ` Chet Ramey
2024-04-27 16:56                                   ` Carl Edquist
2024-04-28 17:50                                     ` Chet Ramey
2024-04-08 16:21                   ` Chet Ramey
2024-04-12 16:49                     ` Carl Edquist
2024-04-16 15:48                       ` Chet Ramey
2024-04-20 23:11                         ` Carl Edquist
2024-04-22 16:12                           ` Chet Ramey
2024-04-17 14:37               ` Chet Ramey
2024-04-20 22:04                 ` Carl Edquist
2024-03-12  3:34           ` RFE: enable buffering on null-terminated data Zachary Santer
2024-03-14 14:15             ` Carl Edquist
2024-03-18  0:12               ` Zachary Santer
2024-03-19  5:24                 ` Kaz Kylheku
2024-03-19 12:50                   ` Zachary Santer
2024-03-20  8:55                     ` Carl Edquist
2024-04-19  0:16                       ` Modify buffering of standard streams via environment variables (not LD_PRELOAD)? Zachary Santer
2024-04-19  9:32                         ` Pádraig Brady
2024-04-19 11:36                           ` Zachary Santer
2024-04-19 12:26                             ` Pádraig Brady
2024-04-19 16:11                               ` Zachary Santer
2024-04-20 16:00                         ` Carl Edquist
2024-04-20 20:00                           ` Zachary Santer
2024-04-20 21:45                             ` Carl Edquist

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cf2692e7-2b6d-0ba3-678b-29c8efaec1d3@cs.wisc.edu \
    --to=edquist@cs.wisc.edu \
    --cc=bug-bash@gnu.org \
    --cc=chet.ramey@case.edu \
    --cc=libc-alpha@sourceware.org \
    --cc=zsanter@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).