unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Carl Edquist <edquist@cs.wisc.edu>
To: Zachary Santer <zsanter@gmail.com>
Cc: bug-bash <bug-bash@gnu.org>, libc-alpha@sourceware.org
Subject: Re: Examples of concurrent coproc usage?
Date: Thu, 14 Mar 2024 04:58:48 -0500 (CDT)	[thread overview]
Message-ID: <88a67f36-2a56-a838-f763-f55b3073bb50@lando.namek.net> (raw)
In-Reply-To: <CABkLJULrT2wi_=VbXxjQUMS6Peso3D5HSFsWp6gJm9-2UbpczQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 11125 bytes --]

[My apologies up front for the length of this email.  The short story is I 
played around with the multi-coproc support: the fd closing seems to work 
fine to prevent deadlock, but I found one bug apparently introduced with 
multi-coproc support, and one other coproc bug that is not new.]

On Mon, 11 Mar 2024, Zachary Santer wrote:

> Was "RFE: enable buffering on null-terminated data"
>
> On Mon, Mar 11, 2024 at 7:54 AM Carl Edquist <edquist@cs.wisc.edu> wrote:
>>
>> (Kind of a side-note ... bash's limited coprocess handling was a long 
>> standing annoyance for me in the past, to the point that I wrote a bash 
>> coprocess management library to handle multiple active coprocess and 
>> give convenient methods for interaction.  Perhaps the trickiest bit 
>> about multiple coprocesses open at once (which I suspect is the reason 
>> support was never added to bash) is that you don't want the second and 
>> subsequent coprocesses to inherit the pipe fds of prior open 
>> coprocesses.  This can result in deadlock if, for instance, you close 
>> your write end to coproc1, but coproc1 continues to wait for input 
>> because coproc2 also has a copy of a write end of the pipe to coproc1's 
>> input.  So you need to be smart about subsequent coprocesses first 
>> closing all fds associated with other coprocesses.
>
> https://lists.gnu.org/archive/html/help-bash/2021-03/msg00296.html
> https://lists.gnu.org/archive/html/help-bash/2021-04/msg00136.html

Oh hey!  Look at that.  Thanks for the links to this thread - I gave them 
a read (along with the old thread from 2011-04).  I feel a little bad I 
missed the 2021 discussion.


> You're on the money, though there is a preprocessor directive you can 
> build bash with that will allow it to handle multiple concurrent 
> coprocesses without complaining: MULTIPLE_COPROCS=1.

Who knew!  Thanks for mentioning it.  When I saw that "only one active 
coprocess at a time" was _still_ listed in the bugs section in bash 5, I 
figured multiple coprocess support had just been abandoned.  Chet, that's 
cool that you implemented it.

I kind of went all-out on my bash coprocess management library though 
(mostly back in 2014-2016) ... It's pretty feature-rich and pleasant to 
use -- to the point that I don't think there is any going-back to bash's 
internal coproc for me, even with multiple coprocess are support.  I 
implemented it with shell functions, so it doesn't rely on compiling 
anything or the latest version of bash being present.  (I even added bash3 
support for older systems.)

> Chet Ramey's sticking point was that he hadn't seen coprocesses used 
> enough in the wild to satisfactorily test that his implementation did in 
> fact keep the coproc file descriptors out of subshells.

To be fair coproc is kind of a niche feature.  But I think more people 
would play with it if it were less awkward to use and if they felt free to 
experiment with multiple coprocs.

By the way, I agree with the Chet's exact description of the problems 
here:

     https://lists.gnu.org/archive/html/help-bash/2021-03/msg00282.html

The issue is separate from the stdio buffering discussion; the issue here 
is with child processes (and I think not foreground subshells, but 
specifically background processes, including coprocesses) inheriting the 
shell's fds that are open to pipes connected to an active coprocess.

Not getting a sigpipe/write failure results in a coprocess sitting around 
longer than it ought to, but it's not obvious (to me) how this leads to 
deadlock, since the shell at least has closed its read end of the pipe to 
that coprocess, so at least you aren't going to hang trying to read from 
it.

On the other hand, a coprocess not seeing EOF will cause deadlock pretty 
readily, especially if it processes all its input before producing output 
(as with wc, sort, sha1sum).  Trying to read from the coprocess will hang 
indefinitely if the coprocess is still waiting for input, which is the 
case if there is another copy of the write end of its read pipe open 
somewhere.


> If you've got examples you can direct him to, I'd really appreciate it.

[My original use cases for multiple coprocesses were (1) for 
programmatically interacting with multiple command-line database clients 
together, and (2) for talking to multiple interactive command-line game 
engines (othello) to play each other.

Perl's IPC::Open2 works, too, but it's easier to experiment on the fly in 
bash.

And in general having the freedom to play with multiple coprocesses helps 
mock up more complicated pipelines, or even webs of interconnected 
processes.]

But you can create a deadlock without doing anything fancy.


Well, *without multi-coproc support*, here's a simple wc example; first 
with a single coproc:

 	$ coproc WC { wc; }
 	$ exec {WC[1]}>&-
 	$ read -u ${WC[0]} X
 	$ echo $X
 	0 0 0

This works as expected.

But if you try it with a second coproc (again, without multi-coproc 
support), the second coproc will inherit copies of the shell's read and 
write pipe fds to the first coproc, and the read will hang (as described 
above), as the first coproc doesn't see EOF:

 	$ coproc WC { wc; }
 	$ coproc CAT { cat; }
 	$ exec {WC[1]}>&-
 	$ read -u ${WC[0]} X

 	# HANGS


But, this can be observed even before attempting the read that hangs.

You can 'ps' to see the user shell (bash), the coprocs' shells (bash), and 
the coprocs' commands (wc & cat).  Then 'ls -l /proc/PID/fd/' to see what 
they have open:

- The user shell has its copies of the read & write fds open for both 
coprocs (as it should)

- The coproc commands (wc & cat) each have only a single read & write pipe 
open, on fd 0 & 1 (as they should)

- The first coproc's shell (WC) has only a single read & write pipe open, 
on fd 0 & 1 (as it should)

- The second coproc's shell (CAT) has its own read & write pipes open, on 
fd 0 & 1 (good), but it also has a copy of the user shell's read & write 
pipe fds to the first coproc (WC) open (on fd 60 & 63 in this case, which 
it inherited when forking from the user shell)

(And in general, latter coproc shells will have stray copies of the user 
shell's r/w ends from all previous coprocs.)

So, you can examine the situation after setting up coprocs, to see if all 
the coproc-related processes have just two pipes open (on fd 0 & 1).  If 
this is the case, I think that suffices to convince me anyway that no 
deadlocks related to stray open fds can happen.  But if any of them has 
other pipes open (inherited from the user shell), that indicates the 
problem.


I tried compiling the latest bash with MULTIPLE_COPROCS=1 (version 
5.2.21(1)) to test out the multi-coproc support.

I tried standing up the above WC and CAT coprocs, together with some 
others to check that the behavior looked ok for pipelines also (which I 
think was one of Chet's concerns)

 	$ coproc WC { wc; }
 	$ coproc CAT { cat; }
 	$ coproc CAT3 { cat | cat | cat; }
 	$ coproc CAT4 { cat | cat | cat | cat; }
 	$ coproc CATX { cat ; }

And as far as the fd situation, everything checks out: the user shell has 
fds open to all the coprocs, and the coproc shells & coproc commands 
(including all the cat's in the pipelines) have only a single read & write 
pipe open on fd 0 & 1.  So, the multi-coproc code seems to be closing the 
shell's copies correctly.

[The examples are boring, but their point is just to investigate the 
stray-fd question.]


HOWEVER!!!

Unexpectedly, the new multi-coproc code seems to close the user shell's 
end of a coprocess's pipes, once the coprocess has terminated.  When 
compiled with MULTIPLE_COPROCS=1, this is true even if there is only a 
single coproc:

 	$ coproc WC { wc; }
 	$ exec {WC[1]}>&-
 	[1]+  Done                    coproc WC { wc; }

 	# WC var gets cleared!!
 	# shell's ${WC[0]} is also closed!

 	# now, can't do:

 	$ read -u ${WC[0]} X
 	$ echo $X

I'm attaching a "bad-coproc-log.txt" with more detailed ps & ls output 
examining the open fds at each step, to make it clear what's happening.

This is a bug.  The shell should not automatically close its read pipe to 
a coprocess that has terminated -- it should stay open to read the final 
output, and the user should be responsible for closing the read end 
explicitly.

This is more obvious for commands that wait until they see EOF before 
generating any output (wc, sort, sha1sum).  But it's also true for any 
command that produces output (filters (sed) or generators (ls)).  If the 
shell's read end is closed automatically, any final output waiting in the 
pipe will be discarded.

It also invites trouble if the shell variable that holds the fds gets 
removed unexpectedly when the coprocess terminates.  (Suddenly the 
variable expands to an empty string.)  It seems to me that the proper time 
to clear the coproc variable (if at all) is after the user has explicitly 
closed both of the fds.  *Or* else add an option to the coproc keyword to 
explicitly close the coproc - which will close both fds and clear the 
variable.

...

Separately, I consider the following coproc behavior to be weird, fragile, 
and broken.

If you fg a coproc, then stop and bg it, it dies.  Why?  Apparently the 
shell abandons the coproc when it is stopped, closes the pipe fds for it, 
and clears the fd variable.

 	$ coproc CAT { cat; }
 	[1] 10391

 	$ fg
 	coproc CAT { cat; }

 	# oops!

 	^Z
 	[1]+  Stopped                 coproc CAT { cat; }

 	$ echo ${CAT[@]}  # what happened to the fds?

 	$ ls -lgo /proc/$$/fd/
 	total 0
 	lrwx------ 1 64 Mar 14 02:26 0 -> /dev/pts/3
 	lrwx------ 1 64 Mar 14 02:26 1 -> /dev/pts/3
 	lrwx------ 1 64 Mar 14 02:25 2 -> /dev/pts/3
 	lrwx------ 1 64 Mar 14 02:26 255 -> /dev/pts/3

 	$ bg
 	[1]+ coproc CAT { cat; } &

 	$
 	[1]+  Done                    coproc CAT { cat; }

 	$ # sad user :(


This behavior is not new to the multi-coproc support.  But just the same 
it seems broken for the shell to automatically close the fds to 
coprocesses.  That should be done explicitly by the user.


>> Word to the wise: you might encounter this issue (coproc2 prevents 
>> coproc1 from seeing its end-of-input) even though you are rigging this 
>> up yourself with FIFOs rather than bash's coproc builtin.)
>
> In my case, it's mostly a non-issue, because I fork the - now three - 
> background processes before exec'ing automatic fds redirecting to/from 
> their FIFO's in the parent process. All the automatic fds get put in an 
> array, and I do close them all at the beginning of a subsequent process 
> substitution.

That's a nice trick with the shell backgrounding all the coprocesses 
before connecting the fifos.  But yeah, to make subsequent coprocesses you 
do still have to close the copy of the user shell's fds that the coprocess 
shell inherits.  It sounds like you are doing that (nice!), but in any 
case it requires some care, and as these stack up it is really handy to 
have something manage it all for you.

(Perhaps this is where I ask if you are happy with your solution or if you 
would like to try out something wildly more flexible...)


Happy coprocessing! :)

Carl

[-- Attachment #2: Type: text/plain, Size: 1358 bytes --]

$ coproc WC { wc; }
[1] 10038

$ ps
  PID TTY          TIME CMD
 9926 pts/3    00:00:00 bash
10038 pts/3    00:00:00 bash
10039 pts/3    00:00:00 wc
10040 pts/3    00:00:00 ps

$ ls -lgo /proc/{$$,10038,10039}/fd/
/proc/10038/fd/:
total 0
lr-x------ 1 64 Mar 14 02:29 0 -> pipe:[81214]
l-wx------ 1 64 Mar 14 02:29 1 -> pipe:[81213]
lrwx------ 1 64 Mar 14 02:28 2 -> /dev/pts/3
lrwx------ 1 64 Mar 14 02:29 255 -> /dev/pts/3

/proc/10039/fd/:
total 0
lr-x------ 1 64 Mar 14 02:29 0 -> pipe:[81214]
l-wx------ 1 64 Mar 14 02:29 1 -> pipe:[81213]
lrwx------ 1 64 Mar 14 02:28 2 -> /dev/pts/3

/proc/9926/fd/:
total 0
lrwx------ 1 64 Mar 14 02:26 0 -> /dev/pts/3
lrwx------ 1 64 Mar 14 02:26 1 -> /dev/pts/3
lrwx------ 1 64 Mar 14 02:25 2 -> /dev/pts/3
lrwx------ 1 64 Mar 14 02:26 255 -> /dev/pts/3
l-wx------ 1 64 Mar 14 02:26 60 -> pipe:[81214]
lr-x------ 1 64 Mar 14 02:26 63 -> pipe:[81213]

$ echo ${WC[@]}
63 60

$ exec {WC[1]}>&-
[1]+  Done                    coproc WC { wc; }

$ ps
  PID TTY          TIME CMD
 9926 pts/3    00:00:00 bash
10042 pts/3    00:00:00 ps

$ echo ${WC[@]}

$ ls -lgo /proc/$$/fd/
total 0
lrwx------ 1 64 Mar 14 02:26 0 -> /dev/pts/3
lrwx------ 1 64 Mar 14 02:26 1 -> /dev/pts/3
lrwx------ 1 64 Mar 14 02:25 2 -> /dev/pts/3
lrwx------ 1 64 Mar 14 02:26 255 -> /dev/pts/3


  reply	other threads:[~2024-03-14 10:58 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CABkLJULa8c0zr1BkzWLTpAxHBcpb15Xms0-Q2OOVCHiAHuL0uA@mail.gmail.com>
     [not found] ` <9831afe6-958a-fbd3-9434-05dd0c9b602a@draigBrady.com>
2024-03-10 15:29   ` RFE: enable buffering on null-terminated data Zachary Santer
2024-03-10 20:36     ` Carl Edquist
2024-03-11  3:48       ` Zachary Santer
2024-03-11 11:54         ` Carl Edquist
2024-03-11 15:12           ` Examples of concurrent coproc usage? Zachary Santer
2024-03-14  9:58             ` Carl Edquist [this message]
2024-03-17 19:40               ` Zachary Santer
2024-04-01 19:24               ` Chet Ramey
2024-04-01 19:31                 ` Chet Ramey
2024-04-02 16:22                   ` Carl Edquist
2024-04-03 13:54                     ` Chet Ramey
2024-04-03 14:32               ` Chet Ramey
2024-04-03 17:19                 ` Zachary Santer
2024-04-08 15:07                   ` Chet Ramey
2024-04-09  3:44                     ` Zachary Santer
2024-04-13 18:45                       ` Chet Ramey
2024-04-14  2:09                         ` Zachary Santer
2024-04-04 12:52                 ` Carl Edquist
2024-04-04 23:23                   ` Martin D Kealey
2024-04-08 19:50                     ` Chet Ramey
2024-04-09 14:46                       ` Zachary Santer
2024-04-13 18:51                         ` Chet Ramey
2024-04-09 15:58                       ` Carl Edquist
2024-04-13 20:10                         ` Chet Ramey
2024-04-14 18:43                           ` Zachary Santer
2024-04-15 18:55                             ` Chet Ramey
2024-04-15 17:01                           ` Carl Edquist
2024-04-17 14:20                             ` Chet Ramey
2024-04-20 22:04                               ` Carl Edquist
2024-04-22 16:06                                 ` Chet Ramey
2024-04-27 16:56                                   ` Carl Edquist
2024-04-28 17:50                                     ` Chet Ramey
2024-04-08 16:21                   ` Chet Ramey
2024-04-12 16:49                     ` Carl Edquist
2024-04-16 15:48                       ` Chet Ramey
2024-04-20 23:11                         ` Carl Edquist
2024-04-22 16:12                           ` Chet Ramey
2024-04-17 14:37               ` Chet Ramey
2024-04-20 22:04                 ` Carl Edquist
2024-03-12  3:34           ` RFE: enable buffering on null-terminated data Zachary Santer
2024-03-14 14:15             ` Carl Edquist
2024-03-18  0:12               ` Zachary Santer
2024-03-19  5:24                 ` Kaz Kylheku
2024-03-19 12:50                   ` Zachary Santer
2024-03-20  8:55                     ` Carl Edquist
2024-04-19  0:16                       ` Modify buffering of standard streams via environment variables (not LD_PRELOAD)? Zachary Santer
2024-04-19  9:32                         ` Pádraig Brady
2024-04-19 11:36                           ` Zachary Santer
2024-04-19 12:26                             ` Pádraig Brady
2024-04-19 16:11                               ` Zachary Santer
2024-04-20 16:00                         ` Carl Edquist
2024-04-20 20:00                           ` Zachary Santer
2024-04-20 21:45                             ` Carl Edquist

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=88a67f36-2a56-a838-f763-f55b3073bb50@lando.namek.net \
    --to=edquist@cs.wisc.edu \
    --cc=bug-bash@gnu.org \
    --cc=libc-alpha@sourceware.org \
    --cc=zsanter@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).