From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on starla X-Spam-Level: X-Spam-Status: No, score=0.5 required=3.0 tests=BODY_8BITS,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id E2B641F44D for ; Fri, 12 Apr 2024 16:48:02 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (2048-bit key; secure) header.d=cs.wisc.edu header.i=@cs.wisc.edu header.a=rsa-sha256 header.s=csl-2018021300 header.b=iz24n1Z8; dkim-atps=neutral Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 175DE3858415 for ; Fri, 12 Apr 2024 16:48:02 +0000 (GMT) Received: from smtpout2.cs.wisc.edu (smtpout2.cs.wisc.edu [128.105.6.54]) by sourceware.org (Postfix) with ESMTPS id 7A28B3858D38 for ; Fri, 12 Apr 2024 16:47:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7A28B3858D38 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=cs.wisc.edu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=cs.wisc.edu ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7A28B3858D38 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=128.105.6.54 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712940462; cv=none; b=FMAqY5rWtvqI+iyUNIswlc+e/fYHPdIqnS1kkwRTWLy/sAlu/OEhpRtuT7vsYyM1gW4G1DQ8Ss3BtM3NeUTzVCvnlm1mwRWIty/UBsnberzEeyF+RM9qVSi+TNiQ1bBOKSjNCXmAb+MlEqjGzab+83VybrdCiWr2vHMYu+nxJL4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712940462; c=relaxed/simple; bh=b1zeUmgSts4oD2fesulrS89u93gwX2Epti9ygjkqp+s=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=JxXMxYIpaAonXCXYQ0Ouj8n5ERxPfcq2OeaR5PVxbJpHakYpVcly16Kzpv2ZmZ0lMTQ6Z5EdRmmKxghaAHc7FSEUaEephnL74R5eMdDeg+psfflV+VXUb+/ZKmzGgPYXuQ/msxZ3ah+abOyNwXsOKFi7B6tWdfuIubvVFYMa5Qw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from alumni.cs.wisc.edu (alumni.cs.wisc.edu [128.105.2.11]) by flint.cs.wisc.edu (8.14.7/8.14.4) with ESMTP id 43CGlODB002005; Fri, 12 Apr 2024 11:47:25 -0500 DKIM-Filter: OpenDKIM Filter v2.11.0 flint.cs.wisc.edu 43CGlODB002005 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.wisc.edu; s=csl-2018021300; t=1712940445; bh=gm9OF5WAsFTGshcZAI824luTU1Y9r1uABTBR/Ii50lg=; h=Date:From:To:cc:Subject:In-Reply-To:References:From; b=iz24n1Z8YdpW8XonMQSbPtPY/pS5OMC3vE7eVcfbXkIuzjHEcKVIb30umu9HrqO4h BactbSFOy5+JXvNNjUD0+2UPMmkvsgXrT/E1AJrJMwQkiKp0V6JyRipqsn+GixlmPR to7mruTr64KaYlJxNBF0/pjbegZz2/mkgcVEc4H9M0c5ry0AYfX5N6CAAznMWqw/7O BBcO2dwetjnew3SZz9mxbDjQOME7ezH7s2wVvYyLaGA94Bi7ILm76H6Te/aEAaQmcy pBpLmKZPkKTO0DZk0qD/E7KRzf8G4xXGnG1pF5daEyUuUBi5JX3M1ab4INAAFCltAM A1JRF69AiZkIA== Received: from localhost (localhost.localdomain [127.0.0.1]) by alumni.cs.wisc.edu (Postfix) with ESMTP id D0E041E0835; Fri, 12 Apr 2024 11:47:24 -0500 (CDT) Date: Fri, 12 Apr 2024 11:49:08 -0500 (CDT) From: Carl Edquist To: Chet Ramey cc: Zachary Santer , bug-bash , libc-alpha@sourceware.org Subject: Re: Examples of concurrent coproc usage? In-Reply-To: <86c3765e-e29d-48d5-b468-3f20b59916b2@case.edu> Message-ID: <6bcbd956-7296-7150-765f-63318a425d1b@cs.wisc.edu> References: <9831afe6-958a-fbd3-9434-05dd0c9b602a@draigBrady.com> <317fe0e2-8cf9-d4ac-ed56-e6ebcc2baa55@cs.wisc.edu> <8c490a55-598a-adf6-67c2-eb2a6099620a@cs.wisc.edu> <88a67f36-2a56-a838-f763-f55b3073bb50@lando.namek.net> <2791ad90-a871-474d-89dd-bc6b20cdd1f2@case.edu> <86c3765e-e29d-48d5-b468-3f20b59916b2@case.edu> MIME-Version: 1.0 Content-Type: multipart/mixed; BOUNDARY=-1463761075561652934171284338325228 Content-ID: X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1463761075561652934171284338325228 Content-Type: text/plain; CHARSET=utf-8; format=flowed Content-Transfer-Encoding: 8BIT Content-ID: On Mon, 8 Apr 2024, Chet Ramey wrote: > On 4/4/24 8:52 AM, Carl Edquist wrote: > >> Zack illustrated basically the same point with his example: >> >>>     exec {fd}< <( some command ) >>>     while IFS='' read -r line <&"${fd}"; do >>>       # do stuff >>>     done >>>     {fd}<&- >> >> A process-substitution open to the shell like this is effectively a >> one-ended coproc (though not in the jobs list), and it behaves reliably >> here because the user can count on {fd} to remain open even after the >> child process terminates. > > That exposes the fundamental difference. The procsub is essentially the > same kind of object as a coproc, but it exposes the pipe endpoint(s) as > filenames. The shell maintains open file descriptors to the child > process whose input or output it exposes as a FIFO or a file in /dev/fd, > since you have to have a reader and a writer. The shell closes the file > descriptor and, if necessary, removes the FIFO when the command for > which that was one of the word expansions (or a redirection) completes. > coprocs are designed to be longer-lived, and not associated with a > particular command or redirection. > > But the important piece is that $fd is not the file descriptor the shell > keeps open to the procsub -- it's a new file descriptor, dup'd from the > original by the redirection. Since it was used with `exec', it persists > until the script explicitly closes it. It doesn't matter when the shell > reaps the procsub and closes the file descriptor(s) -- the copy in $fd > remains until the script explicitly closes it. You might get read > returning failure at some point, but the shell won't close $fd for you. > > Since procsubs expand to filenames, even opening them is sufficient to > give you a new file descriptor (with the usual caveats about how > different OSs handle the /dev/fd device). > > You can do this yourself with coprocs right now, with no changes to the > shell. > > >> So, the user can determine when the coproc fds are no longer needed, >> whether that's when EOF is hit trying to read from the coproc, or >> whatever other condition. > > Duplicating the file descriptor will do that for you. Thanks for the explanation, that all makes sense. One technical difference in my mind is that doing this with a procsub is reliably safe: exec {fd}< <( some command ) since the expanded pathname (/dev/fd/N or the fifo alternative) will stay around for the duration of the exec command, so there is no concern about whether or not the dup redirection will succeed. Where with a coproc coproc X { potentially short lived command with output; } exec {xr}<&${X[0]} {xw}>&${X[1]} there is technically the possibility that the coproc can finish and be reaped before the exec command gets a chance to run and duplicate the fds. But, I also get what you said, that your design intent with coprocs was for them to be longer-lived, so immediate termination was not a concern. >> Personally I like the idea of 'closing' a coproc explicitly, but if >> it's a bother to add options to the coproc keyword, then I would say >> just let the user be responsible for closing the fds. Once the coproc >> has terminated _and_ the coproc's fds are closed, then the coproc can >> be deallocated. > > This is not backwards compatible. coprocs may be a little-used feature, > but you're adding a burden on the shell programmer that wasn't there > previously. Ok, so, I'm trying to imagine a case where this would cause any problems or extra work for such an existing user. Maybe you can provide an example from your own uses? (Where it would cause trouble or require adding code if the coproc deallocation were deferred until the fds are closed explicitly.) My first thought is that in the general case, the user doesn't really need to worry much about closing the fds for a terminated coproc anyway, as they will all be closed implicitly when the shell exits (either an interactive session or a script). [This is a common model for using coprocs, by the way, where an auxiliary coprocess is left open for the lifetime of the shell session and never explicitly closed. When the shell session exits, the fds are closed implicitly by the OS, and the coprocess sees EOF and exits on its own.] If a user expects the coproc variable to go away automatically, that user won't be accessing a still-open fd from that variable for anything. As for the forgotten-about half-closed pipe fds to the reaped coproc, I don't see how they could lead to deadlock, nor do I see how a shell programmer expecting the existing behavior would even attempt to access them at all, apart from programming error. The only potential issue I can imagine is if a script (or a user at an interactive prompt) would start _so_ many of these longer-lived coprocs (more than 500??), one at a time in succession, in a single shell session, that all the available fds would be exhausted. (That is, if the shell is not closing them automatically upon coproc termination.) Is that the backwards compatibility concern? Because otherwise it seems like stray fds for terminated coprocs would be benign. ... Meanwhile, the bash man page does not specify the shell's behavior for when a coproc terminates, so you might say there's room for interpretation and the new deferring behavior would not break any promises. And as it strikes me anyway, the real "burden" on the programmer with the existing behavior is having to make a copy of the coproc fds every time coproc X { cmd; } exec {xr}<&${X[0]} {xw}>&${X[1]} and use the copies instead of the originals in order to reliably read the final output from the coproc. ... Though I can hear Obi-Wan Kenobi gently saying to Luke, "You must do what you feel is right, of course." >>> What should it do to make sure that the variables don't hang around >>> with invalid file descriptors? >> >> First, just to be clear, the fds to/from the coproc pipes are not >> invalid when the coproc terminates (you can still read from them); they >> are only invalid after they are closed. > > That's only sort of true; writing to a pipe for which there is no reader > generates SIGPIPE, which is a fatal signal. Eh, when I talk about an fd being "invalid" here I mean "fd is not a valid file descriptor" (to use the language for EBADF from the man page for various system calls like read(2), write(2), close(2)). That's why I say the fds only become invalid after they are closed. And of course the primary use I care about is reading the final output from a completed coproc. (Which is generally after explicitly closing the write end.) The shell's read fd is still open, and can be read - it'll either return data, or return EOF, but that's not an error and not invalid. But since you mention it, writing to a broken pipe is still semantically meaningful also. (I would even say valid.) In the typical case it's expected behavior for a process to get killed when it attempts this and shell pipeline programming is designed with this in mind. But when you try to write to a terminated coproc when you have the shell automatically closing its write end, you get an unpredictable situation: - If the write happens after the coproc terminates but before the shell reaps it (and closes the fds), then you will generate a SIGPIPE, which by default gracefully kills the shell (as is normal for programs in a pipeline). - On the other hand, if the write happens after the shell reaps it and closes the fds, you will get a bad (invalid) file descriptor error message, without killing the shell. So even for write attempts, you introduce uncertain behavior by automatically closing the fds, when the normal, predictable, valid thing would be to die by SIGPIPE. (That's my take anyway.) > If the coproc terminates, the file descriptor to write to it becomes > invalid because it's implicitly closed. Yes, but the distinction I was making is that they do not become invalid when or because the coproc terminates, they become invalid when and because the shell closes them. (I'm saying that if the shell did not close them automatically, they would remain valid.) >> The surprising bit is when they become invalid unexpectedly (from the >> point of view of the user) because the shell closes them >> automatically, at the somewhat arbitrary timing when the coproc is >> reaped. > > No real difference from procsubs. I think I disagree? The difference is that the replacement string for a procsub (/dev/fd/N or a fifo path) remains valid for the command in question. (Right?) So the command in question can count on that path being valid. And if a procsub is used in an exec redirection, in order to extend its use for future commands (and the redirection is guaranteed to work, since it is guaranteed to be valid for that exec command), then the newly opened pipe fd will not be subject to automatic closing either. As far as I can tell there is no arbitrary timing for when the shell closes the fds for procsubs. As far as I can tell, it closes them when the command in question completes, and that's the end of the story. (There's no waiting for the timing of the background procsub process to complete.) >> Second, why is it a problem if the variables keep their (invalid) fds >> after closing them, if the user is the one that closed them anyway? >> >> Isn't this how it works with the auto-assigned fd redirections? > > Those are different file descriptors. > >> >>     $ exec {d}<. >>     $ echo $d >>     10 >>     $ exec {d}<&- >>     $ echo $d >>     10 > > The shell doesn't try to manage that object in the same way it does a > coproc. The user has explicitly indicated they want to manage it. Ok - your intention makes sense then. My reasoning was that auto-allocated redirection fds ( {x}>file or {x}>&$N ) are a way of asking the shell to automatically place fds in a variable for you to manage - and I imagined 'coproc X {...}' the same way. >> If the user has explicitly closed both fd ends for a coproc, it should >> not be a surprise to the user either way - whether the variable gets >> unset automatically, or whether it remains with (-1 -1). >> >> Since you are already unsetting the variable when the coproc is >> deallocated though, I'd say it's fine to keep doing that -- just don't >> deallocate the coproc before the user has closed both fds. > > It's just not backwards compatible. I might add an option to enable > that kind of management, but probably not for bash-5.3. Ah, nice idea. No hurry on my end - but yeah if you imagine the alternate behavior is somehow going to cause problems for existing uses (eg, the fd exhaustion mentioned earlier) then yeah a shell option for the deallocation behavior would at least be a way for users to get reliable behavior without the burden of duping the fds manually every time. > But there is a window there where a short-lived coprocess could be > reaped before you dup the file descriptors. Since the original intent of > the feature was that coprocs were a way to communicate with long-lived > processes -- something more persistent than a process substitution -- it > was not really a concern at the time. Makes sense. For me, working with coprocesses is largely a more flexible way of setting up interesting pipelines - which is where the shell excels. Once a 'pipework' is set up (I'm making up this word now to distinguish from a simple pipeline), the shell does not have to be in the middle shoveling data around - the external commands can do that on their own. So in my mind, thinking about the "lifetime" of a coproc is often not so different from thinking about the lifetime of a regular pipeline, once you set up the plumbing for your commands. The timing of individual parts of a pipeline finishing shouldn't really matter, as long as the pipes serve their purpose to deliver output from one part to the next. Thanks for your time, and happy Friday :) Carl ---1463761075561652934171284338325228--