unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Zachary Santer <zsanter@gmail.com>
To: Carl Edquist <edquist@cs.wisc.edu>
Cc: libc-alpha@sourceware.org, coreutils@gnu.org, p@draigbrady.com
Subject: Re: RFE: enable buffering on null-terminated data
Date: Mon, 11 Mar 2024 23:34:39 -0400	[thread overview]
Message-ID: <CABkLJULSTemarEOFXj+8gOb4t-+dLYhfdGD1OF0E+zVRo=WQ3A@mail.gmail.com> (raw)
In-Reply-To: <8c490a55-598a-adf6-67c2-eb2a6099620a@cs.wisc.edu>

[-- Attachment #1: Type: text/plain, Size: 3163 bytes --]

On Mon, Mar 11, 2024 at 7:54 AM Carl Edquist <edquist@cs.wisc.edu> wrote:
>
> (In my coprocess management library, I effectively run every coproc with
> --output=L by default, by eval'ing the output of 'env -i stdbuf -oL env',
> because most of the time for a coprocess, that's whats wanted/necessary.)

Surrounded by 'set -a' and 'set +a', I guess? Now that's interesting.
I just added that to a script I have that prints lines output by
another command that it runs, generally a build script, to the command
line, but updating the same line over and over again. I want to see if
it updates more continuously like that.

> ... Although, for your example coprocess use, where the shell both
> produces the input for the coproc and consumes its output, you might be
> able to simplify things by making the producer and consumer separate
> processes.  Then you could do a simpler 'producer | filter | consumer'
> without having to worry about buffering at all.  But if the producer and
> consumer need to be in the same process (eg they share state and are
> logically interdependent), then yeah that's where you need a coprocess for
> the filter.

Yeah, there's really no way to break what I'm doing into a standard pipeline.

> (Although given your time output, you might say the performance hit for
> unbuffered is not that huge.)

We see a somewhat bigger difference, at least proportionally, if we
get bash more or less out of the way. See command-buffering, attached.

Standard:
real    0m0.202s
user    0m0.280s
sys     0m0.076s
Line-buffered:
real    0m0.497s
user    0m0.374s
sys     0m0.545s
Unbuffered:
real    0m0.648s
user    0m0.544s
sys     0m0.702s

In coproc-buffering, unbuffered output was 21.7% slower than
line-buffered output, whereas here it's 30.4% slower.

Of course, using line-buffered or unbuffered output in this situation
makes no sense. Where it might be useful in a pipeline is when an
earlier command in a pipeline might only print things occasionally,
and you want those things transformed and printed to the command line
immediately.

> So ... again in theory I also feel like a null-terminated buffering mode
> for stdbuf(1) (and setbuf(3)) is kind of a missing feature.

My assumption is that line-buffering through setbuf(3) was implemented
for printing to the command line, so its availability to stdbuf(1) is
just a useful side effect.

In the BUGS section in the man page for stdbuf(1), we see:
On GLIBC platforms, specifying a buffer size, i.e., using fully
buffered mode will result in undefined operation.

If I'm not mistaken, then buffer modes other than 0 and L don't
actually work. Maybe I should count my blessings here. I don't know
what's going on in the background that would explain glibc not
supporting any of that, or stdbuf(1) implementing features that aren't
supported on the vast majority of systems where it will be installed.

> It may just
> be that nobody has actually had a real need for it.  (Yet?)

I imagine if anybody has, they just set --output=0 and moved on. Bash
scripts aren't the fastest thing in the world, anyway.

[-- Attachment #2: command-buffering --]
[-- Type: application/octet-stream, Size: 887 bytes --]

#!/usr/bin/env bash

set -o nounset -o noglob +o braceexpand
shopt -s lastpipe
export LC_ALL='C.UTF-8'

tab_spaces=8

sed_expr='s/[[:blank:]]+$//'

test=$'  \tLine with tabs\t why?\t  '

repeat="${1}"

for (( i = 0; i < repeat; i++ )); do
  printf '%s\n' "${test}"
done > tab-input.txt

printf '%s' "Standard:"
time {
  sed --binary --regexp-extended --expression="${sed_expr}" < tab-input.txt |
    expand --tabs="${tab_spaces}" > /dev/null
}

printf '%s' "Line-buffered:"
time {
  stdbuf --output=L -- \
      sed --binary --regexp-extended --expression="${sed_expr}" < tab-input.txt |
    stdbuf --output=L -- \
        expand --tabs="${tab_spaces}" > /dev/null
}

printf '%s' "Unbuffered:"
time {
  stdbuf --output=0 -- \
      sed --binary --regexp-extended --expression="${sed_expr}" < tab-input.txt |
    stdbuf --output=0 -- \
        expand --tabs="${tab_spaces}" > /dev/null
}

  parent reply	other threads:[~2024-03-12  3:35 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CABkLJULa8c0zr1BkzWLTpAxHBcpb15Xms0-Q2OOVCHiAHuL0uA@mail.gmail.com>
     [not found] ` <9831afe6-958a-fbd3-9434-05dd0c9b602a@draigBrady.com>
2024-03-10 15:29   ` RFE: enable buffering on null-terminated data Zachary Santer
2024-03-10 20:36     ` Carl Edquist
2024-03-11  3:48       ` Zachary Santer
2024-03-11 11:54         ` Carl Edquist
2024-03-11 15:12           ` Examples of concurrent coproc usage? Zachary Santer
2024-03-14  9:58             ` Carl Edquist
2024-03-17 19:40               ` Zachary Santer
2024-04-01 19:24               ` Chet Ramey
2024-04-01 19:31                 ` Chet Ramey
2024-04-02 16:22                   ` Carl Edquist
2024-04-03 13:54                     ` Chet Ramey
2024-04-03 14:32               ` Chet Ramey
2024-04-03 17:19                 ` Zachary Santer
2024-04-08 15:07                   ` Chet Ramey
2024-04-09  3:44                     ` Zachary Santer
2024-04-13 18:45                       ` Chet Ramey
2024-04-14  2:09                         ` Zachary Santer
2024-04-04 12:52                 ` Carl Edquist
2024-04-04 23:23                   ` Martin D Kealey
2024-04-08 19:50                     ` Chet Ramey
2024-04-09 14:46                       ` Zachary Santer
2024-04-13 18:51                         ` Chet Ramey
2024-04-09 15:58                       ` Carl Edquist
2024-04-13 20:10                         ` Chet Ramey
2024-04-14 18:43                           ` Zachary Santer
2024-04-15 18:55                             ` Chet Ramey
2024-04-15 17:01                           ` Carl Edquist
2024-04-17 14:20                             ` Chet Ramey
2024-04-20 22:04                               ` Carl Edquist
2024-04-22 16:06                                 ` Chet Ramey
2024-04-27 16:56                                   ` Carl Edquist
2024-04-28 17:50                                     ` Chet Ramey
2024-04-08 16:21                   ` Chet Ramey
2024-04-12 16:49                     ` Carl Edquist
2024-04-16 15:48                       ` Chet Ramey
2024-04-20 23:11                         ` Carl Edquist
2024-04-22 16:12                           ` Chet Ramey
2024-04-17 14:37               ` Chet Ramey
2024-04-20 22:04                 ` Carl Edquist
2024-03-12  3:34           ` Zachary Santer [this message]
2024-03-14 14:15             ` RFE: enable buffering on null-terminated data Carl Edquist
2024-03-18  0:12               ` Zachary Santer
2024-03-19  5:24                 ` Kaz Kylheku
2024-03-19 12:50                   ` Zachary Santer
2024-03-20  8:55                     ` Carl Edquist
2024-04-19  0:16                       ` Modify buffering of standard streams via environment variables (not LD_PRELOAD)? Zachary Santer
2024-04-19  9:32                         ` Pádraig Brady
2024-04-19 11:36                           ` Zachary Santer
2024-04-19 12:26                             ` Pádraig Brady
2024-04-19 16:11                               ` Zachary Santer
2024-04-20 16:00                         ` Carl Edquist
2024-04-20 20:00                           ` Zachary Santer
2024-04-20 21:45                             ` Carl Edquist

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABkLJULSTemarEOFXj+8gOb4t-+dLYhfdGD1OF0E+zVRo=WQ3A@mail.gmail.com' \
    --to=zsanter@gmail.com \
    --cc=coreutils@gnu.org \
    --cc=edquist@cs.wisc.edu \
    --cc=libc-alpha@sourceware.org \
    --cc=p@draigbrady.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).