From: Carl Edquist <edquist@cs.wisc.edu>
To: Zachary Santer <zsanter@gmail.com>
Cc: libc-alpha@sourceware.org, coreutils@gnu.org, p@draigbrady.com
Subject: Re: RFE: enable buffering on null-terminated data
Date: Sun, 10 Mar 2024 15:36:32 -0500 (CDT) [thread overview]
Message-ID: <317fe0e2-8cf9-d4ac-ed56-e6ebcc2baa55@cs.wisc.edu> (raw)
In-Reply-To: <CABkLJUKdbwP-7Bw5PTXGDh5o9qpX14=7TCxSgd5v+1mDfdoEpQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2952 bytes --]
Hi Zack,
This sounds like a potentially useful feature (it'd probably belong with a
corresponding new buffer mode in setbuf(3)) ...
> Filenames should be passed between utilities in a null-terminated
> fashion, because the null byte is the only byte that can't appear within
> one.
Out of curiosity, do you have an example command line for your use case?
> If I want to buffer output data on null bytes, the closest I can get is
> 'stdbuf --output=0', which doesn't buffer at all. This is pretty
> inefficient.
I'm just thinking that find(1), for instance, will end up calling write(2)
exactly once per filename (-print or -print0) if run under stdbuf
unbuffered, which is the same as you'd get with a corresponding stdbuf
line-buffered mode (newline or null-terminated).
It seems that where line buffering improves performance over unbuffered is
when there are several calls to (for example) printf(3) in constructing a
single line. find(1), and some filters like grep(1), will write a line at
a time in unbuffered mode, and thus don't seem to benefit at all from line
buffering. On the other hand, cut(1) appears to putchar(3) a byte at a
time, which in unbuffered mode will (like you say) be pretty inefficient.
So, depending on your use case, a new null-terminated line buffered option
may or may not actually improve efficiency over unbuffered mode.
You can run your commands under strace like
stdbuf --output=X strace -c -ewrite command ... | ...
to count the number of actual writes for each buffering mode.
Carl
PS, "find -printf" recognizes a '\c' escape to flush the output, in case
that helps. So "find -printf '%p\0\c'" would, for instance, already
behave the same as "stdbuf --output=N find -print0" with the new stdbuf
output mode you're suggesting.
(Though again, this doesn't actually seem to be any more efficient than
running "stdbuf --output=0 find -print0")
On Sun, 10 Mar 2024, Zachary Santer wrote:
> Was "stdbuf feature request - line buffering but for null-terminated data"
>
> See below.
>
> On Sun, Mar 10, 2024 at 5:38 AM Pádraig Brady <P@draigbrady.com> wrote:
>>
>> On 09/03/2024 16:30, Zachary Santer wrote:
>>> 'stdbuf --output=L' will line-buffer the command's output stream.
>>> Pretty useful, but that's looking for newlines. Filenames should be
>>> passed between utilities in a null-terminated fashion, because the
>>> null byte is the only byte that can't appear within one.
>>>
>>> If I want to buffer output data on null bytes, the closest I can get
>>> is 'stdbuf --output=0', which doesn't buffer at all. This is pretty
>>> inefficient.
>>>
>>> 0 means unbuffered, and Z is already taken for, I guess, zebibytes.
>>> --output=N, then?
>>>
>>> Would this require a change to libc implementations, or is it possible now?
>>
>> This does seem like useful functionality,
>> but it would require support for libc implementations first.
>>
>> cheers,
>> Pádraig
>
>
next prev parent reply other threads:[~2024-03-10 20:37 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CABkLJULa8c0zr1BkzWLTpAxHBcpb15Xms0-Q2OOVCHiAHuL0uA@mail.gmail.com>
[not found] ` <9831afe6-958a-fbd3-9434-05dd0c9b602a@draigBrady.com>
2024-03-10 15:29 ` RFE: enable buffering on null-terminated data Zachary Santer
2024-03-10 20:36 ` Carl Edquist [this message]
2024-03-11 3:48 ` Zachary Santer
2024-03-11 11:54 ` Carl Edquist
2024-03-11 15:12 ` Examples of concurrent coproc usage? Zachary Santer
2024-03-14 9:58 ` Carl Edquist
2024-03-17 19:40 ` Zachary Santer
2024-04-01 19:24 ` Chet Ramey
2024-04-01 19:31 ` Chet Ramey
2024-04-02 16:22 ` Carl Edquist
2024-04-03 13:54 ` Chet Ramey
2024-04-03 14:32 ` Chet Ramey
2024-04-03 17:19 ` Zachary Santer
2024-04-08 15:07 ` Chet Ramey
2024-04-09 3:44 ` Zachary Santer
2024-04-13 18:45 ` Chet Ramey
2024-04-14 2:09 ` Zachary Santer
2024-04-04 12:52 ` Carl Edquist
2024-04-04 23:23 ` Martin D Kealey
2024-04-08 19:50 ` Chet Ramey
2024-04-09 14:46 ` Zachary Santer
2024-04-13 18:51 ` Chet Ramey
2024-04-09 15:58 ` Carl Edquist
2024-04-13 20:10 ` Chet Ramey
2024-04-14 18:43 ` Zachary Santer
2024-04-15 18:55 ` Chet Ramey
2024-04-15 17:01 ` Carl Edquist
2024-04-17 14:20 ` Chet Ramey
2024-04-20 22:04 ` Carl Edquist
2024-04-22 16:06 ` Chet Ramey
2024-04-27 16:56 ` Carl Edquist
2024-04-28 17:50 ` Chet Ramey
2024-04-08 16:21 ` Chet Ramey
2024-04-12 16:49 ` Carl Edquist
2024-04-16 15:48 ` Chet Ramey
2024-04-20 23:11 ` Carl Edquist
2024-04-22 16:12 ` Chet Ramey
2024-04-17 14:37 ` Chet Ramey
2024-04-20 22:04 ` Carl Edquist
2024-03-12 3:34 ` RFE: enable buffering on null-terminated data Zachary Santer
2024-03-14 14:15 ` Carl Edquist
2024-03-18 0:12 ` Zachary Santer
2024-03-19 5:24 ` Kaz Kylheku
2024-03-19 12:50 ` Zachary Santer
2024-03-20 8:55 ` Carl Edquist
2024-04-19 0:16 ` Modify buffering of standard streams via environment variables (not LD_PRELOAD)? Zachary Santer
2024-04-19 9:32 ` Pádraig Brady
2024-04-19 11:36 ` Zachary Santer
2024-04-19 12:26 ` Pádraig Brady
2024-04-19 16:11 ` Zachary Santer
2024-04-20 16:00 ` Carl Edquist
2024-04-20 20:00 ` Zachary Santer
2024-04-20 21:45 ` Carl Edquist
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/libc/involved.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=317fe0e2-8cf9-d4ac-ed56-e6ebcc2baa55@cs.wisc.edu \
--to=edquist@cs.wisc.edu \
--cc=coreutils@gnu.org \
--cc=libc-alpha@sourceware.org \
--cc=p@draigbrady.com \
--cc=zsanter@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).