On Sun, Mar 10, 2024 at 4:36 PM Carl Edquist wrote: > > Hi Zack, > > This sounds like a potentially useful feature (it'd probably belong with a > corresponding new buffer mode in setbuf(3)) ... > > > Filenames should be passed between utilities in a null-terminated > > fashion, because the null byte is the only byte that can't appear within > > one. > > Out of curiosity, do you have an example command line for your use case? My use for 'stdbuf --output=L' is to be able to run a command within a bash coprocess. (Really, a background process communicating with the parent process through FIFOs, since Bash prints a warning message if you try to run more than one coprocess at a time. Shouldn't make a difference here.) See coproc-buffering, attached. Without making the command's output either line-buffered or unbuffered, what I'm doing there would deadlock. I feed one line in and then expect to be able to read a transformed line immediately. If that transformed line is stuck in a buffer that's still waiting to be filled, then nothing happens. I swear doing this actually makes sense in my application. $ ./coproc-buffering 100000 Line-buffered: real 0m17.795s user 0m6.234s sys 0m11.469s Unbuffered: real 0m21.656s user 0m6.609s sys 0m14.906s When I initially implemented this thing, I felt lucky that the data I was passing in were lines ending in newlines, and not null-terminated, since my script gets to benefit from 'stdbuf --output=L'. Truth be told, I don't currently have a need for --output=N. Of course, sed and all sorts of other Linux command-line tools can produce or handle null-terminated data. > > If I want to buffer output data on null bytes, the closest I can get is > > 'stdbuf --output=0', which doesn't buffer at all. This is pretty > > inefficient. > > I'm just thinking that find(1), for instance, will end up calling write(2) > exactly once per filename (-print or -print0) if run under stdbuf > unbuffered, which is the same as you'd get with a corresponding stdbuf > line-buffered mode (newline or null-terminated). > > It seems that where line buffering improves performance over unbuffered is > when there are several calls to (for example) printf(3) in constructing a > single line. find(1), and some filters like grep(1), will write a line at > a time in unbuffered mode, and thus don't seem to benefit at all from line > buffering. On the other hand, cut(1) appears to putchar(3) a byte at a > time, which in unbuffered mode will (like you say) be pretty inefficient. > > So, depending on your use case, a new null-terminated line buffered option > may or may not actually improve efficiency over unbuffered mode. I hadn't considered that. > You can run your commands under strace like > > stdbuf --output=X strace -c -ewrite command ... | ... > > to count the number of actual writes for each buffering mode. I'm running bash in MSYS2 on a Windows machine, so hopefully that doesn't invalidate any assumptions. Now setting up strace around the things within the coprocess, and only passing in one line, I now have coproc-buffering-strace, attached. Giving the argument 'L', both sed and expand call write() once. Giving the argument 0, sed calls write() twice and expand calls it a bunch of times, seemingly once for each character it outputs. So I guess that's it. $ ./coproc-buffering-strace L | Line with tabs why?| $ grep -c -F 'write:' sed-trace.txt expand-trace.txt sed-trace.txt:1 expand-trace.txt:1 $ ./coproc-buffering-strace 0 | Line with tabs why?| $ grep -c -F 'write:' sed-trace.txt expand-trace.txt sed-trace.txt:2 expand-trace.txt:30 > Carl > > > PS, "find -printf" recognizes a '\c' escape to flush the output, in case > that helps. So "find -printf '%p\0\c'" would, for instance, already > behave the same as "stdbuf --output=N find -print0" with the new stdbuf > output mode you're suggesting. > > (Though again, this doesn't actually seem to be any more efficient than > running "stdbuf --output=0 find -print0") > > On Sun, 10 Mar 2024, Zachary Santer wrote: > > > Was "stdbuf feature request - line buffering but for null-terminated data" > > > > See below. > > > > On Sun, Mar 10, 2024 at 5:38 AM Pádraig Brady wrote: > >> > >> On 09/03/2024 16:30, Zachary Santer wrote: > >>> 'stdbuf --output=L' will line-buffer the command's output stream. > >>> Pretty useful, but that's looking for newlines. Filenames should be > >>> passed between utilities in a null-terminated fashion, because the > >>> null byte is the only byte that can't appear within one. > >>> > >>> If I want to buffer output data on null bytes, the closest I can get > >>> is 'stdbuf --output=0', which doesn't buffer at all. This is pretty > >>> inefficient. > >>> > >>> 0 means unbuffered, and Z is already taken for, I guess, zebibytes. > >>> --output=N, then? > >>> > >>> Would this require a change to libc implementations, or is it possible now? > >> > >> This does seem like useful functionality, > >> but it would require support for libc implementations first. > >> > >> cheers, > >> Pádraig > > > >