From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on starla X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 436501F44D for ; Mon, 11 Mar 2024 03:48:53 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=T5BKrK67; dkim-atps=neutral Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A68933857703 for ; Mon, 11 Mar 2024 03:48:51 +0000 (GMT) Received: from mail-lj1-x22c.google.com (mail-lj1-x22c.google.com [IPv6:2a00:1450:4864:20::22c]) by sourceware.org (Postfix) with ESMTPS id 7F58E3858422 for ; Mon, 11 Mar 2024 03:48:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7F58E3858422 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7F58E3858422 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::22c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710128912; cv=none; b=qOj4Nu17zLRo3wJYI8IYES4lh7IYqrfVD3Jw3tQ7oL/PTwQdHxj3ruKyMMHu2sUCxKPOPFL9/+Xdi28oCmRizpgWhYLVFSMd4URj05NdXdqylnaM/zuz5+5z9GT0eavtdSgb7892vxjWsRWn2hegkk9l4VfmMd90PC12Nxhxdjo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710128912; c=relaxed/simple; bh=P3IGy3znkPW3IPC+5D+4qO96AM6NzYEPlvlFop4DP2k=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=xFGZTQeC/oaFQIrXJL66W8pXSGVhZPcc98bTfi1MVvcu3VBLShRxhzsTIDkUWyNMnUKTlr5J/aogvFOd5KFWhHNkQVByw06rPZ8haOXmGS/C/MA0XizLP38z3mj0dI0MiKjLA+DJ0XbCiwBK9V+27zZ+Lc/roXzv6fXg3G17luE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lj1-x22c.google.com with SMTP id 38308e7fff4ca-2d311081954so45756871fa.2 for ; Sun, 10 Mar 2024 20:48:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710128907; x=1710733707; darn=sourceware.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=XHHa9aAXlfsSOKd/fg57wR9xczaOMZAmU6/22c5Eti4=; b=T5BKrK67q5P1YaHdY4YPhQrdXcgpx1GW75dBer4a0LQcEwm95a2/EJJ1t4DODGp5WO vtlFCnWgEwD0zAI00VWy3z25kN8ET3w660SSfvmU1Y1zBjqNpF1MgqxStZLxKQ05oS1W kqX0yHctRKIqyVa+wMJeqDHSTn+AO7VeS+dqfsel4ImQCV2HjiqjWMBpdC/uwWQmkP5l 2g9DhyRq5+7P0VxwVXTav1au0/qpy6sCwZ+si0Tp1MW2ujjDvmNeSYFPtdjzsMXYcnv6 Ng9W27t87e8GXGHRSmOPLHKmJ2578Ojz8aDbCvrsjN0opwHboZ8l0BCPgt3UBWW+MiSa 6uqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710128907; x=1710733707; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=XHHa9aAXlfsSOKd/fg57wR9xczaOMZAmU6/22c5Eti4=; b=pEnOymtuDubs1Oi8Wj8+rrQzMTVwmk/jEE8boEmlV27htnkr3wN2Y6soOBbnqVFye5 QiJSnX8OYUzDS0PpJMkUICpIT6B7jleuLRmlJr/srMh7p24H5xZJPNc906YECACDFHS/ kebm+NFh5rQ0xvn4uQl9891C8a7DpW4k3r7q9v7Dtxzu32o8AWLM2+qMCr1BpEDyshuZ 5wsQRCrkVYSC63qzAd0MeImeWPh+UPSHwdk/G6j0CGMjmsSq9ijrxYMS1K3jBT4l9/1G hIlydjHPCbSFmu5yzYUIjQOCt2d5gdaOMxbcutLiEzqVN3va7c5ThNh4DCKrT4aBZYxl TBng== X-Gm-Message-State: AOJu0Yw/zAYnIOqFGxsBJd01WpSJO/KXLord2PtqKT5XT0ZzTjiMGD95 kBCf6paz5leR8q1m3crRal+iWQ1nnn+fc4A9+GXAbnRcd2BqM6jGiRUWvpaTLmQpd8JZj4oeY5K A9ZVPkeZsZNDZq9g6VkifsipXK9Y= X-Google-Smtp-Source: AGHT+IEDUPnK5hUBfPVAqivDdHhcqc0ZjPKCUYQpdTBP4D0In2nRSbbvGpkCJQz6pv3OpfbbZaFmw5QrYPubFTquOww= X-Received: by 2002:a05:6512:480f:b0:513:75a5:2da8 with SMTP id eo15-20020a056512480f00b0051375a52da8mr3358496lfb.27.1710128906680; Sun, 10 Mar 2024 20:48:26 -0700 (PDT) MIME-Version: 1.0 References: <9831afe6-958a-fbd3-9434-05dd0c9b602a@draigBrady.com> <317fe0e2-8cf9-d4ac-ed56-e6ebcc2baa55@cs.wisc.edu> In-Reply-To: <317fe0e2-8cf9-d4ac-ed56-e6ebcc2baa55@cs.wisc.edu> From: Zachary Santer Date: Sun, 10 Mar 2024 23:48:12 -0400 Message-ID: Subject: Re: RFE: enable buffering on null-terminated data To: Carl Edquist Cc: libc-alpha@sourceware.org, coreutils@gnu.org, p@draigbrady.com Content-Type: multipart/mixed; boundary="0000000000000167e706135a6a37" X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org --0000000000000167e706135a6a37 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Mar 10, 2024 at 4:36=E2=80=AFPM Carl Edquist = wrote: > > Hi Zack, > > This sounds like a potentially useful feature (it'd probably belong with = a > corresponding new buffer mode in setbuf(3)) ... > > > Filenames should be passed between utilities in a null-terminated > > fashion, because the null byte is the only byte that can't appear withi= n > > one. > > Out of curiosity, do you have an example command line for your use case? My use for 'stdbuf --output=3DL' is to be able to run a command within a bash coprocess. (Really, a background process communicating with the parent process through FIFOs, since Bash prints a warning message if you try to run more than one coprocess at a time. Shouldn't make a difference here.) See coproc-buffering, attached. Without making the command's output either line-buffered or unbuffered, what I'm doing there would deadlock. I feed one line in and then expect to be able to read a transformed line immediately. If that transformed line is stuck in a buffer that's still waiting to be filled, then nothing happens. I swear doing this actually makes sense in my application. $ ./coproc-buffering 100000 Line-buffered: real 0m17.795s user 0m6.234s sys 0m11.469s Unbuffered: real 0m21.656s user 0m6.609s sys 0m14.906s When I initially implemented this thing, I felt lucky that the data I was passing in were lines ending in newlines, and not null-terminated, since my script gets to benefit from 'stdbuf --output=3DL'. Truth be told, I don't currently have a need for --output=3DN. Of course, sed and all sorts of other Linux command-line tools can produce or handle null-terminated data. > > If I want to buffer output data on null bytes, the closest I can get is > > 'stdbuf --output=3D0', which doesn't buffer at all. This is pretty > > inefficient. > > I'm just thinking that find(1), for instance, will end up calling write(2= ) > exactly once per filename (-print or -print0) if run under stdbuf > unbuffered, which is the same as you'd get with a corresponding stdbuf > line-buffered mode (newline or null-terminated). > > It seems that where line buffering improves performance over unbuffered i= s > when there are several calls to (for example) printf(3) in constructing a > single line. find(1), and some filters like grep(1), will write a line a= t > a time in unbuffered mode, and thus don't seem to benefit at all from lin= e > buffering. On the other hand, cut(1) appears to putchar(3) a byte at a > time, which in unbuffered mode will (like you say) be pretty inefficient. > > So, depending on your use case, a new null-terminated line buffered optio= n > may or may not actually improve efficiency over unbuffered mode. I hadn't considered that. > You can run your commands under strace like > > stdbuf --output=3DX strace -c -ewrite command ... | ... > > to count the number of actual writes for each buffering mode. I'm running bash in MSYS2 on a Windows machine, so hopefully that doesn't invalidate any assumptions. Now setting up strace around the things within the coprocess, and only passing in one line, I now have coproc-buffering-strace, attached. Giving the argument 'L', both sed and expand call write() once. Giving the argument 0, sed calls write() twice and expand calls it a bunch of times, seemingly once for each character it outputs. So I guess that's it. $ ./coproc-buffering-strace L | Line with tabs why?| $ grep -c -F 'write:' sed-trace.txt expand-trace.txt sed-trace.txt:1 expand-trace.txt:1 $ ./coproc-buffering-strace 0 | Line with tabs why?| $ grep -c -F 'write:' sed-trace.txt expand-trace.txt sed-trace.txt:2 expand-trace.txt:30 > Carl > > > PS, "find -printf" recognizes a '\c' escape to flush the output, in case > that helps. So "find -printf '%p\0\c'" would, for instance, already > behave the same as "stdbuf --output=3DN find -print0" with the new stdbu= f > output mode you're suggesting. > > (Though again, this doesn't actually seem to be any more efficient than > running "stdbuf --output=3D0 find -print0") > > On Sun, 10 Mar 2024, Zachary Santer wrote: > > > Was "stdbuf feature request - line buffering but for null-terminated da= ta" > > > > See below. > > > > On Sun, Mar 10, 2024 at 5:38=E2=80=AFAM P=C3=A1draig Brady wrote: > >> > >> On 09/03/2024 16:30, Zachary Santer wrote: > >>> 'stdbuf --output=3DL' will line-buffer the command's output stream. > >>> Pretty useful, but that's looking for newlines. Filenames should be > >>> passed between utilities in a null-terminated fashion, because the > >>> null byte is the only byte that can't appear within one. > >>> > >>> If I want to buffer output data on null bytes, the closest I can get > >>> is 'stdbuf --output=3D0', which doesn't buffer at all. This is pretty > >>> inefficient. > >>> > >>> 0 means unbuffered, and Z is already taken for, I guess, zebibytes. > >>> --output=3DN, then? > >>> > >>> Would this require a change to libc implementations, or is it possibl= e now? > >> > >> This does seem like useful functionality, > >> but it would require support for libc implementations first. > >> > >> cheers, > >> P=C3=A1draig > > > > --0000000000000167e706135a6a37 Content-Type: application/octet-stream; name=coproc-buffering Content-Disposition: attachment; filename=coproc-buffering Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_ltmdflmy0 IyEvdXNyL2Jpbi9lbnYgYmFzaAoKc2V0IC1vIG5vdW5zZXQgLW8gbm9nbG9iICtvIGJyYWNlZXhw YW5kCnNob3B0IC1zIGxhc3RwaXBlCmV4cG9ydCBMQ19BTEw9J0MuVVRGLTgnCgp0YWJfc3BhY2Vz PTgKCnNlZF9leHByPSdzL1tbOmJsYW5rOl1dKyQvLycKCnRlc3Q9JCcgIFx0TGluZSB3aXRoIHRh YnNcdCB3aHk/XHQgICcKCnJlcGVhdD0iJHsxfSIKCmNvcHJvYyBsaW5lX2J1ZmZlcmVkIHsKICBz dGRidWYgLS1vdXRwdXQ9TCAtLSBcCiAgICAgIHNlZCAtLWJpbmFyeSAtLXJlZ2V4cC1leHRlbmRl ZCAtLWV4cHJlc3Npb249IiR7c2VkX2V4cHJ9IiB8CiAgICBzdGRidWYgLS1vdXRwdXQ9TCAtLSBc CiAgICAgICAgZXhwYW5kIC0tdGFicz0iJHt0YWJfc3BhY2VzfSIKfQoKcHJpbnRmICclcycgIkxp bmUtYnVmZmVyZWQ6Igp0aW1lIHsKICBmb3IgKCggaSA9IDA7IGkgPCByZXBlYXQ7IGkrKyApKTsg ZG8KICAgIHByaW50ZiAnJXNcbicgIiR7dGVzdH0iID4mIiR7bGluZV9idWZmZXJlZFsxXX0iCiAg ICBJRlM9JycgcmVhZCAtciBsaW5lIDwmIiR7bGluZV9idWZmZXJlZFswXX0iCiAgICBwcmludGYg J3wlc3xcbicgIiR7bGluZX0iID4gL2Rldi9udWxsCiAgZG9uZQp9CgpleGVjIHtsaW5lX2J1ZmZl cmVkWzBdfTwmLSB7bGluZV9idWZmZXJlZFsxXX0+Ji0Kd2FpdCAiJHtsaW5lX2J1ZmZlcmVkX1BJ RH0iCgpjb3Byb2MgdW5idWZmZXJlZCB7CiAgc3RkYnVmIC0tb3V0cHV0PTAgLS0gXAogICAgICBz ZWQgLS1iaW5hcnkgLS1yZWdleHAtZXh0ZW5kZWQgLS1leHByZXNzaW9uPSIke3NlZF9leHByfSIg fAogICAgc3RkYnVmIC0tb3V0cHV0PTAgLS0gXAogICAgICAgIGV4cGFuZCAtLXRhYnM9IiR7dGFi X3NwYWNlc30iCn0KCnByaW50ZiAnJXMnICJVbmJ1ZmZlcmVkOiIKdGltZSB7CiAgZm9yICgoIGkg PSAwOyBpIDwgcmVwZWF0OyBpKysgKSk7IGRvCiAgICBwcmludGYgJyVzXG4nICIke3Rlc3R9IiA+ JiIke3VuYnVmZmVyZWRbMV19IgogICAgSUZTPScnIHJlYWQgLXIgbGluZSA8JiIke3VuYnVmZmVy ZWRbMF19IgogICAgcHJpbnRmICd8JXN8XG4nICIke2xpbmV9IiA+IC9kZXYvbnVsbAogIGRvbmUK fQoKZXhlYyB7dW5idWZmZXJlZFswXX08Ji0ge3VuYnVmZmVyZWRbMV19PiYtCndhaXQgIiR7dW5i dWZmZXJlZF9QSUR9Igo= --0000000000000167e706135a6a37 Content-Type: application/octet-stream; name=coproc-buffering-strace Content-Disposition: attachment; filename=coproc-buffering-strace Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_ltmdgrw81 IyEvdXNyL2Jpbi9lbnYgYmFzaAoKc2V0IC1vIG5vdW5zZXQgLW8gbm9nbG9iICtvIGJyYWNlZXhw YW5kCnNob3B0IC1zIGxhc3RwaXBlCmV4cG9ydCBMQ19BTEw9J0MuVVRGLTgnCgp0YWJfc3BhY2Vz PTgKCnNlZF9leHByPSdzL1tbOmJsYW5rOl1dKyQvLycKCnRlc3Q9JCcgIFx0TGluZSB3aXRoIHRh YnNcdCB3aHk/XHQgICcKCmJ1ZmZlcl9zZXR0aW5nPSIkezF9IgoKY29wcm9jIGJ1ZmZlcl90ZXN0 IHsKICBzdGRidWYgLS1vdXRwdXQ9IiR7YnVmZmVyX3NldHRpbmd9IiAtLSBcCiAgICAgIHN0cmFj ZSAtZSAtbyBzZWQtdHJhY2UudHh0IFwKICAgICAgc2VkIC0tYmluYXJ5IC0tcmVnZXhwLWV4dGVu ZGVkIC0tZXhwcmVzc2lvbj0iJHtzZWRfZXhwcn0iIHwKICAgc3RkYnVmIC0tb3V0cHV0PSIke2J1 ZmZlcl9zZXR0aW5nfSIgLS0gXAogICAgICAgc3RyYWNlIC1lIC1vIGV4cGFuZC10cmFjZS50eHQg XAogICAgICAgZXhwYW5kIC0tdGFicz0iJHt0YWJfc3BhY2VzfSIKfQoKcHJpbnRmICclc1xuJyAi JHt0ZXN0fSIgPiYiJHtidWZmZXJfdGVzdFsxXX0iCklGUz0nJyByZWFkIC1yIGxpbmUgPCYiJHti dWZmZXJfdGVzdFswXX0iCnByaW50ZiAnfCVzfFxuJyAiJHtsaW5lLy8kJ1x0Jy9UQUJ9IgoKZXhl YyB7YnVmZmVyX3Rlc3RbMF19PCYtIHtidWZmZXJfdGVzdFsxXX0+Ji0Kd2FpdCAiJHtidWZmZXJf dGVzdF9QSUR9Igo= --0000000000000167e706135a6a37--