bug-coreutils@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
From: "Pádraig Brady" <P@draigBrady.com>
To: Stephane Chazelas <stephane@chazelas.org>, 61300@debbugs.gnu.org
Subject: bug#61300: wc -c doesn't advance stdin position when it's a regular file
Date: Sun, 5 Feb 2023 19:59:58 +0000	[thread overview]
Message-ID: <3fddd0c6-7d8f-631e-f37a-f2635ba0268e@draigBrady.com> (raw)
In-Reply-To: <20230205182728.5i2oi23purlzp6jj@chazelas.org>

[-- Attachment #1: Type: text/plain, Size: 1634 bytes --]

On 05/02/2023 18:27, Stephane Chazelas wrote:
> "wc -c" without filename arguments is meant to read stdin til
> EOF and report the number of bytes it has read.
> 
> When stdin is on a regular file, GNU wc has that optimisation
> whereby it skips the reading, does a pos = lseek(0,0,SEEK_CUR)
> to find out its current position within the file, fstat(0) and
> reports st_size - pos (assuming st_size > pos).
> 
> However, it does not move the position to the end of the file.
> That means for instance that:
> 
> $ echo test > file
> $ { wc -c; wc -c; } < file
> 5
> 5
> 
> Instead of 5, then 0:
> 
> $ { wc -c; cat; } < file
> 5
> test
> 
> So the optimisation is incomplete.
> 
> It also reports the size of the file even if it could not possibly read it
> because it's not open in read mode:
> 
> { wc -c; } 0>> file
> 5
> 
> IMO, it should only do the optimisation if
> - fcntl(F_GETFL) to check that the file is opened in O_RDONLY or O_RDWR
> - current checks for /proc /sys-like filesystems
> - pos > st_size
> - lseek(0,st_size,SEEK_POS) is successful.
> 
> (that leaves a race window above where it could move the cursor
> backward, but I would think that can be ignored as if something
> else reads at the same time, there's not much we can expect
> anyway).

Yes I agree.

Adjusting would also avoid the following inconsistencies:

$ { wc -c; wc -c; } < file
5
5

$ { wc -l; wc -l; } < file
1
0

$ truncate -s $(getconf PAGESIZE) file
$ { wc -c; wc -c; } < file
4096
0

Hopefully the attached addresses this.
Note it doesn't add the constraint on the input being readable,
which I'll think a bit more about.

cheers,
Pádraig

[-- Attachment #2: wc-update-offset.patch --]
[-- Type: text/x-patch, Size: 2313 bytes --]

From 42f72ec424e7eecd6b56c5b6fca5f377ff73795b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <P@draigBrady.com>
Date: Sun, 5 Feb 2023 19:52:31 +0000
Subject: [PATCH] wc: ensure we update file offset

* src/wc.c (wc): Update the offset when not reading,
and do read if we can't update the offset.
* tests/misc/wc-proc.sh: Add a test case.
* NEWS: Mention the bug fix.
Fixes https://bugs.gnu.org/61300
---
 NEWS                  |  4 ++++
 src/wc.c              |  5 ++++-
 tests/misc/wc-proc.sh | 12 ++++++++++++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index b3cde4a01..1cea8cc32 100644
--- a/NEWS
+++ b/NEWS
@@ -57,6 +57,10 @@ GNU coreutils NEWS                                    -*- outline -*-
   sized files larger than SIZE_MAX.
   [bug introduced in coreutils-8.24]
 
+  `wc -c` will again correctly update the read offset of inputs.
+  Previously it deduced the size of inputs while leaving the offset unchanged.
+  [bug introduced in coreutils-8.27]
+
 ** Changes in behavior
 
   Programs now support the new Ronna (R), and Quetta (Q) SI prefixes,
diff --git a/src/wc.c b/src/wc.c
index 5f3ef6eee..de04612e9 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -446,7 +446,10 @@ wc (int fd, char const *file_x, struct fstatus *fstatus, off_t current_pos)
                  beyond the end of the file.  As in the example above.  */
 
               bytes = end_pos < current_pos ? 0 : end_pos - current_pos;
-              skip_read = true;
+              if (bytes && 0 <= lseek (fd, bytes, SEEK_CUR))
+                skip_read = true;
+              else
+                bytes = 0;
             }
           else
             {
diff --git a/tests/misc/wc-proc.sh b/tests/misc/wc-proc.sh
index 5eb43b982..2b5026405 100755
--- a/tests/misc/wc-proc.sh
+++ b/tests/misc/wc-proc.sh
@@ -42,6 +42,18 @@ cat <<\EOF > exp
 EOF
 compare exp out || fail=1
 
+# Ensure we update the offset even when not reading,
+# which wasn't the case from coreutils-8.27 to coreutils-9.1
+{ wc -c; wc -c; } < no_read >  out || fail=1
+{ wc -c; wc -c; } < do_read >> out || fail=1
+cat <<\EOF > exp
+2
+0
+1048576
+0
+EOF
+compare exp out || fail=1
+
 # Ensure we don't read too much when reading,
 # as was the case on 32 bit systems
 # from coreutils-8.24 to coreutils-9.1
-- 
2.26.2


  reply	other threads:[~2023-02-05 20:01 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-05 18:27 bug#61300: wc -c doesn't advance stdin position when it's a regular file Stephane Chazelas
2023-02-05 19:59 ` Pádraig Brady [this message]
2023-02-05 20:59   ` Paul Eggert
2023-02-06  6:27     ` Stephane Chazelas
2023-02-06 19:38       ` Pádraig Brady
2023-02-06 19:50         ` Paul Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.gnu.org/mailman/listinfo/bug-coreutils

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3fddd0c6-7d8f-631e-f37a-f2635ba0268e@draigBrady.com \
    --to=p@draigbrady.com \
    --cc=61300@debbugs.gnu.org \
    --cc=stephane@chazelas.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).