From: "Pádraig Brady" <P@draigBrady.com>
To: Stephane Chazelas <stephane@chazelas.org>, 61300@debbugs.gnu.org
Subject: bug#61300: wc -c doesn't advance stdin position when it's a regular file
Date: Sun, 5 Feb 2023 19:59:58 +0000 [thread overview]
Message-ID: <3fddd0c6-7d8f-631e-f37a-f2635ba0268e@draigBrady.com> (raw)
In-Reply-To: <20230205182728.5i2oi23purlzp6jj@chazelas.org>
[-- Attachment #1: Type: text/plain, Size: 1634 bytes --]
On 05/02/2023 18:27, Stephane Chazelas wrote:
> "wc -c" without filename arguments is meant to read stdin til
> EOF and report the number of bytes it has read.
>
> When stdin is on a regular file, GNU wc has that optimisation
> whereby it skips the reading, does a pos = lseek(0,0,SEEK_CUR)
> to find out its current position within the file, fstat(0) and
> reports st_size - pos (assuming st_size > pos).
>
> However, it does not move the position to the end of the file.
> That means for instance that:
>
> $ echo test > file
> $ { wc -c; wc -c; } < file
> 5
> 5
>
> Instead of 5, then 0:
>
> $ { wc -c; cat; } < file
> 5
> test
>
> So the optimisation is incomplete.
>
> It also reports the size of the file even if it could not possibly read it
> because it's not open in read mode:
>
> { wc -c; } 0>> file
> 5
>
> IMO, it should only do the optimisation if
> - fcntl(F_GETFL) to check that the file is opened in O_RDONLY or O_RDWR
> - current checks for /proc /sys-like filesystems
> - pos > st_size
> - lseek(0,st_size,SEEK_POS) is successful.
>
> (that leaves a race window above where it could move the cursor
> backward, but I would think that can be ignored as if something
> else reads at the same time, there's not much we can expect
> anyway).
Yes I agree.
Adjusting would also avoid the following inconsistencies:
$ { wc -c; wc -c; } < file
5
5
$ { wc -l; wc -l; } < file
1
0
$ truncate -s $(getconf PAGESIZE) file
$ { wc -c; wc -c; } < file
4096
0
Hopefully the attached addresses this.
Note it doesn't add the constraint on the input being readable,
which I'll think a bit more about.
cheers,
Pádraig
[-- Attachment #2: wc-update-offset.patch --]
[-- Type: text/x-patch, Size: 2313 bytes --]
From 42f72ec424e7eecd6b56c5b6fca5f377ff73795b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <P@draigBrady.com>
Date: Sun, 5 Feb 2023 19:52:31 +0000
Subject: [PATCH] wc: ensure we update file offset
* src/wc.c (wc): Update the offset when not reading,
and do read if we can't update the offset.
* tests/misc/wc-proc.sh: Add a test case.
* NEWS: Mention the bug fix.
Fixes https://bugs.gnu.org/61300
---
NEWS | 4 ++++
src/wc.c | 5 ++++-
tests/misc/wc-proc.sh | 12 ++++++++++++
3 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/NEWS b/NEWS
index b3cde4a01..1cea8cc32 100644
--- a/NEWS
+++ b/NEWS
@@ -57,6 +57,10 @@ GNU coreutils NEWS -*- outline -*-
sized files larger than SIZE_MAX.
[bug introduced in coreutils-8.24]
+ `wc -c` will again correctly update the read offset of inputs.
+ Previously it deduced the size of inputs while leaving the offset unchanged.
+ [bug introduced in coreutils-8.27]
+
** Changes in behavior
Programs now support the new Ronna (R), and Quetta (Q) SI prefixes,
diff --git a/src/wc.c b/src/wc.c
index 5f3ef6eee..de04612e9 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -446,7 +446,10 @@ wc (int fd, char const *file_x, struct fstatus *fstatus, off_t current_pos)
beyond the end of the file. As in the example above. */
bytes = end_pos < current_pos ? 0 : end_pos - current_pos;
- skip_read = true;
+ if (bytes && 0 <= lseek (fd, bytes, SEEK_CUR))
+ skip_read = true;
+ else
+ bytes = 0;
}
else
{
diff --git a/tests/misc/wc-proc.sh b/tests/misc/wc-proc.sh
index 5eb43b982..2b5026405 100755
--- a/tests/misc/wc-proc.sh
+++ b/tests/misc/wc-proc.sh
@@ -42,6 +42,18 @@ cat <<\EOF > exp
EOF
compare exp out || fail=1
+# Ensure we update the offset even when not reading,
+# which wasn't the case from coreutils-8.27 to coreutils-9.1
+{ wc -c; wc -c; } < no_read > out || fail=1
+{ wc -c; wc -c; } < do_read >> out || fail=1
+cat <<\EOF > exp
+2
+0
+1048576
+0
+EOF
+compare exp out || fail=1
+
# Ensure we don't read too much when reading,
# as was the case on 32 bit systems
# from coreutils-8.24 to coreutils-9.1
--
2.26.2
next prev parent reply other threads:[~2023-02-05 20:01 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-05 18:27 bug#61300: wc -c doesn't advance stdin position when it's a regular file Stephane Chazelas
2023-02-05 19:59 ` Pádraig Brady [this message]
2023-02-05 20:59 ` Paul Eggert
2023-02-06 6:27 ` Stephane Chazelas
2023-02-06 19:38 ` Pádraig Brady
2023-02-06 19:50 ` Paul Eggert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.gnu.org/mailman/listinfo/bug-coreutils
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3fddd0c6-7d8f-631e-f37a-f2635ba0268e@draigBrady.com \
--to=p@draigbrady.com \
--cc=61300@debbugs.gnu.org \
--cc=stephane@chazelas.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).