This reverts commit 8ece3b3eb576a78d2e67ad4c3a80a39fa6708809.
This commit broke userspace. Bash uses ESPIPE to determine whether or
not the file should be read using "unbuffered I/O", which means reading
1 byte at a time instead of 128 bytes at a time. I used to use bash to
read through kmsg in a really quite nasty way:
while read -t 0.1 -r line 2>/dev/null || [[ $? -ne 142 ]]; do
echo "SARU $line"
done < /dev/kmsg
This will show all lines that can fit into the 128 byte buffer, and skip
lines that don't. That's pretty awful, but at least it worked.
With this change, bash now tries to do 1-byte reads, which means it
skips all the lines, which is worse than before.
Now, I don't really care very much about this, and I'm already look for
a workaround. But I did just spend an hour trying to figure out why my
scripts were broken. Either way, it makes no difference to me personally
whether this is reverted, but it might be something to consider. If you
declare that "trying to read /dev/kmsg with bash is terminally stupid
anyway," I might be inclined to agree with you. But do note that bash
uses lseek(fd, 0, SEEK_CUR)==>ESPIPE to determine whether or not it's
reading from a pipe.
Cc: Bruno Meneguele <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Petr Mladek <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
---
Documentation/ABI/testing/dev-kmsg | 5 -----
kernel/printk/printk.c | 10 ----------
2 files changed, 15 deletions(-)
diff --git a/Documentation/ABI/testing/dev-kmsg b/Documentation/ABI/testing/dev-kmsg
index 1e6c28b1942b..f307506eb54c 100644
--- a/Documentation/ABI/testing/dev-kmsg
+++ b/Documentation/ABI/testing/dev-kmsg
@@ -56,11 +56,6 @@ Description: The /dev/kmsg character device node provides userspace access
seek after the last record available at the time
the last SYSLOG_ACTION_CLEAR was issued.
- Due to the record nature of this interface with a "read all"
- behavior and the specific positions each seek operation sets,
- SEEK_CUR is not supported, returning -ESPIPE (invalid seek) to
- errno whenever requested.
-
The output format consists of a prefix carrying the syslog
prefix including priority and facility, the 64 bit message
sequence number and the monotonic timestamp in microseconds,
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8c14835be46c..b71eaf5f5a86 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -974,16 +974,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
user->idx = log_next_idx;
user->seq = log_next_seq;
break;
- case SEEK_CUR:
- /*
- * It isn't supported due to the record nature of this
- * interface: _SET _DATA and _END point to very specific
- * record positions, while _CUR would be more useful in case
- * of a byte-based log. Because of that, return the default
- * errno value for invalid seek operation.
- */
- ret = -ESPIPE;
- break;
default:
ret = -EINVAL;
}
--
2.27.0
On Sun, Jun 21, 2020 at 8:02 PM Jason A. Donenfeld <[email protected]> wrote:
>
> This reverts commit 8ece3b3eb576a78d2e67ad4c3a80a39fa6708809.
>
> This commit broke userspace. Bash uses ESPIPE to determine whether or
> not the file should be read using "unbuffered I/O", which means reading
> 1 byte at a time instead of 128 bytes at a time.
Ack. Somewhat odd behavior, but clearly user space depended on the
legacy "return EINVAL rather than ESPIPE" behavior.
I do think there are other reasons to not return ESPIPE. The fact is,
the printk buffer _is_ seekable, it's just relative seeking that
doesn't work. You can seek to the beginning and the end and a
particular offset, just not relative.
So I kind of see why people wanted to return ESPIPE, but at the same
time it really is very misleading: ESPIPE is for pure streams that
can't lseek at all.
The fact that some silly shell internal then reacts very badly to that
may be extreme, but it may be as well as you can do it you worry about
leaving data for the next user.
I've applied the revert.
Linus
On Sun, Jun 21, 2020 at 08:50:09PM -0700, Linus Torvalds wrote:
> On Sun, Jun 21, 2020 at 8:02 PM Jason A. Donenfeld <[email protected]> wrote:
> >
> > This reverts commit 8ece3b3eb576a78d2e67ad4c3a80a39fa6708809.
> >
> > This commit broke userspace. Bash uses ESPIPE to determine whether or
> > not the file should be read using "unbuffered I/O", which means reading
> > 1 byte at a time instead of 128 bytes at a time.
>
> Ack. Somewhat odd behavior, but clearly user space depended on the
> legacy "return EINVAL rather than ESPIPE" behavior.
>
> I do think there are other reasons to not return ESPIPE. The fact is,
> the printk buffer _is_ seekable, it's just relative seeking that
> doesn't work. You can seek to the beginning and the end and a
> particular offset, just not relative.
>
> So I kind of see why people wanted to return ESPIPE, but at the same
> time it really is very misleading: ESPIPE is for pure streams that
> can't lseek at all.
That was indeed a misunderstanding of mine wrt ESPIPE meaning.
And I agree with your previous paragraph where you stated that the
buffer is only not "relative" seekable. So, ack for the revert.
However, the issue with glibc is their fd checking on dprintf using:
lseek(offset == 0, whence == SEEK_CUR)
Which, technically, isn't a relative seek operation in my opinion, thus
I'm also not sure that returning EINVAL is correct.
Would it make sense to return the next buffer index instead? Basically
the same as SEEK_END does? The first "if (offset)" in the function would
prevent any real relative move while SEEK_CUR would return a valid
address following this buffer behavior of specific points it could seek
to.
>
> The fact that some silly shell internal then reacts very badly to that
> may be extreme, but it may be as well as you can do it you worry about
> leaving data for the next user.
>
> I've applied the revert.
>
> Linus
>
--
bmeneg
PGP Key: http://bmeneg.com/pubkey.txt
On Mon, Jun 22, 2020 at 6:38 AM Bruno Meneguele <[email protected]> wrote:
>
> However, the issue with glibc is their fd checking on dprintf using:
>
> lseek(offset == 0, whence == SEEK_CUR)
>
> Which, technically, isn't a relative seek operation in my opinion, thus
> I'm also not sure that returning EINVAL is correct.
Well, I'm not sure there is a "correct". Normal file descriptors are
seekable or not, this is kind of a special one. It's not like you can
read it byte for byte anyway.
There is a "historical behavior".
> Would it make sense to return the next buffer index instead? Basically
> the same as SEEK_END does? The first "if (offset)" in the function would
> prevent any real relative move while SEEK_CUR would return a valid
> address following this buffer behavior of specific points it could seek
> to.
Maybe. At the same time, the way we don't actually return a real
position means that that's very dangerous too. We'll always return
"we're at position zero".
And we never accept byte-by-byte reads and require a "get the whole
record" model.
So I think we might as well accept "kmsg is special".
I don't have hugely strong opinions on it - I certainly agree that
"SEEK_CUR with offset zero could be a no-op", but I also don't think
there's a huge reason to try to change it, considering just _how_
special kmsg is.
Linus
On Mon, Jun 22, 2020 at 09:42:25AM -0700, Linus Torvalds wrote:
> > Would it make sense to return the next buffer index instead? Basically
> > the same as SEEK_END does? The first "if (offset)" in the function would
> > prevent any real relative move while SEEK_CUR would return a valid
> > address following this buffer behavior of specific points it could seek
> > to.
>
> Maybe. At the same time, the way we don't actually return a real
> position means that that's very dangerous too. We'll always return
> "we're at position zero".
>
> And we never accept byte-by-byte reads and require a "get the whole
> record" model.
>
> So I think we might as well accept "kmsg is special".
>
> I don't have hugely strong opinions on it - I certainly agree that
> "SEEK_CUR with offset zero could be a no-op", but I also don't think
> there's a huge reason to try to change it, considering just _how_
> special kmsg is.
Although both options are pretty fine by me too, I "fear" (not really)
we can end up stacking special behavior interfaces, forcing userspace to
keep a "table of special case files". Personally, I prefer to return
something _valid_ to userspace rather than _fail_ with special meaning.
But in any case I think it's worth adding a note in the docs just to
make sure we have somewhere to point in case they start looking.
Thanks Linus! Will wait some more in case we have other thoughts around
it before posting anything (doc patch or the other approach).
--
bmeneg
PGP Key: http://bmeneg.com/pubkey.txt
On Mon, Jun 22, 2020 at 10:10 AM Bruno Meneguele <[email protected]> wrote:
>
> Although both options are pretty fine by me too, I "fear" (not really)
> we can end up stacking special behavior interfaces, forcing userspace to
> keep a "table of special case files". Personally, I prefer to return
> something _valid_ to userspace rather than _fail_ with special meaning.
Well, what's even worse if user space has to handle two different
things just because behavior changed in different kernels..
So at some point "odd behavior" is much better than "odd behavior in
two different ways"..
Linus
On Mon, Jun 22, 2020 at 10:32:51AM -0700, Linus Torvalds wrote:
> On Mon, Jun 22, 2020 at 10:10 AM Bruno Meneguele <[email protected]> wrote:
> >
> > Although both options are pretty fine by me too, I "fear" (not really)
> > we can end up stacking special behavior interfaces, forcing userspace to
> > keep a "table of special case files". Personally, I prefer to return
> > something _valid_ to userspace rather than _fail_ with special meaning.
>
> Well, what's even worse if user space has to handle two different
> things just because behavior changed in different kernels..
>
> So at some point "odd behavior" is much better than "odd behavior in
> two different ways"..
Really well pointed!
Ok, +1 for doc-only patch it is :).
--
bmeneg
PGP Key: http://bmeneg.com/pubkey.txt
On Sun, Jun 21, 2020 at 9:02 PM Jason A. Donenfeld <[email protected]> wrote:
> This commit broke userspace. Bash uses ESPIPE to determine whether or
> not the file should be read using "unbuffered I/O", which means reading
> 1 byte at a time instead of 128 bytes at a time. I used to use bash to
> read through kmsg in a really quite nasty way:
>
> while read -t 0.1 -r line 2>/dev/null || [[ $? -ne 142 ]]; do
> echo "SARU $line"
> done < /dev/kmsg
>
> This will show all lines that can fit into the 128 byte buffer, and skip
> lines that don't. That's pretty awful, but at least it worked.
FYI, bash finally bumped its read buffer up to 4k, which actually
makes reading /dev/kmsg less awful than previously thought:
http://git.savannah.gnu.org/cgit/bash.git/commit/?id=6edcd70089d71ee8c17bf3298527054b3223be9f
This is probably too mundane to warrant an email, but in case somebody
finds this thread in the future, voila.