2021-02-16 16:19:56

by syzbot

[permalink] [raw]
Subject: WARNING in iov_iter_revert (2)

Hello,

syzbot found the following issue on:

HEAD commit: dcc0b490 Merge tag 'powerpc-5.11-8' of git://git.kernel.or..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=162f2d02d00000
kernel config: https://syzkaller.appspot.com/x/.config?x=1106b4b91e8dfab8
dashboard link: https://syzkaller.appspot.com/bug?extid=3d2c27c2b7dc2a94814d
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1346ac60d00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=123eae4cd00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

[U] 
------------[ cut here ]------------
WARNING: CPU: 1 PID: 8461 at lib/iov_iter.c:1090 iov_iter_revert+0x2e3/0x8e0 lib/iov_iter.c:1090
Modules linked in:
CPU: 1 PID: 8461 Comm: syz-executor778 Not tainted 5.11.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:iov_iter_revert+0x2e3/0x8e0 lib/iov_iter.c:1090
Code: 05 00 00 45 8b 74 24 08 48 89 de 4c 89 6d 20 49 83 c5 01 4c 89 f7 e8 bc eb b4 fd 49 39 de 72 ba e9 2c ff ff ff e8 9d e5 b4 fd <0f> 0b e9 30 ff ff ff e8 91 e5 b4 fd 48 8d 7d 18 48 b8 00 00 00 00
RSP: 0018:ffffc9000168fc30 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffffffffffffffff RCX: 0000000000000000
RDX: ffff88801b6c1bc0 RSI: ffffffff83bdef03 RDI: 0000000000000003
RBP: ffffc9000168fd68 R08: 000000007ffff000 R09: ffffffff8f86683f
R10: ffffffff83bdec5e R11: 0000000000000000 R12: 0000000000000001
R13: 0000000000000000 R14: ffffc9000168fd68 R15: ffff88801b6c1bc0
FS: 00000000016f5300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5d4c03d0d8 CR3: 0000000013adc000 CR4: 0000000000350ef0
Call Trace:
do_tty_write drivers/tty/tty_io.c:967 [inline]
file_tty_write.constprop.0+0x55f/0x8f0 drivers/tty/tty_io.c:1048
call_write_iter include/linux/fs.h:1901 [inline]
new_sync_write+0x426/0x650 fs/read_write.c:518
vfs_write+0x791/0xa30 fs/read_write.c:605
ksys_write+0x12d/0x250 fs/read_write.c:658
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x43ee99
Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffc3cb0628 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000400488 RCX: 000000000043ee99
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000402e80 R08: 0000000000400488 R09: 0000000000400488
R10: 0000000000400488 R11: 0000000000000246 R12: 0000000000402f10
R13: 0000000000000000 R14: 00000000004ac018 R15: 0000000000400488


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches


2021-02-20 14:31:54

by syzbot

[permalink] [raw]
Subject: Re: WARNING in iov_iter_revert (2)

syzbot has bisected this issue to:

commit 9bb48c82aced07698a2d08ee0f1475a6c4f6b266
Author: Linus Torvalds <[email protected]>
Date: Tue Jan 19 19:41:16 2021 +0000

tty: implement write_iter

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1009a9f2d00000
start commit: dcc0b490 Merge tag 'powerpc-5.11-8' of git://git.kernel.or..
git tree: upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=1209a9f2d00000
console output: https://syzkaller.appspot.com/x/log.txt?x=1409a9f2d00000
kernel config: https://syzkaller.appspot.com/x/.config?x=1106b4b91e8dfab8
dashboard link: https://syzkaller.appspot.com/bug?extid=3d2c27c2b7dc2a94814d
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1346ac60d00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=123eae4cd00000

Reported-by: [email protected]
Fixes: 9bb48c82aced ("tty: implement write_iter")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

2021-02-20 17:43:00

by Al Viro

[permalink] [raw]
Subject: Re: WARNING in iov_iter_revert (2)

On Sat, Feb 20, 2021 at 08:56:40AM -0800, Linus Torvalds wrote:
> Al,
> This is the "FIXME! Have Al check this!" case in do_tty_write(). You were
> in on that whole discussion, but we never did get to that issue...
>
> There are some subtle rules about doing the iov_iter_revert(), but what's
> the best way to do this properly? Instead of doing a copy_from_iter() and
> then reverting the part that didn't fit in the buffer, doing a
> non-advancing copy and then advancing the amount that did fit, or what?
>
> I still don't have power, so this is all me on mobile with html email
> (sorry), and limited ability to really look closer.
>
> "Help me, Albi-wan Viro, you're my only hope"

Will check... BTW, when you get around to doing pulls, could you pick
the replacement (in followup) instead of the first pull request for
work.namei? Jens has caught a braino in the last commit there...

2021-02-20 17:44:43

by Al Viro

[permalink] [raw]
Subject: [git pull] work.namei stuff (v2)

Most of that pile is LOOKUP_CACHED series; the rest is a couple of
misc cleanups in the general area...

There's a minor bisect hazard in the end of series, and normally
I would've just folded the fix into the previous commit, but this branch is
shared with Jens' tree, with stuff on top of it in there, so that would've
required rebases outside of vfs.git.

NOTE: I'm less than thrilled by the "let's allow offloading pathwalks
to helper threads" push, but LOOKUP_CACHED is useful on its own.

The following changes since commit 5c8fe583cce542aa0b84adc939ce85293de36e5e:

Linux 5.11-rc1 (2020-12-27 15:30:22 -0800)

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git work.namei

for you to fetch changes up to eacd9aa8cedeb412842c7b339adbaa0477fdd5ad:

fix handling of nd->depth on LOOKUP_CACHED failures in try_to_unlazy* (2021-02-20 12:33:12 -0500)

----------------------------------------------------------------
Al Viro (3):
do_tmpfile(): don't mess with finish_open()
saner calling conventions for unlazy_child()
fix handling of nd->depth on LOOKUP_CACHED failures in try_to_unlazy*

Jens Axboe (3):
fs: make unlazy_walk() error handling consistent
fs: add support for LOOKUP_CACHED
fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED

Steven Rostedt (VMware) (1):
fs/namei.c: Remove unlikely of status being -ECHILD in lookup_fast()

fs/namei.c | 89 ++++++++++++++++++++++----------------------
fs/open.c | 6 +++
include/linux/fcntl.h | 2 +-
include/linux/namei.h | 1 +
include/uapi/linux/openat2.h | 4 ++
5 files changed, 56 insertions(+), 46 deletions(-)

2021-02-20 19:33:33

by Al Viro

[permalink] [raw]
Subject: Re: WARNING in iov_iter_revert (2)

On Sat, Feb 20, 2021 at 05:38:49PM +0000, Al Viro wrote:
> On Sat, Feb 20, 2021 at 08:56:40AM -0800, Linus Torvalds wrote:
> > Al,
> > This is the "FIXME! Have Al check this!" case in do_tty_write(). You were
> > in on that whole discussion, but we never did get to that issue...
> >
> > There are some subtle rules about doing the iov_iter_revert(), but what's
> > the best way to do this properly? Instead of doing a copy_from_iter() and
> > then reverting the part that didn't fit in the buffer, doing a
> > non-advancing copy and then advancing the amount that did fit, or what?
> >
> > I still don't have power, so this is all me on mobile with html email
> > (sorry), and limited ability to really look closer.
> >
> > "Help me, Albi-wan Viro, you're my only hope"
>
> Will check... BTW, when you get around to doing pulls, could you pick
> the replacement (in followup) instead of the first pull request for
> work.namei? Jens has caught a braino in the last commit there...

It turned out to be really amusing. What happens is write(fd, NULL, 0)
on /dev/ttyprintk, with N_GSM0710 for ldisc (== "pass the data as
is to tty->op->write()". And that's the first write since opening
that sucker, so we end up with
/* write_buf/write_cnt is protected by the atomic_write_lock mutex */
if (tty->write_cnt < chunk) {
unsigned char *buf_chunk;

if (chunk < 1024)
chunk = 1024;

buf_chunk = kmalloc(chunk, GFP_KERNEL);
if (!buf_chunk) {
ret = -ENOMEM;
goto out;
}
kfree(tty->write_buf);
tty->write_cnt = chunk;
tty->write_buf = buf_chunk;
}
doing nothing - ->write_cnt is still 0 and ->write_buf - NULL. Then
we copy 0 bytes from source to ->write_buf(), which reports that 0
bytes had been copied, TYVM. Then we call
ret = write(tty, file, tty->write_buf, size);
i.e.
ret = gsm_write(tty, file, NULL, 0);
which calls
tpk_write(tty, NULL, 0)
which does
tpk_printk(NULL, 0);
and _that_ has a very special semantics:
int i = tpk_curr;

if (buf == NULL) {
tpk_flush();
return i;
}
i.e. it *can* return a positive number that gets propagated all way
back to do_tty_write(). And then you notice that it has reports
successful write of amount other than what you'd passed and tries
to pull back. By amount passed - amount written. With iov_iter_revert()
saying that some tosser has asked it to revert by something close to
~(size_t)0.

IOW, it's not iov_iter_revert() being weird or do_tty_write() misuing it -
it's tpk_write() playing silly buggers. Note that old tree would've
gone through seriously weird contortions on the same call:
// chunk and count are 0, ->write_buf is NULL
for (;;) {
size_t size = count;
if (size > chunk)
size = chunk;
ret = -EFAULT;
if (copy_from_user(tty->write_buf, buf, size))
break;
ret = write(tty, file, tty->write_buf, size);
if (ret <= 0)
break;
written += ret;
buf += ret;
count -= ret;
if (!count)
break;
ret = -ERESTARTSYS;
if (signal_pending(current))
break;
cond_resched();
}
and we get written = ret = small positive, count = - that amount,
buf = NULL + that mount. On the next iteration size = 0 (since
chunk is still 0), with same no-op copy_from_user() of 0 bytes,
then gsm_write(tty, file, NULL, 0) and since tpk_flush() zeroes
tpk_curr we finally get 0 out of tpk_printk/tpk_write/gsm_write
and bugger off on if (ret <= 0). Then we have the value in written
returned.

So yeah, this return value *was* returned to userland. Except that
if we had done any writes before that, we'd find ->write_buf
non-NULL and the magical semantics of write(fd, NULL, 0) would
*not* have triggered - we would've gotten zero.

Do we want to preserve that weirdness of /dev/ttyprintk writes?
That's orthogonal to the iov_iter uses in there.

2021-02-20 19:45:32

by Al Viro

[permalink] [raw]
Subject: Re: WARNING in iov_iter_revert (2)

On Sat, Feb 20, 2021 at 07:29:57PM +0000, Al Viro wrote:

> And then you notice that it has reports
> successful write of amount other than what you'd passed and tries
> to pull back.

Sorry, half-edited sentence has escaped ;-/ Should be

"And there the caller notices that callback has reported a successful
write, but the amount apparently written is not the same as the
amount it had asked to write. It interprets that as a short write
and tries to pull back. Only it's actually _forward_, since we'd
asked to write 0 bytes and got a small positive number from the
callback."

2021-02-21 00:48:54

by Linus Torvalds

[permalink] [raw]
Subject: Re: WARNING in iov_iter_revert (2)

[ Let's see how long this lasts, but I've got a generator for the
laptop, and hopefully I'll be able to start doing pulls tonight, and
get "real" power tomorrow ]

On Sat, Feb 20, 2021 at 11:30 AM Al Viro <[email protected]> wrote:
>
> IOW, it's not iov_iter_revert() being weird or do_tty_write() misuing it -
> it's tpk_write() playing silly buggers.

Ok, that's actually not as bad I was was afraid it might be.

> Do we want to preserve that weirdness of /dev/ttyprintk writes?
> That's orthogonal to the iov_iter uses in there.

I don't think the ttyprintk weirdness was intentional. I'd fix that,
but in the meantime clearly we should make do_tty_write() protect
against this insanity, and do something like

--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -961,6 +961,9 @@ static inline ssize_t do_tty_write(
ret = write(tty, file, tty->write_buf, size);
if (ret <= 0)
break;
+ /* ttyprintk historical oddity */
+ if (ret > size)
+ break;

/* FIXME! Have Al check this! */
if (ret != size)

in there. Because right now we clearly do strange and not-so-wonderful
things if the write routine returns a bigger value than it was
passed.. Not limited to that iov_iter_revert() thing, but the whole
loop.

Comments?

Linus

2021-02-21 08:45:31

by Sabyrzhan Tasbolatov

[permalink] [raw]
Subject: Re: WARNING in iov_iter_revert (2)

> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -961,6 +961,9 @@ static inline ssize_t do_tty_write(
> ret = write(tty, file, tty->write_buf, size);
> if (ret <= 0)
> break;
> + /* ttyprintk historical oddity */
> + if (ret > size)
> + break;
>
> /* FIXME! Have Al check this! */
> if (ret != size)
>
> in there. Because right now we clearly do strange and not-so-wonderful
> things if the write routine returns a bigger value than it was
> passed.. Not limited to that iov_iter_revert() thing, but the whole
> loop.
>
> Comments?

Just want to comment that this fix is correct (tested),
rather than what I did [1] to return abruptly
in the beginning of do_tty_write() for write(fd, NULL, 0) case.

Let me know if I can prepare a patch with Linus's fix above.

[1] https://lore.kernel.org/lkml/[email protected]
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -905,6 +905,9 @@ static inline ssize_t do_tty_write(
ssize_t ret, written = 0;
unsigned int chunk;

+ if (!count)
+ return -EFAULT;
+
ret = tty_write_lock(tty, file->f_flags & O_NDELAY);
if (ret < 0)
return ret;

2021-02-21 17:22:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: WARNING in iov_iter_revert (2)

On Sat, Feb 20, 2021 at 4:45 PM Linus Torvalds
<[email protected]> wrote:
>
> I don't think the ttyprintk weirdness was intentional. I'd fix that,
> but in the meantime clearly we should make do_tty_write() protect
> against this insanity, and do something like
>
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -961,6 +961,9 @@ static inline ssize_t do_tty_write(
> ret = write(tty, file, tty->write_buf, size);
> if (ret <= 0)
> break;
> + /* ttyprintk historical oddity */
> + if (ret > size)
> + break;

Actually, we need to move the

written += ret;

line up above this thing too, so that we get the expected insane
return value to user space.

Of course, if ttyprintk gets fixed, this is all moot, but I've applied
this patch for now (since I was doing Greg's tty pull).

I've marked it as "fixing" 9bb48c82aced ("tty: implement write_iter")
even if the real cause is much older - the WARNING is new.

Linus

PS. Just lost power again. Oh joy.

2021-02-21 17:47:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: [git pull] work.namei stuff (v2)

On Sat, Feb 20, 2021 at 9:40 AM Al Viro <[email protected]> wrote:
>
> NOTE: I'm less than thrilled by the "let's allow offloading pathwalks
> to helper threads" push, but LOOKUP_CACHED is useful on its own.

Well, that is getting fixed separately, so I think this is all good.
Still helper threads, but no longer kernel helper threads.

Linus

2021-02-21 18:48:37

by pr-tracker-bot

[permalink] [raw]
Subject: Re: [git pull] work.namei stuff (v2)

The pull request you sent on Sat, 20 Feb 2021 17:40:52 +0000:

> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git work.namei

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/c57b1f0a5f40e6d35f22a3ce61e69d73fc0b1dbc

Thank you!

--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html