2021-12-13 04:25:02

by Dae R. Jeong

[permalink] [raw]
Subject: WARNING in schedule_bh

Hello,

During fuzzing, I observed a few warnings in the floppy driver, which
seems similar with the one found by Syzkaller.
(https://syzkaller.appspot.com/bug?id=7c17d936536dc3864e5df2d79ea11cdd946f81bf).

One of the warning reports is as follow:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 11682 at drivers/block/floppy.c:1000 schedule_bh drivers/block/floppy.c:1000 [inline]
WARNING: CPU: 2 PID: 11682 at drivers/block/floppy.c:1000 process_fd_request drivers/block/floppy.c:2851 [inline]
WARNING: CPU: 2 PID: 11682 at drivers/block/floppy.c:1000 fd_locked_ioctl drivers/block/floppy.c:3506 [inline]
WARNING: CPU: 2 PID: 11682 at drivers/block/floppy.c:1000 fd_ioctl+0x4825/0x4e90 drivers/block/floppy.c:3555
Modules linked in:
...
(skipped)
...
Call Trace:
<TASK>
blkdev_ioctl+0x45f/0xb20 block/ioctl.c:609
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:874 [inline]
__se_sys_ioctl+0x12c/0x1e0 fs/ioctl.c:860
__x64_sys_ioctl+0x9e/0xe0 fs/ioctl.c:860
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x6f/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x478b29
...
(skipped)
...
</TASK>
------------------------------------

A similar warning seems to occur in places where schedule_bh() is
called (e.g., floppy_queue_rq, floppy_interrupt, ...).

I am trying to understand why this happens. The below execution
scenario is my best guess (but different with the above call
trace). Since I don't fully understand the semantic of the floppy
driver, please execuse me if this is wrong.


fd_locked_ioctl(FDRESET) kworkerd floppy_interrupt
user_reset_fdc()
cont = &reset_cont;
wait_til_done(reset_fdc)
schedule_bh(reset_fdc)
wait_event(command_done)
reset_fdc()
do_floppy = reset_interrupt
/* triggering an interrupt
as stated in the comment */
handler = do_floppy // reset_interrupt
schedule_bh(handler)
reset_interrupt()
success_and_wakeup // reset_cont.redo
genric_success()
generic_done(1) // reset_cont.done
cont = &wakeup_cont
do_wakeup() // wakeup_cont.redo
reschedule_timeout()
cont = NULL
wake_up(command_done) // fd_locked_ioctl() can now resume

floppy_shutdown() // invoked by the above reschedule_timeout()
process_fd_request() // cont is NULL by reset_interrupt()
schedule_bh(redo_fd_request)
process_fd_request()
schedule_bh(redo_fd_request) <- WARNING


So, for me, concurrent execution of floppy_shutdown() and
fd_locked_ioctl() is suspicious. Could you please check the above
scenario is reasonable?


Best regards,
Dae R. Jeong.


2021-12-17 06:44:13

by Denis Efremov

[permalink] [raw]
Subject: Re: WARNING in schedule_bh

Hi,

>
> So, for me, concurrent execution of floppy_shutdown() and
> fd_locked_ioctl() is suspicious. Could you please check the above
> scenario is reasonable?
>

Thank you for the analysis. You are right and concurrent execution
of floppy_shutdown() and fd_locked_ioctl() looks suspicious. I know about
this warning more than 2 years already since I wrote syzkaller descriptions
for the floppy. I can only wonder why did it take so long for the syzbot
to find it.
However, this bug is not reproducible on real hardware. I would prefer not
to touch the code significantly unless there is a security reason for it.
The pros here are that bugs should be fixed. The cons here are that changing
the code more-or-less significantly is hard to test on real hardware
(there was a regression in UAPI of floppy not so long ago, it took almost
half of a year before it was reported and it took another couple of months
before distros released kernels with a fix); floppy driver contains
undocumented/poorly-documented hacks (e.g. O_ACCMODE
https://github.com/google/syzkaller/commit/3ea5a3451b2bfa90a3b73397273560f17d587efc#diff-07b38a9cc5b8b1eed725414265f033c41abffbbe537567799ed678dfe9c49d7a);
/dev/fd0 is accessible only by root/disk user on most of the distros nowadays;
races are highly likely not reproducible on real hardware because floppy
devices are slow; from the functional point of view the driver worked with
this bug almost 10 years (maybe more) and I doubt it's possible to
face this bug during normal workflow (without direct intention to trigger it);
I think that usage of floppies inside VMs is limited to some specific
workflows and I doubt it's really broad.
If you see an easy way to fix this issue, please send a patch.

Thanks,
Denis