Hi,
While fuzzing 5.15.y kernel with Syzkaller, we noticed a INFO: task hung
bug in fb_deferred_io_work()
This is in 5.15.149 tag, and this is introduced by a set of commits:
2655757a3f10 fbdev: flush deferred IO before closing
15492bab7393 fbdev: Fix incorrect page mapping clearance at
fb_deferred_io_release()
87b9802ca824 fbdev: Fix invalid page access after closing deferred I/O
devices
9a95fc04261f fbdev: Rename pagelist to pagereflist for deferred I/O
186b89659c4c fbdev: Track deferred-I/O pages in pageref struct
e79b2b2aadef fbdev: defio: fix the pagelist corruption
0616b00a31d6 fbdev: Don't sort deferred-I/O pages by default
5d3aff76a316 fbdev/defio: Early-out if page is already enlisted
I then tried if it could be reproducible on upstream kernel as well, and
it did. below is the call trace with latest 6.9-rc4 based kernel. I have
enabled the LOCKDEP to get locking related information.
INFO: task kworker/0:3:1077 blocked for more than 122 seconds.
Not tainted 6.9.0-rc4+ #54
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:3 state:D stack:0 pid:1077 tgid:1077 ppid:2
flags:0x00004000
Workqueue: events fb_deferred_io_work
Call Trace:
<TASK>
__schedule (kernel/sched/core.c:5409 kernel/sched/core.c:6746)
? __pfx___schedule (kernel/sched/core.c:6617)
? schedule (kernel/sched/core.c:6804 kernel/sched/core.c:6837)
? __pfx___lock_release (kernel/locking/lockdep.c:5406)
? schedule (kernel/sched/core.c:6804 kernel/sched/core.c:6837)
schedule (./arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6824
kernel/sched/core.c:6838)
io_schedule (kernel/sched/core.c:9019 kernel/sched/core.c:9045)
folio_wait_bit_common (mm/filemap.c:1272)
? __pfx_folio_wait_bit_common (mm/filemap.c:1206)
? __pfx___mutex_lock (kernel/locking/mutex.c:751)
? lock_acquire.part.0 (kernel/locking/lockdep.c:467
kernel/locking/lockdep.c:5756)
? __pfx_wake_page_function (mm/filemap.c:1091)
? __pfx_lock_acquire.part.0 (kernel/locking/lockdep.c:5719)
fb_deferred_io_work (./include/linux/pagemap.h:1069
drivers/video/fbdev/core/fb_defio.c:248)
? lock_acquire (./include/trace/events/lock.h:24
kernel/locking/lockdep.c:5725)
process_one_work (kernel/workqueue.c:3254)
? __pfx_lock_acquire.part.0 (kernel/locking/lockdep.c:5719)
? __pfx_process_one_work (kernel/workqueue.c:3156)
? assign_work (kernel/workqueue.c:1210)
worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416)
? __kthread_parkme (kernel/kthread.c:293)
? __pfx_worker_thread (kernel/workqueue.c:3362)
kthread (kernel/kthread.c:388)
? __pfx_kthread (kernel/kthread.c:341)
ret_from_fork (arch/x86/kernel/process.c:147)
? __pfx_kthread (kernel/kthread.c:341)
ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
</TASK>
INFO: task r:2664 blocked for more than 123 seconds.
Not tainted 6.9.0-rc4+ #54
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:r state:D stack:0 pid:2664 tgid:2664
ppid:2618 flags:0x00000002
Call Trace:
<TASK>
__schedule (kernel/sched/core.c:5409 kernel/sched/core.c:6746)
? __pfx___schedule (kernel/sched/core.c:6617)
? schedule (kernel/sched/core.c:6804 kernel/sched/core.c:6837)
? __pfx___lock_release (kernel/locking/lockdep.c:5406)
? schedule (kernel/sched/core.c:6804 kernel/sched/core.c:6837)
schedule (./arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6824
kernel/sched/core.c:6838)
schedule_timeout (kernel/time/timer.c:2559)
? __pfx_schedule_timeout (kernel/time/timer.c:2544)
? mark_lock.part.0 (./arch/x86/include/asm/bitops.h:227
/arch/x86/include/asm/bitops.h:239
/include/asm-generic/bitops/instrumented-non-atomic.h:142
kernel/locking/lockdep.c:228 kernel/locking/lockdep.c:4656)
? mark_held_locks (kernel/locking/lockdep.c:4274)
? lockdep_hardirqs_on_prepare.part.0 (kernel/locking/lockdep.c:4300
kernel/locking/lockdep.c:4359)
__wait_for_common (kernel/sched/completion.c:95
kernel/sched/completion.c:116)
? __pfx_schedule_timeout (kernel/time/timer.c:2544)
? __pfx___wait_for_common (kernel/sched/completion.c:110)
? start_flush_work (kernel/workqueue.c:4192)
__flush_work (kernel/workqueue.c:4207)
? __pfx___flush_work (kernel/workqueue.c:4195)
? __pfx_wq_barrier_func (kernel/workqueue.c:3742)
? flush_delayed_work (kernel/workqueue.c:4249)
? __pfx_fb_release (drivers/video/fbdev/core/fb_chrdev.c:375)
fb_deferred_io_lastclose (drivers/video/fbdev/core/fb_defio.c:321)
? __pfx_fb_release (drivers/video/fbdev/core/fb_chrdev.c:375)
fb_deferred_io_release (drivers/video/fbdev/core/fb_defio.c:333)
fb_release (drivers/video/fbdev/core/fb_chrdev.c:383)
? __pfx_fb_release (drivers/video/fbdev/core/fb_chrdev.c:375)
__fput (fs/file_table.c:422)
task_work_run (kernel/task_work.c:180 (discriminator 1))
? __pfx_task_work_run (kernel/task_work.c:148)
? do_raw_spin_unlock (./arch/x86/include/asm/paravirt.h:589
/arch/x86/include/asm/qspinlock.h:57 kernel/locking/spinlock_debug.c:142)
do_exit (kernel/exit.c:879)
? do_raw_spin_lock (./arch/x86/include/asm/atomic.h:115
/include/linux/atomic/atomic-arch-fallback.h:2170
/include/linux/atomic/atomic-instrumented.h:1302
/include/asm-generic/qspinlock.h:111 kernel/locking/spinlock_debug.c:116)
? __pfx_do_exit (kernel/exit.c:819)
? __pfx_do_raw_spin_lock (kernel/locking/spinlock_debug.c:114)
do_group_exit (kernel/exit.c:1008)
__x64_sys_exit_group (kernel/exit.c:1036)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
RIP: 0033:0x7f9cfe71e5f1
RSP: 002b:00007ffd636aec08 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007f9cfe7fc510 RCX: 00007f9cfe71e5f1
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffffffffff80 R09: 0000000000000000
R10: 0000000000000020 R11: 0000000000000246 R12: 00007f9cfe7fc510
R13: 0000000000000000 R14: 00007f9cfe7fc9e8 R15: 00007f9cfe7fca00
</TASK>
Showing all locks held in the system:
1 lock held by khungtaskd/50:
#0: ffffffff950219c0 (rcu_read_lock){....}-{1:2}, at:
debug_show_all_locks (./include/linux/rcupdate.h:329 (discriminator 1)
/include/linux/rcupdate.h:781 (discriminator 1)
kernel/locking/lockdep.c:6614 (discriminator 1))
3 locks held by kworker/0:3/1077:
#0: ffff88810005ad48 ((wq_completion)events){+.+.}-{0:0}, at:
process_one_work (kernel/workqueue.c:3229)
#1: ffff8881080dfd98
((work_completion)(&(&info->deferred_work)->work)){+.+.}-{0:0}, at:
process_one_work (kernel/workqueue.c:3230)
#2: ffff88811416db38 (&fbdefio->lock){+.+.}-{3:3}, at:
fb_deferred_io_work (drivers/video/fbdev/core/fb_defio.c:246)
1 lock held by r/2664:
#0: ffff8881141fe078 (&fb_info->lock){+.+.}-{3:3}, at: fb_release
(drivers/video/fbdev/core/fb_chrdev.c:380)
Syzkaller initially generated a long reproducer, Vegard simplified it to
the one below
C repro:
#define _GNU_SOURCE
#include <sys/mman.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
int main(void)
{
int fd = open("/dev/fb0", O_RDWR, 0);
char *ptr = (char *) 0x20000000ul;
mmap(ptr, 4096, PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0xff000);
*ptr = 0;
mmap(ptr, 4096, PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0);
*ptr = 0;
return 0;
}
This needs to be run as root to see the hang in 122 seconds.
238 static void fb_deferred_io_work(struct work_struct *work)
239 {
240 struct fb_info *info = container_of(work, struct fb_info,
deferred_work.work);
241 struct fb_deferred_io_pageref *pageref, *next;
242 struct fb_deferred_io *fbdefio = info->fbdefio;
243
244 /* here we mkclean the pages, then do all deferred IO */
245 mutex_lock(&fbdefio->lock);
246 list_for_each_entry(pageref, &fbdefio->pagereflist, list) {
247 struct page *cur = pageref->page;
248 lock_page(cur);
249 page_mkclean(cur);
250 unlock_page(cur);
251 }
It is stalling on lock_page() in fb_deferred_io_work()
I will be happy to try any patches and report back.
Thanks,
Harshit
On Thu, Apr 18, 2024 at 2:40 PM Harshit Mogalapalli
<[email protected]> wrote:
>
> Hi,
>
> While fuzzing 5.15.y kernel with Syzkaller, we noticed a INFO: task hung
> bug in fb_deferred_io_work()
>
>
> This is in 5.15.149 tag, and this is introduced by a set of commits:
>
> 2655757a3f10 fbdev: flush deferred IO before closing
> 15492bab7393 fbdev: Fix incorrect page mapping clearance at
> fb_deferred_io_release()
> 87b9802ca824 fbdev: Fix invalid page access after closing deferred I/O
> devices
> 9a95fc04261f fbdev: Rename pagelist to pagereflist for deferred I/O
> 186b89659c4c fbdev: Track deferred-I/O pages in pageref struct
> e79b2b2aadef fbdev: defio: fix the pagelist corruption
> 0616b00a31d6 fbdev: Don't sort deferred-I/O pages by default
> 5d3aff76a316 fbdev/defio: Early-out if page is already enlisted
>
> I then tried if it could be reproducible on upstream kernel as well, and
> it did. below is the call trace with latest 6.9-rc4 based kernel. I have
> enabled the LOCKDEP to get locking related information.
>
> INFO: task kworker/0:3:1077 blocked for more than 122 seconds.
> Not tainted 6.9.0-rc4+ #54
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/0:3 state:D stack:0 pid:1077 tgid:1077 ppid:2
> flags:0x00004000
> Workqueue: events fb_deferred_io_work
> Call Trace:
> <TASK>
> __schedule (kernel/sched/core.c:5409 kernel/sched/core.c:6746)
> ? __pfx___schedule (kernel/sched/core.c:6617)
> ? schedule (kernel/sched/core.c:6804 kernel/sched/core.c:6837)
> ? __pfx___lock_release (kernel/locking/lockdep.c:5406)
> ? schedule (kernel/sched/core.c:6804 kernel/sched/core.c:6837)
> schedule (./arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6824
> kernel/sched/core.c:6838)
> io_schedule (kernel/sched/core.c:9019 kernel/sched/core.c:9045)
> folio_wait_bit_common (mm/filemap.c:1272)
> ? __pfx_folio_wait_bit_common (mm/filemap.c:1206)
> ? __pfx___mutex_lock (kernel/locking/mutex.c:751)
> ? lock_acquire.part.0 (kernel/locking/lockdep.c:467
> kernel/locking/lockdep.c:5756)
> ? __pfx_wake_page_function (mm/filemap.c:1091)
> ? __pfx_lock_acquire.part.0 (kernel/locking/lockdep.c:5719)
> fb_deferred_io_work (./include/linux/pagemap.h:1069
> drivers/video/fbdev/core/fb_defio.c:248)
> ? lock_acquire (./include/trace/events/lock.h:24
> kernel/locking/lockdep.c:5725)
> process_one_work (kernel/workqueue.c:3254)
> ? __pfx_lock_acquire.part.0 (kernel/locking/lockdep.c:5719)
> ? __pfx_process_one_work (kernel/workqueue.c:3156)
> ? assign_work (kernel/workqueue.c:1210)
> worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416)
> ? __kthread_parkme (kernel/kthread.c:293)
> ? __pfx_worker_thread (kernel/workqueue.c:3362)
> kthread (kernel/kthread.c:388)
> ? __pfx_kthread (kernel/kthread.c:341)
> ret_from_fork (arch/x86/kernel/process.c:147)
> ? __pfx_kthread (kernel/kthread.c:341)
> ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
> </TASK>
> INFO: task r:2664 blocked for more than 123 seconds.
> Not tainted 6.9.0-rc4+ #54
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:r state:D stack:0 pid:2664 tgid:2664
> ppid:2618 flags:0x00000002
> Call Trace:
> <TASK>
> __schedule (kernel/sched/core.c:5409 kernel/sched/core.c:6746)
> ? __pfx___schedule (kernel/sched/core.c:6617)
> ? schedule (kernel/sched/core.c:6804 kernel/sched/core.c:6837)
> ? __pfx___lock_release (kernel/locking/lockdep.c:5406)
> ? schedule (kernel/sched/core.c:6804 kernel/sched/core.c:6837)
> schedule (./arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6824
> kernel/sched/core.c:6838)
> schedule_timeout (kernel/time/timer.c:2559)
> ? __pfx_schedule_timeout (kernel/time/timer.c:2544)
> ? mark_lock.part.0 (./arch/x86/include/asm/bitops.h:227
> ./arch/x86/include/asm/bitops.h:239
> ./include/asm-generic/bitops/instrumented-non-atomic.h:142
> kernel/locking/lockdep.c:228 kernel/locking/lockdep.c:4656)
> ? mark_held_locks (kernel/locking/lockdep.c:4274)
> ? lockdep_hardirqs_on_prepare.part.0 (kernel/locking/lockdep.c:4300
> kernel/locking/lockdep.c:4359)
> __wait_for_common (kernel/sched/completion.c:95
> kernel/sched/completion.c:116)
> ? __pfx_schedule_timeout (kernel/time/timer.c:2544)
> ? __pfx___wait_for_common (kernel/sched/completion.c:110)
> ? start_flush_work (kernel/workqueue.c:4192)
> __flush_work (kernel/workqueue.c:4207)
> ? __pfx___flush_work (kernel/workqueue.c:4195)
> ? __pfx_wq_barrier_func (kernel/workqueue.c:3742)
> ? flush_delayed_work (kernel/workqueue.c:4249)
> ? __pfx_fb_release (drivers/video/fbdev/core/fb_chrdev.c:375)
> fb_deferred_io_lastclose (drivers/video/fbdev/core/fb_defio.c:321)
> ? __pfx_fb_release (drivers/video/fbdev/core/fb_chrdev.c:375)
> fb_deferred_io_release (drivers/video/fbdev/core/fb_defio.c:333)
> fb_release (drivers/video/fbdev/core/fb_chrdev.c:383)
> ? __pfx_fb_release (drivers/video/fbdev/core/fb_chrdev.c:375)
> __fput (fs/file_table.c:422)
> task_work_run (kernel/task_work.c:180 (discriminator 1))
> ? __pfx_task_work_run (kernel/task_work.c:148)
> ? do_raw_spin_unlock (./arch/x86/include/asm/paravirt.h:589
> ./arch/x86/include/asm/qspinlock.h:57 kernel/locking/spinlock_debug.c:142)
> do_exit (kernel/exit.c:879)
> ? do_raw_spin_lock (./arch/x86/include/asm/atomic.h:115
> ./include/linux/atomic/atomic-arch-fallback.h:2170
> ./include/linux/atomic/atomic-instrumented.h:1302
> ./include/asm-generic/qspinlock.h:111 kernel/locking/spinlock_debug.c:116)
> ? __pfx_do_exit (kernel/exit.c:819)
> ? __pfx_do_raw_spin_lock (kernel/locking/spinlock_debug.c:114)
> do_group_exit (kernel/exit.c:1008)
> __x64_sys_exit_group (kernel/exit.c:1036)
> do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
> entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> RIP: 0033:0x7f9cfe71e5f1
> RSP: 002b:00007ffd636aec08 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> RAX: ffffffffffffffda RBX: 00007f9cfe7fc510 RCX: 00007f9cfe71e5f1
> RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
> RBP: 0000000000000000 R08: ffffffffffffff80 R09: 0000000000000000
> R10: 0000000000000020 R11: 0000000000000246 R12: 00007f9cfe7fc510
> R13: 0000000000000000 R14: 00007f9cfe7fc9e8 R15: 00007f9cfe7fca00
> </TASK>
>
> Showing all locks held in the system:
> 1 lock held by khungtaskd/50:
> #0: ffffffff950219c0 (rcu_read_lock){....}-{1:2}, at:
> debug_show_all_locks (./include/linux/rcupdate.h:329 (discriminator 1)
> ./include/linux/rcupdate.h:781 (discriminator 1)
> kernel/locking/lockdep.c:6614 (discriminator 1))
> 3 locks held by kworker/0:3/1077:
> #0: ffff88810005ad48 ((wq_completion)events){+.+.}-{0:0}, at:
> process_one_work (kernel/workqueue.c:3229)
> #1: ffff8881080dfd98
> ((work_completion)(&(&info->deferred_work)->work)){+.+.}-{0:0}, at:
> process_one_work (kernel/workqueue.c:3230)
> #2: ffff88811416db38 (&fbdefio->lock){+.+.}-{3:3}, at:
> fb_deferred_io_work (drivers/video/fbdev/core/fb_defio.c:246)
> 1 lock held by r/2664:
> #0: ffff8881141fe078 (&fb_info->lock){+.+.}-{3:3}, at: fb_release
> (drivers/video/fbdev/core/fb_chrdev.c:380)
>
>
>
> Syzkaller initially generated a long reproducer, Vegard simplified it to
> the one below
>
> C repro:
>
> #define _GNU_SOURCE
> #include <sys/mman.h>
> #include <fcntl.h>
> #include <stdlib.h>
> #include <unistd.h>
>
> int main(void)
> {
> int fd = open("/dev/fb0", O_RDWR, 0);
>
> char *ptr = (char *) 0x20000000ul;
>
> mmap(ptr, 4096, PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0xff000);
> *ptr = 0;
>
> mmap(ptr, 4096, PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0);
> *ptr = 0;
>
> return 0;
> }
>
> This needs to be run as root to see the hang in 122 seconds.
>
> 238 static void fb_deferred_io_work(struct work_struct *work)
> 239 {
> 240 struct fb_info *info = container_of(work, struct fb_info,
> deferred_work.work);
> 241 struct fb_deferred_io_pageref *pageref, *next;
> 242 struct fb_deferred_io *fbdefio = info->fbdefio;
> 243
> 244 /* here we mkclean the pages, then do all deferred IO */
> 245 mutex_lock(&fbdefio->lock);
> 246 list_for_each_entry(pageref, &fbdefio->pagereflist, list) {
> 247 struct page *cur = pageref->page;
> 248 lock_page(cur);
> 249 page_mkclean(cur);
> 250 unlock_page(cur);
> 251 }
>
> It is stalling on lock_page() in fb_deferred_io_work()
>
> I will be happy to try any patches and report back.
Hi,
This sounds similar to the SUSE bug [1]. We fixed it by reverting [2]
in the SUSE kernel. The problem seems to be that flush_delayed_work()
kills the timer and re-queues the work but doesn't guarantee that it
is finished when returning. So when the device is closed, the
fb_deferred_io_work() function can still be queued and tries to access
memory that's been freed.
Patch [2] tries to solve the problem of not throwing away pending data
when closing the device. Perhaps calling cancel_delayed_work_sync()
and then follow up with a manual call to fb_deferred_io_work() would
be enough to flush the remaining data?
-Patrik
[1] https://bugzilla.suse.com/show_bug.cgi?id=1221814
[2] 33cd6ea9c067 fbdev: flush deferred IO before closing
>
>
> Thanks,
> Harshit
On 2024-04-18 Patrik Jakobsson wrote:
> This sounds similar to the SUSE bug [1]. We fixed it by reverting [2]
> in the SUSE kernel. The problem seems to be that flush_delayed_work()
> kills the timer and re-queues the work but doesn't guarantee that it
> is finished when returning. So when the device is closed, the
> fb_deferred_io_work() function can still be queued and tries to access
> memory that's been freed.
flush_delayed_work() *does* guarantee the work is finished before
returning.
> Patch [2] tries to solve the problem of not throwing away pending data
> when closing the device. Perhaps calling cancel_delayed_work_sync()
> and then follow up with a manual call to fb_deferred_io_work() would
> be enough to flush the remaining data?
>
> -Patrik
>
> [1] https://bugzilla.suse.com/show_bug.cgi?id=1221814
> [2] 33cd6ea9c067 fbdev: flush deferred IO before closing
>
> >
> >
> > Thanks,
> > Harshit
On 2024-04-18 Harshit Mogalapalli wrote:
> While fuzzing 5.15.y kernel with Syzkaller, we noticed a INFO: task hung
> bug in fb_deferred_io_work()
Which framebuffer device are you using exactly? It is possible that
the problem is with the device driver, not core framebuffer.
Best regards,
Nam
On Thu, Apr 18, 2024 at 4:05 PM Nam Cao <[email protected]> wrote:
>
> On 2024-04-18 Patrik Jakobsson wrote:
> > This sounds similar to the SUSE bug [1]. We fixed it by reverting [2]
> > in the SUSE kernel. The problem seems to be that flush_delayed_work()
> > kills the timer and re-queues the work but doesn't guarantee that it
> > is finished when returning. So when the device is closed, the
> > fb_deferred_io_work() function can still be queued and tries to access
> > memory that's been freed.
>
> flush_delayed_work() *does* guarantee the work is finished before
> returning.
Right, flush_work() does guarantee that the work is finished, but it
doesn't guarantee that the queue is idle if work has been requeued
since flush started. So fb_deferred_io_work() should be able to happen
after the device is closed. Or am I missing something?
>
> > Patch [2] tries to solve the problem of not throwing away pending data
> > when closing the device. Perhaps calling cancel_delayed_work_sync()
> > and then follow up with a manual call to fb_deferred_io_work() would
> > be enough to flush the remaining data?
> >
> > -Patrik
> >
> > [1] https://bugzilla.suse.com/show_bug.cgi?id=1221814
> > [2] 33cd6ea9c067 fbdev: flush deferred IO before closing
> >
> > >
> > >
> > > Thanks,
> > > Harshit
>
On Thu, 18 Apr 2024 16:06:52 +0200,
Nam Cao wrote:
>
> On 2024-04-18 Harshit Mogalapalli wrote:
> > While fuzzing 5.15.y kernel with Syzkaller, we noticed a INFO: task hung
> > bug in fb_deferred_io_work()
>
> Which framebuffer device are you using exactly? It is possible that
> the problem is with the device driver, not core framebuffer.
Note that it was already known that using flush_delayed_work() caused
a problem. See the thread of the fix patch:
https://lore.kernel.org/all/[email protected]/
BTW, the problem is seen with bochs drm.
Takashi
On 4/18/24 16:26, Takashi Iwai wrote:
> On Thu, 18 Apr 2024 16:06:52 +0200,
> Nam Cao wrote:
>>
>> On 2024-04-18 Harshit Mogalapalli wrote:
>>> While fuzzing 5.15.y kernel with Syzkaller, we noticed a INFO: task hung
>>> bug in fb_deferred_io_work()
>>
>> Which framebuffer device are you using exactly? It is possible that
>> the problem is with the device driver, not core framebuffer.
>
> Note that it was already known that using flush_delayed_work() caused
> a problem. See the thread of the fix patch:
> https://lore.kernel.org/all/[email protected]/
Harshit reported the hung tasks with kernel v5.15-stable, and can even reproduce
that issue with kernel v6.9-rc4 although it has all of your patches from
that referenced mail thread applied.
So, what does your statement that "it was already known that it causes problems" exactly mean?
Can it be fixed? Is someone looking into fixing it?
> BTW, the problem is seen with bochs drm.
Helge
On Thu, 18 Apr 2024 21:29:57 +0200,
Helge Deller wrote:
>
> On 4/18/24 16:26, Takashi Iwai wrote:
> > On Thu, 18 Apr 2024 16:06:52 +0200,
> > Nam Cao wrote:
> >>
> >> On 2024-04-18 Harshit Mogalapalli wrote:
> >>> While fuzzing 5.15.y kernel with Syzkaller, we noticed a INFO: task hung
> >>> bug in fb_deferred_io_work()
> >>
> >> Which framebuffer device are you using exactly? It is possible that
> >> the problem is with the device driver, not core framebuffer.
> >
> > Note that it was already known that using flush_delayed_work() caused
> > a problem. See the thread of the fix patch:
> > https://lore.kernel.org/all/[email protected]/
>
> Harshit reported the hung tasks with kernel v5.15-stable, and can even reproduce
> that issue with kernel v6.9-rc4 although it has all of your patches from
> that referenced mail thread applied.
> So, what does your statement that "it was already known that it causes problems" exactly mean?
> Can it be fixed? Is someone looking into fixing it?
My original fix was intentionally with cancel_delayed_work_sync()
because flush_delayed_work() didn't work. We knew that it'd miss some
last-minute queued change, but it's better than crash, so it was
applied in that way.
Then later on, the commit 33cd6ea9c067 changed cancel_*() to
flush_delayed_work() blindly, and the known problem resurfaced again.
Takashi
Hi Patrik,
On 18/04/24 18:44, Patrik Jakobsson wrote:
> On Thu, Apr 18, 2024 at 2:40 PM Harshit Mogalapalli
> <[email protected]> wrote:
>>
>> Hi,
>>
>> While fuzzing 5.15.y kernel with Syzkaller, we noticed a INFO: task hung
>> bug in fb_deferred_io_work()
>>
>>
>> This is in 5.15.149 tag, and this is introduced by a set of commits:
>>
>> 2655757a3f10 fbdev: flush deferred IO before closing
>> 15492bab7393 fbdev: Fix incorrect page mapping clearance at
>> fb_deferred_io_release()
>> 87b9802ca824 fbdev: Fix invalid page access after closing deferred I/O
>> devices
>> 9a95fc04261f fbdev: Rename pagelist to pagereflist for deferred I/O
>> 186b89659c4c fbdev: Track deferred-I/O pages in pageref struct
>> e79b2b2aadef fbdev: defio: fix the pagelist corruption
>> 0616b00a31d6 fbdev: Don't sort deferred-I/O pages by default
>> 5d3aff76a316 fbdev/defio: Early-out if page is already enlisted
>>
>> I then tried if it could be reproducible on upstream kernel as well, and
>> it did. below is the call trace with latest 6.9-rc4 based kernel. I have
>> enabled the LOCKDEP to get locking related information.
>>
>> INFO: task kworker/0:3:1077 blocked for more than 122 seconds.
>> Not tainted 6.9.0-rc4+ #54
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:kworker/0:3 state:D stack:0 pid:1077 tgid:1077 ppid:2
>> flags:0x00004000
>> Workqueue: events fb_deferred_io_work
>> Call Trace:
>> <TASK>
>> __schedule (kernel/sched/core.c:5409 kernel/sched/core.c:6746)
>> ? __pfx___schedule (kernel/sched/core.c:6617)
>> ? schedule (kernel/sched/core.c:6804 kernel/sched/core.c:6837)
>> ? __pfx___lock_release (kernel/locking/lockdep.c:5406)
>> ? schedule (kernel/sched/core.c:6804 kernel/sched/core.c:6837)
>> schedule (./arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6824
>> kernel/sched/core.c:6838)
>> io_schedule (kernel/sched/core.c:9019 kernel/sched/core.c:9045)
>> folio_wait_bit_common (mm/filemap.c:1272)
>> ? __pfx_folio_wait_bit_common (mm/filemap.c:1206)
>> ? __pfx___mutex_lock (kernel/locking/mutex.c:751)
>> ? lock_acquire.part.0 (kernel/locking/lockdep.c:467
>> kernel/locking/lockdep.c:5756)
>> ? __pfx_wake_page_function (mm/filemap.c:1091)
>> ? __pfx_lock_acquire.part.0 (kernel/locking/lockdep.c:5719)
>> fb_deferred_io_work (./include/linux/pagemap.h:1069
>> drivers/video/fbdev/core/fb_defio.c:248)
>> ? lock_acquire (./include/trace/events/lock.h:24
>> kernel/locking/lockdep.c:5725)
>> process_one_work (kernel/workqueue.c:3254)
>> ? __pfx_lock_acquire.part.0 (kernel/locking/lockdep.c:5719)
>> ? __pfx_process_one_work (kernel/workqueue.c:3156)
>> ? assign_work (kernel/workqueue.c:1210)
>> worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416)
>> ? __kthread_parkme (kernel/kthread.c:293)
>> ? __pfx_worker_thread (kernel/workqueue.c:3362)
>> kthread (kernel/kthread.c:388)
>> ? __pfx_kthread (kernel/kthread.c:341)
>> ret_from_fork (arch/x86/kernel/process.c:147)
>> ? __pfx_kthread (kernel/kthread.c:341)
>> ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
>> </TASK>
>> INFO: task r:2664 blocked for more than 123 seconds.
>> Not tainted 6.9.0-rc4+ #54
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:r state:D stack:0 pid:2664 tgid:2664
>> ppid:2618 flags:0x00000002
>> Call Trace:
>> <TASK>
>> __schedule (kernel/sched/core.c:5409 kernel/sched/core.c:6746)
>> ? __pfx___schedule (kernel/sched/core.c:6617)
>> ? schedule (kernel/sched/core.c:6804 kernel/sched/core.c:6837)
>> ? __pfx___lock_release (kernel/locking/lockdep.c:5406)
>> ? schedule (kernel/sched/core.c:6804 kernel/sched/core.c:6837)
>> schedule (./arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6824
>> kernel/sched/core.c:6838)
>> schedule_timeout (kernel/time/timer.c:2559)
>> ? __pfx_schedule_timeout (kernel/time/timer.c:2544)
>> ? mark_lock.part.0 (./arch/x86/include/asm/bitops.h:227
>> ./arch/x86/include/asm/bitops.h:239
>> ./include/asm-generic/bitops/instrumented-non-atomic.h:142
>> kernel/locking/lockdep.c:228 kernel/locking/lockdep.c:4656)
>> ? mark_held_locks (kernel/locking/lockdep.c:4274)
>> ? lockdep_hardirqs_on_prepare.part.0 (kernel/locking/lockdep.c:4300
>> kernel/locking/lockdep.c:4359)
>> __wait_for_common (kernel/sched/completion.c:95
>> kernel/sched/completion.c:116)
>> ? __pfx_schedule_timeout (kernel/time/timer.c:2544)
>> ? __pfx___wait_for_common (kernel/sched/completion.c:110)
>> ? start_flush_work (kernel/workqueue.c:4192)
>> __flush_work (kernel/workqueue.c:4207)
>> ? __pfx___flush_work (kernel/workqueue.c:4195)
>> ? __pfx_wq_barrier_func (kernel/workqueue.c:3742)
>> ? flush_delayed_work (kernel/workqueue.c:4249)
>> ? __pfx_fb_release (drivers/video/fbdev/core/fb_chrdev.c:375)
>> fb_deferred_io_lastclose (drivers/video/fbdev/core/fb_defio.c:321)
>> ? __pfx_fb_release (drivers/video/fbdev/core/fb_chrdev.c:375)
>> fb_deferred_io_release (drivers/video/fbdev/core/fb_defio.c:333)
>> fb_release (drivers/video/fbdev/core/fb_chrdev.c:383)
>> ? __pfx_fb_release (drivers/video/fbdev/core/fb_chrdev.c:375)
>> __fput (fs/file_table.c:422)
>> task_work_run (kernel/task_work.c:180 (discriminator 1))
>> ? __pfx_task_work_run (kernel/task_work.c:148)
>> ? do_raw_spin_unlock (./arch/x86/include/asm/paravirt.h:589
>> ./arch/x86/include/asm/qspinlock.h:57 kernel/locking/spinlock_debug.c:142)
>> do_exit (kernel/exit.c:879)
>> ? do_raw_spin_lock (./arch/x86/include/asm/atomic.h:115
>> ./include/linux/atomic/atomic-arch-fallback.h:2170
>> ./include/linux/atomic/atomic-instrumented.h:1302
>> ./include/asm-generic/qspinlock.h:111 kernel/locking/spinlock_debug.c:116)
>> ? __pfx_do_exit (kernel/exit.c:819)
>> ? __pfx_do_raw_spin_lock (kernel/locking/spinlock_debug.c:114)
>> do_group_exit (kernel/exit.c:1008)
>> __x64_sys_exit_group (kernel/exit.c:1036)
>> do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
>> entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
>> RIP: 0033:0x7f9cfe71e5f1
>> RSP: 002b:00007ffd636aec08 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
>> RAX: ffffffffffffffda RBX: 00007f9cfe7fc510 RCX: 00007f9cfe71e5f1
>> RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
>> RBP: 0000000000000000 R08: ffffffffffffff80 R09: 0000000000000000
>> R10: 0000000000000020 R11: 0000000000000246 R12: 00007f9cfe7fc510
>> R13: 0000000000000000 R14: 00007f9cfe7fc9e8 R15: 00007f9cfe7fca00
>> </TASK>
>>
>> Showing all locks held in the system:
>> 1 lock held by khungtaskd/50:
>> #0: ffffffff950219c0 (rcu_read_lock){....}-{1:2}, at:
>> debug_show_all_locks (./include/linux/rcupdate.h:329 (discriminator 1)
>> ./include/linux/rcupdate.h:781 (discriminator 1)
>> kernel/locking/lockdep.c:6614 (discriminator 1))
>> 3 locks held by kworker/0:3/1077:
>> #0: ffff88810005ad48 ((wq_completion)events){+.+.}-{0:0}, at:
>> process_one_work (kernel/workqueue.c:3229)
>> #1: ffff8881080dfd98
>> ((work_completion)(&(&info->deferred_work)->work)){+.+.}-{0:0}, at:
>> process_one_work (kernel/workqueue.c:3230)
>> #2: ffff88811416db38 (&fbdefio->lock){+.+.}-{3:3}, at:
>> fb_deferred_io_work (drivers/video/fbdev/core/fb_defio.c:246)
>> 1 lock held by r/2664:
>> #0: ffff8881141fe078 (&fb_info->lock){+.+.}-{3:3}, at: fb_release
>> (drivers/video/fbdev/core/fb_chrdev.c:380)
>>
>>
>>
>> Syzkaller initially generated a long reproducer, Vegard simplified it to
>> the one below
>>
>> C repro:
>>
>> #define _GNU_SOURCE
>> #include <sys/mman.h>
>> #include <fcntl.h>
>> #include <stdlib.h>
>> #include <unistd.h>
>>
>> int main(void)
>> {
>> int fd = open("/dev/fb0", O_RDWR, 0);
>>
>> char *ptr = (char *) 0x20000000ul;
>>
>> mmap(ptr, 4096, PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0xff000);
>> *ptr = 0;
>>
>> mmap(ptr, 4096, PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0);
>> *ptr = 0;
>>
>> return 0;
>> }
>>
>> This needs to be run as root to see the hang in 122 seconds.
>>
>> 238 static void fb_deferred_io_work(struct work_struct *work)
>> 239 {
>> 240 struct fb_info *info = container_of(work, struct fb_info,
>> deferred_work.work);
>> 241 struct fb_deferred_io_pageref *pageref, *next;
>> 242 struct fb_deferred_io *fbdefio = info->fbdefio;
>> 243
>> 244 /* here we mkclean the pages, then do all deferred IO */
>> 245 mutex_lock(&fbdefio->lock);
>> 246 list_for_each_entry(pageref, &fbdefio->pagereflist, list) {
>> 247 struct page *cur = pageref->page;
>> 248 lock_page(cur);
>> 249 page_mkclean(cur);
>> 250 unlock_page(cur);
>> 251 }
>>
>> It is stalling on lock_page() in fb_deferred_io_work()
>>
>> I will be happy to try any patches and report back.
>
> Hi,
Thanks for the quick response and sharing details.
> This sounds similar to the SUSE bug [1]. We fixed it by reverting [2]
> in the SUSE kernel. The problem seems to be that flush_delayed_work()
> kills the timer and re-queues the work but doesn't guarantee that it
> is finished when returning. So when the device is closed, the
> fb_deferred_io_work() function can still be queued and tries to access
> memory that's been freed.
>
I tried reverting that commit but saw some other similar problem,
here it is:
[ 618.508726] INFO: task r:2682 blocked for more than 122 seconds.
[ 618.513662] Not tainted 6.9.0-rc4+ #55
[ 618.517105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 618.523204] task:r state:D stack:0 pid:2682
tgid:2682 ppid:2636 flags:0x00000004
[ 618.530325] Call Trace:
[ 618.532470] <TASK>
[ 618.534394] __schedule+0x787/0x1b10
[ 618.537506] ? __pfx___schedule+0x10/0x10
[ 618.540784] ? schedule+0x1e6/0x290
[ 618.543698] ? __lock_release+0x103/0x460
[ 618.546982] ? folio_wait_bit_common+0x22a/0x8b0
[ 618.550697] ? __pfx___lock_release+0x10/0x10
[ 618.554268] ? mark_lock.part.0+0x77/0x880
[ 618.557614] ? schedule+0x1e6/0x290
[ 618.560545] schedule+0xdd/0x290
[ 618.563324] io_schedule+0xc4/0x140
[ 618.566251] folio_wait_bit_common+0x32a/0x8b0
[ 618.570062] ? __pfx_folio_wait_bit_common+0x10/0x10
[ 618.574231] ? mark_lock.part.0+0x77/0x880
[ 618.577575] ? __pfx_wake_page_function+0x10/0x10
[ 618.581430] __do_fault+0x2fc/0x610
[ 618.584329] ? __thp_vma_allowable_orders+0x25e/0x9f0
[ 618.588442] do_fault+0x22e/0xb60
[ 618.591222] handle_pte_fault+0x55a/0x7e0
[ 618.594543] ? thp_vma_allowable_orders.constprop.0+0x66/0x1e0
[ 618.599204] __handle_mm_fault+0xf93/0x1580
[ 618.602652] ? __pfx___handle_mm_fault+0x10/0x10
[ 618.606460] ? find_vma+0xc5/0x150
[ 618.609324] ? __pfx_find_vma+0x10/0x10
[ 618.612497] ? lock_acquire+0x1bf/0x230
[ 618.615639] handle_mm_fault+0x288/0xa80
[ 618.618882] do_user_addr_fault+0x39a/0x1280
[ 618.622453] exc_page_fault+0xa0/0x1c0
[ 618.625530] asm_exc_page_fault+0x26/0x30
[ 618.628807] RIP: 0033:0x40118d
[ 618.631429] RSP: 002b:00007ffdae7ffa40 EFLAGS: 00010207
[ 618.635586] RAX: 0000000020000000 RBX: 0000000000000000 RCX:
00007f8c00b4d557
[ 618.641281] RDX: 0000000000000002 RSI: 0000000000001000 RDI:
0000000020000000
[ 618.646822] RBP: 00007ffdae7ffa50 R08: 0000000000000003 R09:
00000000000ff000
[ 618.652349] R10: 0000000000000011 R11: 0000000000000246 R12:
00007ffdae7ffb88
[ 618.657931] R13: 0000000000401136 R14: 00007f8c00cedaa0 R15:
0000000000403e08
[ 618.663498] </TASK>
[ 618.665808]
[ 618.665808] Showing all locks held in the system:
[ 618.671090] 1 lock held by khungtaskd/51:
[ 618.674595] #0: ffffffff950219c0 (rcu_read_lock){....}-{1:2}, at:
debug_show_all_locks+0x75/0x340
[ 618.682063] 1 lock held by r/2682:
[ 618.685101] #0: ffff888100072fe0 (&mm->mmap_lock){++++}-{3:3}, at:
lock_mm_and_find_vma+0x3a/0x6b0
[ 618.692648]
[ 618.694194] =============================================
[ 618.694194]
Note that, reverting all 8 commits doesn't show any issue on 5.15.y, but
that does not look like a feasible solution.
I think we can say that the issue is localized to the above pointed 8
commits.
Thanks,
Harshit
> Patch [2] tries to solve the problem of not throwing away pending data
> when closing the device. Perhaps calling cancel_delayed_work_sync()
> and then follow up with a manual call to fb_deferred_io_work() would
> be enough to flush the remaining data?
>
> -Patrik
>
> [1] https://bugzilla.suse.com/show_bug.cgi?id=1221814
> [2] 33cd6ea9c067 fbdev: flush deferred IO before closing
>
>>
>>
>> Thanks,
>> Harshit
Hi Nam,
On 18/04/24 19:36, Nam Cao wrote:
> On 2024-04-18 Harshit Mogalapalli wrote:
>> While fuzzing 5.15.y kernel with Syzkaller, we noticed a INFO: task hung
>> bug in fb_deferred_io_work()
>
> Which framebuffer device are you using exactly? It is possible that
> the problem is with the device driver, not core framebuffer.
>
I am not very sure about the answer, but the C reproducer uses(/dev/fb0)
int fd = open("/dev/fb0", O_RDWR, 0);
Thanks,
Harshit
> Best regards,
> Nam
Hi Takashi,
On 19/04/24 12:14, Takashi Iwai wrote:
> On Thu, 18 Apr 2024 21:29:57 +0200,
> Helge Deller wrote:
>>
>> On 4/18/24 16:26, Takashi Iwai wrote:
>>> On Thu, 18 Apr 2024 16:06:52 +0200,
>>> Nam Cao wrote:
>>>>
>>>> On 2024-04-18 Harshit Mogalapalli wrote:
>>>>> While fuzzing 5.15.y kernel with Syzkaller, we noticed a INFO: task hung
>>>>> bug in fb_deferred_io_work()
>>>>
>>>> Which framebuffer device are you using exactly? It is possible that
>>>> the problem is with the device driver, not core framebuffer.
>>>
>>> Note that it was already known that using flush_delayed_work() caused
>>> a problem. See the thread of the fix patch:
>>> https://lore.kernel.org/all/[email protected]/
>>
>> Harshit reported the hung tasks with kernel v5.15-stable, and can even reproduce
>> that issue with kernel v6.9-rc4 although it has all of your patches from
>> that referenced mail thread applied.
>> So, what does your statement that "it was already known that it causes problems" exactly mean?
>> Can it be fixed? Is someone looking into fixing it?
>
> My original fix was intentionally with cancel_delayed_work_sync()
> because flush_delayed_work() didn't work. We knew that it'd miss some
> last-minute queued change, but it's better than crash, so it was
> applied in that way.
>
Thanks for sharing these details.
> Then later on, the commit 33cd6ea9c067 changed cancel_*() to
> flush_delayed_work() blindly, and the known problem resurfaced again.
>
I have reverted that commit, but still could see some other task hung
message as shared here on other reply:
https://lore.kernel.org/all/[email protected]/
Thanks,
Harshit
>
> Takashi
On Fri, 19 Apr 2024 09:39:09 +0200,
Harshit Mogalapalli wrote:
>
> Hi Takashi,
>
> On 19/04/24 12:14, Takashi Iwai wrote:
> > On Thu, 18 Apr 2024 21:29:57 +0200,
> > Helge Deller wrote:
> >>
> >> On 4/18/24 16:26, Takashi Iwai wrote:
> >>> On Thu, 18 Apr 2024 16:06:52 +0200,
> >>> Nam Cao wrote:
> >>>>
> >>>> On 2024-04-18 Harshit Mogalapalli wrote:
> >>>>> While fuzzing 5.15.y kernel with Syzkaller, we noticed a INFO: task hung
> >>>>> bug in fb_deferred_io_work()
> >>>>
> >>>> Which framebuffer device are you using exactly? It is possible that
> >>>> the problem is with the device driver, not core framebuffer.
> >>>
> >>> Note that it was already known that using flush_delayed_work() caused
> >>> a problem. See the thread of the fix patch:
> >>> https://lore.kernel.org/all/[email protected]/
> >>
> >> Harshit reported the hung tasks with kernel v5.15-stable, and can even reproduce
> >> that issue with kernel v6.9-rc4 although it has all of your patches from
> >> that referenced mail thread applied.
> >> So, what does your statement that "it was already known that it causes problems" exactly mean?
> >> Can it be fixed? Is someone looking into fixing it?
> >
> > My original fix was intentionally with cancel_delayed_work_sync()
> > because flush_delayed_work() didn't work. We knew that it'd miss some
> > last-minute queued change, but it's better than crash, so it was
> > applied in that way.
> >
>
> Thanks for sharing these details.
>
> > Then later on, the commit 33cd6ea9c067 changed cancel_*() to
> > flush_delayed_work() blindly, and the known problem resurfaced again.
> >
>
> I have reverted that commit, but still could see some other task hung
> message as shared here on other reply:
>
> https://lore.kernel.org/all/[email protected]/
Yes, then it could be a different cause, I suppose.
The crash with flush_delayed_work() was a real crash, no hanging task,
IIRC.
Can you reproduce the issue with the latest Linus upstream, too?
thanks,
Takashi
On Fri, Apr 19, 2024 at 9:45 AM Takashi Iwai <[email protected]> wrote:
>
> On Fri, 19 Apr 2024 09:39:09 +0200,
> Harshit Mogalapalli wrote:
> >
> > Hi Takashi,
> >
> > On 19/04/24 12:14, Takashi Iwai wrote:
> > > On Thu, 18 Apr 2024 21:29:57 +0200,
> > > Helge Deller wrote:
> > >>
> > >> On 4/18/24 16:26, Takashi Iwai wrote:
> > >>> On Thu, 18 Apr 2024 16:06:52 +0200,
> > >>> Nam Cao wrote:
> > >>>>
> > >>>> On 2024-04-18 Harshit Mogalapalli wrote:
> > >>>>> While fuzzing 5.15.y kernel with Syzkaller, we noticed a INFO: task hung
> > >>>>> bug in fb_deferred_io_work()
> > >>>>
> > >>>> Which framebuffer device are you using exactly? It is possible that
> > >>>> the problem is with the device driver, not core framebuffer.
> > >>>
> > >>> Note that it was already known that using flush_delayed_work() caused
> > >>> a problem. See the thread of the fix patch:
> > >>> https://lore.kernel.org/all/[email protected]/
> > >>
> > >> Harshit reported the hung tasks with kernel v5.15-stable, and can even reproduce
> > >> that issue with kernel v6.9-rc4 although it has all of your patches from
> > >> that referenced mail thread applied.
> > >> So, what does your statement that "it was already known that it causes problems" exactly mean?
> > >> Can it be fixed? Is someone looking into fixing it?
> > >
> > > My original fix was intentionally with cancel_delayed_work_sync()
> > > because flush_delayed_work() didn't work. We knew that it'd miss some
> > > last-minute queued change, but it's better than crash, so it was
> > > applied in that way.
> > >
> >
> > Thanks for sharing these details.
> >
> > > Then later on, the commit 33cd6ea9c067 changed cancel_*() to
> > > flush_delayed_work() blindly, and the known problem resurfaced again.
> > >
> >
> > I have reverted that commit, but still could see some other task hung
> > message as shared here on other reply:
> >
> > https://lore.kernel.org/all/d2485cb9-277d-4b8e-9794-02f1efababc9@oraclecom/
>
> Yes, then it could be a different cause, I suppose.
> The crash with flush_delayed_work() was a real crash, no hanging task,
> IIRC.
Neither cancel_delayed_work_sync() or flush_delayed_work() prevent new
work from being scheduled after they return. But
cancel_delayed_work_sync() at least makes sure the queue is empty so
the problem becomes less apparent.
Could this explain what we're seeing?
>
> Can you reproduce the issue with the latest Linus upstream, too?
>
>
> thanks,
>
> Takashi
Hi Takashi,
On 19/04/24 13:15, Takashi Iwai wrote:
> On Fri, 19 Apr 2024 09:39:09 +0200,
>>> Then later on, the commit 33cd6ea9c067 changed cancel_*() to
>>> flush_delayed_work() blindly, and the known problem resurfaced again.
>>>
>>
>> I have reverted that commit, but still could see some other task hung
>> message as shared here on other reply:
>>
>> https://lore.kernel.org/all/[email protected]/
>
> Yes, then it could be a different cause, I suppose.
> The crash with flush_delayed_work() was a real crash, no hanging task,
> IIRC.
>
> Can you reproduce the issue with the latest Linus upstream, too?
>
Yes, this was reproduced on 6.9-rc4 kernel[1], also reverting that
commit 33cd6ea9c067 on upstream still showed another issue[2].
[1]https://lore.kernel.org/all/[email protected]/
[2]
https://lore.kernel.org/all/[email protected]/
Thanks,
Harshit
>
> thanks,
>
> Takashi
On 2024-04-18 Harshit Mogalapalli wrote:
> While fuzzing 5.15.y kernel with Syzkaller, we noticed a INFO: task hung
> bug in fb_deferred_io_work()
I think the problem is because of improper offset address calculation.
The kernel calculate address offset with:
offset = vmf->address - vmf->vma->vm_start
Now the problem is that your C program mmap the framebuffer at 2
different offsets:
mmap(ptr, 4096, PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0xff000);
mmap(ptr, 4096, PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0);
but the kernel doesn't take these different offsets into account.
So, 2 different pages are mistakenly recognized as the same page.
Can you try the following patch?
Best regards,
Nam
diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
index dae96c9f61cf..d5d6cd9e8b29 100644
--- a/drivers/video/fbdev/core/fb_defio.c
+++ b/drivers/video/fbdev/core/fb_defio.c
@@ -196,7 +196,8 @@ static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, unsigned long
*/
static vm_fault_t fb_deferred_io_page_mkwrite(struct fb_info *info, struct vm_fault *vmf)
{
- unsigned long offset = vmf->address - vmf->vma->vm_start;
+ unsigned long offset = vmf->address - vmf->vma->vm_start
+ + (vmf->vma->vm_pgoff << PAGE_SHIFT);
struct page *page = vmf->page;
file_update_time(vmf->vma->vm_file);
On 2024-04-18 Patrik Jakobsson wrote:
> On Thu, Apr 18, 2024 at 4:05 PM Nam Cao <[email protected]> wrote:
> >
> > On 2024-04-18 Patrik Jakobsson wrote:
> > > This sounds similar to the SUSE bug [1]. We fixed it by reverting [2]
> > > in the SUSE kernel. The problem seems to be that flush_delayed_work()
> > > kills the timer and re-queues the work but doesn't guarantee that it
> > > is finished when returning. So when the device is closed, the
> > > fb_deferred_io_work() function can still be queued and tries to access
> > > memory that's been freed.
> >
> > flush_delayed_work() *does* guarantee the work is finished before
> > returning.
>
> Right, flush_work() does guarantee that the work is finished, but it
> doesn't guarantee that the queue is idle if work has been requeued
> since flush started. So fb_deferred_io_work() should be able to happen
> after the device is closed. Or am I missing something?
I'm confused: how is it possible for fb_deferred_io_work() to run after
the device is closed?
The workqueue is flushed during closing of device. If more work is
queued during/after the flushing, then someone must be writing to the
device during/after the closing of the device. How can that happen?
Best regards,
Nam
> >
> > > Patch [2] tries to solve the problem of not throwing away pending data
> > > when closing the device. Perhaps calling cancel_delayed_work_sync()
> > > and then follow up with a manual call to fb_deferred_io_work() would
> > > be enough to flush the remaining data?
> > >
> > > -Patrik
> > >
> > > [1] https://bugzilla.suse.com/show_bug.cgi?id=1221814
> > > [2] 33cd6ea9c067 fbdev: flush deferred IO before closing
> > >
> > > >
> > > >
> > > > Thanks,
> > > > Harshit
> >
On 2024-04-19 Patrik Jakobsson wrote:
> Neither cancel_delayed_work_sync() or flush_delayed_work() prevent new
> work from being scheduled after they return.
flush_delayed_work() is called during device closing. And because no
writes are performed after the device has been closed, no new work
should be queued after flush_delayed_work().
> But
> cancel_delayed_work_sync() at least makes sure the queue is empty so
> the problem becomes less apparent.
>
> Could this explain what we're seeing?
I suspect that cancel_delayed_work_sync() is only treating the symptoms
by preventing the deferred work from running. The real bug is "someone"
giving fb_deferred_io_work() invalid pages to work with. But that's
just a blind guess.
Best regards,
Nam
Hi Nam,
+CC stable( heads up as this is a regression affecting 5.15.y and
probably others, Greg: this was reproducible upstream so reported
everything w.r.t upstream code but initially found on 5.15.y)
On 19/04/24 20:29, Nam Cao wrote:
> On 2024-04-18 Harshit Mogalapalli wrote:
>> While fuzzing 5.15.y kernel with Syzkaller, we noticed a INFO: task hung
>> bug in fb_deferred_io_work()
>
> I think the problem is because of improper offset address calculation.
> The kernel calculate address offset with:
> offset = vmf->address - vmf->vma->vm_start
>
> Now the problem is that your C program mmap the framebuffer at 2
> different offsets:
> mmap(ptr, 4096, PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0xff000);
> mmap(ptr, 4096, PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0);
>
> but the kernel doesn't take these different offsets into account.
> So, 2 different pages are mistakenly recognized as the same page.
>
> Can you try the following patch?
This patch works well against the reproducer, this simplified repro and
the longer repro which syzkaller generated couldn't trigger any hang
with the below patch applied.
Thanks,
Harshit
>
> Best regards,
> Nam
>
> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
> index dae96c9f61cf..d5d6cd9e8b29 100644
> --- a/drivers/video/fbdev/core/fb_defio.c
> +++ b/drivers/video/fbdev/core/fb_defio.c
> @@ -196,7 +196,8 @@ static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, unsigned long
> */
> static vm_fault_t fb_deferred_io_page_mkwrite(struct fb_info *info, struct vm_fault *vmf)
> {
> - unsigned long offset = vmf->address - vmf->vma->vm_start;
> + unsigned long offset = vmf->address - vmf->vma->vm_start
> + + (vmf->vma->vm_pgoff << PAGE_SHIFT);
> struct page *page = vmf->page;
>
> file_update_time(vmf->vma->vm_file);
>
>
On 2024-04-19 Harshit Mogalapalli wrote:
> +CC stable( heads up as this is a regression affecting 5.15.y and
> probably others, Greg: this was reproducible upstream so reported
> everything w.r.t upstream code but initially found on 5.15.y)
No worry about this, I will add a "Cc: <[email protected]>" tag
to the patch, and the stable folks (or their scripts) will pick it up.
> This patch works well against the reproducer, this simplified repro and
> the longer repro which syzkaller generated couldn't trigger any hang
> with the below patch applied.
Great, thanks! Can I add
Reported-and-tested-by: Harshit Mogalapalli <[email protected]>
to the patch?
Best regards,
Nam
On 19/04/24 21:53, Nam Cao wrote:
> On 2024-04-19 Harshit Mogalapalli wrote:
>> +CC stable( heads up as this is a regression affecting 5.15.y and
>> probably others, Greg: this was reproducible upstream so reported
>> everything w.r.t upstream code but initially found on 5.15.y)
>
> No worry about this, I will add a "Cc: <[email protected]>" tag
> to the patch, and the stable folks (or their scripts) will pick it up.
>
Thanks for that!
>> This patch works well against the reproducer, this simplified repro and
>> the longer repro which syzkaller generated couldn't trigger any hang
>> with the below patch applied.
>
> Great, thanks! Can I add
> Reported-and-tested-by: Harshit Mogalapalli <[email protected]>
> to the patch?
>
Sure, that would be good!
Thanks,
Harshit
> Best regards,
> Nam
On Fri, Apr 19, 2024 at 5:34 PM Nam Cao <[email protected]> wrote:
>
> On 2024-04-19 Patrik Jakobsson wrote:
> > Neither cancel_delayed_work_sync() or flush_delayed_work() prevent new
> > work from being scheduled after they return.
>
> flush_delayed_work() is called during device closing. And because no
> writes are performed after the device has been closed, no new work
> should be queued after flush_delayed_work().
Yes, nothing should write after the device is closed but the events
are asynchronous so in theory the order is not guaranteed. I also find
it unlikely but I have no other theory at this point.
>
> > But
> > cancel_delayed_work_sync() at least makes sure the queue is empty so
> > the problem becomes less apparent.
> >
> > Could this explain what we're seeing?
>
> I suspect that cancel_delayed_work_sync() is only treating the symptoms
> by preventing the deferred work from running. The real bug is "someone"
> giving fb_deferred_io_work() invalid pages to work with. But that's
> just a blind guess.
Trying to figure out when the page goes away in relation to when the
work is triggered might be a good place to start.
>
> Best regards,
> Nam