2019-11-13 19:10:06

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH] irq_work: Fix IRQ_WORK_BUZY bit clearing

While attempting to clear the buzy bit at the end of a work execution,
atomic_cmpxchg() expects the value of the flags with the pending bit
cleared as the old value. However we are passing by mistake the value of
the flags before we actually cleared the pending bit.

As a result, clearing the buzy bit fails and irq_work_sync() may stall:

watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [blktrace:4948]
CPU: 0 PID: 4948 Comm: blktrace Not tainted 5.4.0-rc7-00003-gfeb4a51323bab #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
RIP: 0010:irq_work_sync+0x4/0x10
Call Trace:
relay_close_buf+0x19/0x50
relay_close+0x64/0x100
blk_trace_free+0x1f/0x50
__blk_trace_remove+0x1e/0x30
blk_trace_ioctl+0x11b/0x140
blkdev_ioctl+0x6c1/0xa40
block_ioctl+0x39/0x40
do_vfs_ioctl+0xa5/0x700
ksys_ioctl+0x70/0x80
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x5b/0x1d0
entry_SYSCALL_64_after_hwframe+0x44/0xa9

So clear the appropriate bit before passing the old flags to cmpxchg().

Reported-by: kernel test robot <[email protected]>
Reported-by: Leonard Crestez <[email protected]>
Fixes: feb4a51323ba ("irq_work: Slightly simplify IRQ_WORK_PENDING clearing")
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Paul E . McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
kernel/irq_work.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 49c53f80a13a..828cc30774bc 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -158,6 +158,7 @@ static void irq_work_run_list(struct llist_head *list)
* Clear the BUSY bit and return to the free state if
* no-one else claimed it meanwhile.
*/
+ flags &= ~IRQ_WORK_PENDING;
(void)atomic_cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY);
}
}
--
2.23.0


2019-11-14 15:58:10

by Leonard Crestez

[permalink] [raw]
Subject: Re: [PATCH] irq_work: Fix IRQ_WORK_BUZY bit clearing

On 13.11.2019 19:12, Frederic Weisbecker wrote:
> While attempting to clear the buzy bit at the end of a work execution,
> atomic_cmpxchg() expects the value of the flags with the pending bit
> cleared as the old value. However we are passing by mistake the value of
> the flags before we actually cleared the pending bit.

Busy is spelled with an S

>
> As a result, clearing the buzy bit fails and irq_work_sync() may stall:
>
> watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [blktrace:4948]
> CPU: 0 PID: 4948 Comm: blktrace Not tainted 5.4.0-rc7-00003-gfeb4a51323bab #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> RIP: 0010:irq_work_sync+0x4/0x10
> Call Trace:
> relay_close_buf+0x19/0x50
> relay_close+0x64/0x100
> blk_trace_free+0x1f/0x50
> __blk_trace_remove+0x1e/0x30
> blk_trace_ioctl+0x11b/0x140
> blkdev_ioctl+0x6c1/0xa40
> block_ioctl+0x39/0x40
> do_vfs_ioctl+0xa5/0x700
> ksys_ioctl+0x70/0x80
> __x64_sys_ioctl+0x16/0x20
> do_syscall_64+0x5b/0x1d0
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> So clear the appropriate bit before passing the old flags to cmpxchg().
>
> Reported-by: kernel test robot <[email protected]>
> Reported-by: Leonard Crestez <[email protected]>
> Fixes: feb4a51323ba ("irq_work: Slightly simplify IRQ_WORK_PENDING clearing")
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Paul E . McKenney <[email protected]>
> Cc: Peter Zijlstra <[email protected]> everywhere.
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>

Tested-by: Leonard Crestez <[email protected]>

Without this patch switching cpufreq governors hangs on arm64.

> ---
> kernel/irq_work.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index 49c53f80a13a..828cc30774bc 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -158,6 +158,7 @@ static void irq_work_run_list(struct llist_head *list)
> * Clear the BUSY bit and return to the free state if
> * no-one else claimed it meanwhile.
> */
> + flags &= ~IRQ_WORK_PENDING;
> (void)atomic_cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY);
> }
> }
>

2019-11-15 08:53:11

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [PATCH] irq_work: Fix IRQ_WORK_BUZY bit clearing

Hi Frederic,

Thanks for this fix patch.

On Thu, 14 Nov 2019 at 21:26, Leonard Crestez <[email protected]> wrote:
>
> On 13.11.2019 19:12, Frederic Weisbecker wrote:
> > While attempting to clear the buzy bit at the end of a work execution,
> > atomic_cmpxchg() expects the value of the flags with the pending bit
> > cleared as the old value. However we are passing by mistake the value of
> > the flags before we actually cleared the pending bit.
>
> Busy is spelled with an S
>
> >
> > As a result, clearing the buzy bit fails and irq_work_sync() may stall:
> >
> > watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [blktrace:4948]
> > CPU: 0 PID: 4948 Comm: blktrace Not tainted 5.4.0-rc7-00003-gfeb4a51323bab #1
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > RIP: 0010:irq_work_sync+0x4/0x10
> > Call Trace:
> > relay_close_buf+0x19/0x50
> > relay_close+0x64/0x100
> > blk_trace_free+0x1f/0x50
> > __blk_trace_remove+0x1e/0x30
> > blk_trace_ioctl+0x11b/0x140
> > blkdev_ioctl+0x6c1/0xa40
> > block_ioctl+0x39/0x40
> > do_vfs_ioctl+0xa5/0x700
> > ksys_ioctl+0x70/0x80
> > __x64_sys_ioctl+0x16/0x20
> > do_syscall_64+0x5b/0x1d0
> > entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > So clear the appropriate bit before passing the old flags to cmpxchg().
> >
> > Reported-by: kernel test robot <[email protected]>
> > Reported-by: Leonard Crestez <[email protected]>
> > Fixes: feb4a51323ba ("irq_work: Slightly simplify IRQ_WORK_PENDING clearing")
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > Cc: Paul E . McKenney <[email protected]>
> > Cc: Peter Zijlstra <[email protected]> everywhere.
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
>
> Tested-by: Leonard Crestez <[email protected]>
>
> Without this patch switching cpufreq governors hangs on arm64.

Right.

This patch solved two problems,
1) juno-r2 boot pass now
2) rcu_sched self-detected stall on CPU on x86_64 problem is solved now.

Tested-by: Naresh Kamboju <[email protected]>

Hope this will get merged into linux next.

ref:
https://lkft.validation.linaro.org/scheduler/job/1010542#L260
https://lkft.validation.linaro.org/scheduler/job/1010793#L493

- Naresh

2019-11-15 09:56:33

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: irq/core] irq_work: Fix IRQ_WORK_BUSY bit clearing

The following commit has been merged into the irq/core branch of tip:

Commit-ID: e9838bd51169af87ae248336d4c3fc59184a0e46
Gitweb: https://git.kernel.org/tip/e9838bd51169af87ae248336d4c3fc59184a0e46
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Wed, 13 Nov 2019 18:12:01 +01:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Fri, 15 Nov 2019 10:48:37 +01:00

irq_work: Fix IRQ_WORK_BUSY bit clearing

While attempting to clear the busy bit at the end of a work execution,
atomic_cmpxchg() expects the value of the flags with the pending bit
cleared as the old value. However by mistake the value of the flags is
passed without clearing the pending bit first.

As a result, clearing the busy bit fails and irq_work_sync() may stall:

watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [blktrace:4948]
CPU: 0 PID: 4948 Comm: blktrace Not tainted 5.4.0-rc7-00003-gfeb4a51323bab #1
RIP: 0010:irq_work_sync+0x4/0x10
Call Trace:
relay_close_buf+0x19/0x50
relay_close+0x64/0x100
blk_trace_free+0x1f/0x50
__blk_trace_remove+0x1e/0x30
blk_trace_ioctl+0x11b/0x140
blkdev_ioctl+0x6c1/0xa40
block_ioctl+0x39/0x40
do_vfs_ioctl+0xa5/0x700
ksys_ioctl+0x70/0x80
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x5b/0x1d0
entry_SYSCALL_64_after_hwframe+0x44/0xa9

So clear the appropriate bit before passing the old flags to cmpxchg().

Fixes: feb4a51323ba ("irq_work: Slightly simplify IRQ_WORK_PENDING clearing")
Reported-by: kernel test robot <[email protected]>
Reported-by: Leonard Crestez <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Leonard Crestez <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

---
kernel/irq_work.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 49c53f8..828cc30 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -158,6 +158,7 @@ static void irq_work_run_list(struct llist_head *list)
* Clear the BUSY bit and return to the free state if
* no-one else claimed it meanwhile.
*/
+ flags &= ~IRQ_WORK_PENDING;
(void)atomic_cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY);
}
}