2018-06-21 03:04:26

by Jia-Ju Bai

[permalink] [raw]
Subject: [BUG] mm: backing-dev: a possible sleep-in-atomic-context bug in cgwb_create()

The kernel may sleep with holding a spinlock.
The function call path (from bottom to top) in Linux-4.16.7 is:

[FUNC] schedule
lib/percpu-refcount.c, 222:
schedule in __percpu_ref_switch_mode
lib/percpu-refcount.c, 339:
__percpu_ref_switch_mode in percpu_ref_kill_and_confirm
./include/linux/percpu-refcount.h, 127:
percpu_ref_kill_and_confirm in percpu_ref_kill
mm/backing-dev.c, 545:
percpu_ref_kill in cgwb_kill
mm/backing-dev.c, 576:
cgwb_kill in cgwb_create
mm/backing-dev.c, 573:
_raw_spin_lock_irqsave in cgwb_create

This bug is found by my static analysis tool (DSAC-2) and checked by my
code review.

I do not know how to correctly fix this bug, so I just report them.
Maybe cgwb_kill() should not be called with holding a spinlock.


Best wishes,
Jia-Ju Bai


2018-06-21 03:36:22

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [BUG] mm: backing-dev: a possible sleep-in-atomic-context bug in cgwb_create()

On Thu, Jun 21, 2018 at 11:02:58AM +0800, Jia-Ju Bai wrote:
> The kernel may sleep with holding a spinlock.
> The function call path (from bottom to top) in Linux-4.16.7 is:
>
> [FUNC] schedule
> lib/percpu-refcount.c, 222:
> schedule in __percpu_ref_switch_mode
> lib/percpu-refcount.c, 339:
> __percpu_ref_switch_mode in percpu_ref_kill_and_confirm
> ./include/linux/percpu-refcount.h, 127:
> percpu_ref_kill_and_confirm in percpu_ref_kill
> mm/backing-dev.c, 545:
> percpu_ref_kill in cgwb_kill
> mm/backing-dev.c, 576:
> cgwb_kill in cgwb_create
> mm/backing-dev.c, 573:
> _raw_spin_lock_irqsave in cgwb_create
>
> This bug is found by my static analysis tool (DSAC-2) and checked by my
> code review.

I disagree with your code review.

* If the previous ATOMIC switching hasn't finished yet, wait for
* its completion. If the caller ensures that ATOMIC switching
* isn't in progress, this function can be called from any context.

I believe cgwb_kill is always called under the spinlock, so we will never
sleep because the percpu ref will never be switching to atomic mode.

This is complex and subtle, so I could be wrong.

2018-06-22 08:51:38

by Jan Kara

[permalink] [raw]
Subject: Re: [BUG] mm: backing-dev: a possible sleep-in-atomic-context bug in cgwb_create()

On Wed 20-06-18 20:35:15, Matthew Wilcox wrote:
> On Thu, Jun 21, 2018 at 11:02:58AM +0800, Jia-Ju Bai wrote:
> > The kernel may sleep with holding a spinlock.
> > The function call path (from bottom to top) in Linux-4.16.7 is:
> >
> > [FUNC] schedule
> > lib/percpu-refcount.c, 222:
> > schedule in __percpu_ref_switch_mode
> > lib/percpu-refcount.c, 339:
> > __percpu_ref_switch_mode in percpu_ref_kill_and_confirm
> > ./include/linux/percpu-refcount.h, 127:
> > percpu_ref_kill_and_confirm in percpu_ref_kill
> > mm/backing-dev.c, 545:
> > percpu_ref_kill in cgwb_kill
> > mm/backing-dev.c, 576:
> > cgwb_kill in cgwb_create
> > mm/backing-dev.c, 573:
> > _raw_spin_lock_irqsave in cgwb_create
> >
> > This bug is found by my static analysis tool (DSAC-2) and checked by my
> > code review.
>
> I disagree with your code review.
>
> * If the previous ATOMIC switching hasn't finished yet, wait for
> * its completion. If the caller ensures that ATOMIC switching
> * isn't in progress, this function can be called from any context.
>
> I believe cgwb_kill is always called under the spinlock, so we will never
> sleep because the percpu ref will never be switching to atomic mode.

You are right that the sleep under spinlock never happens. And the reason
is that percpu_ref_kill() never results in blocking - it does call
percpu_ref_kill_and_confirm() but the 'confirm' argument is NULL and thus
even percpu_ref_kill_and_confirm() never blocks.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR