2014-06-05 08:39:48

by Gu Zheng

[permalink] [raw]
Subject: [PATCH] mm/mempolicy: fix sleeping function called from invalid context

When running with the kernel(3.15-rc7+), the follow bug occurs:
[ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
[ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
[ 9969.441175] INFO: lockdep is turned off.
[ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G A 3.15.0-rc7+ #85
[ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
[ 9969.706052] ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
[ 9969.795323] ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
[ 9969.884710] ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
[ 9969.974071] Call Trace:
[ 9970.003403] [<ffffffff8162f523>] dump_stack+0x4d/0x66
[ 9970.065074] [<ffffffff8109995a>] __might_sleep+0xfa/0x130
[ 9970.130743] [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
[ 9970.200638] [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
[ 9970.272610] [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
[ 9970.344584] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
[ 9970.409282] [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
[ 9970.471897] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
[ 9970.536585] [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
[ 9970.613763] [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
[ 9970.683660] [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
[ 9970.759795] [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
[ 9970.834885] [<ffffffff8106a598>] do_fork+0xd8/0x380
[ 9970.894375] [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
[ 9970.969470] [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
[ 9971.030011] [<ffffffff81642009>] stub_clone+0x69/0x90
[ 9971.091573] [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b

The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
rcu_read_lock protection region to protect the access to cpuset only in
current_cpuset_is_being_rebound(). So that we can avoid this bug.

Signed-off-by: Gu Zheng <[email protected]>
---
kernel/cpuset.c | 8 +++++++-
mm/mempolicy.c | 2 --
2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 3d54c41..a735402 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1188,7 +1188,13 @@ done:

int current_cpuset_is_being_rebound(void)
{
- return task_cs(current) == cpuset_being_rebound;
+ int ret;
+
+ rcu_read_lock();
+ ret = task_cs(current) == cpuset_being_rebound;
+ rcu_read_unlock();
+
+ return ret;
}

static int update_relax_domain_level(struct cpuset *cs, s64 val)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 78e1472..b58a9d5 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2138,7 +2138,6 @@ struct mempolicy *__mpol_dup(struct mempolicy *old)
} else
*new = *old;

- rcu_read_lock();
if (current_cpuset_is_being_rebound()) {
nodemask_t mems = cpuset_mems_allowed(current);
if (new->flags & MPOL_F_REBINDING)
@@ -2146,7 +2145,6 @@ struct mempolicy *__mpol_dup(struct mempolicy *old)
else
mpol_rebind_policy(new, &mems, MPOL_REBIND_ONCE);
}
- rcu_read_unlock();
atomic_set(&new->refcnt, 1);
return new;
}
--
1.7.7


2014-06-05 14:18:43

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context

On Thu, Jun 05, 2014 at 04:28:52PM +0800, Gu Zheng wrote:
> When running with the kernel(3.15-rc7+), the follow bug occurs:
> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
> [ 9969.441175] INFO: lockdep is turned off.
> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G A 3.15.0-rc7+ #85
> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
> [ 9969.706052] ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
> [ 9969.795323] ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
> [ 9969.884710] ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
> [ 9969.974071] Call Trace:
> [ 9970.003403] [<ffffffff8162f523>] dump_stack+0x4d/0x66
> [ 9970.065074] [<ffffffff8109995a>] __might_sleep+0xfa/0x130
> [ 9970.130743] [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
> [ 9970.200638] [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
> [ 9970.272610] [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
> [ 9970.344584] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> [ 9970.409282] [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
> [ 9970.471897] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> [ 9970.536585] [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
> [ 9970.613763] [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
> [ 9970.683660] [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
> [ 9970.759795] [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
> [ 9970.834885] [<ffffffff8106a598>] do_fork+0xd8/0x380
> [ 9970.894375] [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
> [ 9970.969470] [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
> [ 9971.030011] [<ffffffff81642009>] stub_clone+0x69/0x90
> [ 9971.091573] [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
>
> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
> rcu_read_lock protection region to protect the access to cpuset only in
> current_cpuset_is_being_rebound(). So that we can avoid this bug.
>
> Signed-off-by: Gu Zheng <[email protected]>
> ---
> kernel/cpuset.c | 8 +++++++-
> mm/mempolicy.c | 2 --
> 2 files changed, 7 insertions(+), 3 deletions(-)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

</formletter>

2014-06-05 20:23:43

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context

On Thu, 5 Jun 2014 16:28:52 +0800 Gu Zheng <[email protected]> wrote:

> When running with the kernel(3.15-rc7+), the follow bug occurs:
> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
> [ 9969.441175] INFO: lockdep is turned off.
> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G A 3.15.0-rc7+ #85
> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
> [ 9969.706052] ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
> [ 9969.795323] ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
> [ 9969.884710] ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
> [ 9969.974071] Call Trace:
> [ 9970.003403] [<ffffffff8162f523>] dump_stack+0x4d/0x66
> [ 9970.065074] [<ffffffff8109995a>] __might_sleep+0xfa/0x130
> [ 9970.130743] [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
> [ 9970.200638] [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
> [ 9970.272610] [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
> [ 9970.344584] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> [ 9970.409282] [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
> [ 9970.471897] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> [ 9970.536585] [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
> [ 9970.613763] [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
> [ 9970.683660] [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
> [ 9970.759795] [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
> [ 9970.834885] [<ffffffff8106a598>] do_fork+0xd8/0x380
> [ 9970.894375] [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
> [ 9970.969470] [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
> [ 9971.030011] [<ffffffff81642009>] stub_clone+0x69/0x90
> [ 9971.091573] [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
>
> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
> rcu_read_lock protection region to protect the access to cpuset only in
> current_cpuset_is_being_rebound(). So that we can avoid this bug.
>
> ...
>
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -1188,7 +1188,13 @@ done:
>
> int current_cpuset_is_being_rebound(void)
> {
> - return task_cs(current) == cpuset_being_rebound;
> + int ret;
> +
> + rcu_read_lock();
> + ret = task_cs(current) == cpuset_being_rebound;
> + rcu_read_unlock();
> +
> + return ret;
> }

Looks fishy to me. If the rcu_read_lock() stabilizes
cpuset_being_rebound then cpuset_being_rebound can change immediately
after rcu_read_unlock() and `ret' is now wrong.

Anyway. Tejun, this one is yours please ;)

2014-06-06 09:45:17

by Gu Zheng

[permalink] [raw]
Subject: Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context

Hi Greg,
On 06/05/2014 10:18 PM, Greg KH wrote:

> On Thu, Jun 05, 2014 at 04:28:52PM +0800, Gu Zheng wrote:
>> When running with the kernel(3.15-rc7+), the follow bug occurs:
>> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
>> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
>> [ 9969.441175] INFO: lockdep is turned off.
>> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G A 3.15.0-rc7+ #85
>> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
>> [ 9969.706052] ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
>> [ 9969.795323] ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
>> [ 9969.884710] ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
>> [ 9969.974071] Call Trace:
>> [ 9970.003403] [<ffffffff8162f523>] dump_stack+0x4d/0x66
>> [ 9970.065074] [<ffffffff8109995a>] __might_sleep+0xfa/0x130
>> [ 9970.130743] [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
>> [ 9970.200638] [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
>> [ 9970.272610] [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
>> [ 9970.344584] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>> [ 9970.409282] [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
>> [ 9970.471897] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>> [ 9970.536585] [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
>> [ 9970.613763] [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
>> [ 9970.683660] [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
>> [ 9970.759795] [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
>> [ 9970.834885] [<ffffffff8106a598>] do_fork+0xd8/0x380
>> [ 9970.894375] [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
>> [ 9970.969470] [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
>> [ 9971.030011] [<ffffffff81642009>] stub_clone+0x69/0x90
>> [ 9971.091573] [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
>>
>> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
>> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
>> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
>> rcu_read_lock protection region to protect the access to cpuset only in
>> current_cpuset_is_being_rebound(). So that we can avoid this bug.
>>
>> Signed-off-by: Gu Zheng <[email protected]>
>> ---
>> kernel/cpuset.c | 8 +++++++-
>> mm/mempolicy.c | 2 --
>> 2 files changed, 7 insertions(+), 3 deletions(-)
>
> <formletter>
>
> This is not the correct way to submit patches for inclusion in the
> stable kernel tree. Please read Documentation/stable_kernel_rules.txt
> for how to do this properly.

>
> </formletter>

Thanks for your reminder.
I'll follow the right rules.

Regards,
Gu

>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
> .
>

2014-06-06 10:18:33

by Gu Zheng

[permalink] [raw]
Subject: Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context

(cc'ing Li)

Hi Andrew,
Thanks for your comment.
On 06/06/2014 04:23 AM, Andrew Morton wrote:

> On Thu, 5 Jun 2014 16:28:52 +0800 Gu Zheng <[email protected]> wrote:
>
>> When running with the kernel(3.15-rc7+), the follow bug occurs:
>> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
>> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
>> [ 9969.441175] INFO: lockdep is turned off.
>> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G A 3.15.0-rc7+ #85
>> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
>> [ 9969.706052] ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
>> [ 9969.795323] ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
>> [ 9969.884710] ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
>> [ 9969.974071] Call Trace:
>> [ 9970.003403] [<ffffffff8162f523>] dump_stack+0x4d/0x66
>> [ 9970.065074] [<ffffffff8109995a>] __might_sleep+0xfa/0x130
>> [ 9970.130743] [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
>> [ 9970.200638] [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
>> [ 9970.272610] [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
>> [ 9970.344584] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>> [ 9970.409282] [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
>> [ 9970.471897] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>> [ 9970.536585] [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
>> [ 9970.613763] [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
>> [ 9970.683660] [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
>> [ 9970.759795] [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
>> [ 9970.834885] [<ffffffff8106a598>] do_fork+0xd8/0x380
>> [ 9970.894375] [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
>> [ 9970.969470] [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
>> [ 9971.030011] [<ffffffff81642009>] stub_clone+0x69/0x90
>> [ 9971.091573] [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
>>
>> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
>> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
>> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
>> rcu_read_lock protection region to protect the access to cpuset only in
>> current_cpuset_is_being_rebound(). So that we can avoid this bug.
>>
>> ...
>>
>> --- a/kernel/cpuset.c
>> +++ b/kernel/cpuset.c
>> @@ -1188,7 +1188,13 @@ done:
>>
>> int current_cpuset_is_being_rebound(void)
>> {
>> - return task_cs(current) == cpuset_being_rebound;
>> + int ret;
>> +
>> + rcu_read_lock();
>> + ret = task_cs(current) == cpuset_being_rebound;
>> + rcu_read_unlock();
>> +
>> + return ret;
>> }
>
> Looks fishy to me. If the rcu_read_lock() stabilizes
> cpuset_being_rebound then cpuset_being_rebound can change immediately
> after rcu_read_unlock() and `ret' is now wrong.

IMO, whether cpuset_being_rebound changed or not is immaterial here, we
just want to know whether the cpuset is being rebound at that point.

>
> Anyway. Tejun, this one is yours please ;)

To Tejun, Li:
Any comment? And if I misread something, please correct me?

Thanks,
Gu

> .
>

2014-06-08 22:47:16

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context

On Fri, 6 Jun 2014, Gu Zheng wrote:

> >> When running with the kernel(3.15-rc7+), the follow bug occurs:
> >> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
> >> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
> >> [ 9969.441175] INFO: lockdep is turned off.
> >> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G A 3.15.0-rc7+ #85
> >> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
> >> [ 9969.706052] ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
> >> [ 9969.795323] ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
> >> [ 9969.884710] ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
> >> [ 9969.974071] Call Trace:
> >> [ 9970.003403] [<ffffffff8162f523>] dump_stack+0x4d/0x66
> >> [ 9970.065074] [<ffffffff8109995a>] __might_sleep+0xfa/0x130
> >> [ 9970.130743] [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
> >> [ 9970.200638] [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
> >> [ 9970.272610] [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
> >> [ 9970.344584] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> >> [ 9970.409282] [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
> >> [ 9970.471897] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> >> [ 9970.536585] [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
> >> [ 9970.613763] [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
> >> [ 9970.683660] [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
> >> [ 9970.759795] [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
> >> [ 9970.834885] [<ffffffff8106a598>] do_fork+0xd8/0x380
> >> [ 9970.894375] [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
> >> [ 9970.969470] [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
> >> [ 9971.030011] [<ffffffff81642009>] stub_clone+0x69/0x90
> >> [ 9971.091573] [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
> >>
> >> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
> >> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
> >> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
> >> rcu_read_lock protection region to protect the access to cpuset only in
> >> current_cpuset_is_being_rebound(). So that we can avoid this bug.
> >>
> >> ...
> >>
> >> --- a/kernel/cpuset.c
> >> +++ b/kernel/cpuset.c
> >> @@ -1188,7 +1188,13 @@ done:
> >>
> >> int current_cpuset_is_being_rebound(void)
> >> {
> >> - return task_cs(current) == cpuset_being_rebound;
> >> + int ret;
> >> +
> >> + rcu_read_lock();
> >> + ret = task_cs(current) == cpuset_being_rebound;
> >> + rcu_read_unlock();
> >> +
> >> + return ret;
> >> }
> >
> > Looks fishy to me. If the rcu_read_lock() stabilizes
> > cpuset_being_rebound then cpuset_being_rebound can change immediately
> > after rcu_read_unlock() and `ret' is now wrong.
>
> IMO, whether cpuset_being_rebound changed or not is immaterial here, we
> just want to know whether the cpuset is being rebound at that point.
>

I think your patch addresses the problem that you're reporting but misses
the larger problem with cpuset.mems rebinding on fork(). When the
forker's task_struct is duplicated (which includes ->mems_allowed) and it
races with an update to cpuset_being_rebound in update_tasks_nodemask()
then the task's mems_allowed doesn't get updated.

2014-06-09 08:59:52

by Gu Zheng

[permalink] [raw]
Subject: Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context

Hi David,

On 06/09/2014 06:47 AM, David Rientjes wrote:

> On Fri, 6 Jun 2014, Gu Zheng wrote:
>
>>>> When running with the kernel(3.15-rc7+), the follow bug occurs:
>>>> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
>>>> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
>>>> [ 9969.441175] INFO: lockdep is turned off.
>>>> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G A 3.15.0-rc7+ #85
>>>> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
>>>> [ 9969.706052] ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
>>>> [ 9969.795323] ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
>>>> [ 9969.884710] ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
>>>> [ 9969.974071] Call Trace:
>>>> [ 9970.003403] [<ffffffff8162f523>] dump_stack+0x4d/0x66
>>>> [ 9970.065074] [<ffffffff8109995a>] __might_sleep+0xfa/0x130
>>>> [ 9970.130743] [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
>>>> [ 9970.200638] [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
>>>> [ 9970.272610] [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
>>>> [ 9970.344584] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>>>> [ 9970.409282] [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
>>>> [ 9970.471897] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>>>> [ 9970.536585] [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
>>>> [ 9970.613763] [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
>>>> [ 9970.683660] [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
>>>> [ 9970.759795] [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
>>>> [ 9970.834885] [<ffffffff8106a598>] do_fork+0xd8/0x380
>>>> [ 9970.894375] [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
>>>> [ 9970.969470] [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
>>>> [ 9971.030011] [<ffffffff81642009>] stub_clone+0x69/0x90
>>>> [ 9971.091573] [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
>>>>
>>>> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
>>>> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
>>>> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
>>>> rcu_read_lock protection region to protect the access to cpuset only in
>>>> current_cpuset_is_being_rebound(). So that we can avoid this bug.
>>>>
>>>> ...
>>>>
>>>> --- a/kernel/cpuset.c
>>>> +++ b/kernel/cpuset.c
>>>> @@ -1188,7 +1188,13 @@ done:
>>>>
>>>> int current_cpuset_is_being_rebound(void)
>>>> {
>>>> - return task_cs(current) == cpuset_being_rebound;
>>>> + int ret;
>>>> +
>>>> + rcu_read_lock();
>>>> + ret = task_cs(current) == cpuset_being_rebound;
>>>> + rcu_read_unlock();
>>>> +
>>>> + return ret;
>>>> }
>>>
>>> Looks fishy to me. If the rcu_read_lock() stabilizes
>>> cpuset_being_rebound then cpuset_being_rebound can change immediately
>>> after rcu_read_unlock() and `ret' is now wrong.
>>
>> IMO, whether cpuset_being_rebound changed or not is immaterial here, we
>> just want to know whether the cpuset is being rebound at that point.
>>
>
> I think your patch addresses the problem that you're reporting but misses
> the larger problem with cpuset.mems rebinding on fork(). When the
> forker's task_struct is duplicated (which includes ->mems_allowed) and it
> races with an update to cpuset_being_rebound in update_tasks_nodemask()
> then the task's mems_allowed doesn't get updated.

Yes, you are right, this patch just wants to address the bug reported above.
The race condition you mentioned above inherently exists there, but it is yet
another issue, the rcu lock here makes no sense to it, and I think we need
additional sync-mechanisms if want to fix it.
But thinking more, though the current implementation has flaw, but I worry
about the negative effect if we really want to fix it. Or maybe the fear
is unnecessary.:)

Thanks,
Gu

> .
>

2014-06-09 09:14:07

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context

On Mon, 9 Jun 2014, Gu Zheng wrote:

> > I think your patch addresses the problem that you're reporting but misses
> > the larger problem with cpuset.mems rebinding on fork(). When the
> > forker's task_struct is duplicated (which includes ->mems_allowed) and it
> > races with an update to cpuset_being_rebound in update_tasks_nodemask()
> > then the task's mems_allowed doesn't get updated.
>
> Yes, you are right, this patch just wants to address the bug reported above.
> The race condition you mentioned above inherently exists there, but it is yet
> another issue, the rcu lock here makes no sense to it, and I think we need
> additional sync-mechanisms if want to fix it.

Yes, the rcu lock is not providing protection for any critical section
here that requires (1) the forker's cpuset to be stored in
cpuset_being_rebound or (2) the forked thread's cpuset to be rebound by
the cpuset nodemask update, and no race involving the two.

> But thinking more, though the current implementation has flaw, but I worry
> about the negative effect if we really want to fix it. Or maybe the fear
> is unnecessary.:)
>

It needs to be slightly rewritten to work properly without negatively
impacting the latency of fork(). Do you have the cycles to do it?

2014-06-09 10:09:08

by Gu Zheng

[permalink] [raw]
Subject: Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context

Hi David,

On 06/09/2014 05:13 PM, David Rientjes wrote:

> On Mon, 9 Jun 2014, Gu Zheng wrote:
>
>>> I think your patch addresses the problem that you're reporting but misses
>>> the larger problem with cpuset.mems rebinding on fork(). When the
>>> forker's task_struct is duplicated (which includes ->mems_allowed) and it
>>> races with an update to cpuset_being_rebound in update_tasks_nodemask()
>>> then the task's mems_allowed doesn't get updated.
>>
>> Yes, you are right, this patch just wants to address the bug reported above.
>> The race condition you mentioned above inherently exists there, but it is yet
>> another issue, the rcu lock here makes no sense to it, and I think we need
>> additional sync-mechanisms if want to fix it.
>
> Yes, the rcu lock is not providing protection for any critical section
> here that requires (1) the forker's cpuset to be stored in
> cpuset_being_rebound or (2) the forked thread's cpuset to be rebound by
> the cpuset nodemask update, and no race involving the two.
>
>> But thinking more, though the current implementation has flaw, but I worry
>> about the negative effect if we really want to fix it. Or maybe the fear
>> is unnecessary.:)
>>
>
> It needs to be slightly rewritten to work properly without negatively
> impacting the latency of fork(). Do you have the cycles to do it?
>

To be honest, I'm busy with other schedule. And if you(or other
guys) have proper proposal, please go ahead.

To Tejun, Li and Andrew:
Any comment? Or could you apply this *bug fix* first?

Regards,
Gu

2014-06-10 03:00:54

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context

On 2014/6/9 17:13, David Rientjes wrote:
> On Mon, 9 Jun 2014, Gu Zheng wrote:
>
>>> I think your patch addresses the problem that you're reporting but misses
>>> the larger problem with cpuset.mems rebinding on fork(). When the
>>> forker's task_struct is duplicated (which includes ->mems_allowed) and it
>>> races with an update to cpuset_being_rebound in update_tasks_nodemask()
>>> then the task's mems_allowed doesn't get updated.
>>
>> Yes, you are right, this patch just wants to address the bug reported above.
>> The race condition you mentioned above inherently exists there, but it is yet
>> another issue, the rcu lock here makes no sense to it, and I think we need
>> additional sync-mechanisms if want to fix it.
>
> Yes, the rcu lock is not providing protection for any critical section
> here that requires (1) the forker's cpuset to be stored in
> cpuset_being_rebound or (2) the forked thread's cpuset to be rebound by
> the cpuset nodemask update, and no race involving the two.
>

Yes, this is a long-standing issue. Besides the race you described, the child
task's mems_allowed can be wrong if the cpuset's nodemask changes before the
child has been added to the cgroup's tasklist.

I remember Tejun once said he wanted to disallow task migration between
cgroups during fork, and that should fix this problem.

>> But thinking more, though the current implementation has flaw, but I worry
>> about the negative effect if we really want to fix it. Or maybe the fear
>> is unnecessary.:)
>>
>
> It needs to be slightly rewritten to work properly without negatively
> impacting the latency of fork(). Do you have the cycles to do it?
>

Sounds you have other idea?

2014-06-10 22:16:15

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context

On Tue, 10 Jun 2014, Li Zefan wrote:

> > Yes, the rcu lock is not providing protection for any critical section
> > here that requires (1) the forker's cpuset to be stored in
> > cpuset_being_rebound or (2) the forked thread's cpuset to be rebound by
> > the cpuset nodemask update, and no race involving the two.
> >
>
> Yes, this is a long-standing issue. Besides the race you described, the child
> task's mems_allowed can be wrong if the cpuset's nodemask changes before the
> child has been added to the cgroup's tasklist.
>
> I remember Tejun once said he wanted to disallow task migration between
> cgroups during fork, and that should fix this problem.
>

Ok, I don't want to fix it in cpusets if cgroups will eventually prevent
it, so I need an understanding of the long term plan. Will cgroups
continue to allow migration during fork(), Tejun?

> > It needs to be slightly rewritten to work properly without negatively
> > impacting the latency of fork(). Do you have the cycles to do it?
> >
>
> Sounds you have other idea?
>

It wouldn't be too difficult with a cgroup post fork callback into the
cpuset code to rebind the nodemask if it has changed, but with my above
concern those might be yanked out eventually :)

2014-06-20 21:01:43

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context

Hello, Li.

Sorry about the long delay.

On Tue, Jun 10, 2014 at 10:58:45AM +0800, Li Zefan wrote:
> Yes, this is a long-standing issue. Besides the race you described, the child
> task's mems_allowed can be wrong if the cpuset's nodemask changes before the
> child has been added to the cgroup's tasklist.
>
> I remember Tejun once said he wanted to disallow task migration between
> cgroups during fork, and that should fix this problem.

I'm having trouble remembering but yeah enforcing stricter behavior
across fork could be beneficial. Hmmm... the problem with making
forks exclusive against migrations is that we'll end up adding more
locking to the fork path which isn't too nice.

Hmmm... other controllers (cgroup_freezer) can reliably synchronize
the child's state to the cgroup it belongs to. Why can't cpuset? Is
there something fundamentally missing in the cgroup API?

> > It needs to be slightly rewritten to work properly without negatively
> > impacting the latency of fork(). Do you have the cycles to do it?
> >
>
> Sounds you have other idea?

I don't think the suggested patch breaks anything more than it was
broken before and we should probably apply it for the time being. Li?

Thanks.

--
tejun

2014-06-24 02:28:34

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context

On 2014/6/21 5:01, Tejun Heo wrote:
> Hello, Li.
>
> Sorry about the long delay.
>
> On Tue, Jun 10, 2014 at 10:58:45AM +0800, Li Zefan wrote:
>> Yes, this is a long-standing issue. Besides the race you described, the child
>> task's mems_allowed can be wrong if the cpuset's nodemask changes before the
>> child has been added to the cgroup's tasklist.
>>
>> I remember Tejun once said he wanted to disallow task migration between
>> cgroups during fork, and that should fix this problem.
>
> I'm having trouble remembering but yeah enforcing stricter behavior
> across fork could be beneficial. Hmmm... the problem with making
> forks exclusive against migrations is that we'll end up adding more
> locking to the fork path which isn't too nice.
>
> Hmmm... other controllers (cgroup_freezer) can reliably synchronize
> the child's state to the cgroup it belongs to. Why can't cpuset? Is
> there something fundamentally missing in the cgroup API?
>

cgroup_freezer uses the fork callback. We can also do this for cpuset as
suggested by David, which adds a little bit overhead to the fork path.

David, care to send out a patch?

>>> It needs to be slightly rewritten to work properly without negatively
>>> impacting the latency of fork(). Do you have the cycles to do it?
>>>
>>
>> Sounds you have other idea?
>
> I don't think the suggested patch breaks anything more than it was
> broken before and we should probably apply it for the time being. Li?
>

Yeah, we should apply Gu Zheng's patch any way.

2014-06-24 20:58:38

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context

Hello,

On Tue, Jun 24, 2014 at 10:28:12AM +0800, Li Zefan wrote:
> > I don't think the suggested patch breaks anything more than it was
> > broken before and we should probably apply it for the time being. Li?
>
> Yeah, we should apply Gu Zheng's patch any way.

Gu Zheng, can you please respin the patch with updated explanation on
the temporary nature of the change. I'll apply it once Li acks it.

Thanks.

--
tejun

2014-06-25 01:08:32

by Gu Zheng

[permalink] [raw]
Subject: Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context

Hi Tejun,
On 06/25/2014 04:58 AM, Tejun Heo wrote:

> Hello,
>
> On Tue, Jun 24, 2014 at 10:28:12AM +0800, Li Zefan wrote:
>>> I don't think the suggested patch breaks anything more than it was
>>> broken before and we should probably apply it for the time being. Li?
>>
>> Yeah, we should apply Gu Zheng's patch any way.
>
> Gu Zheng, can you please respin the patch with updated explanation on
> the temporary nature of the change. I'll apply it once Li acks it.

OK, I'll resend it soon.

Regards,
Gu

>
> Thanks.
>