Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933732AbaFII7w (ORCPT ); Mon, 9 Jun 2014 04:59:52 -0400 Received: from cn.fujitsu.com ([59.151.112.132]:51058 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S932502AbaFII7p (ORCPT ); Mon, 9 Jun 2014 04:59:45 -0400 X-IronPort-AV: E=Sophos;i="4.98,1001,1392134400"; d="scan'208";a="31645141" Message-ID: <539574F1.2060701@cn.fujitsu.com> Date: Mon, 9 Jun 2014 16:48:49 +0800 From: Gu Zheng User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20110930 Thunderbird/7.0.1 MIME-Version: 1.0 To: David Rientjes CC: Andrew Morton , linux-kernel , Tejun Heo , , Cgroups , , Li Zefan Subject: Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context References: <53902A44.50005@cn.fujitsu.com> <20140605132339.ddf6df4a0cf5c14d17eb8691@linux-foundation.org> <539192F1.7050308@cn.fujitsu.com> In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.167.226.100] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi David, On 06/09/2014 06:47 AM, David Rientjes wrote: > On Fri, 6 Jun 2014, Gu Zheng wrote: > >>>> When running with the kernel(3.15-rc7+), the follow bug occurs: >>>> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586 >>>> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python >>>> [ 9969.441175] INFO: lockdep is turned off. >>>> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G A 3.15.0-rc7+ #85 >>>> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012 >>>> [ 9969.706052] ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18 >>>> [ 9969.795323] ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c >>>> [ 9969.884710] ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000 >>>> [ 9969.974071] Call Trace: >>>> [ 9970.003403] [] dump_stack+0x4d/0x66 >>>> [ 9970.065074] [] __might_sleep+0xfa/0x130 >>>> [ 9970.130743] [] mutex_lock_nested+0x3c/0x4f0 >>>> [ 9970.200638] [] ? kmem_cache_alloc+0x1bc/0x210 >>>> [ 9970.272610] [] cpuset_mems_allowed+0x27/0x140 >>>> [ 9970.344584] [] ? __mpol_dup+0x63/0x150 >>>> [ 9970.409282] [] __mpol_dup+0xe5/0x150 >>>> [ 9970.471897] [] ? __mpol_dup+0x63/0x150 >>>> [ 9970.536585] [] ? copy_process.part.23+0x606/0x1d40 >>>> [ 9970.613763] [] ? trace_hardirqs_on+0xd/0x10 >>>> [ 9970.683660] [] ? monotonic_to_bootbased+0x2f/0x50 >>>> [ 9970.759795] [] copy_process.part.23+0x670/0x1d40 >>>> [ 9970.834885] [] do_fork+0xd8/0x380 >>>> [ 9970.894375] [] ? __audit_syscall_entry+0x9c/0xf0 >>>> [ 9970.969470] [] SyS_clone+0x16/0x20 >>>> [ 9971.030011] [] stub_clone+0x69/0x90 >>>> [ 9971.091573] [] ? system_call_fastpath+0x16/0x1b >>>> >>>> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex) >>>> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(), >>>> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the >>>> rcu_read_lock protection region to protect the access to cpuset only in >>>> current_cpuset_is_being_rebound(). So that we can avoid this bug. >>>> >>>> ... >>>> >>>> --- a/kernel/cpuset.c >>>> +++ b/kernel/cpuset.c >>>> @@ -1188,7 +1188,13 @@ done: >>>> >>>> int current_cpuset_is_being_rebound(void) >>>> { >>>> - return task_cs(current) == cpuset_being_rebound; >>>> + int ret; >>>> + >>>> + rcu_read_lock(); >>>> + ret = task_cs(current) == cpuset_being_rebound; >>>> + rcu_read_unlock(); >>>> + >>>> + return ret; >>>> } >>> >>> Looks fishy to me. If the rcu_read_lock() stabilizes >>> cpuset_being_rebound then cpuset_being_rebound can change immediately >>> after rcu_read_unlock() and `ret' is now wrong. >> >> IMO, whether cpuset_being_rebound changed or not is immaterial here, we >> just want to know whether the cpuset is being rebound at that point. >> > > I think your patch addresses the problem that you're reporting but misses > the larger problem with cpuset.mems rebinding on fork(). When the > forker's task_struct is duplicated (which includes ->mems_allowed) and it > races with an update to cpuset_being_rebound in update_tasks_nodemask() > then the task's mems_allowed doesn't get updated. Yes, you are right, this patch just wants to address the bug reported above. The race condition you mentioned above inherently exists there, but it is yet another issue, the rcu lock here makes no sense to it, and I think we need additional sync-mechanisms if want to fix it. But thinking more, though the current implementation has flaw, but I worry about the negative effect if we really want to fix it. Or maybe the fear is unnecessary.:) Thanks, Gu > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/