Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757010AbaF3Tri (ORCPT ); Mon, 30 Jun 2014 15:47:38 -0400 Received: from mail-qg0-f50.google.com ([209.85.192.50]:58850 "EHLO mail-qg0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750836AbaF3Trg (ORCPT ); Mon, 30 Jun 2014 15:47:36 -0400 Date: Mon, 30 Jun 2014 15:47:32 -0400 From: Tejun Heo To: Li Zefan Cc: LKML , Cgroups Subject: [PATCH cgroup/for-3.16-fixes] cpuset: break kernfs active protection in cpuset_write_resmask() Message-ID: <20140630194732.GF14807@htj.dyndns.org> References: <53B0DD8D.5060805@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53B0DD8D.5060805@huawei.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hey, Li. Can you please test this patch and ack it? Thanks. ------ 8< ------ Writing to either "cpuset.cpus" or "cpuset.mems" file flushes cpuset_hotplug_work so that cpu or memory hotunplug doesn't end up migrating tasks off a cpuset after new resources are added to it. As cpuset_hotplug_work calls into cgroup core via cgroup_transfer_tasks(), this flushing adds the dependency to cgroup core locking from cpuset_write_resmak(). This used to be okay because cgroup interface files were protected by a different mutex; however, 8353da1f91f1 ("cgroup: remove cgroup_tree_mutex") simplified the cgroup core locking and this dependency became a deadlock hazard - cgroup file removal performed under cgroup core lock tries to drain on-going file operation which is trying to flush cpuset_hotplug_work blocked on the same cgroup core lock. The locking simplification was done because kernfs added an a lot easier way to deal with circular dependencies involving kernfs active protection. Let's use the same strategy in cpuset and break active protection in cpuset_write_resmask(). While it isn't the prettiest, this is a very rare, likely unique, situation which also goes away on the unified hierarchy. The commands to trigger the deadlock warning without the patch and the lockdep output follow. localhost:/ # mount -t cgroup -o cpuset xxx /cpuset localhost:/ # mkdir /cpuset/tmp localhost:/ # echo 1 > /cpuset/tmp/cpuset.cpus localhost:/ # echo 0 > cpuset/tmp/cpuset.mems localhost:/ # echo $$ > /cpuset/tmp/tasks localhost:/ # echo 0 > /sys/devices/system/cpu/cpu1/online ====================================================== [ INFO: possible circular locking dependency detected ] 3.16.0-rc1-0.1-default+ #7 Not tainted ------------------------------------------------------- kworker/1:0/32649 is trying to acquire lock: (cgroup_mutex){+.+.+.}, at: [] cgroup_transfer_tasks+0x37/0x150 but task is already holding lock: (cpuset_hotplug_work){+.+...}, at: [] process_one_work+0x192/0x520 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (cpuset_hotplug_work){+.+...}: ... -> #1 (s_active#175){++++.+}: ... -> #0 (cgroup_mutex){+.+.+.}: ... other info that might help us debug this: Chain exists of: cgroup_mutex --> s_active#175 --> cpuset_hotplug_work Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(cpuset_hotplug_work); lock(s_active#175); lock(cpuset_hotplug_work); lock(cgroup_mutex); *** DEADLOCK *** 2 locks held by kworker/1:0/32649: #0: ("events"){.+.+.+}, at: [] process_one_work+0x192/0x520 #1: (cpuset_hotplug_work){+.+...}, at: [] process_one_work+0x192/0x520 stack backtrace: CPU: 1 PID: 32649 Comm: kworker/1:0 Not tainted 3.16.0-rc1-0.1-default+ #7 ... Call Trace: [] dump_stack+0x72/0x8a [] print_circular_bug+0x10f/0x120 [] check_prev_add+0x43e/0x4b0 [] validate_chain+0x656/0x7c0 [] __lock_acquire+0x382/0x660 [] lock_acquire+0xf9/0x170 [] mutex_lock_nested+0x6f/0x380 [] cgroup_transfer_tasks+0x37/0x150 [] hotplug_update_tasks_insane+0x110/0x1d0 [] cpuset_hotplug_update_tasks+0x13d/0x180 [] cpuset_hotplug_workfn+0x18c/0x630 [] process_one_work+0x254/0x520 [] worker_thread+0x13d/0x3d0 [] kthread+0xf8/0x100 [] ret_from_fork+0x7c/0xb0 Signed-off-by: Tejun Heo Reported-by: Li Zefan --- kernel/cpuset.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -1617,7 +1617,17 @@ static ssize_t cpuset_write_resmask(stru * resources, wait for the previously scheduled operations before * proceeding, so that we don't end up keep removing tasks added * after execution capability is restored. + * + * cpuset_hotplug_work calls back into cgroup core via + * cgroup_transfer_tasks() and waiting for it from a cgroupfs + * operation like this one can lead to a deadlock through kernfs + * active_ref protection. Let's break the protection. Losing the + * protection is okay as we check whether @cs is online after + * grabbing cpuset_mutex anyway. This only happens on the legacy + * hierarchies. */ + css_get(&cs->css); + kernfs_break_active_protection(of->kn); flush_work(&cpuset_hotplug_work); mutex_lock(&cpuset_mutex); @@ -1645,6 +1655,8 @@ static ssize_t cpuset_write_resmask(stru free_trial_cpuset(trialcs); out_unlock: mutex_unlock(&cpuset_mutex); + kernfs_unbreak_active_protection(of->kn); + css_put(&cs->css); return retval ?: nbytes; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/