Received: by 10.223.164.221 with SMTP id h29csp647336wrb; Thu, 26 Oct 2017 04:54:14 -0700 (PDT) X-Google-Smtp-Source: ABhQp+QRD2mjegcqh6TMouOXCqpAZ2KCunHMmKDUuWYImGiIl1Cyy3yEroWM/slL/WwPbfqURs2i X-Received: by 10.101.81.130 with SMTP id h2mr4638804pgq.175.1509018854040; Thu, 26 Oct 2017 04:54:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1509018854; cv=none; d=google.com; s=arc-20160816; b=NXWWwZ9rl2JtYUoSkpPpeBuX66c84QTh6hI1ak81wThIoOz1arvIpElKWG0eNvuHMX j+yTmr2ftru0UjYnx+1i7b0uhxpjJHaMuiuUbaaZbHyNdW1tdB+MUSwuPOR3VS4jNiKi kEpcSpyQzt+W+nAhkxzQxnlmsldaM0R0xhDsEiVhc2GE6CYWX6zIfxtZ6lQB9Z9ICtH1 dQMzRmGboz6LwVJUl9KJStyYvOH4pLn8Ckwj5DbGs1fhoSPnf+jn7SL2hduDP6P/Fdib j15bq7xVkP+Zd2YriXJFfNTDJj/qgvjzI0eKf6el5XN+rFETsBPUcAJ+IUJHnqssWGyJ DUwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dmarc-filter:dkim-signature:dkim-signature :arc-authentication-results; bh=KwiIFu4zhqmqYM9RARLiBGMeFqL00RhR6YGWXOCoYo4=; b=yNLNsTJHSxnujtDCt+WTzxL5Tct1W2DNpwGxElqTn+Vyjar8+Nrk1z+P0SqLOwyE81 0H6E/fSxIJ0/uQU0yanqZu0QiY6+E+beRtwdymuyYtqwztffn1O8deYuyiBgqC2BqKvU uIa2GVgY8/f6AogB6Cg51Ux2xeJDC4iQYf+giZj0cf/a39Bl8ZMV7SjpGrTU0EETReaE lCafOqylpz20UPfpk8NQy+3mwXRLMU5ZFYS2VvmcDwQNPerYihPoVvcT+HFcU8lMrWKW NBo9mHAY8CzvQpbTPwYZdnzqGF9mwdAww2I/j1aaVTdWEhfKdJe2YP3YzPBi+zamq9jc AnIw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=f+6TP0UP; dkim=pass header.i=@codeaurora.org header.s=default header.b=ejttPojD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v8si3202956pgs.358.2017.10.26.04.54.00; Thu, 26 Oct 2017 04:54:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=f+6TP0UP; dkim=pass header.i=@codeaurora.org header.s=default header.b=ejttPojD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752033AbdJZLw2 (ORCPT + 99 others); Thu, 26 Oct 2017 07:52:28 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:52762 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751520AbdJZLwW (ORCPT ); Thu, 26 Oct 2017 07:52:22 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 02842602BC; Thu, 26 Oct 2017 11:52:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1509018742; bh=84E0gQD8jLWem2VToU2SGP7q/ZrZTnLVJCANgIb+Oxk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=f+6TP0UPsxyxCD/VR7ki+Dvu5aMNqr2c2PdIepwVoMbmeu3xiLY5mnEihUq4MwGlK b36sMvd4oCh5nFneyw3Q2uflLHIJ5XWBs/gLZxlHE1OxBMbuu3/5KmsexxD23Fs5BD tDDt4UgcZKm+eCfbry56cxSqCyI2KZnBaZxn/Y0E= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,T_DKIM_INVALID autolearn=no autolearn_force=no version=3.4.0 Received: from prsood-linux.qualcomm.com (blr-c-bdr-fw-01_globalnat_allzones-outside.qualcomm.com [103.229.19.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: prsood@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id D9D986055B; Thu, 26 Oct 2017 11:52:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1509018741; bh=84E0gQD8jLWem2VToU2SGP7q/ZrZTnLVJCANgIb+Oxk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ejttPojDtInM+/EAKDXct4vZ9/rvEJ0P8RwSULsQuI93BOZ/XqCVmYC3f9PLGjjv+ U4okhik8tHFbB/COLMAoEH08HrzJOXnn9ZmNGkqboYx5kybsrGMqYceYFZxBCsdYaL ssWsrdf9AjhdHoByImXQRb/SIYOxLon3JtHW6n5k= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org D9D986055B Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=prsood@codeaurora.org From: Prateek Sood To: peterz@infradead.org, tj@kernel.org, lizefan@huawei.com, mingo@kernel.org, longman@redhat.com, boqun.feng@gmail.com, tglx@linutronix.de Cc: Prateek Sood , cgroups@vger.kernel.org, sramana@codeaurora.org, linux-kernel@vger.kernel.org Subject: [PATCH] cgroup/cpuset: remove circular dependency deadlock Date: Thu, 26 Oct 2017 17:22:02 +0530 Message-Id: <1509018722-30359-1-git-send-email-prsood@codeaurora.org> X-Mailer: git-send-email 1.9.1 In-Reply-To: <20171025093041.GO3165@worktop.lehotels.local> References: <20171025093041.GO3165@worktop.lehotels.local> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Remove circular dependency deadlock in a scenario where hotplug of CPU is being done while there is updation in cgroup and cpuset triggered from userspace. Process A => kthreadd => Process B => Process C => Process A Process A cpu_subsys_offline(); cpu_down(); _cpu_down(); percpu_down_write(&cpu_hotplug_lock); //held cpuhp_invoke_callback(); workqueue_offline_cpu(); wq_update_unbound_numa(); kthread_create_on_node(); wake_up_process(); //wakeup kthreadd flush_work(); wait_for_completion(); kthreadd kthreadd(); kernel_thread(); do_fork(); copy_process(); percpu_down_read(&cgroup_threadgroup_rwsem); __rwsem_down_read_failed_common(); //waiting Process B kernfs_fop_write(); cgroup_file_write(); cgroup_procs_write(); percpu_down_write(&cgroup_threadgroup_rwsem); //held cgroup_attach_task(); cgroup_migrate(); cgroup_migrate_execute(); cpuset_can_attach(); mutex_lock(&cpuset_mutex); //waiting Process C kernfs_fop_write(); cgroup_file_write(); cpuset_write_resmask(); mutex_lock(&cpuset_mutex); //held update_cpumask(); update_cpumasks_hier(); rebuild_sched_domains_locked(); get_online_cpus(); percpu_down_read(&cpu_hotplug_lock); //waiting Eliminating deadlock by reversing the locking order for cpuset_mutex and cpu_hotplug_lock. Signed-off-by: Prateek Sood --- include/linux/cpuset.h | 6 ----- kernel/cgroup/cpuset.c | 70 ++++++++++++++++++++++++++------------------------ kernel/power/process.c | 2 -- kernel/sched/core.c | 1 - 4 files changed, 36 insertions(+), 43 deletions(-) diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h index a1e6a33..e74655d 100644 --- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -51,9 +51,7 @@ static inline void cpuset_dec(void) extern int cpuset_init(void); extern void cpuset_init_smp(void); -extern void cpuset_force_rebuild(void); extern void cpuset_update_active_cpus(void); -extern void cpuset_wait_for_hotplug(void); extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask); extern void cpuset_cpus_allowed_fallback(struct task_struct *p); extern nodemask_t cpuset_mems_allowed(struct task_struct *p); @@ -166,15 +164,11 @@ static inline void set_mems_allowed(nodemask_t nodemask) static inline int cpuset_init(void) { return 0; } static inline void cpuset_init_smp(void) {} -static inline void cpuset_force_rebuild(void) { } - static inline void cpuset_update_active_cpus(void) { partition_sched_domains(1, NULL, NULL); } -static inline void cpuset_wait_for_hotplug(void) { } - static inline void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask) { diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 4657e29..a8213c2 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -826,16 +826,14 @@ static int generate_sched_domains(cpumask_var_t **domains, * 'cpus' is removed, then call this routine to rebuild the * scheduler's dynamic sched domains. * - * Call with cpuset_mutex held. Takes get_online_cpus(). */ -static void rebuild_sched_domains_locked(void) +static void rebuild_sched_domains_cpuslocked(void) { struct sched_domain_attr *attr; cpumask_var_t *doms; int ndoms; lockdep_assert_held(&cpuset_mutex); - get_online_cpus(); /* * We have raced with CPU hotplug. Don't do anything to avoid @@ -843,27 +841,27 @@ static void rebuild_sched_domains_locked(void) * Anyways, hotplug work item will rebuild sched domains. */ if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask)) - goto out; + return; /* Generate domain masks and attrs */ ndoms = generate_sched_domains(&doms, &attr); /* Have scheduler rebuild the domains */ partition_sched_domains(ndoms, doms, attr); -out: - put_online_cpus(); } #else /* !CONFIG_SMP */ -static void rebuild_sched_domains_locked(void) +static void rebuild_sched_domains_cpuslocked(void) { } #endif /* CONFIG_SMP */ void rebuild_sched_domains(void) { + cpus_read_lock(); mutex_lock(&cpuset_mutex); - rebuild_sched_domains_locked(); + rebuild_sched_domains_cpuslocked(); mutex_unlock(&cpuset_mutex); + cpus_read_unlock(); } /** @@ -949,7 +947,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus) rcu_read_unlock(); if (need_rebuild_sched_domains) - rebuild_sched_domains_locked(); + rebuild_sched_domains_cpuslocked(); } /** @@ -1281,7 +1279,7 @@ static int update_relax_domain_level(struct cpuset *cs, s64 val) cs->relax_domain_level = val; if (!cpumask_empty(cs->cpus_allowed) && is_sched_load_balance(cs)) - rebuild_sched_domains_locked(); + rebuild_sched_domains_cpuslocked(); } return 0; @@ -1314,7 +1312,6 @@ static void update_tasks_flags(struct cpuset *cs) * * Call with cpuset_mutex held. */ - static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, int turning_on) { @@ -1347,7 +1344,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, spin_unlock_irq(&callback_lock); if (!cpumask_empty(trialcs->cpus_allowed) && balance_flag_changed) - rebuild_sched_domains_locked(); + rebuild_sched_domains_cpuslocked(); if (spread_flag_changed) update_tasks_flags(cs); @@ -1615,6 +1612,7 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft, cpuset_filetype_t type = cft->private; int retval = 0; + cpus_read_lock(); mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) { retval = -ENODEV; @@ -1652,6 +1650,7 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft, } out_unlock: mutex_unlock(&cpuset_mutex); + cpus_read_unlock(); return retval; } @@ -1662,6 +1661,7 @@ static int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft, cpuset_filetype_t type = cft->private; int retval = -ENODEV; + cpus_read_lock(); mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) goto out_unlock; @@ -1676,6 +1676,7 @@ static int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft, } out_unlock: mutex_unlock(&cpuset_mutex); + cpus_read_unlock(); return retval; } @@ -1714,6 +1715,7 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of, kernfs_break_active_protection(of->kn); flush_work(&cpuset_hotplug_work); + cpus_read_lock(); mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) goto out_unlock; @@ -1739,6 +1741,7 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of, free_trial_cpuset(trialcs); out_unlock: mutex_unlock(&cpuset_mutex); + cpus_read_unlock(); kernfs_unbreak_active_protection(of->kn); css_put(&cs->css); flush_workqueue(cpuset_migrate_mm_wq); @@ -2039,13 +2042,14 @@ static int cpuset_css_online(struct cgroup_subsys_state *css) /* * If the cpuset being removed has its flag 'sched_load_balance' * enabled, then simulate turning sched_load_balance off, which - * will call rebuild_sched_domains_locked(). + * will call rebuild_sched_domains_cpuslocked(). */ static void cpuset_css_offline(struct cgroup_subsys_state *css) { struct cpuset *cs = css_cs(css); + cpus_read_lock(); mutex_lock(&cpuset_mutex); if (is_sched_load_balance(cs)) @@ -2055,6 +2059,7 @@ static void cpuset_css_offline(struct cgroup_subsys_state *css) clear_bit(CS_ONLINE, &cs->flags); mutex_unlock(&cpuset_mutex); + cpus_read_unlock(); } static void cpuset_css_free(struct cgroup_subsys_state *css) @@ -2275,15 +2280,8 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs) mutex_unlock(&cpuset_mutex); } -static bool force_rebuild; - -void cpuset_force_rebuild(void) -{ - force_rebuild = true; -} - /** - * cpuset_hotplug_workfn - handle CPU/memory hotunplug for a cpuset + * cpuset_hotplug - handle CPU/memory hotunplug for a cpuset * * This function is called after either CPU or memory configuration has * changed and updates cpuset accordingly. The top_cpuset is always @@ -2298,7 +2296,7 @@ void cpuset_force_rebuild(void) * Note that CPU offlining during suspend is ignored. We don't modify * cpusets across suspend/resume cycles at all. */ -static void cpuset_hotplug_workfn(struct work_struct *work) +static void cpuset_hotplug(bool use_cpu_hp_lock) { static cpumask_t new_cpus; static nodemask_t new_mems; @@ -2356,25 +2354,29 @@ static void cpuset_hotplug_workfn(struct work_struct *work) } /* rebuild sched domains if cpus_allowed has changed */ - if (cpus_updated || force_rebuild) { - force_rebuild = false; - rebuild_sched_domains(); + if (cpus_updated) { + if (use_cpu_hp_lock) + rebuild_sched_domains(); + else { + /* When called during cpu hotplug cpu_hotplug_lock + * is held by the calling thread, not + * not cpuhp_thread_fun + */ + mutex_lock(&cpuset_mutex); + rebuild_sched_domains_cpuslocked(); + mutex_unlock(&cpuset_mutex); + } } } -void cpuset_update_active_cpus(void) +static void cpuset_hotplug_workfn(struct work_struct *work) { - /* - * We're inside cpu hotplug critical region which usually nests - * inside cgroup synchronization. Bounce actual hotplug processing - * to a work item to avoid reverse locking order. - */ - schedule_work(&cpuset_hotplug_work); + cpuset_hotplug(true); } -void cpuset_wait_for_hotplug(void) +void cpuset_update_active_cpus(void) { - flush_work(&cpuset_hotplug_work); + cpuset_hotplug(false); } /* diff --git a/kernel/power/process.c b/kernel/power/process.c index 50f25cb..28772b405 100644 --- a/kernel/power/process.c +++ b/kernel/power/process.c @@ -203,8 +203,6 @@ void thaw_processes(void) __usermodehelper_set_disable_depth(UMH_FREEZING); thaw_workqueues(); - cpuset_wait_for_hotplug(); - read_lock(&tasklist_lock); for_each_process_thread(g, p) { /* No other threads should have PF_SUSPEND_TASK set */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d17c5da..25b8717 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5590,7 +5590,6 @@ static void cpuset_cpu_active(void) * restore the original sched domains by considering the * cpuset configurations. */ - cpuset_force_rebuild(); } cpuset_update_active_cpus(); } -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project. From 1582221464701041206@xxx Wed Oct 25 09:32:53 +0000 2017 X-GM-THRID: 1577859769769316492 X-Gmail-Labels: Inbox,Category Forums