Received: by 10.223.164.221 with SMTP id h29csp2518783wrb; Mon, 9 Oct 2017 06:28:34 -0700 (PDT) X-Google-Smtp-Source: AOwi7QCaC6YvHDh2uoO6zlTG/i/29dhglwuj5bNRQAoOW0H0lVX2d+C8bxYxHTRikdWbgHD5waMi X-Received: by 10.98.110.73 with SMTP id j70mr167778pfc.146.1507555714068; Mon, 09 Oct 2017 06:28:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1507555714; cv=none; d=google.com; s=arc-20160816; b=Cp0DDywAXlEXYa55RSPX2tgKPPOueg2PRyjjXLicw9/9eGleqeiGAENY2FTdp+yjqj ek2Q3BPU34SVRQv1/w9dPakZNQJdjU3XFhg/I2MV1BA92iAbYjDl+Zf1ccZAA1BnWSL7 ty421IMYEGouv1dqlDLjTnI5D3m0tcC6Pt5bJn7N+gwjfLCbw0lwyAtCBojoR+zZOK9F wJfyvXQ+9JThuHtjXrb6URZjZDgrlUN9XcYJ1SBi2R5arx6XnGGo2zx0lOlD23gYj4XB ZMjVkoaHtZi+tlCmO3brQjTuZx5/vvl2Cg7hJwwCM6aIUdAU2wKKo93bqOcTSBtsuDHB M4HQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:subject:from:dmarc-filter :dkim-signature:dkim-signature:arc-authentication-results; bh=ZUAizRiUaXEgjI9xKSOtK5ems3oa5AEpQ5s8vom/Q/s=; b=qfV3x5jyDXu9avmhB5ddvK66nfEnh00CWtyU7VlLNtcO2wVYrfm26d5uUM+ICBufMk GnJxseZe//KNGo1czQ6fEwWTwgAcspsd3NXT+df/EFVZ+6tlE0Te+ehGwh6GSHJWNMFF 59IdIfm8PsisI700zsH80fTE1YoyWJqA2eesCEztkziiT22pa4zWPr0VZ7tcBZJ6as8n Ttmr3TaQHjQQ0R/aR4OldheL6p8od4qu31aOsHCgRRRlRytYsHtUG3s92HrCUDA3220s jpVdae8I4QTFBIkJ62K6zBhE8IxtYDC/n3XnpCpkEyrHv33CMVBTl7MrJsavpsLgSdsX C01A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=kda47twM; dkim=pass header.i=@codeaurora.org header.s=default header.b=VhOu0UTQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t8si3129469pgr.730.2017.10.09.06.28.20; Mon, 09 Oct 2017 06:28:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=kda47twM; dkim=pass header.i=@codeaurora.org header.s=default header.b=VhOu0UTQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755489AbdJIN2C (ORCPT + 99 others); Mon, 9 Oct 2017 09:28:02 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:40050 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755272AbdJIN16 (ORCPT ); Mon, 9 Oct 2017 09:27:58 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 46A5F60780; Mon, 9 Oct 2017 13:27:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1507555678; bh=hkxtn+CfNgakYDtx4lIaLFE2Lhs3kFiJkDuczuonZjE=; h=From:Subject:To:Cc:References:Date:In-Reply-To:From; b=kda47twMSBy0qCmkxiZX+fFZ9Ktlj8Gvm7uFVrDz8DbRFjrw2mCdFq3Z9GiG9GNW4 3VPA6TxOOBiVRNaUeDAG0+8fgseLFPZoVgZd9qfwbcGI/gZIaDI3Z2QF+dt3eJFkxK punQLXijMtP0UlWShFNPgZ5yN+CITykyp/tZHrHE= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,T_DKIM_INVALID autolearn=no autolearn_force=no version=3.4.0 Received: from [10.204.79.19] (blr-c-bdr-fw-01_globalnat_allzones-outside.qualcomm.com [103.229.19.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: prsood@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id A2A3E60780; Mon, 9 Oct 2017 13:27:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1507555677; bh=hkxtn+CfNgakYDtx4lIaLFE2Lhs3kFiJkDuczuonZjE=; h=From:Subject:To:Cc:References:Date:In-Reply-To:From; b=VhOu0UTQweHG5iYKpgv515hkRErmURkASWVbelw+vcSakSAKn26zm14ygPyj3GDwE N0BTOobxO5BMRWHpcIegnaGxZ1+28GyantOo+R3hkDBOMrWe9zMpYQD2HSyU56WY6H ggC29dAM32FJ6J/Efz9pO3Z4H4H5ln/0mpRUhh5o= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org A2A3E60780 Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=prsood@codeaurora.org From: Prateek Sood Subject: Re: [PATCH] cgroup/cpuset: remove circular dependency deadlock To: Peter Zijlstra Cc: tj@kernel.org, lizefan@huawei.com, cgroups@vger.kernel.org, mingo@kernel.org, longman@redhat.com, boqun.feng@gmail.com, tglx@linutronix.de, linux-kernel@vger.kernel.org, sramana@codeaurora.org References: <1504792583-10424-1-git-send-email-prsood@codeaurora.org> <20170907175107.GG17526@worktop.programming.kicks-ass.net> Message-ID: <4668d1ec-dc43-8a9c-4f94-a421683d3c17@codeaurora.org> Date: Mon, 9 Oct 2017 18:57:46 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20170907175107.GG17526@worktop.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/07/2017 11:21 PM, Peter Zijlstra wrote: > On Thu, Sep 07, 2017 at 07:26:23PM +0530, Prateek Sood wrote: >> Remove circular dependency deadlock in a scenario where hotplug of CPU is >> being done while there is updation in cgroup and cpuset triggered from >> userspace. > > You've forgotten to mention your solution to the deadlock, namely > inverting cpuset_mutex and cpu_hotplug_lock. > >> Signed-off-by: Prateek Sood >> --- >> kernel/cgroup/cpuset.c | 32 +++++++++++++++++++------------- >> 1 file changed, 19 insertions(+), 13 deletions(-) >> >> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c >> index 2f4039b..60dc0ac 100644 >> --- a/kernel/cgroup/cpuset.c >> +++ b/kernel/cgroup/cpuset.c >> @@ -816,16 +816,15 @@ static int generate_sched_domains(cpumask_var_t **domains, >> * 'cpus' is removed, then call this routine to rebuild the >> * scheduler's dynamic sched domains. >> * >> - * Call with cpuset_mutex held. Takes get_online_cpus(). >> */ >> -static void rebuild_sched_domains_locked(void) >> +static void rebuild_sched_domains_cpuslocked(void) >> { >> struct sched_domain_attr *attr; >> cpumask_var_t *doms; >> int ndoms; >> >> + lockdep_assert_cpus_held(); >> lockdep_assert_held(&cpuset_mutex); >> - get_online_cpus(); >> >> /* >> * We have raced with CPU hotplug. Don't do anything to avoid >> @@ -833,27 +832,27 @@ static void rebuild_sched_domains_locked(void) >> * Anyways, hotplug work item will rebuild sched domains. >> */ >> if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask)) >> - goto out; >> + return; >> >> /* Generate domain masks and attrs */ >> ndoms = generate_sched_domains(&doms, &attr); >> >> /* Have scheduler rebuild the domains */ >> partition_sched_domains(ndoms, doms, attr); >> -out: >> - put_online_cpus(); >> } >> #else /* !CONFIG_SMP */ >> -static void rebuild_sched_domains_locked(void) >> +static void rebuild_sched_domains_cpuslocked(void) >> { >> } >> #endif /* CONFIG_SMP */ >> >> void rebuild_sched_domains(void) >> { >> + get_online_cpus(); >> mutex_lock(&cpuset_mutex); >> - rebuild_sched_domains_locked(); >> + rebuild_sched_domains_cpuslocked(); >> mutex_unlock(&cpuset_mutex); >> + put_online_cpus(); >> } > > But if you invert these locks, the need for cpuset_hotplug_workfn() goes > away, at least for the CPU part, and we can make in synchronous again. > Yay!! > > Also, I think new code should use cpus_read_lock() instead of > get_online_cpus(). > Thanks for the review comments Peter. For patch related to circular deadlock, I will send an updated version. The callback making a call to cpuset_hotplug_workfn()in hotplug path are [CPUHP_AP_ACTIVE] = { .name = "sched:active", .startup.single = sched_cpu_activate, .teardown.single = sched_cpu_deactivate, }, if we make cpuset_hotplug_workfn() synchronous, deadlock might happen: _cpu_down() cpus_write_lock() //held cpuhp_kick_ap_work() cpuhp_kick_ap() __cpuhp_kick_ap() wake_up_process() //cpuhp_thread_fun wait_for_ap_thread() //wait for complete from cpuhp_thread_fun() cpuhp_thread_fun() cpuhp_invoke_callback() sched_cpu_deactivate() cpuset_cpu_inactive() cpuset_update_active_cpus() cpuset_hotplug_work() rebuild_sched_domains() cpus_read_lock() //waiting as acquired in _cpu_down() -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project From 1577935939548195325@xxx Fri Sep 08 02:16:18 +0000 2017 X-GM-THRID: 1577859769769316492 X-Gmail-Labels: Inbox,Category Forums