Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp3708263imm; Mon, 18 Jun 2018 02:43:11 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJgn96fJ4cZwomyg/cWc7HSoEChMZfWTvMUoQziQEnvve/FbQdMWwrgJO1ImkpqK+eO0Qsv X-Received: by 2002:a62:8a83:: with SMTP id o3-v6mr12826333pfk.12.1529314991596; Mon, 18 Jun 2018 02:43:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529314991; cv=none; d=google.com; s=arc-20160816; b=qTxssXVKunAiADD1xEeymdCJKvgp50tO8KgEYZQfzR8UJ094SmfVV6X9nh9m1xZ52P 3/1NSxsDtt9/abyjdpWHnS4apZiMu6MjNB4uR5ow2N6cHS30hT6bmXmw+/NIwrbJER/8 disyqeYCdakb1ZRWnxI0y537EGjDtQxh7wSTKG+Itx44IAO+wY4NRLRDIMAspkERkMKP MUC8n13cKcsvXQHRar+wyz/tRUVEdY5ymzjdCjgoUSSch1QzV7RV9r7/bIP/pvMn6Mr+ XxurX1mlCNmE1W97V+C9H/xUhV0j3IdbODnuZ7CGPztNTYDXEjskO0rSb8QaykkRRbD8 xNLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=kVUXJRdN+1Iu2VFGkX+WTSrBJ6hlpzsCeMEnvD00uSA=; b=JkiaR7w5N3vhsRRLQC6lSWZCyjE8C9CFMBi+XXH+tpIrPRCtgaZoCzTKFofrPD03xJ w+Ww75d8GHVGYt/ehbt57Cs1MTTk+plxHjn0ovq6I0AP4wuMgTaCqcae6QvFzbJK37he xKZPqa5/NoEjDch3aPlMuUoXpQJ6O8BkeEOTMLutwrtxlo6it/lb3PIhOM9D3hulbZsd YDuRCBR/GbReI/DjYVqmmWpo8KgYQs9OXYQ26KqJCI9KCg0Aw5+Sj9u60mwhxyZtFeCy dMIV4ElCEls48+u6iwWCeZfDCk0abn6iklTCoi6vO0GSh1qJzWRtsH9cX3Kmf12PqhcZ AftA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l5-v6si16198671pls.360.2018.06.18.02.42.58; Mon, 18 Jun 2018 02:43:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965574AbeFRIWh (ORCPT + 99 others); Mon, 18 Jun 2018 04:22:37 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:55570 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965515AbeFRIWf (ORCPT ); Mon, 18 Jun 2018 04:22:35 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 42D52C7A; Mon, 18 Jun 2018 08:22:34 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, "Peter Zijlstra (Intel)" , Matt Fleming , Linus Torvalds , Michal Hocko , Mike Galbraith , Thomas Gleixner , Ingo Molnar , Sasha Levin Subject: [PATCH 4.16 159/279] stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock Date: Mon, 18 Jun 2018 10:12:24 +0200 Message-Id: <20180618080615.404029993@linuxfoundation.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180618080608.851973560@linuxfoundation.org> References: <20180618080608.851973560@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.16-stable review patch. If anyone has any objections, please let me know. ------------------ From: Peter Zijlstra [ Upstream commit 0b26351b910fb8fe6a056f8a1bbccabe50c0e19f ] Matt reported the following deadlock: CPU0 CPU1 schedule(.prev=migrate/0) pick_next_task() ... idle_balance() migrate_swap() active_balance() stop_two_cpus() spin_lock(stopper0->lock) spin_lock(stopper1->lock) ttwu(migrate/0) smp_cond_load_acquire() -- waits for schedule() stop_one_cpu(1) spin_lock(stopper1->lock) -- waits for stopper lock Fix this deadlock by taking the wakeups out from under stopper->lock. This allows the active_balance() to queue the stop work and finish the context switch, which in turn allows the wakeup from migrate_swap() to observe the context and complete the wakeup. Signed-off-by: Peter Zijlstra (Intel) Reported-by: Matt Fleming Signed-off-by: Peter Zijlstra (Intel) Acked-by: Matt Fleming Cc: Linus Torvalds Cc: Michal Hocko Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20180420095005.GH4064@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/stop_machine.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -21,6 +21,7 @@ #include #include #include +#include /* * Structure to determine completion condition and record errors. May @@ -65,27 +66,31 @@ static void cpu_stop_signal_done(struct } static void __cpu_stop_queue_work(struct cpu_stopper *stopper, - struct cpu_stop_work *work) + struct cpu_stop_work *work, + struct wake_q_head *wakeq) { list_add_tail(&work->list, &stopper->works); - wake_up_process(stopper->thread); + wake_q_add(wakeq, stopper->thread); } /* queue @work to @stopper. if offline, @work is completed immediately */ static bool cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work) { struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu); + DEFINE_WAKE_Q(wakeq); unsigned long flags; bool enabled; spin_lock_irqsave(&stopper->lock, flags); enabled = stopper->enabled; if (enabled) - __cpu_stop_queue_work(stopper, work); + __cpu_stop_queue_work(stopper, work, &wakeq); else if (work->done) cpu_stop_signal_done(work->done); spin_unlock_irqrestore(&stopper->lock, flags); + wake_up_q(&wakeq); + return enabled; } @@ -229,6 +234,7 @@ static int cpu_stop_queue_two_works(int { struct cpu_stopper *stopper1 = per_cpu_ptr(&cpu_stopper, cpu1); struct cpu_stopper *stopper2 = per_cpu_ptr(&cpu_stopper, cpu2); + DEFINE_WAKE_Q(wakeq); int err; retry: spin_lock_irq(&stopper1->lock); @@ -252,8 +258,8 @@ retry: goto unlock; err = 0; - __cpu_stop_queue_work(stopper1, work1); - __cpu_stop_queue_work(stopper2, work2); + __cpu_stop_queue_work(stopper1, work1, &wakeq); + __cpu_stop_queue_work(stopper2, work2, &wakeq); unlock: spin_unlock(&stopper2->lock); spin_unlock_irq(&stopper1->lock); @@ -263,6 +269,9 @@ unlock: cpu_relax(); goto retry; } + + wake_up_q(&wakeq); + return err; } /**