Received: by 10.192.165.148 with SMTP id m20csp50479imm; Fri, 20 Apr 2018 02:51:43 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+xvtu/LL0Y+F/ilYFHJOu/BqHUQ/JstQ9no3sBbPjVXdWde5P99Bi+4BBK7B+ZpvleJqtz X-Received: by 10.99.140.14 with SMTP id m14mr8130971pgd.320.1524217903845; Fri, 20 Apr 2018 02:51:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524217903; cv=none; d=google.com; s=arc-20160816; b=j2NQ4JuDKZzq5jQafPDWYcZFWpK/uGwHEVc9t+iUKFCgqwCIGNuBPuzvN/fPux8Ipn NET+36vzC78slg1DFaW5Rz8k53NbjYareLJHhP46ZB7eITT+PZYAJO8tJBaluhXBfyUF 2dW3N4u5ZCwtoZoYBZjrpo50K2CGP7ivtpQhwVmPCdehZ/j4id/H2H/V6Us7+cj7TkWd Y/UCOn/nfiHczrYGSBn8zvLiGRdhGPjS6HNW8tZyvurhz7XLwV2WnE6rStEfrl53guKn 5v4UfZ/uTg94uNGa0fn1/nzPJl04VENU1u+c+KLgM3qXgMNKEM59IHw9D53LNKJdK63T U1zQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=4sj5wUIGEs4VB24ChiPAOcQzru9rln5rK+OxZRk2IoI=; b=zUy9k0IgZM9TRnseS5auiV0/YIf4Oxr/eLqekd03zV1lartdzlQYWYY5jyfIUvsMa+ XS4PaqLTBkfziX2pAlRzWo/GHXD/VY8CdndzAEJRTISsYgDDK2kecXAEgTLMCm5klMH3 xKOZkv0ARpzGH74AYdZ9LkBbKgVhKqEou5cZPp4yPcHcuOr1buQwmcn4CyDgL8YD7fz4 XxB8eyh0wP0zL7OSXJ0sUnBkFsyvAApBYz1crngiBaXsIqljNlG5WC6P2LRzbZnJSG57 tQklS4uxqpFWkoS4dFXdkaoVkddBI/MBRgPA1+WwMZEbJ02JnhxSBH2QUTICj0KSl7+g 9W9A== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=wAEz1Pry; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 33-v6si2992774plf.308.2018.04.20.02.51.30; Fri, 20 Apr 2018 02:51:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=wAEz1Pry; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754523AbeDTJuQ (ORCPT + 99 others); Fri, 20 Apr 2018 05:50:16 -0400 Received: from merlin.infradead.org ([205.233.59.134]:47982 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754282AbeDTJuP (ORCPT ); Fri, 20 Apr 2018 05:50:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=4sj5wUIGEs4VB24ChiPAOcQzru9rln5rK+OxZRk2IoI=; b=wAEz1PrytDct3CFW5Q1xwS4Ew ulMG6gA5RiDoHpCrLn3wOmh59//4PoqoP57fIh4aXo0jejI8in1ax21vB8hkHs+kq4x6z4bKY3o/6 +7Z3AHuUw4p2YhSj9Aj9/+5eHhzw4EMoUy7o92hsHfdNPgR+3lNDksuB+jeYsDwChFVngBqP2IdEh IbczuJ6WIttOq5W+oOSSAT/ARTQxwjVG4Q0HDvLez/vxUiMTQ5ENb/71bFb/d8pWthN69xhFGIlG6 pZD7IeOxnhAjmTg24tOaV66a+15kNi2w2Rj1/9uEsnjv7nQ/FuMnYu3lftSA1vqUzu+ld+GEFndAo ilWG7Fzxg==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1f9SgC-0002Xs-FB; Fri, 20 Apr 2018 09:50:08 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id C8B792029F877; Fri, 20 Apr 2018 11:50:05 +0200 (CEST) Date: Fri, 20 Apr 2018 11:50:05 +0200 From: Peter Zijlstra To: Matt Fleming Cc: Ingo Molnar , linux-kernel@vger.kernel.org, Michal Hocko , Mike Galbraith Subject: Re: cpu stopper threads and load balancing leads to deadlock Message-ID: <20180420095005.GH4064@hirez.programming.kicks-ass.net> References: <20180417142119.GA4511@codeblueprint.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180417142119.GA4511@codeblueprint.co.uk> User-Agent: Mutt/1.9.3 (2018-01-21) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 17, 2018 at 03:21:19PM +0100, Matt Fleming wrote: > Hi guys, > > We've seen a bug in one of our SLE kernels where the cpu stopper > thread ("migration/15") is entering idle balance. This then triggers > active load balance. > > At the same time, a task on another CPU triggers a page fault and NUMA > balancing kicks in to try and migrate the task closer to the NUMA node > for that page (we're inside stop_two_cpus()). This faulting task is > spinning in try_to_wake_up() (inside smp_cond_load_acquire(&p->on_cpu, > !VAL)), waiting for "migration/15" to context switch. > > Unfortunately, because "migration/15" is doing active load balance > it's spinning waiting for the NUMA-page-faulting CPU's stopper lock, > which is already held (since it's inside stop_two_cpus()). > > Deadlock ensues. So if I read that right, something like the following happens: CPU0 CPU1 schedule(.prev=migrate/0) pick_next_task ... idle_balance migrate_swap() active_balance stop_two_cpus() spin_lock(stopper0->lock) spin_lock(stopper1->lock) ttwu(migrate/0) smp_cond_load_acquire() -- waits for schedule() stop_one_cpu(1) spin_lock(stopper1->lock) -- waits for stopper lock Fix _this_ deadlock by taking out the wakeups from under stopper->lock. I'm not entirely sure there isn't more dragons here, but this particular one seems fixable by doing that. Is there any way you can reproduce/test this? Maybe-signed-off-by: Peter Zijlstra (Intel) --- kernel/stop_machine.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index b7591261652d..64c0291b579c 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -21,6 +21,7 @@ #include #include #include +#include /* * Structure to determine completion condition and record errors. May @@ -65,27 +66,31 @@ static void cpu_stop_signal_done(struct cpu_stop_done *done) } static void __cpu_stop_queue_work(struct cpu_stopper *stopper, - struct cpu_stop_work *work) + struct cpu_stop_work *work, + struct wake_q_head *wakeq) { list_add_tail(&work->list, &stopper->works); - wake_up_process(stopper->thread); + wake_q_add(wakeq, stopper->thread); } /* queue @work to @stopper. if offline, @work is completed immediately */ static bool cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work) { struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu); + DEFINE_WAKE_Q(wakeq); unsigned long flags; bool enabled; spin_lock_irqsave(&stopper->lock, flags); enabled = stopper->enabled; if (enabled) - __cpu_stop_queue_work(stopper, work); + __cpu_stop_queue_work(stopper, work, &wakeq); else if (work->done) cpu_stop_signal_done(work->done); spin_unlock_irqrestore(&stopper->lock, flags); + wake_up_q(&wakeq); + return enabled; } @@ -229,6 +234,7 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, { struct cpu_stopper *stopper1 = per_cpu_ptr(&cpu_stopper, cpu1); struct cpu_stopper *stopper2 = per_cpu_ptr(&cpu_stopper, cpu2); + DEFINE_WAKE_Q(wakeq); int err; retry: spin_lock_irq(&stopper1->lock); @@ -252,8 +258,8 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, goto unlock; err = 0; - __cpu_stop_queue_work(stopper1, work1); - __cpu_stop_queue_work(stopper2, work2); + __cpu_stop_queue_work(stopper1, work1, &wakeq); + __cpu_stop_queue_work(stopper2, work2, &wakeq); unlock: spin_unlock(&stopper2->lock); spin_unlock_irq(&stopper1->lock); @@ -263,6 +269,9 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, cpu_relax(); goto retry; } + + wake_up_q(&wakeq); + return err; } /**