Received: by 10.192.165.148 with SMTP id m20csp1615122imm; Thu, 3 May 2018 02:27:29 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrMp//uLg1lHQ3PLlmeMy9X29S6afBKs+lHbZjI1oCOoN4+sBrqWfxoziAzOoXRLz9LLY/F X-Received: by 2002:a63:b80a:: with SMTP id p10-v6mr18504917pge.250.1525339649196; Thu, 03 May 2018 02:27:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525339649; cv=none; d=google.com; s=arc-20160816; b=0tnflfojJCShliH0frf0rgKWPArMsgaXsl5MJS2edYHsPHjUjH3xIHyLTTu2PKFEN3 vUPdeIqR87A+E6sy54F0KctUCLDjXRE1LSA6xadl9fuWvq1eE7By4YXqgvYvlALhb/py iiljWjnTbFiqB4HoYC2U2jwK929leznIWVn1Uupf/vwnn4SGozbz5nQ4FkSVZvqlPFD2 hKB+QH7mNKfoz+QmK7hY7DHHvl1wxDRw5q4UBHjunyc+A5WQxsou1TAFJ3sTlu92B10b PKztYn9wCeg00Dm3wOir7P7NjwtLzDs5BBPvttMB6EsZ7GY/wIdCUiWKgI4mk4PMU+0B ztSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-disposition :content-transfer-encoding:mime-version:robot-unsubscribe:robot-id :git-commit-id:subject:to:references:in-reply-to:reply-to:cc :message-id:from:date:arc-authentication-results; bh=spbGs4eYiNjt4kiYU2LMlWMICZdDQW4S3H25uR+Csv4=; b=0fle9bLZAxsryiEEbTP3gPv1p1aPTc3ivh7o6Ov38n3YUJEf5GjI3EEBNxtyNUbQip 19RWbVg5hu4MOAElerPb2tsBMExWX46WoFZiJk7p8wUWecK57X0FS4tDG94MHJqXv+Hr 0LYh8a+ixJaQrXhsJwa+bCVTiHn3J0DvRDTvNCMB9xp7wjG93jKUVGDQEibBrrPBiBU2 rL1fRebiHFZlmZVxBLMZltfXW1bwsrE0b61FsBiHF/3akI8MUyRgMFZaXH0qIbdWYQ5Q VTHD216/VllFAsRSm13lqhqLkPVzEC2jKZiDt0AQDVNI9yBmX3opBRLb7I48+MjDD9kw JTmQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w1-v6si3523576pgp.10.2018.05.03.02.27.15; Thu, 03 May 2018 02:27:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751445AbeECJZ2 (ORCPT + 99 others); Thu, 3 May 2018 05:25:28 -0400 Received: from terminus.zytor.com ([198.137.202.136]:49055 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750969AbeECJZ0 (ORCPT ); Thu, 3 May 2018 05:25:26 -0400 Received: from terminus.zytor.com (localhost [127.0.0.1]) by terminus.zytor.com (8.15.2/8.15.2) with ESMTP id w439OtG7512612; Thu, 3 May 2018 02:24:55 -0700 Received: (from tipbot@localhost) by terminus.zytor.com (8.15.2/8.15.2/Submit) id w439Ot0L512609; Thu, 3 May 2018 02:24:55 -0700 Date: Thu, 3 May 2018 02:24:55 -0700 X-Authentication-Warning: terminus.zytor.com: tipbot set sender to tipbot@zytor.com using -f From: tip-bot for Peter Zijlstra Message-ID: Cc: mingo@kernel.org, mhocko@suse.com, umgwanakikbuti@gmail.com, hpa@zytor.com, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, peterz@infradead.org, tglx@linutronix.de, matt@codeblueprint.co.uk Reply-To: tglx@linutronix.de, matt@codeblueprint.co.uk, peterz@infradead.org, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, umgwanakikbuti@gmail.com, hpa@zytor.com, mhocko@suse.com, mingo@kernel.org In-Reply-To: <20180420095005.GH4064@hirez.programming.kicks-ass.net> References: <20180420095005.GH4064@hirez.programming.kicks-ass.net> To: linux-tip-commits@vger.kernel.org Subject: [tip:sched/urgent] stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock Git-Commit-ID: 0b26351b910fb8fe6a056f8a1bbccabe50c0e19f X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on terminus.zytor.com Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit-ID: 0b26351b910fb8fe6a056f8a1bbccabe50c0e19f Gitweb: https://git.kernel.org/tip/0b26351b910fb8fe6a056f8a1bbccabe50c0e19f Author: Peter Zijlstra AuthorDate: Fri, 20 Apr 2018 11:50:05 +0200 Committer: Ingo Molnar CommitDate: Thu, 3 May 2018 07:38:03 +0200 stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock Matt reported the following deadlock: CPU0 CPU1 schedule(.prev=migrate/0) pick_next_task() ... idle_balance() migrate_swap() active_balance() stop_two_cpus() spin_lock(stopper0->lock) spin_lock(stopper1->lock) ttwu(migrate/0) smp_cond_load_acquire() -- waits for schedule() stop_one_cpu(1) spin_lock(stopper1->lock) -- waits for stopper lock Fix this deadlock by taking the wakeups out from under stopper->lock. This allows the active_balance() to queue the stop work and finish the context switch, which in turn allows the wakeup from migrate_swap() to observe the context and complete the wakeup. Signed-off-by: Peter Zijlstra (Intel) Reported-by: Matt Fleming Signed-off-by: Peter Zijlstra (Intel) Acked-by: Matt Fleming Cc: Linus Torvalds Cc: Michal Hocko Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20180420095005.GH4064@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar --- kernel/stop_machine.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index b7591261652d..64c0291b579c 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -21,6 +21,7 @@ #include #include #include +#include /* * Structure to determine completion condition and record errors. May @@ -65,27 +66,31 @@ static void cpu_stop_signal_done(struct cpu_stop_done *done) } static void __cpu_stop_queue_work(struct cpu_stopper *stopper, - struct cpu_stop_work *work) + struct cpu_stop_work *work, + struct wake_q_head *wakeq) { list_add_tail(&work->list, &stopper->works); - wake_up_process(stopper->thread); + wake_q_add(wakeq, stopper->thread); } /* queue @work to @stopper. if offline, @work is completed immediately */ static bool cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work) { struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu); + DEFINE_WAKE_Q(wakeq); unsigned long flags; bool enabled; spin_lock_irqsave(&stopper->lock, flags); enabled = stopper->enabled; if (enabled) - __cpu_stop_queue_work(stopper, work); + __cpu_stop_queue_work(stopper, work, &wakeq); else if (work->done) cpu_stop_signal_done(work->done); spin_unlock_irqrestore(&stopper->lock, flags); + wake_up_q(&wakeq); + return enabled; } @@ -229,6 +234,7 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, { struct cpu_stopper *stopper1 = per_cpu_ptr(&cpu_stopper, cpu1); struct cpu_stopper *stopper2 = per_cpu_ptr(&cpu_stopper, cpu2); + DEFINE_WAKE_Q(wakeq); int err; retry: spin_lock_irq(&stopper1->lock); @@ -252,8 +258,8 @@ retry: goto unlock; err = 0; - __cpu_stop_queue_work(stopper1, work1); - __cpu_stop_queue_work(stopper2, work2); + __cpu_stop_queue_work(stopper1, work1, &wakeq); + __cpu_stop_queue_work(stopper2, work2, &wakeq); unlock: spin_unlock(&stopper2->lock); spin_unlock_irq(&stopper1->lock); @@ -263,6 +269,9 @@ unlock: cpu_relax(); goto retry; } + + wake_up_q(&wakeq); + return err; } /**