Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp4196468imm; Mon, 30 Jul 2018 10:14:27 -0700 (PDT) X-Google-Smtp-Source: AAOMgpewknrhmAayu29Xg1wy3eOZ5n7qqOE9OwVzVMn2Ljaa052S0fFNl6IN7jPm6pslBrzg8YPW X-Received: by 2002:a17:902:142:: with SMTP id 60-v6mr17494979plb.330.1532970867478; Mon, 30 Jul 2018 10:14:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532970867; cv=none; d=google.com; s=arc-20160816; b=stq+l10aQXq8MlyBaL6SjWdfhepUQ9d3gdCRMCghAUlmwdZeN9S+8CWFmhR72KgSp+ G372OpEwuq8V8y+Y/oqiZx14EXCbP1TlO2PxNEC/Ez0JbxepHjMdOEk3eIGA4RxZvjoq 43dDofUXca2rozKrxRygphN/BRc54ZVcQ4YCOEhqYJF2MLdoPZEBvz3jhOGPI520gasL DD6p47AJoOdALrMBybIRVVdHWngTK5uM6oED+xReHgzGntLNZscP66j1SnkWvBwGj/xJ akCb3y+IiqW23s9KgYCi7moRX/c2L75LIC9FYsRnkp9hHEtQAHM9CkPYGLWlcpagNLUm jVAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:message-id:references :in-reply-to:subject:cc:to:from:date:content-transfer-encoding :mime-version:dkim-signature:dkim-signature :arc-authentication-results; bh=pfvOS58/kETDhD4se0gP9Z0xH8n3ikt8vjs0fVsrqOA=; b=YvGIJ0j8NZOp299Nc9CMIvOFkYWV9tEj2YMdMt4D9dfLOUpC1ezzYklm5eV8N/fdqZ o1wzR4eh2k8PT2qim7kNjwutpiPVtRm9Jv0wZMg0OHU7PQ3XWXCMLCfinhO59Z3WmuoS IGJcagz5BIcCbNsJxHAXUR0T5YZLlWPEO58dv2mdEryTulyfHlRx156V3R5DSuNb5ZAq GH543qr6aTkN734Wuoo8pY4A71juIjXP5A9BhI3w9bqUM+C19soCyP/Q7yD/KH2KgGYk PxI4twK4jeK9itRoHIem7+GPZo3HDWrJD55MmYC0Iz+j2uhp718Er70N88BMDszgCoKH ntWw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=RJHQIlP4; dkim=pass header.i=@codeaurora.org header.s=default header.b=LhBgGSdO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id cd4-v6si11635344plb.516.2018.07.30.10.14.11; Mon, 30 Jul 2018 10:14:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=RJHQIlP4; dkim=pass header.i=@codeaurora.org header.s=default header.b=LhBgGSdO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729138AbeG3Ssm (ORCPT + 99 others); Mon, 30 Jul 2018 14:48:42 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:43634 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726762AbeG3Ssm (ORCPT ); Mon, 30 Jul 2018 14:48:42 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id C26B062012; Mon, 30 Jul 2018 17:12:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1532970764; bh=3i4OwjlLmnJLm1CQSwJm4J08j4tb5Qrf0Dn3mEoaC2M=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=RJHQIlP434Hkl9octwfS31tjFM+JADm60bnLig4rtPioYwiIBmOIMkHDSGXWYeS3h fJy727HgfQuUVHeChhDwboMWgPma3yOtyyTJcDdXidhJDmBcT/88VoMIxq4265r9Lw F7vXGGUb1SuLzFGimTepnFbvFrw5/JsNqAQ/vXd4= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,T_DKIM_INVALID autolearn=no autolearn_force=no version=3.4.0 Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.codeaurora.org (Postfix) with ESMTP id DEC8262012; Mon, 30 Jul 2018 17:12:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1532970763; bh=3i4OwjlLmnJLm1CQSwJm4J08j4tb5Qrf0Dn3mEoaC2M=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=LhBgGSdOQ/7iq14XZ9EdjvdmYFHlhfV0uSyJ6OLtt/R6RirIuvP+F9vr2gACxY41N OxbC+mfpLDIq5x1IFb+hMs/kGbnWr5cytL17R3YaOGDFodmwkzCApxj0aDJXhe6NON CAh6nZ8/DrxZ1QKgLVFIE+Docf8xDFpeleP2b3pk= MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 30 Jul 2018 10:12:43 -0700 From: Sodagudi Prasad To: Thomas Gleixner Cc: Peter Zijlstra , Sebastian Andrzej Siewior , isaacm@codeaurora.org, matt@codeblueprint.co.uk, mingo@kernel.org, linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org, pkondeti@codeaurora.org, stable@vger.kernel.org Subject: Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads In-Reply-To: References: <1531856129-9871-1-git-send-email-isaacm@codeaurora.org> <20180724062350.nlem2suuy5wlxpts@linutronix.de> <20180730112140.GH2494@hirez.programming.kicks-ass.net> Message-ID: <109d0e70606ccd34861a80525d6d11aa@codeaurora.org> X-Sender: psodagud@codeaurora.org User-Agent: Roundcube Webmail/1.2.5 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018-07-30 05:41, Thomas Gleixner wrote: > On Mon, 30 Jul 2018, Peter Zijlstra wrote: > >> On Mon, Jul 30, 2018 at 12:20:57PM +0200, Thomas Gleixner wrote: >> > On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote: >> > > On 2018-07-23 18:13:48 [-0700], isaacm@codeaurora.org wrote: >> > > > Hi all, >> > > Hi, >> > > >> > > > Are there any comments about this patch? >> > > >> > > I haven't look in detail at this but your new preempt_disable() makes >> > > things unbalanced for the err != 0 case. >> > >> > It doesn't but that code is really an unreadable pile of ... >> >> --- >> Subject: stop_machine: Reflow cpu_stop_queue_two_works() >> >> The code flow in cpu_stop_queue_two_works() is a little arcane; fix >> this by lifting the preempt_disable() to the top to create more >> natural >> nesting wrt the spinlocks and make the wake_up_q() and >> preempt_enable() >> unconditional at the end. >> >> Furthermore, enable preemption in the -EDEADLK case, such that we >> spin-wait with preemption enabled. >> >> Suggested-by: Thomas Gleixner >> Signed-off-by: Peter Zijlstra (Intel) > Hi Peter/Thomas, How about including below change as well? Currently, there is no way to identify thread migrations completed or not. When we observe this issue, the symptom was work queue lock up. It is better to have some timeout here and induce the bug_on. There is no way to identify the migration threads stuck or not. --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * struct cpu_stop_done done; struct cpu_stop_work work1, work2; struct multi_stop_data msdata; + int ret; msdata = (struct multi_stop_data){ .fn = fn, @@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * if (cpu_stop_queue_two_works(cpu1, &work1, cpu2, &work2)) return -ENOENT; - wait_for_completion(&done.completion); + ret = wait_for_completion_timeout(&done.completion, msecs_to_jiffies(1000)); + if (!ret) + BUG_ON(1); + > Thanks for cleaning that up! > > Reviewed-by: Thomas Gleixner -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project