Received: by 10.192.165.148 with SMTP id m20csp2949682imm; Mon, 7 May 2018 04:11:02 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpVCEF8+EfImScEpjucIiVUfOiFriDo1yATZj9BHo96WYNa4jRomOC8dPHsIajJ0EDjtl5R X-Received: by 2002:a65:5183:: with SMTP id h3-v6mr29285454pgq.58.1525691462215; Mon, 07 May 2018 04:11:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525691462; cv=none; d=google.com; s=arc-20160816; b=kX4VH92k/hI5IBJK5c0xZ5iuaNotWcjwKXd+7UJhEcAIt2Ds+6wCZfY7IRmrlouAOA mKJwci3MhEeuFpUvJVSO7YRJDCI073i/rl+rOK8sqZu1T5dse08EtDkbTA9/kn6M2IJE alEZjMaKlTZkbiRdRRdIrK6JKy/phNZu+OvaHqpcMScSnzk4Y5XyumSyzqu0lvxE5g96 8o4iZ1NrYbWDPGLRcXC19s4QZEfsqUucdozWLcCWaWVtuyo7erDgfbJnTlHWeGfYndVr w/7nJz7BYukv10IP96dqCtXw1fsdMBTqe2JM2wXPyDF8Y0eamyh9spl+91bMuuQ7ZOTj lBgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:from:subject:dmarc-filter :dkim-signature:dkim-signature:arc-authentication-results; bh=i6pkjNP9pAUrAhtmxuVZH9r+IMnbDGjnd4KANFlYcRQ=; b=OZdILhzAF6ydTLnGzbNXqsZu9tEdSMAMzhGOq5t0nVW+5wdVJIsiB/2nJQJP0rxb8l 0ezX9EgHiyR/Cq7X4kj2lyyl2uUJktJT4EZ9Pi5J+s8zIfHddpkduDZ+aEZgAmNjzxfW GSTvHPJkCsu0mhGhEoo0o4vxfnKVhsyyyNS+yqZnMIYK6RDNLdFukaDi42wHcGYnRZyr dp32x/9PpKKp2NTFh51nuJCB7BsMC6f4mbjhSBEeCs8Ys4ihsFcostfvVprCqai40kEY LU3cGQMSKluD5vCV/6wAP8njZlmm+Tls8AKbtXWWCMTiKxYLBg5haeDvt8pj1OO5v8n2 e8pQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=QtVN9FfM; dkim=pass header.i=@codeaurora.org header.s=default header.b=YL4mSC6P; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w16-v6si9009908plq.141.2018.05.07.04.10.47; Mon, 07 May 2018 04:11:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=QtVN9FfM; dkim=pass header.i=@codeaurora.org header.s=default header.b=YL4mSC6P; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751955AbeEGLJk (ORCPT + 99 others); Mon, 7 May 2018 07:09:40 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:43666 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751794AbeEGLJi (ORCPT ); Mon, 7 May 2018 07:09:38 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id AC33360B23; Mon, 7 May 2018 11:09:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1525691377; bh=qvXaUmQBtmFteFv9gq6t0fz0BCerUG+si3CJ0PbrsjM=; h=Subject:From:To:Cc:References:Date:In-Reply-To:From; b=QtVN9FfMux5h/Ssdod+PkkEmdIuBs5VmmgXiVaI2p2yjPKg0CnVlK6y2SAR8oWPNs 5G/Y6Lqy+3OU7mnBaKWjN2+L+Hmx1edm/X+mN4cZFhzVnwKLVnHaEdcwUNXpyxeODy OH88d6PD3FYyV+HfCfHJJmfyhxXF8aVLODLgVQiE= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,T_DKIM_INVALID autolearn=no autolearn_force=no version=3.4.0 Received: from [10.204.78.254] (blr-c-bdr-fw-01_globalnat_allzones-outside.qualcomm.com [103.229.19.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: gkohli@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 4303B60115; Mon, 7 May 2018 11:09:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1525691376; bh=qvXaUmQBtmFteFv9gq6t0fz0BCerUG+si3CJ0PbrsjM=; h=Subject:From:To:Cc:References:Date:In-Reply-To:From; b=YL4mSC6PivebQ9hFCI0+YzLW4pDQcqqwBIFnlg44xSonYZlHXlrjwVHef+6w6/3EI /fqH+LlI9acgvYteTfTO40M4p9T6C677sQMSfekXnbfh4uA19ahmBnJGRdskO73yIC sJXs5Rglc+TwXTC9xINTyalCqYujRp3rJXZ/Z2vs= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 4303B60115 Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=gkohli@codeaurora.org Subject: Re: [PATCH v1] kthread/smpboot: Serialize kthread parking against wakeup From: "Kohli, Gaurav" To: Peter Zijlstra Cc: tglx@linutronix.de, mpe@ellerman.id.au, mingo@kernel.org, bigeasy@linutronix.de, linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org, Neeraj Upadhyay , Will Deacon , Oleg Nesterov References: <20180426085719.GW4129@hirez.programming.kicks-ass.net> <4d3f68f8-e599-6b27-a2e8-9e96b401d57a@codeaurora.org> <20180430111744.GE4082@hirez.programming.kicks-ass.net> <3af3365b-4e3f-e388-8e90-45a3bd4120fd@codeaurora.org> <20180501101845.GE12217@hirez.programming.kicks-ass.net> <20180501113132.GF12217@hirez.programming.kicks-ass.net> <745d762d-9ab3-0749-9b87-9bb03d913071@codeaurora.org> <20180501131904.GG12217@hirez.programming.kicks-ass.net> <9b289790-9b3a-73bd-7166-bf39f32cefd8@codeaurora.org> <20180502082011.GB12180@hirez.programming.kicks-ass.net> <830d7225-af90-a55a-991a-bb2023d538f1@codeaurora.org> Message-ID: Date: Mon, 7 May 2018 16:39:28 +0530 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <830d7225-af90-a55a-991a-bb2023d538f1@codeaurora.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/2/2018 3:43 PM, Kohli, Gaurav wrote: > > > On 5/2/2018 1:50 PM, Peter Zijlstra wrote: >> On Wed, May 02, 2018 at 10:45:52AM +0530, Kohli, Gaurav wrote: >>> On 5/1/2018 6:49 PM, Peter Zijlstra wrote: >> >>>>    - complete(&kthread->parked), which we can do inside schedule(); >>>> this >>>>      solves the problem because then kthread_park() will not return >>>> early >>>>      and the task really is blocked. >>> >>> I think complete will not help, as problem is like below : >>> >>> Control Thread                                CPUHP thread >>> >>>                           cpuhp_thread_fun >>>                           Wake control thread >>>                           complete(&st->done); >>> >>> takedown_cpu >>> kthread_park >>> set_bit(KTHREAD_SHOULD_PARK >>> >>>                          Here cpuhp is looping, >>>                     //success case >>>                          Generally when issue is not >>>                          coming >>>                          it schedule out by below : >>> >>> ht->thread_should_run(td->cpu >>>                           scheduler >>>                     //failure case >>>                     before schedule >>>                     loop check >>>                     (kthread_should_park() >>>                          enter here as PARKED set >>> >>> wake_up_process(k) >> >> If k has TASK_PARKED, then wake_up_process() which uses TASK_NORMAL will >> no-op, because: >> >>     TASK_PARKED & TASK_NORMAL == 0 >> >>>                     __kthread_parkme >>>                      complete(&self->parked); >>> SETS RUNNING >>>                                  schedule >> >> But suppose, you do get that store, and we get to schedule with >> TASK_RUNNING, then schedule will no-op and we'll go around the loop and >> not complete. >> >> See also: >> lkml.kernel.org/r/20180430111744.GE4082@hirez.programming.kicks-ass.net >> >> Either TASK_RUNNING gets set before we do schedule() and we go around >> again, re-set TASK_PARKED, resched the condition and re-call schedule(), >> or we schedule() first and ttwu() will not issue the TASK_RUNNING store. >> >> In either case, we'll eventually hit schedule() with TASK_PARKED. Then, >> and only then will the complete() happen. >> >>> wait_for_completion(&kthread->parked); >> >> The point is, we'll only ever complete ^ that completion when we've >> scheduled out the task in TASK_PARKED state. If the task didn't get >> parked, no completion. > > Thanks for the detailed explanation, yes in all cases unpark will > observe parked state only. >> >> >> And that is the reason I like this approach above the others. It >> guarantees the task really is parked when we ask for it. We don't have >> to deal with the task still running and getting migrated to another CPU >> nonsense. >> > HI Peter, We have tested with new patch and still seeing same issue, in this dumps we don't have debug traces, but seems there still exist race from code review , Can you please check it once: Controller Thread CPUHP Thread takedown_cpu kthread_park kthread_parkme Set KTHREAD_SHOULD_PARK smpboot_thread_fn set Task interruptible wake_up_process Kthread_parkme SET TASK_PARKED schedule raw_spin_lock(&rq->lock) context_switch finish_lock_switch Case TASK_PARKED kthread_park_complete SET TASK_INTERRUPTIBLE And also seeing the same warning during unpark of cpuhp from controller: if (!wait_task_inactive(p, state)) { WARN_ON(1); return; } 325.065893] [] kthread_unpark+0x80/0xd8 [ 325.065902] [] bringup_cpu+0xa0/0x12c [ 325.065910] [] cpuhp_invoke_callback+0xb4/0x5c8 [ 325.065917] [] cpuhp_up_callbacks+0x3c/0x154 [ 325.065924] [] _cpu_up+0x134/0x208 [ 325.065931] [] do_cpu_up+0x168/0x1a0 [ 325.065938] [] cpu_up+0x24/0x30 [ 325.065948] [] cpu_subsys_online+0x20/0x2c [ 325.065956] [] device_online+0x70/0xb4 [ 325.065962] [] online_store+0xd0/0xdc [ 325.065971] [] dev_attr_store+0x40/0x54 [ 325.065982] [] sysfs_kf_write+0x5c/0x74 [ 325.065988] [] kernfs_fop_write+0xcc/0x1ec [ 325.065999] [] vfs_write+0xb4/0x1d0 [ 325.066006] [] SyS_write+0x60/0xc0 [ 325.066014] [] el0_svc_naked+0x24/0x28 And after this same crash occured: [ 325.521307] [] smpboot_thread_fn+0x26c/0x2c8 [ 325.527295] [] kthread+0xf4/0x108 I will put more debug ftraces to check what is going on exactly. Regards Gaurav -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.