Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp766096imm; Tue, 5 Jun 2018 04:14:33 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJOpZPsjIF9rRjWJedxNgPEz+lv6MSator7AEJDNrIbTk2cMoDqPgXXeR51I2c317C4fFTS X-Received: by 2002:a62:fd0b:: with SMTP id p11-v6mr4158333pfh.52.1528197273800; Tue, 05 Jun 2018 04:14:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528197273; cv=none; d=google.com; s=arc-20160816; b=Ybq0i7NS3R8KOQ55nrs6EhPj1JHYCrho4BM3l2AKARw92OUgs22j3jWMr0K7z6zUXI Uum2bTqoHYP0XiKJt117z02Dp0Yi5YuslrS4Nob8IuvjmO+rPM/+NsPbXg8kGTh4y0YJ 6z56AtR9dyVlSg467H8vaywkldYhhuwsAFb6IeG+rMRDgSCng+n6a5dOsSLUN+PkFihs emQ07PvKHWAD3vHoD1NvfT0lOwvrEDTMQKFB7FV3mxo5c+MrGSQSjpl3/m8Ij1qQYAX3 dMToOXbvMuzte5eHcaUqtUH7+Qds0qAu17efa/XIgk6/XPdlBdvD8owkWv8dpOYT58NL qbTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:from:subject:dmarc-filter :dkim-signature:dkim-signature:arc-authentication-results; bh=yTAH37zRtGx771cYYIBo/VtH+9DleuOfUKSCYGzoZ6g=; b=t3syjFXMTeby3dkv10IR6iiei8nIYWFwyOcIf3SnEPEFAACu4i4ABA7ALEx3ZPF1Qz pvLzEopJaoNRdTO6gqApIsRjEnPnqskrkCXdRWThVAsrB8sjNXlo5ozLZgvOU8Lz0UVy hRhfE+xD+8UJO/ZfdZzdSWwM2RrpkLVbKnSXlr7dGP0MM764hETOLeoEXF9mF7OzemNM sail6jn3GLGzrH7wOo3xjenJgdiC+u3soO/Nb1SmZqmgO2n/E4VcYUcBUKi1UnfGXWLH ArC+THrazf2GiFJ6f1kjdoRGyhTLscJ8Q+8+kn8BflghhAglZEktoI1HWyppACw5Iz2H HNog== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=NZQQR7Z2; dkim=pass header.i=@codeaurora.org header.s=default header.b=HLKYjqTD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n9-v6si3784629plk.310.2018.06.05.04.14.17; Tue, 05 Jun 2018 04:14:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=NZQQR7Z2; dkim=pass header.i=@codeaurora.org header.s=default header.b=HLKYjqTD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751689AbeFELNw (ORCPT + 99 others); Tue, 5 Jun 2018 07:13:52 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:39224 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751530AbeFELNu (ORCPT ); Tue, 5 Jun 2018 07:13:50 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 314A660261; Tue, 5 Jun 2018 11:13:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1528197230; bh=gTPbEjzh2xaTlx4y7zZZjhhe7zjT9NdQzS50pZXiRHs=; h=Subject:From:To:Cc:References:Date:In-Reply-To:From; b=NZQQR7Z2Gk2OH6ctXzWytMit8+WZ0y7W8hnlbquZq6BmuiDOmQzpJkJ9TJN2sLha/ No7pAJpXTgvQwtBnvM14JB8KZ3oHUr37Q+f+2bnTJQA5xSNi5ABcQU1OJCwotjmkW1 R9fZ8PBzqKCIbbuLqmwVTqTuBSGVNVpb8zFhE6rA= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,T_DKIM_INVALID autolearn=no autolearn_force=no version=3.4.0 Received: from [10.204.78.68] (blr-c-bdr-fw-01_globalnat_allzones-outside.qualcomm.com [103.229.19.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: gkohli@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 96655601A8; Tue, 5 Jun 2018 11:13:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1528197229; bh=gTPbEjzh2xaTlx4y7zZZjhhe7zjT9NdQzS50pZXiRHs=; h=Subject:From:To:Cc:References:Date:In-Reply-To:From; b=HLKYjqTDzwJ15MUKYLCVW0ddXELlAjU8Q4PSChs94QUJVe/6+6NmijsT/mTKe2gII PEZR4PzfVZ7DuCe7/8vsnjtkwG1vcPD+qZMAhRYCYd7Ncny7JkvTcYtZzWRvSP4HG/ UprDzdaSCGS9cXt00ZoJd9OrvQM8BYqXiFCr7H0M= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 96655601A8 Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=gkohli@codeaurora.org Subject: Re: [PATCH v1] kthread/smpboot: Serialize kthread parking against wakeup From: "Kohli, Gaurav" To: Peter Zijlstra Cc: tglx@linutronix.de, mpe@ellerman.id.au, mingo@kernel.org, bigeasy@linutronix.de, linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org, Neeraj Upadhyay , Will Deacon , Oleg Nesterov References: <20180426085719.GW4129@hirez.programming.kicks-ass.net> <4d3f68f8-e599-6b27-a2e8-9e96b401d57a@codeaurora.org> <20180430111744.GE4082@hirez.programming.kicks-ass.net> <3af3365b-4e3f-e388-8e90-45a3bd4120fd@codeaurora.org> <20180501101845.GE12217@hirez.programming.kicks-ass.net> <20180501113132.GF12217@hirez.programming.kicks-ass.net> <745d762d-9ab3-0749-9b87-9bb03d913071@codeaurora.org> <20180501131904.GG12217@hirez.programming.kicks-ass.net> <9b289790-9b3a-73bd-7166-bf39f32cefd8@codeaurora.org> <20180502082011.GB12180@hirez.programming.kicks-ass.net> <830d7225-af90-a55a-991a-bb2023d538f1@codeaurora.org> <55221a5b-dd52-3359-f582-86830dd9f205@codeaurora.org> Message-ID: Date: Tue, 5 Jun 2018 16:43:45 +0530 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <55221a5b-dd52-3359-f582-86830dd9f205@codeaurora.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Peter, As last mentioned on mail, we are still seeing issue with the latest approach and below is the susceptible race as mentioned earlier.. controller Thread CPUHP Thread takedown_cpu kthread_park kthread_parkme Set KTHREAD_SHOULD_PARK smpboot_thread_fn set Task interruptible wake_up_process if (!(p->state & state)) goto out; Kthread_parkme SET TASK_PARKED schedule raw_spin_lock(&rq->lock) ttwu_remote waiting for __task_rq_lock context_switch finish_lock_switch Case TASK_PARKED kthread_park_complete SET Running So it seems issue is still their with the latest mentioned fix kthread, sched/wait: Fix kthread_parkme() completion issue. Regards Gaurav On 5/7/2018 4:53 PM, Kohli, Gaurav wrote: > Corrected the formatting, Sorry for spam. > > >> >> HI Peter, >> >> We have tested with new patch and still seeing same issue, in this >> dumps we don't have debug traces, but seems there still exist race >> from code review , Can you please check it once: >> >> Controller Thread                               CPUHP Thread >> takedown_cpu >> kthread_park >> kthread_parkme >> Set KTHREAD_SHOULD_PARK >>                                                 smpboot_thread_fn >>                                                 set Task interruptible >> >> >> wake_up_process >> >>                                                 Kthread_parkme >>                                                 SET TASK_PARKED >>                                                 schedule >>                                                 raw_spin_lock(&rq->lock) >> >>                                                 context_switch >> >>                                                 finish_lock_switch >> >> >> >>                                                 Case TASK_PARKED >>                                                 kthread_park_complete >> >> >> SET TASK_INTERRUPTIBLE >> >> >> And also seeing the same warning during unpark of cpuhp from controller: >>   if (!wait_task_inactive(p, state)) { >>                  WARN_ON(1); >>                  return; >>          } >> 325.065893] [] kthread_unpark+0x80/0xd8 >> [  325.065902] [] bringup_cpu+0xa0/0x12c >> [  325.065910] [] cpuhp_invoke_callback+0xb4/0x5c8 >> [  325.065917] [] cpuhp_up_callbacks+0x3c/0x154 >> [  325.065924] [] _cpu_up+0x134/0x208 >> [  325.065931] [] do_cpu_up+0x168/0x1a0 >> [  325.065938] [] cpu_up+0x24/0x30 >> [  325.065948] [] cpu_subsys_online+0x20/0x2c >> [  325.065956] [] device_online+0x70/0xb4 >> [  325.065962] [] online_store+0xd0/0xdc >> [  325.065971] [] dev_attr_store+0x40/0x54 >> [  325.065982] [] sysfs_kf_write+0x5c/0x74 >> [  325.065988] [] kernfs_fop_write+0xcc/0x1ec >> [  325.065999] [] vfs_write+0xb4/0x1d0 >> [  325.066006] [] SyS_write+0x60/0xc0 >> [  325.066014] [] el0_svc_naked+0x24/0x28 >> >> >> And after this same crash occured: >> [  325.521307] [] smpboot_thread_fn+0x26c/0x2c8 >> [  325.527295] [] kthread+0xf4/0x108 >> >> I will put more debug ftraces to check what is going on exactly. >> >> Regards >> Gaurav >> >> >> >> > -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.