Received: by 10.192.165.148 with SMTP id m20csp4830762imm; Tue, 1 May 2018 04:46:49 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrxUCPQ98VkXyzsLugmzy7M/DSI05tnZjeCXaYCz+V4kuPtgAsyAkf07Vhe1FHeF3tQWUKo X-Received: by 10.98.17.220 with SMTP id 89mr15428080pfr.18.1525175209592; Tue, 01 May 2018 04:46:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525175209; cv=none; d=google.com; s=arc-20160816; b=QFjkXqkqbtCqXz0pNPSHjPAcD8M+W+W9Am504Rrk682tRbngtXZWtH6oFWVO3NSaO5 S+9NmSorb3xRenG/LEgHptRznPSzxwYQb8ynMSb+kNSehkl16Hy/UUdQ+3eWnaPmn5rz Y6bAugpFb/I2EbioU42q3SBDM1zynsoz05U1IEGgMccLW7qhpv9wvvp0ffLdBJfV3V0u CEHG8X5/9hhx33bAo20FPp9/sdM3WqrVqDiU7t1QAH3eRCVbDVsBOCiYm2HzFkN8WHr7 b458y5T+rm2qMMHQZkT1X0cjv9DcfW/GHmpE0gaAbNrZ7VFQyCqqNw3cq5Re0X8hJFJ0 tXOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dmarc-filter :dkim-signature:dkim-signature:arc-authentication-results; bh=V8EQ+2l/b1WidH9PyxUR0cUEepl/xrR6dWrrRMKJmnk=; b=M8dGHfVd6LVDp3NBClN+FiJKF5cR4wliVHvZt2SFW3/QFLfPUZre03bwCUUMhUg7m7 0Yr9z+ixMvhweCZK4Y1IBoOiTIMO8RDT7SvAi5MtRS2Rurq9qBAiezEGcUBo6JT+mJoJ xscbXUQYNVszfYdRokAK4uZ850vqpUn16kf2Ql/CT/Nuxwja6ln3LftlWwzmH3ZBCdGy FmKO2l1DGCYzu/5POV54pV1cQvXlekUo/gDJ+fGfEexD89aG7nP3XaHBt20qiNsiUgWz HJipDvkRV++GmEoVquNoTizi5JtF4dSdUKKjwsgp/RlODDMoZKJ7XDPATLcf+8i8AVK0 qE7A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=oH+195NX; dkim=pass header.i=@codeaurora.org header.s=default header.b=oH+195NX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u124-v6si7823288pgb.151.2018.05.01.04.46.35; Tue, 01 May 2018 04:46:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=oH+195NX; dkim=pass header.i=@codeaurora.org header.s=default header.b=oH+195NX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754798AbeEALqY (ORCPT + 99 others); Tue, 1 May 2018 07:46:24 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:47658 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751920AbeEALqX (ORCPT ); Tue, 1 May 2018 07:46:23 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id B4A6460767; Tue, 1 May 2018 11:46:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1525175182; bh=487XzwLiKouXjDWnosXYkh8O1oaL9ZWJlcPvqnj83Q0=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=oH+195NXmtfQ9DHang+2/Igrwi80nkIpfegZ1BzgfD/toH7rgCNQB45U+6WehmBni psxDVHRhUc3O5f+/16N8fwVNdTH3QqN/kozRMqKaWPmkSmQNQ84GUgzdw+O1K2oKxT UqHRqbvQKuISdTEhvyhJwgJHP8t1qdj4hwP0fa08= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,T_DKIM_INVALID autolearn=no autolearn_force=no version=3.4.0 Received: from [10.204.78.254] (blr-c-bdr-fw-01_globalnat_allzones-outside.qualcomm.com [103.229.19.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: gkohli@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id EFC7D601A1; Tue, 1 May 2018 11:46:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1525175182; bh=487XzwLiKouXjDWnosXYkh8O1oaL9ZWJlcPvqnj83Q0=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=oH+195NXmtfQ9DHang+2/Igrwi80nkIpfegZ1BzgfD/toH7rgCNQB45U+6WehmBni psxDVHRhUc3O5f+/16N8fwVNdTH3QqN/kozRMqKaWPmkSmQNQ84GUgzdw+O1K2oKxT UqHRqbvQKuISdTEhvyhJwgJHP8t1qdj4hwP0fa08= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org EFC7D601A1 Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=gkohli@codeaurora.org Subject: Re: [PATCH v1] kthread/smpboot: Serialize kthread parking against wakeup To: Peter Zijlstra Cc: tglx@linutronix.de, mpe@ellerman.id.au, mingo@kernel.org, bigeasy@linutronix.de, linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org, Neeraj Upadhyay , Will Deacon , Oleg Nesterov References: <1524645199-5596-1-git-send-email-gkohli@codeaurora.org> <20180425200917.GZ4082@hirez.programming.kicks-ass.net> <20180426084131.GV4129@hirez.programming.kicks-ass.net> <20180426085719.GW4129@hirez.programming.kicks-ass.net> <4d3f68f8-e599-6b27-a2e8-9e96b401d57a@codeaurora.org> <20180430111744.GE4082@hirez.programming.kicks-ass.net> <3af3365b-4e3f-e388-8e90-45a3bd4120fd@codeaurora.org> <20180501101845.GE12217@hirez.programming.kicks-ass.net> <20180501113132.GF12217@hirez.programming.kicks-ass.net> From: "Kohli, Gaurav" Message-ID: <745d762d-9ab3-0749-9b87-9bb03d913071@codeaurora.org> Date: Tue, 1 May 2018 17:16:16 +0530 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180501113132.GF12217@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/1/2018 5:01 PM, Peter Zijlstra wrote: > On Tue, May 01, 2018 at 04:10:53PM +0530, Kohli, Gaurav wrote: >> Yes with loop, it will reset TASK_PARKED but that is not happening in the >> dumps we have seen. > > But was that with or without the fixed wait-loop? I don't care about > stuff you might have seen with the current code, that is clearly broken. > >>> takedown_cpu() can proceed beyond smpboot_park_threads() and kill the >>> CPU before any of the threads are parked -- per having the complete() >>> before hitting schedule(). >>> >>> And, afaict, that is harmless. When we go offline, sched_cpu_dying() -> >>> migrate_tasks() will migrate any still runnable threads off the cpu. >>> But because at this point the thread must be in the PARKED wait-loop, it >>> will hit schedule() and go to sleep eventually. >>> >>> Also note that kthread_unpark() does __kthread_bind() to rebind the >>> threads. >>> >>> Aaaah... I think I've spotted a problem there. We clear SHOULD_PARK >>> before we rebind, so if the thread lost the first PARKED store, >>> does the completion, gets migrated, cycles through the loop and now >>> observes !SHOULD_PARK and bails the wait-loop, then __kthread_bind() >>> will forever wait. >>> >> >> So during next unpark >> __kthread_unpark -> __kthread_bind -> wait_task_inactive (this got failed, >> as current state is running so failed on below call: > > Aah, yes, I seem to have mis-remembered how wait_task_inactive() works. > And it is indeed still a problem.. > > Let me ponder what the best solution is, it's a bit of a mess. > Sure , Thanks a lot. -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.