Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1213229pxb; Thu, 4 Feb 2021 07:36:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJwsPueIdGD2FtzmfccuA4xUQtlfWbyftuWDf6JRH0Yx3FcNf0F4w6AIbWWAVlEt0hCbO8Oj X-Received: by 2002:a17:906:7cd8:: with SMTP id h24mr8207724ejp.511.1612452974740; Thu, 04 Feb 2021 07:36:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612452974; cv=none; d=google.com; s=arc-20160816; b=aCY4wJAi7AWm2gWIDMoGJUHHrx4LvcHoxFhG21USk1O0Ou+3e/QAwaJU+tdXaBdAHS sn3RgWrp7opJpyYO+PJZK7kr8tiONJN65ZdB0Prc+rN0m1d3rMEAVpAPw0pt56n0zpAq 1rQ1sGtLZSptB9ougxolowyimyE1YmqbjX20Olf0T/N/JYPEtSt+EOZFk5hKWDJGi8eA xbzy7PedUbW0zKTrZBxpkYHbeUYjEBCxZra22xnctz2jmVKC1UqQBmMc/ieOL7s2WJwx 7JyUn3P3fZsAmJcOM6J8Aiwi+ruQjqDqxzYIEQrHtFLpEtW3wLACfVZaeUrDkxyUVvkX hxaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=Q7jNy5dd9G4mGZzqA1XcBBAZ+gr2h3w3KeqmkMVsYxA=; b=DLpbkG/Bi2N6X74Tp9Pvu9El8h59PuarobVEVwnfqNX3FkHu1Z4lpznwqGT1Uv88Xr 7Ha8PQOusVexshDD5mNDbNJb9OcbO5b2w+7XWuAzWLkYRAWQp7e4eLJQxLtwQ33kmidR U/Xwoki7ElDXFizwy4P0U0uMADf+zrsCAq9/iBex0UUccs9dLciOJda8YkMkJvJPUlkL oKNhR4VIKgzBwvngBJ94gtyhJ6YjEKgxeGv9TczUziBq8cX6onXEPo2ZMWXovKLSmqOx vn13rhQlOO0b2VA4xxxvs1VIC3VxSltVNHfEeU9x6NWWH1bahAp8Epr8+mNzFth9yDuM PaXg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e21si3366758ejt.485.2021.02.04.07.35.49; Thu, 04 Feb 2021 07:36:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237375AbhBDPbn (ORCPT + 99 others); Thu, 4 Feb 2021 10:31:43 -0500 Received: from foss.arm.com ([217.140.110.172]:60456 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237258AbhBDPba (ORCPT ); Thu, 4 Feb 2021 10:31:30 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0D99F11D4; Thu, 4 Feb 2021 07:30:45 -0800 (PST) Received: from e107158-lin (e107158-lin.cambridge.arm.com [10.1.194.78]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 109A73F718; Thu, 4 Feb 2021 07:30:42 -0800 (PST) Date: Thu, 4 Feb 2021 15:30:40 +0000 From: Qais Yousef To: Valentin Schneider Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@kernel.org, bigeasy@linutronix.de, swood@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vincent.donnefort@arm.com, tj@kernel.org Subject: Re: [RFC PATCH] sched/core: Fix premature p->migration_pending completion Message-ID: <20210204153040.qqkoa5sjztqeskoc@e107158-lin> References: <20210127193035.13789-1-valentin.schneider@arm.com> <20210203172344.uzq2iod4g46ffame@e107158-lin> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/03/21 18:59, Valentin Schneider wrote: > On 03/02/21 17:23, Qais Yousef wrote: > > On 01/27/21 19:30, Valentin Schneider wrote: > >> Fiddling some more with a TLA+ model of set_cpus_allowed_ptr() & friends > >> unearthed one more outstanding issue. This doesn't even involve > >> migrate_disable(), but rather affinity changes and execution of the stopper > >> racing with each other. > >> > >> My own interpretation of the (lengthy) TLA+ splat (note the potential for > >> errors at each level) is: > >> > >> Initial conditions: > >> victim.cpus_mask = {CPU0, CPU1} > >> > >> CPU0 CPU1 CPU > >> > >> switch_to(victim) > >> set_cpus_allowed(victim, {CPU1}) > >> kick CPU0 migration_cpu_stop({.dest_cpu = CPU1}) > >> switch_to(stopper/0) > >> // e.g. CFS load balance > >> move_queued_task(CPU0, victim, CPU1); > >> switch_to(victim) > >> set_cpus_allowed(victim, {CPU0}); > >> task_rq_unlock(); > >> migration_cpu_stop(dest_cpu=CPU1) > > > > This migration stop is due to set_cpus_allowed(victim, {CPU1}), right? > > > > Right > > >> task_rq(p) != rq && pending > >> kick CPU1 migration_cpu_stop({.dest_cpu = CPU1}) > >> > >> switch_to(stopper/1) > >> migration_cpu_stop(dest_cpu=CPU1) > > > > And this migration stop is due to set_cpus_allowed(victim, {CPU0}), right? > > > > Nein! This is a retriggering of the "current" stopper (triggered by > set_cpus_allowed(victim, {CPU1})), see the tail of that > > else if (dest_cpu < 0 || pending) > > branch in migration_cpu_stop(), is what I'm trying to hint at with that > > task_rq(p) != rq && pending Okay I see. But AFAIU, the work will be queued in order. So we should first handle the set_cpus_allowed_ptr(victim, {CPU0}) before the retrigger, no? So I see migration_cpu_stop() running 3 times 1. because of set_cpus_allowed(victim, {CPU1}) on CPU0 2. because of set_cpus_allowed(victim, {CPU0}) on CPU1 3. because of retrigger of '1' on CPU0 Thanks -- Qais Yousef