Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp536753pxb; Wed, 3 Feb 2021 11:03:45 -0800 (PST) X-Google-Smtp-Source: ABdhPJxfEy7Xf2HWhZqU+SvNR0tKxqLS1IxUJoPFJFhmGBWU0dggima7Xj6zxC3s6DepxKCvYCCf X-Received: by 2002:a05:6402:1291:: with SMTP id w17mr4647656edv.112.1612379025502; Wed, 03 Feb 2021 11:03:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612379025; cv=none; d=google.com; s=arc-20160816; b=anZEjD9UUfExbcdSwHizcEe0aPjODKM/fs/h0uGuROe92w6h+lnMXlXdih93R9ChTS u/7Z92+JB8rHFs/9t748/HNRfK3VGVFQVwL7MPo+Iekd7xop5Ea4VpZM8ukyiCRo/GWO mLlRGPttYO2zk6CoWG2CYVCN9UF4k7Yaw03UTXtv0O0MIQRO7p5z/tPuGZK1O+HBkKm+ 7wq4CVVtBeFwaoQCb+q4jOsIrE4QC4YHXk5H0plaFlRk87DP5iTPdqTwOBBUPLztXb/L 1y5t1DyY/CwMB/a4g013cMaOytbXmgIehU1fkwanebegb4q+ogjNeuNCP31fk1P0FbQf hh/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:user-agent :references:in-reply-to:subject:cc:to:from; bh=b7yVjE13N/22VcDjENHd/9LyCTmpghHWIzYAbXvA3po=; b=OIMnRrlyQlXv6c+onVtGfj9RFR1+acFkRA7WA+i1c06VdrbtlyDLO3grYRYd+KmuI0 2p8HYcscdR6zsOB810/hDiy4FHjIlvyVsFCRog3eS3EukoGS31wCJ46KsG+vNSmQaNlT Z2nhLilk2uAyHmo0p6/TwV8JoQt6n5j26s1K3H1X94Fs24uFVgxx8ih3LG0Xgmt4bNO6 DUubzm+0sga6+PoskBd86IU5Z2ani8BUIrj2XOVNxe9UxVTPTiMBuHVZL4k52VxoboX0 6S9YFXjyPg5dvvfloiXtiy8DbomrklGtoJqlkRSTCr0AcOIGy50tF9MbaqMsGbWD//Bp 3Uag== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v8si1827697edc.382.2021.02.03.11.03.18; Wed, 03 Feb 2021 11:03:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232138AbhBCS77 (ORCPT + 99 others); Wed, 3 Feb 2021 13:59:59 -0500 Received: from foss.arm.com ([217.140.110.172]:45304 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231202AbhBCS76 (ORCPT ); Wed, 3 Feb 2021 13:59:58 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8514811FB; Wed, 3 Feb 2021 10:59:12 -0800 (PST) Received: from e113632-lin (e113632-lin.cambridge.arm.com [10.1.194.46]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 85C293F719; Wed, 3 Feb 2021 10:59:10 -0800 (PST) From: Valentin Schneider To: Qais Yousef Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@kernel.org, bigeasy@linutronix.de, swood@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vincent.donnefort@arm.com, tj@kernel.org Subject: Re: [RFC PATCH] sched/core: Fix premature p->migration_pending completion In-Reply-To: <20210203172344.uzq2iod4g46ffame@e107158-lin> References: <20210127193035.13789-1-valentin.schneider@arm.com> <20210203172344.uzq2iod4g46ffame@e107158-lin> User-Agent: Notmuch/0.21 (http://notmuchmail.org) Emacs/26.3 (x86_64-pc-linux-gnu) Date: Wed, 03 Feb 2021 18:59:08 +0000 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/02/21 17:23, Qais Yousef wrote: > On 01/27/21 19:30, Valentin Schneider wrote: >> Fiddling some more with a TLA+ model of set_cpus_allowed_ptr() & friends >> unearthed one more outstanding issue. This doesn't even involve >> migrate_disable(), but rather affinity changes and execution of the stopper >> racing with each other. >> >> My own interpretation of the (lengthy) TLA+ splat (note the potential for >> errors at each level) is: >> >> Initial conditions: >> victim.cpus_mask = {CPU0, CPU1} >> >> CPU0 CPU1 CPU >> >> switch_to(victim) >> set_cpus_allowed(victim, {CPU1}) >> kick CPU0 migration_cpu_stop({.dest_cpu = CPU1}) >> switch_to(stopper/0) >> // e.g. CFS load balance >> move_queued_task(CPU0, victim, CPU1); >> switch_to(victim) >> set_cpus_allowed(victim, {CPU0}); >> task_rq_unlock(); >> migration_cpu_stop(dest_cpu=CPU1) > > This migration stop is due to set_cpus_allowed(victim, {CPU1}), right? > Right >> task_rq(p) != rq && pending >> kick CPU1 migration_cpu_stop({.dest_cpu = CPU1}) >> >> switch_to(stopper/1) >> migration_cpu_stop(dest_cpu=CPU1) > > And this migration stop is due to set_cpus_allowed(victim, {CPU0}), right? > Nein! This is a retriggering of the "current" stopper (triggered by set_cpus_allowed(victim, {CPU1})), see the tail of that else if (dest_cpu < 0 || pending) branch in migration_cpu_stop(), is what I'm trying to hint at with that task_rq(p) != rq && pending > If I didn't miss something, then dest_cpu should be CPU0 too, not CPU1 and the > task should be moved back to CPU0 as expected? > > Thanks > > -- > Qais Yousef