Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1851941pxb; Fri, 5 Feb 2021 03:09:21 -0800 (PST) X-Google-Smtp-Source: ABdhPJzQ/pq7BO5bl7eUsLKOBxlNhQnXcf/DwGTHVyYhhpFDgm0qSrbLERW+iUVxK7QWFLGBykmP X-Received: by 2002:a17:906:3a13:: with SMTP id z19mr3623852eje.317.1612523361676; Fri, 05 Feb 2021 03:09:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612523361; cv=none; d=google.com; s=arc-20160816; b=MaKyrcDEd6UNG3xdZWiIwh436BTD1z0d6TnNwrO7IXbj6bxc61V8Wntu9SHFhPbtB7 6x96kVHucYF+n5zCnCIg0N9B/nduTiGmXcb/pUzsNtmG41ZUxejtHv7hISep2P3uFOnp OncR2YvY7MPZFOBNRQRdBvF0Kn/igwenRar1uqqOcG9Ky1pslnfOEsIFLiIEYynDIjbZ 863yduOb8cOo+bR+qWwlRhw78OVux7hdLViRUNo8wYPoUO/0qX3pQOW8sUYWc0gOvo9V wsEAhod9AnD+dKbzukz/srHdW9NASnbMajDBy1KuO7IK1lq4Ug9dq2v0Yc8NmfDDFBqF 5q2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:user-agent :references:in-reply-to:subject:cc:to:from; bh=VFthwQgGqQopM+Bj6RH/VtvLE2tMfOVCXBJmAM8DEPU=; b=krDVOK3mmYdpnwDkknpl+y/IQyAb/DYdKFDDLB36wCLX8FN1BpobKqcJG3YmEr+Fjv Vtpw3AcejwBJqBHXJ3fStxp4phUpyG5mNd9MyiKz44uGupM+3AZ8kImqgUSgJFt7rgkG YRSzlCwTGsoEE3n16MAkb9/Q3hdYB6ZTZDlpJFkvQI8Qxp0qcKzBDcDDpg3+GjX3SjqJ Zv/jYzlV2OvovuITTX49nX/sQ05QnIBAnujQXmtCjV0LJdo9nKDgNvs5fDH/kaF/FQra V3P4ydPJuL1nHYfgIxW2eILy6VGT8XHqDxBv1Cax8Fd8RWscp3aKvZiZExAlnk2D5evh cNYg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d6si5156774edq.260.2021.02.05.03.08.54; Fri, 05 Feb 2021 03:09:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231261AbhBELGF (ORCPT + 99 others); Fri, 5 Feb 2021 06:06:05 -0500 Received: from foss.arm.com ([217.140.110.172]:55830 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231757AbhBELDY (ORCPT ); Fri, 5 Feb 2021 06:03:24 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5A4E1D6E; Fri, 5 Feb 2021 03:02:34 -0800 (PST) Received: from e113632-lin (e113632-lin.cambridge.arm.com [10.1.194.46]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 574763F73B; Fri, 5 Feb 2021 03:02:32 -0800 (PST) From: Valentin Schneider To: Qais Yousef Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@kernel.org, bigeasy@linutronix.de, swood@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vincent.donnefort@arm.com, tj@kernel.org Subject: Re: [RFC PATCH] sched/core: Fix premature p->migration_pending completion In-Reply-To: <20210204153040.qqkoa5sjztqeskoc@e107158-lin> References: <20210127193035.13789-1-valentin.schneider@arm.com> <20210203172344.uzq2iod4g46ffame@e107158-lin> <20210204153040.qqkoa5sjztqeskoc@e107158-lin> User-Agent: Notmuch/0.21 (http://notmuchmail.org) Emacs/26.3 (x86_64-pc-linux-gnu) Date: Fri, 05 Feb 2021 11:02:27 +0000 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/02/21 15:30, Qais Yousef wrote: > On 02/03/21 18:59, Valentin Schneider wrote: >> On 03/02/21 17:23, Qais Yousef wrote: >> > On 01/27/21 19:30, Valentin Schneider wrote: >> >> Initial conditions: >> >> victim.cpus_mask = {CPU0, CPU1} >> >> >> >> CPU0 CPU1 CPU >> >> >> >> switch_to(victim) >> >> set_cpus_allowed(victim, {CPU1}) >> >> kick CPU0 migration_cpu_stop({.dest_cpu = CPU1}) >> >> switch_to(stopper/0) >> >> // e.g. CFS load balance >> >> move_queued_task(CPU0, victim, CPU1); >> >> switch_to(victim) >> >> set_cpus_allowed(victim, {CPU0}); >> >> task_rq_unlock(); >> >> migration_cpu_stop(dest_cpu=CPU1) >> > >> > This migration stop is due to set_cpus_allowed(victim, {CPU1}), right? >> > >> >> Right >> >> >> task_rq(p) != rq && pending >> >> kick CPU1 migration_cpu_stop({.dest_cpu = CPU1}) >> >> >> >> switch_to(stopper/1) >> >> migration_cpu_stop(dest_cpu=CPU1) >> > >> > And this migration stop is due to set_cpus_allowed(victim, {CPU0}), right? >> > >> >> Nein! This is a retriggering of the "current" stopper (triggered by >> set_cpus_allowed(victim, {CPU1})), see the tail of that >> >> else if (dest_cpu < 0 || pending) >> >> branch in migration_cpu_stop(), is what I'm trying to hint at with that >> >> task_rq(p) != rq && pending > > Okay I see. But AFAIU, the work will be queued in order. So we should first > handle the set_cpus_allowed_ptr(victim, {CPU0}) before the retrigger, no? > > So I see migration_cpu_stop() running 3 times > > 1. because of set_cpus_allowed(victim, {CPU1}) on CPU0 > 2. because of set_cpus_allowed(victim, {CPU0}) on CPU1 > 3. because of retrigger of '1' on CPU0 > On that 'CPU' lane, I intentionally included task_rq_unlock() but not 'kick CPU1 migration_cpu_stop({.dest_cpu = CPU0})'. IOW, there is nothing in that trace that queues a stopper work for 2. - it *will* happen at some point, but harm will already have been done. The migrate_task_to() example is potentially worse, because it doesn't rely on which stopper work gets enqueued first - only that an extra affinity change happens before the first stopper work grabs the pi_lock and completes.