Received: by 2002:a89:2c3:0:b0:1ed:23cc:44d1 with SMTP id d3csp1139576lqs; Wed, 6 Mar 2024 07:25:18 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUtI7caGYdVMubvgKxJUkN7TCZXhqBfbVlbyYY1Rr7CJ2+ENgfShYeqkDwpuJa8RqkSnB7MN3JoeGsVM+HpydvZWv3WWtuemKvx7sxiXg== X-Google-Smtp-Source: AGHT+IEy7WVtyqhjuZxg22LLZ3h6JawDDIld0AGgOQew6PSg0JK5wXvsgfpyPWTCmpPrPCy7k5j2 X-Received: by 2002:a17:906:f1d1:b0:a43:83dd:1e4e with SMTP id gx17-20020a170906f1d100b00a4383dd1e4emr10300167ejb.29.1709738718016; Wed, 06 Mar 2024 07:25:18 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709738717; cv=pass; d=google.com; s=arc-20160816; b=Dowe1ZDrxwQ/8Kf/3Soc1MByzfS8y2wIcxxniPSzEu8+oQ+vS9xJU5Pw9xLz9QcGrT GsYyyRLdYXOX8HwMqSuCcVN2IoeKTgDFJ9j3DDSuaB/4SQQnRXfgBDgQEQ7ZmiCGOUnU 9LUBRofLMOkftieH0yK21iovGSVdp69AtuQAlkueNYf8qMvAvvVknv4oathEyuW3ysZq 9Knkit2lnVuSMPGdLextOYnPraNUNlr6vf7LT+EQvNQWyp1mhAaYjuRDx2dIcAxIl6YV gSlJMBxm1569s+yS9DgLILTlr4Eo/f8XHjerxBCS9Vr1amEcnTulJeKQL5qICPNJmol+ ztaA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=dYnsDHB1XKpYTB9/AFM0QsEq8TXS5u+qTt/zLKta2zw=; fh=AC2HQxsaPWUH61vfsIqis3aADTdQ8uNYn8ouVhAOH64=; b=TDCeLhAJaRdiqb6ye0sPEnWhe0qzN5VuzThxUw5ET9xkGNaGt7Psf031FRN474Wy64 qtWEu8mDDMa6iu+YUliTA2o2Pd56Fyo1f1pRMGNPf0ccuY1BefNXPRpSiPyIbRy3g8Kr 2yow0GrecfI5105/FKjR0/9Z+PtwRFHmNiLMiH1TmL/8aSOWejvFtzW772hkKxw6AuJ9 yiEl4tunLz0NalxY5oTgC4x/MULhtoaQiCfcS7TiIMgq0wwHTQGNycVd/yAMggO/FOiW 1G31BOlsKHFNXe6DBynPPlws4HTppN53EsgeJBMTjZtkZMTKkxdJaUMQiiyZXJLzX2Iz 3xnA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=YnOkUDS+; arc=pass (i=1 spf=pass spfdomain=efficios.com dkim=pass dkdomain=efficios.com dmarc=pass fromdomain=efficios.com); spf=pass (google.com: domain of linux-kernel+bounces-94152-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-94152-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id xc7-20020a170907074700b00a3fcef15030si6061966ejb.787.2024.03.06.07.25.17 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Mar 2024 07:25:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-94152-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=YnOkUDS+; arc=pass (i=1 spf=pass spfdomain=efficios.com dkim=pass dkdomain=efficios.com dmarc=pass fromdomain=efficios.com); spf=pass (google.com: domain of linux-kernel+bounces-94152-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-94152-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id B98061F20F8B for ; Wed, 6 Mar 2024 15:25:17 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C218B1350F2; Wed, 6 Mar 2024 15:25:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="YnOkUDS+" Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 391E8131E4B; Wed, 6 Mar 2024 15:25:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709738709; cv=none; b=mDSno6cy8euJmrJRnMWO/EyS/kDnihb4uXCnLleEFRWMuEMSUzlRFaey0fWKyh9rf20RqHP6Oo+TzkrAzET33bWwpjlbed7Fz4ycCJCG2M/i9YGadIn8/XTwDLkOk4mboNs5QbfUXs/0Lp5B56oeaZRdVaMtU+pSM1I9jyPx57E= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709738709; c=relaxed/simple; bh=EDc76FHAyYlhGhEF+aSjah9SCCYqShQbC/emjwhHXkg=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=S4V6EY1tF7E/6JjCckH1dGl9/53OHqS4aDj93D5j4wKliOu1/ppmfMiGvE63Kjs3gtTthN1mXNUuNZe9ftI6m/CUDMJPMdZBgn60LL7IrIwkuPQyBjy63OsGj9xMiwMzghy5NxiLfIBkk1IUYpfzUE6riKMx5QBz5BYslgt7IYM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=YnOkUDS+; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1709738700; bh=EDc76FHAyYlhGhEF+aSjah9SCCYqShQbC/emjwhHXkg=; h=From:To:Cc:Subject:Date:From; b=YnOkUDS+IHPn0lJQvlN8WhKDbpSlcVQnSJwwRrYueGB8MJ5FDfNDq/oDA0o+70lN3 LcfAMtGAX6z1f7nfCYfjMA25S7XR4VHogs5D9rTA7A/w+agmGTehnpnd6fV3GAJaPZ PuPCB7QJGjdmiK6wrO6KOgWkL7SXMeryyFy+JTME/f2hSak2hKjx+XkkVTrFVWRBP/ sovfbmGECCbBIhPkzFVee34VLcJ4ZV0ob6ad4GmBVmy2BaaoLQwDVPCXX71ZX13hxN gfsX3nR9nCyE0DZS38twlXcMNvp2q4xImvYoQoYs2VRuuOqEHtLdiwYjf24sPj7CoS MT9gE0i++0CnA== Received: from thinkos.internal.efficios.com (192-222-143-198.qc.cable.ebox.net [192.222.143.198]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4TqbpM6s7RzfyB; Wed, 6 Mar 2024 10:24:59 -0500 (EST) From: Mathieu Desnoyers To: "levi . yun" Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , stable@vger.kernel.org, Catalin Marinas , Mark Rutland , Will Deacon , Peter Zijlstra , Aaron Lu Subject: [RFC PATCH] sched: Add missing memory barrier in switch_mm_cid Date: Wed, 6 Mar 2024 10:24:43 -0500 Message-Id: <20240306152443.6340-1-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Many architectures' switch_mm() (e.g. arm64) do not have an smp_mb() which the core scheduler code has depended upon since commit: commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid") If switch_mm() doesn't call smp_mb(), sched_mm_cid_remote_clear() can unset the activly used cid when it fails to observe active task after it sets lazy_put. The *is* a memory barrier between storing to rq->curr and _return to userspace_ (as required by membarrier), but the rseq mm_cid has stricter requirements: the barrier needs to be issued between store to rq->curr and switch_mm_cid(), which happens earlier than: - spin_unlock(), - switch_to(). So it's fine when the architecture switch_mm happens to have that barrier already, but less so when the architecture only provides the full barrier in switch_to() or spin_unlock(). It is a bug in the rseq switch_mm_cid() implementation. All architectures that don't have memory barriers in switch_mm(), but rather have the full barrier either in finish_lock_switch() or switch_to() have them too late for the needs of switch_mm_cid(). Introduce a new smp_mb__after_switch_mm(), defined as smp_mb() in the generic barrier.h header, and use it in switch_mm_cid() for scheduler transitions where switch_mm() is expected to provide a memory barrier. Architectures can override smp_mb__after_switch_mm() if their switch_mm() implementation provides an implicit memory barrier. Override it with a no-op on x86 which implicitly provide this memory barrier by writing to CR3. Reported-by: levi.yun Signed-off-by: Mathieu Desnoyers Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid") Cc: # 6.4.x Cc: Mathieu Desnoyers Cc: Catalin Marinas Cc: Mark Rutland Cc: Will Deacon Cc: Peter Zijlstra Cc: Aaron Lu --- arch/x86/include/asm/barrier.h | 3 +++ include/asm-generic/barrier.h | 8 ++++++++ kernel/sched/sched.h | 19 +++++++++++++------ 3 files changed, 24 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index 35389b2af88e..0d5e54201eb2 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -79,6 +79,9 @@ do { \ #define __smp_mb__before_atomic() do { } while (0) #define __smp_mb__after_atomic() do { } while (0) +/* Writing to CR3 provides a full memory barrier in switch_mm(). */ +#define smp_mb__after_switch_mm() do { } while (0) + #include /* diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h index 961f4d88f9ef..5a6c94d7a598 100644 --- a/include/asm-generic/barrier.h +++ b/include/asm-generic/barrier.h @@ -296,5 +296,13 @@ do { \ #define io_stop_wc() do { } while (0) #endif +/* + * Architectures that guarantee an implicit smp_mb() in switch_mm() + * can override smp_mb__after_switch_mm. + */ +#ifndef smp_mb__after_switch_mm +#define smp_mb__after_switch_mm() smp_mb() +#endif + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_GENERIC_BARRIER_H */ diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2e5a95486a42..638ebd355912 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -79,6 +79,8 @@ # include #endif +#include + #include "cpupri.h" #include "cpudeadline.h" @@ -3481,13 +3483,18 @@ static inline void switch_mm_cid(struct rq *rq, * between rq->curr store and load of {prev,next}->mm->pcpu_cid[cpu]. * Provide it here. */ - if (!prev->mm) // from kernel + if (!prev->mm) { // from kernel smp_mb(); - /* - * user -> user transition guarantees a memory barrier through - * switch_mm() when current->mm changes. If current->mm is - * unchanged, no barrier is needed. - */ + } else { // from user + /* + * user -> user transition relies on an implicit the + * memory barrier in switch_mm() when current->mm + * changes. If the architecture switch_mm() does not + * have an implicit memory barrier, it is emitted here. + * If current->mm is unchanged, no barrier is needed. + */ + smp_mb__after_switch_mm(); + } } if (prev->mm_cid_active) { mm_cid_snapshot_time(rq, prev->mm); -- 2.39.2