Received: by 2002:a05:6500:1b8f:b0:1fa:5c73:8e2d with SMTP id df15csp861455lqb; Wed, 29 May 2024 12:47:30 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVuOPhtqOGeEmhFMMMkvMyO6GHef08X1QkB0w7tUXA1X8PmjjWZ8+thGHHez0uVpMORqVXLfi9dleV+AZ9Bez8pP3qdVWFJDzAslGdfCQ== X-Google-Smtp-Source: AGHT+IH0oNJauRAavVclfAsuyNBLoinNgTEdUgpKhbX8INdCfuPvGWz7VWXR/Fnyn+yG8tZWV3Dg X-Received: by 2002:a05:6a20:9145:b0:1af:dbe7:ccbe with SMTP id adf61e73a8af0-1b2645e2916mr17228637.28.1717012050098; Wed, 29 May 2024 12:47:30 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717012050; cv=pass; d=google.com; s=arc-20160816; b=YhQevp2Dsp+RA07LAm3giVlePAA3FvAWMg54imR5/Pn+m7qyi/VJXBH0qPdv/M47wD 6QsEbo0nafEBnVdo6+S6KLiGTpJsXJ9JxDoq5kEE2QUpse2HPiXICERWJbCAsvzh7Xos Pt2cwN8+2T9Rc/5+jbHTDB1SaEGycobZMGQtqSkuDTG/0DYfqiPgh+3/9eG/1MEOA65g mytZ23V6eKbgRjujoqqr8JjOP1tCjTNVHSULZUetEwE2XUehLvrYNqGOtqDBpKZQ2HJy GSHsT7W2KZR9A4yXJ9TPSXYxS3xIJK3GQ/nGqk2x0loUeQzAY97uCZZUvSa9nslC14FO 9wSg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:message-id:content-transfer-encoding:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:subject:date :from:dkim-signature; bh=NWh5cQcTD0smrHrmSKap1gI/8FThLZcYBoglQPc0btU=; fh=D8DUY8DLX9GOR2wagqyWOTDvDNzhdaXBOtPI+ZZMA+g=; b=SkdGRZfc73GKrYl7srf8dPWkflDKVYkuSjvbZfQhLtjp61VWxUGgMdf/l9J5Blbss0 2yK1F7VoRRe+4GhbYiBEyBbtkZZwA5ZgvBAAYxNytbymwh5e7N4X8yymfBYJSEnRpQb7 tuDWvoa6/eEIm/Ddn0eaQCz6ajb8L6XTHmBwplrCAKywgiiamm80xUCU0cHynMFJQwIh Mkw3mfwf5A9Q3c64uplRxOz/BGX0TSSJMiIUwo2s4Xnyr9KyplsiNhj7l+5s8mMbmI7o 8kWRkQUKaw6Ucn4RUjodzmby7VAG2ec9h0uzfjPxrdNtP8TsT39J2h8T9zhIH9WbQDu4 2v3Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=aqEn69C5; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-194619-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-194619-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id 41be03b00d2f7-6822b13b2fcsi11245559a12.865.2024.05.29.12.47.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 May 2024 12:47:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-194619-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=aqEn69C5; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-194619-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-194619-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id BC1C5B23076 for ; Wed, 29 May 2024 19:47:26 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4707E1C2330; Wed, 29 May 2024 19:47:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="aqEn69C5" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 374C5167288 for ; Wed, 29 May 2024 19:47:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717012039; cv=none; b=nDYRKwI5GFu7qI5MPAqh5FY9AL2KG9EePYOn3/wVdrzbuaMCTZU8jJY36ZFNrKQisVc829TBAX7GgfoLgZcWc+czRLpIZqC8Sq5QPqiI7i0lZ8uYMV5P8Qy2K6tvlPtngbYJwJikW2Ng3ChYZVcCGwi/z2qQU1AEtINdSF/g/Mg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717012039; c=relaxed/simple; bh=xkEY6FxchUmTRP4jMudqfomHT3i3V+clLDHnrfgdBnk=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=s2Oml5iEvP0vtv4DkhtOWoMoAfZt2STMRFzFhH/jfoQz2rexc3FmrXhT6DFytx/+xlr8f2Gd3EtX3zIRJfENkwD3doKOKKSXRoplv3d5orgBrgvLdBK3kVips65QxBUB32xmhQ3PxhsOeGhhlQtyl4hoIgdr85p9fhfPKi8yBIY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=aqEn69C5; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4A6BEC113CC; Wed, 29 May 2024 19:47:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1717012038; bh=xkEY6FxchUmTRP4jMudqfomHT3i3V+clLDHnrfgdBnk=; h=From:Date:Subject:To:Cc:From; b=aqEn69C5+/hn7g8MFqmzHSHIVbPFk0J/XtjPotq5EvXdO+q1GlzB8zWa6XzUAjmsm BJBrFjLG+YsUjjnftZ7xHKLuNmJnaufh4tTu3EaPSg1XIjxqAYMZ8brlOyvzEc14cn jlDDTGJO8wxQquR33s1K5nhK6M17tF8QlxDI9ExHe28XVb9gbHDoPJaOoE/0kgOjln SmzNuFplgL3LVqKZPTlJRa+NVKVe9zZigmaImOjshnCVqYrJH3hkJYNoPK8rfbBJ9I VYOZPEOWtWKL8lJ2fKdSHANV3G2GfsW3PFhKhca8a0CMKCfliNQzvqy0FescCrGTiM QI2tVu+VbKxhQ== From: Mark Brown Date: Wed, 29 May 2024 20:46:23 +0100 Subject: [PATCH v6] arm64/fpsimd: Suppress SVE access traps when loading FPSIMD state Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20240529-arm64-sve-trap-mitigation-v6-1-c2037be6aced@kernel.org> X-B4-Tracking: v=1; b=H4sIAA6GV2YC/43P0WrEIBAF0F9ZfK7FGTVqn/ofZR9sHLPSbrJok C5L/r1moV1KIPTxDtwz3BsrlBMV9nK4sUw1lTSNLXRPB9af/DgQT6FlhgKlsMJwn8+d4qUSn7O /8HOa0+Dn1uJIhrDrZG+VY61/yRTT191+O7Z8SmWe8vX+qsJ6/Y9agQMPDokCBLRav35QHunze coDW9mKD8qB3KOwUfAeKEZpEQxsKPlLAexTslHKGNe7RikrN5T6oZQAxD1KNUorEsFYLz25DaU flBJ6j9LrQOwioQZN8e/AZVm+AU8u2RPvAQAA To: Catalin Marinas , Will Deacon Cc: Benjamin Herrenschmidt , Dave Martin , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Mark Brown X-Mailer: b4 0.14-dev-2ee9f X-Developer-Signature: v=1; a=openpgp-sha256; l=7770; i=broonie@kernel.org; h=from:subject:message-id; bh=xkEY6FxchUmTRP4jMudqfomHT3i3V+clLDHnrfgdBnk=; b=owEBbQGS/pANAwAKASTWi3JdVIfQAcsmYgBmV4ZE7xW5vhRP3XJ7Ok8NbZhyFfHPgHGvDBM7/SAq H0qObnSJATMEAAEKAB0WIQSt5miqZ1cYtZ/in+ok1otyXVSH0AUCZleGRAAKCRAk1otyXVSH0KrlB/ 4/mpwXce2z6zIPcyGHmsAIgPWmD3K/7PGOGz+Y0sHELatcL0Ad/KGZvXBLz/pxTqlX4A6vD7hCJQAI 5RyI06SM8mWD1x1dIJ4aIUgD2+S78uKSMV45NRMXPVAAzuUnQwXAMKBmcQMPYwwEf+SEosikBGmYOj CbV9FXUzGx0Ji1a5YuZhD82P7A+QfurwYgu1qS32t8G3r06GxOKNeMm9P1XZH9zhRKtDfqmgHWpPnD NmUkwgcg4s0xUjFt4KtD2xT3C0dexOI+oqddd9qrWYqOQnjzninmIkqUbiS1besSNiI8lrxGgu/mkk KsOI9yLSsVPBlAZgd9nm32s4JaYRZ4 X-Developer-Key: i=broonie@kernel.org; a=openpgp; fpr=3F2568AAC26998F9E813A1C5C3F436CA30F5D8EB When we are in a syscall we take the opportunity to discard the SVE state, saving only the FPSIMD subset of the register state. When we reload the state from memory we reenable SVE access traps, stopping tracking SVE until the task takes another SVE access trap. This means that for a task which is actively using SVE many blocking system calls will have the additional overhead of a SVE access trap. As SVE deployment is progressing we are seeing much wider use of the SVE instruction set, including performance optimised implementations of operations like memset() and memcpy(), which mean that even tasks which are not obviously floating point based can end up with substantial SVE usage. It does not, however, make sense to just unconditionally use the full SVE register state all the time since it is larger than the FPSIMD register state so there is overhead saving and restoring it on context switch and our requirement to flush the register state not shared with FPSIMD on syscall also creates a noticeable overhead on system call. I did some instrumentation which counted the number of SVE access traps and the number of times we loaded FPSIMD only register state for each task. Testing with Debian Bookworm this showed that during boot the overwhelming majority of tasks triggered another SVE access trap more than 50% of the time after loading FPSIMD only state with a substantial number near 100%, though some programs had a very small number of SVE accesses most likely from startup. There were few tasks in the range 5-45%, most tasks either used SVE frequently or used it only a tiny proportion of times. As expected older distributions which do not have the SVE performance work available showed no SVE usage in general applications. This indicates that there should be some useful benefit from reducing the number of SVE access traps for blocking system calls like we did for non blocking system calls in commit 8c845e273104 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch"). Let's do this with a timeout, when we take a SVE access trap record a jiffies after which we'll reeanble SVE traps then check this whenver we load a FPSIMD only floating point state from memory. If the time has passed then we reenable traps, otherwise we leave traps disabled and flush the non-shared register state like we would on trap. The timeout is currently set to a second, I pulled this number out of thin air so there is doubtless some room for tuning. This means that for a task which is actively using SVE the number of SVE access traps will be substantially reduced but applications which use SVE only very infrequently will avoid the overheads associated with tracking SVE state after a second. The extra cost from additional tracking of SVE state only occurs when a task is preempted so short running tasks should be minimally affected. There should be no functional change resulting from this, it is purely a performance optimisation. Signed-off-by: Mark Brown --- Changes in v6: - Rebase onto v6.10-rc1. - Link to v5: https://lore.kernel.org/r/20240405-arm64-sve-trap-mitigation-v5-1-126fe2515ef1@kernel.org Changes in v5: - Rebase onto v6.9-rc1. - Use a timeout rather than number of state loads to decide when to reenable traps. - Link to v4: https://lore.kernel.org/r/20240122-arm64-sve-trap-mitigation-v4-1-54e0d78a3ae9@kernel.org Changes in v4: - Rebase onto v6.8-rc1. - Link to v3: https://lore.kernel.org/r/20231113-arm64-sve-trap-mitigation-v3-1-4779c9382483@kernel.org Changes in v3: - Rebase onto v6.7-rc1. - Link to v2: https://lore.kernel.org/r/20230913-arm64-sve-trap-mitigation-v2-1-1bdeff382171@kernel.org Changes in v2: - Rebase onto v6.6-rc1. - Link to v1: https://lore.kernel.org/r/20230807-arm64-sve-trap-mitigation-v1-1-d92eed1d2855@kernel.org --- arch/arm64/include/asm/processor.h | 1 + arch/arm64/kernel/fpsimd.c | 42 ++++++++++++++++++++++++++++++++------ 2 files changed, 37 insertions(+), 6 deletions(-) diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index f77371232d8c..7a6ed0551291 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -164,6 +164,7 @@ struct thread_struct { unsigned int fpsimd_cpu; void *sve_state; /* SVE registers, if any */ void *sme_state; /* ZA and ZT state, if any */ + unsigned long sve_timeout; /* jiffies to drop TIF_SVE */ unsigned int vl[ARM64_VEC_MAX]; /* vector length */ unsigned int vl_onexec[ARM64_VEC_MAX]; /* vl after next exec */ unsigned long fault_address; /* fault info */ diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 82e8a6017382..4741e4fb612a 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -354,6 +354,7 @@ static void task_fpsimd_load(void) { bool restore_sve_regs = false; bool restore_ffr; + unsigned long sve_vq_minus_one; WARN_ON(!system_supports_fpsimd()); WARN_ON(preemptible()); @@ -365,18 +366,12 @@ static void task_fpsimd_load(void) if (system_supports_sve() || system_supports_sme()) { switch (current->thread.fp_type) { case FP_STATE_FPSIMD: - /* Stop tracking SVE for this task until next use. */ - if (test_and_clear_thread_flag(TIF_SVE)) - sve_user_disable(); break; case FP_STATE_SVE: if (!thread_sm_enabled(¤t->thread) && !WARN_ON_ONCE(!test_and_set_thread_flag(TIF_SVE))) sve_user_enable(); - if (test_thread_flag(TIF_SVE)) - sve_set_vq(sve_vq_from_vl(task_get_sve_vl(current)) - 1); - restore_sve_regs = true; restore_ffr = true; break; @@ -395,6 +390,15 @@ static void task_fpsimd_load(void) } } + /* + * If SVE has been enabled we may keep it enabled even if + * loading only FPSIMD state, so always set the VL. + */ + if (system_supports_sve() && test_thread_flag(TIF_SVE)) { + sve_vq_minus_one = sve_vq_from_vl(task_get_sve_vl(current)) - 1; + sve_set_vq(sve_vq_minus_one); + } + /* Restore SME, override SVE register configuration if needed */ if (system_supports_sme()) { unsigned long sme_vl = task_get_sme_vl(current); @@ -421,6 +425,25 @@ static void task_fpsimd_load(void) } else { WARN_ON_ONCE(current->thread.fp_type != FP_STATE_FPSIMD); fpsimd_load_state(¤t->thread.uw.fpsimd_state); + + /* + * If the task had been using SVE we keep it enabled + * when loading FPSIMD only state for a period to + * minimise overhead for tasks actively using SVE, + * disabling it periodicaly to ensure that tasks that + * use SVE intermittently do eventually avoid the + * overhead of carrying SVE state. The timeout is + * initialised when we take a SVE trap in in + * do_sve_acc(). + */ + if (system_supports_sve() && test_thread_flag(TIF_SVE)) { + if (time_after(jiffies, current->thread.sve_timeout)) { + clear_thread_flag(TIF_SVE); + sve_user_disable(); + } else { + sve_flush_live(true, sve_vq_minus_one); + } + } } } @@ -1397,6 +1420,13 @@ void do_sve_acc(unsigned long esr, struct pt_regs *regs) get_cpu_fpsimd_context(); + /* + * We will keep SVE enabled when loading FPSIMD only state for + * the next second to minimise traps when userspace is + * actively using SVE. + */ + current->thread.sve_timeout = jiffies + HZ; + if (test_and_set_thread_flag(TIF_SVE)) WARN_ON(1); /* SVE access shouldn't have trapped */ --- base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0 change-id: 20230807-arm64-sve-trap-mitigation-2e7e2663c849 Best regards, -- Mark Brown