Received: by 2002:ab2:7b86:0:b0:1f7:5705:b850 with SMTP id q6csp1494053lqh; Mon, 6 May 2024 09:13:05 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVbCRpWuqxfaAv0uwtW05lwgzwY5arxwUu/+c7+yKgwJ4iPU7PP91lFAXHzY3M9MuIEowb27l6i67pnh2hD7cPnt12zjY0JekEVYKEBog== X-Google-Smtp-Source: AGHT+IG+Dc3xZ6zLzZGbWmq2y/kkSWnDA9Lt/emuXqpOcx2q09iqsBWK6ffnBHuCjxv9nLJY2tTo X-Received: by 2002:a19:f003:0:b0:51f:6a38:be0c with SMTP id p3-20020a19f003000000b0051f6a38be0cmr6299884lfc.22.1715011984876; Mon, 06 May 2024 09:13:04 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715011984; cv=pass; d=google.com; s=arc-20160816; b=suAZwXZTiFb/pZ59kshPBwC300nLWF224IrBzY1Y/zCQRo/IhZVPgZxSOnaW68QBAR r50KotM3eK4W8pFImHpOn75w5IZmI9KQL6qyg7BEVxE0HHJFFCOZzLTI8JIr+gpTCO1a +Chk8u0bQz1Mfti1LJbSUjk/qlKzYzd0SCWmyobqx0j/HGcOORBwXPHDRuHq8nOK8E46 n+X2viOudDU+uUQy0Ek5N4QA/XkUvTElFJMEMEdPrkoyEhJqLTniBNAKIqlqx6/trl/Y Ont8PLlEv+pXv50ehfNPSMbzPE6pbiMK4axt3HT+xnjyI2kuF2c36YyY0QHlp5zmYrdz mpmQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :user-agent:references:in-reply-to:subject:cc:to:from:message-id :date:dkim-signature; bh=t/kUK1qZxuxY9gH9WI69kIMdymVL/mPxayyikfN602U=; fh=zeT4JIOvdFDsu/KDjLorsyEIIFH8G5CcaZXLWkuo7E8=; b=xsp62lSaplwRXKPq9D3IW9CSX3hUa4kqbrsEjvgE8YlgM0Q3dgdeQFxHZnSCPkxRWa 9yZ3vPfmWDvlO1+c7tr+zC9x4Pz6TXYUA660ZUKY9evssc1GfffGlFvGO7USEn9/slKj 70sQVc1FOpdr+c6nJjg1hxyIDaKC8s4kivTlfkaotLysdyyU2TNH5tJXt3OKtHCyAXlS A5RgdDut4SwVg5RGqvMwrmHIhfDX9CZBKzEhqY6Y+nCnmY1kpRIujfpi0P57joRkcwWp jxNHin6KOpI80XpyVrVmj+E6edxu4cokYDqdv4D03Z29xGsGdXtG8Dph3sQOOg6cS7Fw 96AQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=juXAToHN; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-170128-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-170128-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id d3-20020a50f683000000b005727d5f9a8bsi4086859edn.431.2024.05.06.09.13.04 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 May 2024 09:13:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-170128-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=juXAToHN; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-170128-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-170128-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 6D0731F237DD for ; Mon, 6 May 2024 16:13:04 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3FAD6142E63; Mon, 6 May 2024 16:12:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="juXAToHN" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 30F56155A57; Mon, 6 May 2024 16:12:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715011977; cv=none; b=q8QGSoaVVDPb/8GjzUpTTpBbzvg7+a9f//Xu3oQK73bVeOsOrLcP3dh7BFkM6/LLYZ55fLpPaXZ6WUjY1Ww9V4irhAhWveARDVnMMqm+OZo9s0OqfoPY+13BQaEYV7ELMoDW10/jSTIIC5OOuUFY1OV+13+Jjj90xa8GkEnWqG8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715011977; c=relaxed/simple; bh=6WL8Sjyjx9eJ7K+zXFfyOq+hI9vi3mF7je4WnOEK4K0=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=jkSzh9Ca/6PYnJMwCXvPPwCQJrWfLR/r3Wh4MD/JORykMiwWDPqi2OPGuHpWOuAWBdjV5k3EUkvMJMnM6U9i+0LViBMYssBnOcF53sYWpUfVxukCDTya0rF+cmP2Bi8EoZqOE2D8POvafAliPlUgiNOC45y+JWyIHCOhteRblxQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=juXAToHN; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id AA2BBC116B1; Mon, 6 May 2024 16:12:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715011976; bh=6WL8Sjyjx9eJ7K+zXFfyOq+hI9vi3mF7je4WnOEK4K0=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=juXAToHNoHXZx1uMNflWMouZ8TRAhvvKW1uPHoNpUw8YNnf81uAceUgVBYBU5rE1b zKsbY3wHdKDfUjq6HKSqw7Oo6GFO7JPRpDZ7brdWtrB1WLTWebHyAv1gdzKi9MvZFz Hjuo7RFAzoJmR0DODpQT5sByjRMGAfgMVypTMZE+1BVBzMs1pnotw9If9zgDFpTYOh IpALmsFXg0O7FmpS/iIdgTdSMWbT+Ch9rzeGYsIRsmSP3hwqNglQuLo2dkzPuctQYk KGEvvKG6moxZCbTdqKT47frpE80X4CL/2urRxzr/U6IV5wKHW/6MDk+fBpDwFzJ5z1 DKyNQcSyU7MYw== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1s40xR-00B1eK-P5; Mon, 06 May 2024 17:12:53 +0100 Date: Mon, 06 May 2024 17:12:53 +0100 Message-ID: <86y18mq5q2.wl-maz@kernel.org> From: Marc Zyngier To: Sergio Lopez Pascual Cc: Eric Curtin , Will Deacon , Hector Martin , Catalin Marinas , Mark Rutland , Zayd Qumsieh , Justin Lu , Ryan Houdek , Mark Brown , Ard Biesheuvel , Mateusz Guzik , Anshuman Khandual , Oliver Upton , Miguel Luis , Joey Gouly , Christoph Paasch , Kees Cook , Sami Tolvanen , Baoquan He , Joel Granados , Dawei Li , Andrew Morton , Florent Revest , David Hildenbrand , Stefan Roesch , Andy Chiu , Josh Triplett , Oleg Nesterov , Helge Deller , Zev Weiss , Ondrej Mosnacek , Miguel Ojeda , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Asahi Linux Subject: Re: [PATCH 0/4] arm64: Support the TSO memory model In-Reply-To: References: <20240411-tso-v1-0-754f11abfbff@marcan.st> <20240411132853.GA26481@willie-the-truck> <28ab55b3-e699-4487-b332-f1f20a6b22a1@marcan.st> <20240419165809.GA4020@willie-the-truck> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.2 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: slp@redhat.com, ecurtin@redhat.com, will@kernel.org, marcan@marcan.st, catalin.marinas@arm.com, mark.rutland@arm.com, zayd_qumsieh@apple.com, ih_justin@apple.com, Houdek.Ryan@fex-emu.org, broonie@kernel.org, ardb@kernel.org, mjguzik@gmail.com, anshuman.khandual@arm.com, oliver.upton@linux.dev, miguel.luis@oracle.com, joey.gouly@arm.com, cpaasch@apple.com, keescook@chromium.org, samitolvanen@google.com, bhe@redhat.com, j.granados@samsung.com, dawei.li@shingroup.cn, akpm@linux-foundation.org, revest@chromium.org, david@redhat.com, shr@devkernel.io, andy.chiu@sifive.com, josh@joshtriplett.org, oleg@redhat.com, deller@gmx.de, zev@bewilderbeest.net, omosnace@redhat.com, ojeda@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, asahi@lists.linux.dev X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Mon, 06 May 2024 12:21:40 +0100, Sergio Lopez Pascual wrote: > > Eric Curtin writes: > > > On Fri, 19 Apr 2024 at 18:08, Will Deacon wrote: > >> > >> On Thu, Apr 11, 2024 at 11:19:13PM +0900, Hector Martin wrote: > >> > On 2024/04/11 22:28, Will Deacon wrote: > >> > > * Some binaries in a distribution exhibit instability which goes away > >> > > in TSO mode, so a taskset-like program is used to run them with TSO > >> > > enabled. > >> > > >> > Since the flag is cleared on execve, this third one isn't generally > >> > possible as far as I know. > >> > >> Ah ok, I'd missed that. Thanks. > >> > >> > > In all these cases, we end up with native arm64 applications that will > >> > > either fail to load or will crash in subtle ways on CPUs without the TSO > >> > > feature. Assuming that the application cannot be fixed, a better > >> > > approach would be to recompile using stronger instructions (e.g. > >> > > LDAR/STLR) so that at least the resulting binary is portable. Now, it's > >> > > true that some existing CPUs are TSO by design (this is a perfectly > >> > > valid implementation of the arm64 memory model), but I think there's a > >> > > big difference between quietly providing more ordering guarantees than > >> > > software may be relying on and providing a mechanism to discover, > >> > > request and ultimately rely upon the stronger behaviour. > >> > > >> > The problem is "just" using stronger instructions is much more > >> > expensive, as emulators have demonstrated. If TSO didn't serve a > >> > practical purpose I wouldn't be submitting this, but it does. This is > >> > basically non-negotiable for x86 emulation; if this is rejected > >> > upstream, it will forever live as a downstream patch used by the entire > >> > gaming-on-Mac-Linux ecosystem (and this is an ecosystem we are very > >> > explicitly targeting, given our efforts with microVMs for 4K page size > >> > support and the upcoming Vulkan drivers). > > In addition to the use case Hector exposed here, there's another, > potentially larger one, which is running x86_64 containers on aarch64 > systems, using a combination of both Virtualization and emulation. > > In this scenario, both not being able to use TSO for emulation > and having to enable it all the time for the whole VM have a very large > impact on performance (~25% on some workloads). Well, there is always a price to pay somewhere, and this is the usual trade-off between performance and maintainability. > I understand the concern about the risk of userspace fragmentation, but > I was wondering if we could minimize it to an acceptable level by > narrowing down the context. For instance, since both use cases we're > bringing to the table imply the use of Virtualization, we should be able > to restrict PR_SET_MEM_MODEL to only be accepted when running on EL1 > (and not in nVHE nor pKVM), returning EINVAL otherwise. This would > heavily discourage users from relying on this feature for native > applications that can run on arbitrary contexts, hence drastically > reducing the fragmentation risk. As I explained in another sub-thread[1], I am not prepared to allow non architectural state to be exposed to a guest. I'm also not prepared to make significant ABI differences between VHE, nVHE, hVHE, with or without pKVM, because the job of the kernel is to abstract those differences. > We would still need a way to ensure the trap gets to the VMM and for > the VMM to operate on the impdef ACTLR_EL12, but that should be dealt on > a different series. The VMM can't use ACTLR_EL12, by the very definition of this register (the clue is in the name). You'd have to proxy the write in the kernel and context-switch it, which means adding non-architectural state to KVM, breaking VM migration and adding more kludges to the existing Apple-specific host crap. Also, let's realise that we are talking about making significant changes to the arm64 ABI for a platform that is still not fully supported in the upstream kernel. I have the feeling that changing the memory model dynamically may not be of the utmost priority until then. Thanks, M. [1] https://lore.kernel.org/all/867cgcqrb9.wl-maz@kernel.org -- Without deviation from the norm, progress is not possible.