Received: by 2002:ab2:6991:0:b0:1f7:f6c3:9cb1 with SMTP id v17csp1075858lqo; Thu, 9 May 2024 04:13:47 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVJ6Jb7KpWj8Kj/KIXe0VRRBqYEd9OwULzOThFRD/Vu2CITX82+upb7/2SAuf4tYUMgxZhXb3Mkjk4D1BZdnLB0yU6GGLHtFBL8KpiE/A== X-Google-Smtp-Source: AGHT+IGkYcga3ppP5TfdCH/Qfypc73kngH8O68jF91ktSEoIFrWpbMEYOkC630MQWhPzgu+XSKxV X-Received: by 2002:a17:90b:104c:b0:2b6:1785:b85 with SMTP id 98e67ed59e1d1-2b65fa1b8e7mr3568081a91.10.1715253226877; Thu, 09 May 2024 04:13:46 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715253226; cv=pass; d=google.com; s=arc-20160816; b=zxPT/gf+GgN/xkzexN9lf0+rIs4KeF0FtNxxcod7DYT8DnD7UTtFPHZ75sFsDhuVTp GHpv9oR1LlVIMVeA4dXkDwxISk8dpLVHqE6H4+4OWNPUfnoe/qekzhgkvP0Tg7upkwRr 1M2TPFmuh1LBxQOn/RypCsWJVG/U8i8ebid06+P0f69mc4aTT3fIHBVi+Lw//K6InRo6 ASaTbIPVPFQXytxGES21m1UMR4785SgliKa7Coq+1zSIhXG7VGQyUHkYCYYTvaJ+HxjK QhGr9sL+26qyEQqZtJNViuULjgrW/gSOaZEDutNsVGiVTsKyxdQPKSCvnPEyrYOt4mIs 1Jtw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:subject:cc:to:from:date; bh=2o7LXOnGDsRN2RBQaNL8LmcSQqASdSf9IFKvZD0zsm0=; fh=67rJ76J4xMop/rdHsh9DEsS+uX1nxgwmddW5uMOAhl0=; b=gw6yzcu38QKMvW/NJ2+i/6EybvNoLVwPvkGlKdQq/ptUkyGRW7/YUEz6yJEWrrc6+J w/VW/P/qa57swUcHLq/Sr8tNH4h6OkLszmnMOPuSxp3oZJQJ5XZw0BHOq58GQ3vjzd3g SNXYP2Ey/76p5S/cAwWMgHy6TsnwDQdA9+6sgilFkvhfysxCP4CfqKgTk/agxfjchhB5 BIhCTifkWpqU4IM69tbgnkim06ZDluzakSu8T2T3qgl/pY51LNbcJnV355RlmbaczVUI urs3E0pQ5U79zlGqsmVsyZuejgma2r1DdVEtXbaSWJAfvT3eUAESzo+o0A5KXhH3FBjt DYxg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-174440-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-174440-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id 98e67ed59e1d1-2b628ea79b6si3390626a91.112.2024.05.09.04.13.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 May 2024 04:13:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-174440-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-174440-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-174440-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 5B6BE282818 for ; Thu, 9 May 2024 11:13:46 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4B4DB130E55; Thu, 9 May 2024 11:13:41 +0000 (UTC) Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C4F5312FF8E; Thu, 9 May 2024 11:13:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715253220; cv=none; b=HwB6cX6lpQMZnZ8pR8Lf//a9YVrCGK6elp60+LmfNt3IFuWGss/R6un48x3RmJO6ojlWCE+CQSE8SO1dEr9B7QTvYmmHw80abV0v4nkflkZ+wDqzZP3G+g9ohQYK1jilLleaRq9TybNkiABAQctAbZRxPFh1wB/C8GsUv5ZeI74= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715253220; c=relaxed/simple; bh=BHSGlKUMbh6We/MDoY819/cZjVhr7tCn9O1b/b0XEdU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=XdzDTW88p9HspoMZBovvVgEYw23L93xK1OdmS/y5k6wZK2Q806G6is5wWqj5d2Bj3QReHEfW7+AAZUWV8GFKkwRM7QDVxHxejGHdVBRsiaanaT/mt2S1Q9pjOOI4sPq6JpnojnrJhWWHpBfv7sdSptexi7It3PdyNOJGvbaRHaQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 21B24C116B1; Thu, 9 May 2024 11:13:33 +0000 (UTC) Date: Thu, 9 May 2024 12:13:31 +0100 From: Catalin Marinas To: Ard Biesheuvel Cc: Alex =?iso-8859-1?Q?Benn=E9e?= , Will Deacon , Hector Martin , Marc Zyngier , Mark Rutland , Zayd Qumsieh , Justin Lu , Ryan Houdek , Mark Brown , Mateusz Guzik , Anshuman Khandual , Oliver Upton , Miguel Luis , Joey Gouly , Christoph Paasch , Kees Cook , Sami Tolvanen , Baoquan He , Joel Granados , Dawei Li , Andrew Morton , Florent Revest , David Hildenbrand , Stefan Roesch , Andy Chiu , Josh Triplett , Oleg Nesterov , Helge Deller , Zev Weiss , Ondrej Mosnacek , Miguel Ojeda , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Asahi Linux Subject: Re: [PATCH 0/4] arm64: Support the TSO memory model Message-ID: References: <20240411-tso-v1-0-754f11abfbff@marcan.st> <20240411132853.GA26481@willie-the-truck> <87seythqct.fsf@draig.linaro.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Tue, May 07, 2024 at 04:52:30PM +0200, Ard Biesheuvel wrote: > On Tue, 7 May 2024 at 12:24, Alex Benn?e wrote: > > I think the main use case here is for emulation. When we run x86-on-arm > > in QEMU we do currently insert lots of extra barrier instructions on > > every load and store. If we can probe and set a TSO mode I can assure > > you we'll do the right thing ;-) > > Without a public specification of what TSO mode actually entails, > deciding which of those barriers can be dropped is not going to be as > straight-forward as you make it out to be. > > Apple's TSO mode is vertically integrated with Rosetta, which means > that TSO mode provides whatever Rosetta needs to run x86 code > correctly, and that it could mean different things on different > generations of the micro-architecture. And whether Apple's TSO is the > same as Fujitsu's is anyone's guess afaik. Indeed. Apart from using impdef registers, that's what I think is the second biggest problem with this feature (and the corresponding patches). We don't know the precise memory model, we can't tell whether this TSO bit is stored in the TLB. If it is, is it per ASID/VMID? The other problem Marc raised is what memory model is between two CPUs where only one has the TSO bit set? Does it only break the TSO model or is there a chance that it also breaks the default relaxed model? What other TSO flavours are out there, how do they compare with the Apple one? > Running a game and seeing it perform better is great, but it is not > the kind of rigor we usually attempt to apply when adding support for > architectural features. Hopefully, there will be some architectural > support for this in the future, but without any spec that defines the > memory model it implements, I am not convinced we should merge this. There is FEAT_LRCPC (available on Apple Silicon from M2 onwards). Rather than having a big knob to turn TSO on or off, this feature introduces instructions that permit a code generator to get the TSO semantics in a more efficient way (e.g. using LDAPR+STLR instead of the stricter LDAR+STLR; not sure how well these are implemented on the Apple Silicon). There are further improvements in FEAT_LRCPC{2,3} (with the latter adding support for SIMD but not available in hardware yet). So the direction from Arm is pretty clear, acknowledging that there is a need for such TSO emulation but not in the way of undocumented impdef registers. Whether more is needed here, I guess people working on emulators could reach out to Arm or CPU vendors with suggestions (the path to the architects is not straightforward, usually legal has a say, but it's doable, there are formal channels already). I see the impdef hardware TSO options as temporary until CPU implementations catch up to architected FEAT_LRCPC*. Given the problems already stated in this thread, I think such hacks should be carried downstream and (hopefully) will eventually vanish. Maybe those TSO knobs currently make an emulation faster than FEAT_LRCPC* but that's feedback to go to the microarchitects on the implementation (or architects on what other instructions should be covered). -- Catalin