Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp222107rdb; Tue, 5 Dec 2023 03:49:19 -0800 (PST) X-Google-Smtp-Source: AGHT+IGSXMws92EKwMRDMZNvva9bvWAse4LlXw++Far3J0LQLqAy78+r3qbxGch8PK5TThVdi7LL X-Received: by 2002:a05:6358:50c6:b0:170:17ea:f4db with SMTP id m6-20020a05635850c600b0017017eaf4dbmr6222678rwm.40.1701776958809; Tue, 05 Dec 2023 03:49:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701776958; cv=none; d=google.com; s=arc-20160816; b=cn3ea2ojKGv+HMIxRGk6ICWUWPLyh3XALEh5So1HyV6sOzCjk8cheoDyAElzyp8ma/ eiXqbqyuQsA1vh+rZfEa1zFZmdv4JkxzIR7g3hrXmr9fmK0fHbI/jbTuplXwynrXUoI5 BKcymxQfNW5veS2Kh6/n52GiTnQBnXoz6jjjSZ3X36yBV1ujiNWUC3mGgvRlPfPv+OZk HB0r2s2bcUwU45iPd9Tm67SOHLIk3X/6wQBi2wtW/ZHFJT5BA+StSZIAkcn9yiLhPVFq xN4J94EIz3QVzZggpgO0vLjqCPe2nnim63EsMzU+Xy7yGLmCMDpWh5m5gtOJrmNs6ema PKYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=ZZorpb1MYfZGIWBNZ9petjKIJx0U8q0X8VSjaRphpyA=; fh=xRt+OkpTw3YooXUBHit4AXNMqPFm1mIwp7QRSqcdGyg=; b=Wx3i/JaGMfE15X+MPif+4Yvi3PbYIrGk+8CjSGH2o4nK2lJIym8kRrR/H4Ei5bk/To jd64zkpWnOtZ39ZEfOk+e1W5KLztYwCXEPNtELxjgZajXdcAnWPpbYX/Db34aS+aS4FV kiCqoXDY5934R1eCDYBkqTxby6fR2wC3+VjeWV17WOMKheB5E8iqIi+tkml3rD02fpZ4 UDLykJwJAceEGi8U5fyXVmFn2HnLqt5PSG81kF9maTSv+pJRMwbByANbQQJyMbp2uX9k rTo6/o2FdGNQlIPDnKn8XsaXEyRaCfUZ6PfA6XYC2TofgpliFrkkSeBqMprfLuxG9OV+ bmbg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id f10-20020a056a00228a00b006cbeff5ae49si3469043pfe.3.2023.12.05.03.49.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 03:49:18 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 71D5E82289AC; Tue, 5 Dec 2023 03:49:15 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347027AbjLELs7 (ORCPT + 99 others); Tue, 5 Dec 2023 06:48:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37506 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345333AbjLELs6 (ORCPT ); Tue, 5 Dec 2023 06:48:58 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 414F5A7 for ; Tue, 5 Dec 2023 03:49:04 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A6136C433C7; Tue, 5 Dec 2023 11:48:59 +0000 (UTC) Date: Tue, 5 Dec 2023 11:48:57 +0000 From: Catalin Marinas To: ankita@nvidia.com Cc: jgg@nvidia.com, maz@kernel.org, oliver.upton@linux.dev, suzuki.poulose@arm.com, yuzenghui@huawei.com, will@kernel.org, ardb@kernel.org, akpm@linux-foundation.org, gshan@redhat.com, aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com, mochs@nvidia.com, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH v2 1/1] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory Message-ID: References: <20231205033015.10044-1-ankita@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231205033015.10044-1-ankita@nvidia.com> X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Tue, 05 Dec 2023 03:49:15 -0800 (PST) On Tue, Dec 05, 2023 at 09:00:15AM +0530, ankita@nvidia.com wrote: > From: Ankit Agrawal > > Currently, KVM for ARM64 maps at stage 2 memory that is considered device > (i.e. it is not RAM) with DEVICE_nGnRE memory attributes; this setting > overrides (as per the ARM architecture [1]) any device MMIO mapping > present at stage 1, resulting in a set-up whereby a guest operating > system cannot determine device MMIO mapping memory attributes on its > own but it is always overridden by the KVM stage 2 default. > > This set-up does not allow guest operating systems to select device > memory attributes independently from KVM stage-2 mappings > (refer to [1], "Combining stage 1 and stage 2 memory type attributes"), > which turns out to be an issue in that guest operating systems > (e.g. Linux) may request to map devices MMIO regions with memory > attributes that guarantee better performance (e.g. gathering > attribute - that for some devices can generate larger PCIe memory > writes TLPs) and specific operations (e.g. unaligned transactions) > such as the NormalNC memory type. > > The default device stage 2 mapping was chosen in KVM for ARM64 since > it was considered safer (i.e. it would not allow guests to trigger > uncontained failures ultimately crashing the machine) but this > turned out to be asynchronous (SError) defeating the purpose. > > Failures containability is a property of the platform and is independent > from the memory type used for MMIO device memory mappings. > > Actually, DEVICE_nGnRE memory type is even more problematic than > Normal-NC memory type in terms of faults containability in that e.g. > aborts triggered on DEVICE_nGnRE loads cannot be made, architecturally, > synchronous (i.e. that would imply that the processor should issue at > most 1 load transaction at a time - it cannot pipeline them - otherwise > the synchronous abort semantics would break the no-speculation attribute > attached to DEVICE_XXX memory). > > This means that regardless of the combined stage1+stage2 mappings a > platform is safe if and only if device transactions cannot trigger > uncontained failures and that in turn relies on platform capabilities > and the device type being assigned (i.e. PCIe AER/DPC error containment > and RAS architecture[3]); therefore the default KVM device stage 2 > memory attributes play no role in making device assignment safer > for a given platform (if the platform design adheres to design > guidelines outlined in [3]) and therefore can be relaxed. > > For all these reasons, relax the KVM stage 2 device memory attributes > from DEVICE_nGnRE to Normal-NC. Add a new kvm_pgtable_prot flag for > Normal-NC. > > The Normal-NC was chosen over a different Normal memory type default > at stage-2 (e.g. Normal Write-through) to avoid cache allocation/snooping. > > Relaxing S2 KVM device MMIO mappings to Normal-NC is not expected to > trigger any issue on guest device reclaim use cases either (i.e. device > MMIO unmap followed by a device reset) at least for PCIe devices, in that > in PCIe a device reset is architected and carried out through PCI config > space transactions that are naturally ordered with respect to MMIO > transactions according to the PCI ordering rules. > > Having Normal-NC S2 default puts guests in control (thanks to > stage1+stage2 combined memory attributes rules [1]) of device MMIO > regions memory mappings, according to the rules described in [1] > and summarized here ([(S1) - stage1], [(S2) - stage 2]): > > S1 | S2 | Result > NORMAL-WB | NORMAL-NC | NORMAL-NC > NORMAL-WT | NORMAL-NC | NORMAL-NC > NORMAL-NC | NORMAL-NC | NORMAL-NC > DEVICE | NORMAL-NC | DEVICE > > It is worth noting that currently, to map devices MMIO space to user > space in a device pass-through use case the VFIO framework applies memory > attributes derived from pgprot_noncached() settings applied to VMAs, which > result in device-nGnRnE memory attributes for the stage-1 VMM mappings. > > This means that a userspace mapping for device MMIO space carried > out with the current VFIO framework and a guest OS mapping for the same > MMIO space may result in a mismatched alias as described in [2]. > > Defaulting KVM device stage-2 mappings to Normal-NC attributes does not > change anything in this respect, in that the mismatched aliases would > only affect (refer to [2] for a detailed explanation) ordering between > the userspace and GuestOS mappings resulting stream of transactions > (i.e. it does not cause loss of property for either stream of > transactions on its own), which is harmless given that the userspace > and GuestOS access to the device is carried out through independent > transactions streams. > > [1] section D8.5 - DDI0487_I_a_a-profile_architecture_reference_manual.pdf > [2] section B2.8 - DDI0487_I_a_a-profile_architecture_reference_manual.pdf > [3] sections 1.7.7.3/1.8.5.2/appendix C - DEN0029H_SBSA_7.1.pdf > > Applied over next-20231201 > > History > ======= > v1 -> v2 > - Updated commit log to the one posted by > Lorenzo Pieralisi (Thanks!) > - Added new flag to represent the NORMAL_NC setting. Updated > stage2_set_prot_attr() to handle new flag. > > v1 Link: > https://lore.kernel.org/all/20230907181459.18145-3-ankita@nvidia.com/ > > Signed-off-by: Ankit Agrawal > Suggested-by: Jason Gunthorpe > Acked-by: Catalin Marinas In the light of having to keep this relaxation only for PCIe devices, I will withdraw my Ack until we come to a conclusion. Thanks. -- Catalin