Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp401219rdb; Tue, 5 Dec 2023 08:24:58 -0800 (PST) X-Google-Smtp-Source: AGHT+IGtECrs7FAypejB07zSTbOVrz7gj+FCJUTaqBSXvzeC662vHBwuVWSirssnVpOj3f6K3jQF X-Received: by 2002:a05:6a00:2450:b0:6ce:6265:fc21 with SMTP id d16-20020a056a00245000b006ce6265fc21mr1436940pfj.26.1701793498270; Tue, 05 Dec 2023 08:24:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701793498; cv=none; d=google.com; s=arc-20160816; b=UuSVKJ6cf+Xe/fffV5ACJGiwzdhR0WoklB2Og0ckIJ9zMjJ4/jhtGPNEMtYLiiQ72e kPWQr2VeDgt0lydw2ht2nqcO+it9RIUKCJHbQqMPvhIU/FYO4GFqjEiOXzpM0Xx7vAV7 LLP35YJE0pITRD3G+QE/tzFXDFq+r4sCfSzzEizTD4N76sd0mGq9NBEuXKnXjhqsMh6S 7LkNDzn8Can9lAijm4t+Gl2BEqDF9DkNcMYzcCMgTUVMHyY6R2LdphUoux5cAqydtSZC DbPL7VOejI3dRjQZDQE+fZwy93zzTg8bzOggz0y/ukHG7Zjc6pFPxgxNylB21W3imUE4 8vPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=zrtooFZurvQbawxpAKt786vHRX1DrUcQ0eQyuOplYIw=; fh=bEyFLQknfHvZVRjPYONukTUGOd/Q5gJXTq+COzby4r4=; b=KhGKRznnFmxDuH3Ji7lpYIM0wpaYnMkiONWGwtOTTwMaa0FEKPkn+0udX9BJiYziqs xgSwsIF7Chrje4k1KsGwYYg12eRUouRa/b2VakmPerwteB5lkkP+3YkQmJOzxIDu+6Cf i28hlffByb5OTP1dOh4T01RNPPkecC3L2g9o1ALiPpgWFcFG6qJK4BhbKUhb95lJSl7z BuvGei3wrOspIG+TL1E5/r/pxyp3Karbjve+pTvxyJ5js+ZPnjKmBUBUeeycSLzdKwZw +ZlxN1Dr2VaEsPCs0e5ESJzI+OWRB2Ob7QT9QNCn35CzSFRNASFJqg4KQRh708wQjAiI VO3A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id ca40-20020a056a0206a800b005c6bab48f48si62093pgb.111.2023.12.05.08.24.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 08:24:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 1532F807387C; Tue, 5 Dec 2023 08:22:50 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232019AbjLEQWf (ORCPT + 99 others); Tue, 5 Dec 2023 11:22:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54336 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231844AbjLEQWd (ORCPT ); Tue, 5 Dec 2023 11:22:33 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1045F9E for ; Tue, 5 Dec 2023 08:22:40 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5D9E8C433C7; Tue, 5 Dec 2023 16:22:35 +0000 (UTC) Date: Tue, 5 Dec 2023 16:22:33 +0000 From: Catalin Marinas To: Jason Gunthorpe Cc: Marc Zyngier , ankita@nvidia.com, Shameerali Kolothum Thodi , oliver.upton@linux.dev, suzuki.poulose@arm.com, yuzenghui@huawei.com, will@kernel.org, ardb@kernel.org, akpm@linux-foundation.org, gshan@redhat.com, aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com, mochs@nvidia.com, kvmarm@lists.linux.dev, kvm@vger.kernel.org, lpieralisi@kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH v2 1/1] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory Message-ID: References: <20231205033015.10044-1-ankita@nvidia.com> <86fs0hatt3.wl-maz@kernel.org> <20231205130517.GD2692119@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231205130517.GD2692119@nvidia.com> X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 05 Dec 2023 08:22:50 -0800 (PST) On Tue, Dec 05, 2023 at 09:05:17AM -0400, Jason Gunthorpe wrote: > On Tue, Dec 05, 2023 at 11:40:47AM +0000, Catalin Marinas wrote: > > > - Will had unanswered questions in another part of the thread: > > > > > > https://lore.kernel.org/all/20231013092954.GB13524@willie-the-truck/ > > > > > > Can someone please help concluding it? > > > > Is this about reclaiming the device? I think we concluded that we can't > > generalise this beyond PCIe, though not sure there was any formal > > statement to that thread. The other point Will had was around stating > > in the commit message why we only relax this to Normal NC. I haven't > > checked the commit message yet, it needs careful reading ;). > > Not quite, we said reclaiming is VFIO's problem and if VFIO can't > reliably reclaim a device it shouldn't create it in the first place. > > Again, I think alot of this is trying to take VFIO problems into KVM. > > VFIO devices should not exist if they pose a harm to the system. If > VFIO decided to create the devices anyhow (eg admin override or > something) then it is not KVM's job to do any further enforcement. Yeah, I made this argument in the past. But it's a fair question to ask since the Arm world is different from x86. Just reusing an existing driver in a different context may break its expectations. Does Normal NC access complete by the time a TLBI (for Stage 2) and DSB (DVMsync) is completed? It does reach some point of serialisation with subsequent accesses to the same address but not sure how it is ordered with an access to a different location like the config space used for reset. Maybe it's not a problem at all or it is safe only for PCIe but it would be good to get to the bottom of this. Device-nGnRnE has some stronger rules around end-point completion and that's what the vfio-pci uses. KVM, however, went for the slightly more relaxed nGnRE variant which, at least per the Arm ARM, doesn't have these guarantees. > Remember, the feedback we got from the CPU architects was that even > DEVICE_* will experience an uncontained failure if the device tiggers > an error response in shipping ARM IP. > > The reason PCIe is safe is because the PCI bridge does not generate > errors in the first place! That's an argument to restrict this feature to PCIe. It's really about fewer arguments on the behaviour of other devices. Marc did raise another issue with the GIC VCPU interface (does this even have a vma in the host VMM?). That's a class of devices where the mapping is context-switched, so the TLBI+DSB rules don't help. > Thus, the way a platform device can actually be safe is if it too > never generates errors in the first place! Obviously this approach > works just as well with NORMAL_NC. > > If a platform device does generate errors then we shouldn't expect > containment at all, and the memory type has no bearing on the > safety. The correct answer is to block these platform devices from > VFIO/KVM/etc because they can trigger uncontained failures. Assuming the error containment is sorted, there are two other issues with other types of devices: 1. Ordering guarantees on reclaim or context switch 2. Unaligned accesses On (2), I think PCIe is fairly clear on how the TLPs are generated, so I wouldn't expect additional errors here. But I have no idea what AMBA/AXI does here in general. Perhaps it's fine, I don't think we looked into it as the focus was mostly on PCIe. So, I think it would be easier to get this patch upstream if we limit the change to PCIe devices for now. We may relax this further in the future. Do you actually have a need for non-PCIe devices to support WC in the guest or it's more about the complexity of the logic to detect whether it's actually a PCIe BAR we are mapping into the guest? (I can see some Arm GPU folk asking for this but those devices are not easily virtualisable). -- Catalin