Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D275C6FD1F for ; Thu, 16 Mar 2023 23:09:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230263AbjCPXJ4 (ORCPT ); Thu, 16 Mar 2023 19:09:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230305AbjCPXJu (ORCPT ); Thu, 16 Mar 2023 19:09:50 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 20BAF658E for ; Thu, 16 Mar 2023 16:09:49 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A4C7E6215A for ; Thu, 16 Mar 2023 23:09:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A7ABDC433EF; Thu, 16 Mar 2023 23:09:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679008188; bh=w6jBwS1Um7mLlpoN+a+eGG2DalmQykD8t+mghS5Gsj8=; h=Date:From:To:cc:Subject:In-Reply-To:References:From; b=Xn/mhMMsBa9jUDc41ni0DMUUt4rDxfMz2bHiXfw1XOBLWhlw+tv8t3tkSrUwFpzah F6yRYXxBoF/7NL4kJm7B7P6a+/rJg9wCMKkeddzgV3I3qHJ9pT3iSq+tgz8wl8Xao9 EzFKSOr2Sg64KMAgUQ3mWgN0+rNM6KVcgflFqSJ/Ucm2c6D0hIxCfeJ9qxxM2o36FQ RbwxQjqrlHHlQYAoMD3qrxYbz5WjR8d+EK4phi4hERBwPm5Ed7Esg6/28AqzgONA8P oCmRbfB8rtZTOgM2VP0L/d4Ou9G/tVshG1zAD4c39viBP2ANCss0BD2ZrmJeQfKD9n j4Q+hcETWJIPQ== Date: Thu, 16 Mar 2023 16:09:44 -0700 (PDT) From: Stefano Stabellini X-X-Sender: sstabellini@ubuntu-linux-20-04-desktop To: Juergen Gross cc: Alex Deucher , Jan Beulich , Stefano Stabellini , Honglei Huang , amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, Stewart Hildebrand , Oleksandr Tyshchenko , Huang Rui , Chen Jiqian , Xenia Ragiadakou , Alex Deucher , xen-devel@lists.xenproject.org, Boris Ostrovsky , Julia Zhang , =?UTF-8?Q?Christian_K=C3=B6nig?= , =?UTF-8?Q?Roger_Pau_Monn=C3=A9?= Subject: Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh In-Reply-To: Message-ID: References: <20230312120157.452859-1-ray.huang@amd.com> <20230312120157.452859-2-ray.huang@amd.com> <5e22a45d-6f12-da9b-94f6-3112a30e8574@suse.com> User-Agent: Alpine 2.22 (DEB 394 2020-01-19) MIME-Version: 1.0 Content-Type: multipart/mixed; BOUNDARY="8323329-549136083-1679007957=:3359" Content-ID: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323329-549136083-1679007957=:3359 Content-Type: text/plain; CHARSET=UTF-8 Content-Transfer-Encoding: 8BIT Content-ID: On Thu, 16 Mar 2023, Juergen Gross wrote: > On 16.03.23 14:53, Alex Deucher wrote: > > On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross wrote: > > > > > > On 16.03.23 14:45, Alex Deucher wrote: > > > > On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich wrote: > > > > > > > > > > On 16.03.2023 00:25, Stefano Stabellini wrote: > > > > > > On Wed, 15 Mar 2023, Jan Beulich wrote: > > > > > > > On 15.03.2023 01:52, Stefano Stabellini wrote: > > > > > > > > On Mon, 13 Mar 2023, Jan Beulich wrote: > > > > > > > > > On 12.03.2023 13:01, Huang Rui wrote: > > > > > > > > > > Xen PVH is the paravirtualized mode and takes advantage of > > > > > > > > > > hardware > > > > > > > > > > virtualization support when possible. It will using the > > > > > > > > > > hardware IOMMU > > > > > > > > > > support instead of xen-swiotlb, so disable swiotlb if > > > > > > > > > > current domain is > > > > > > > > > > Xen PVH. > > > > > > > > > > > > > > > > > > But the kernel has no way (yet) to drive the IOMMU, so how can > > > > > > > > > it get > > > > > > > > > away without resorting to swiotlb in certain cases (like I/O > > > > > > > > > to an > > > > > > > > > address-restricted device)? > > > > > > > > > > > > > > > > I think Ray meant that, thanks to the IOMMU setup by Xen, there > > > > > > > > is no > > > > > > > > need for swiotlb-xen in Dom0. Address translations are done by > > > > > > > > the IOMMU > > > > > > > > so we can use guest physical addresses instead of machine > > > > > > > > addresses for > > > > > > > > DMA. This is a similar case to Dom0 on ARM when the IOMMU is > > > > > > > > available > > > > > > > > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the > > > > > > > > corresponding > > > > > > > > case is XENFEAT_not_direct_mapped). > > > > > > > > > > > > > > But how does Xen using an IOMMU help with, as said, > > > > > > > address-restricted > > > > > > > devices? They may still need e.g. a 32-bit address to be > > > > > > > programmed in, > > > > > > > and if the kernel has memory beyond the 4G boundary not all I/O > > > > > > > buffers > > > > > > > may fulfill this requirement. > > > > > > > > > > > > In short, it is going to work as long as Linux has guest physical > > > > > > addresses (not machine addresses, those could be anything) lower > > > > > > than > > > > > > 4GB. > > > > > > > > > > > > If the address-restricted device does DMA via an IOMMU, then the > > > > > > device > > > > > > gets programmed by Linux using its guest physical addresses (not > > > > > > machine > > > > > > addresses). > > > > > > > > > > > > The 32-bit restriction would be applied by Linux to its choice of > > > > > > guest > > > > > > physical address to use to program the device, the same way it does > > > > > > on > > > > > > native. The device would be fine as it always uses Linux-provided > > > > > > <4GB > > > > > > addresses. After the IOMMU translation (pagetable setup by Xen), we > > > > > > could get any address, including >4GB addresses, and that is > > > > > > expected to > > > > > > work. > > > > > > > > > > I understand that's the "normal" way of working. But whatever the > > > > > swiotlb > > > > > is used for in baremetal Linux, that would similarly require its use > > > > > in > > > > > PVH (or HVM) aiui. So unconditionally disabling it in PVH would look > > > > > to > > > > > me like an incomplete attempt to disable its use altogether on x86. > > > > > What > > > > > difference of PVH vs baremetal am I missing here? > > > > > > > > swiotlb is not usable for GPUs even on bare metal. They often have > > > > hundreds or megs or even gigs of memory mapped on the device at any > > > > given time. Also, AMD GPUs support 44-48 bit DMA masks (depending on > > > > the chip family). > > > > > > But the swiotlb isn't per device, but system global. > > > > Sure, but if the swiotlb is in use, then you can't really use the GPU. > > So you get to pick one. > > The swiotlb is used only for buffers which are not within the DMA mask of a > device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA mask > won't use the swiotlb unless you have a buffer above guest physical address of > 16TB (so basically never). > > Disabling swiotlb in such a guest would OTOH mean, that a device with only > 32 bit DMA mask passed through to this guest couldn't work with buffers > above 4GB. > > I don't think this is acceptable. From the Xen subsystem in Linux point of view, the only thing we need to do is to make sure *not* to enable swiotlb_xen (yes "swiotlb_xen", not the global swiotlb) on PVH because it is not needed anyway. I think we should leave the global "swiotlb" setting alone. The global swiotlb is not relevant to Xen anyway, and surely baremetal Linux has to have a way to deal with swiotlb/GPU incompatibilities. We just have to avoid making things worse on Xen, and for that we just need to avoid unconditionally enabling swiotlb-xen. If the Xen subsystem doesn't enable swiotlb_xen/swiotlb, and no other subsystem enables swiotlb, then we have a good Linux configuration capable of handling the GPU properly. Alex, please correct me if I am wrong. How is x86_swiotlb_enable set to false on native (non-Xen) x86? --8323329-549136083-1679007957=:3359--