Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp16139817rwd; Mon, 26 Jun 2023 06:19:59 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7FrA1P2NvUc+erIgr9nwlK01VBX+XTfM7cSUUI63/bygz/ObFpQkw7/BUtBem1W7R9x0nY X-Received: by 2002:a17:902:e88e:b0:1b3:97a7:c106 with SMTP id w14-20020a170902e88e00b001b397a7c106mr5541997plg.52.1687785599130; Mon, 26 Jun 2023 06:19:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687785599; cv=none; d=google.com; s=arc-20160816; b=XuCHUzxPvVCJZTRkiaVoklcl2VZEZHoNkM2/vCQdLUIb4/BghO7fd+iBwHOHaQe3CR TcIWlDE8wAD4Uq9LsXVKynWY2rMTGuUHXGREoMVb5ZjVea2jrSuFR4TT/7HaJ/0sQFXL CTE4l0+5WbM8wJyKoLefqdXVUHKW4OotgSjGmTkY08Dq6dciXc7Tzzgj/ZjXTkLX456I xIvBmNDjVBkkLp0ZKaPa12yWpP12SGC73xNUIvwlStAgwjDBucThnj7GNdP8zKzhCMPS kVRmSppS51g/w/reOVv9b7AE4eMSfuF8e8PMi1mXCQJ3bdvkiVcKHYaw4h0ySvJJjJ+j iFYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=v38zVIpTgqy/WZKd6Z1+jVylsrF4QE4tPV8XXUZkVwA=; fh=SAq2rPaIRYUp555vjrtDI7JOKN4vIB0TTEqzI/XXG78=; b=XaXApHRdXApAUzP6BEXWDEYHzCMRldOPBcDBdd3BPYfcZrm7wnTQ/d8p9r29T76Tlq q2Ut/KsPBQ9hDT71oIko/tKBvP4UjxYg+oFTYaw4mnVSPvfgQBsO5PZkHCH1C4FeJdFa EvK5dS0sNX7k8nk9USLCdnIz+TyQ2h7s+vUmOP+CwPlobz1ThvG0u1dvWzoELQyueeR6 OHFK1rROZnUte/QnDFqrgXzhs928wHhxtKQb3s9CfZJzVs3QVZxfiw+3lvJ8iokFdZBH 37OcHhGUFQ6TOTJX/74LA6+gunebguQH+BS6RFIhKWL3+uJFUB/qoMgNX6Gi8UBNzoxo bu9Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w2-20020a170902d70200b001ab19e023a1si4725015ply.375.2023.06.26.06.19.47; Mon, 26 Jun 2023 06:19:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229973AbjFZND2 (ORCPT + 99 others); Mon, 26 Jun 2023 09:03:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52582 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230400AbjFZNDH (ORCPT ); Mon, 26 Jun 2023 09:03:07 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C93D8E79 for ; Mon, 26 Jun 2023 06:02:49 -0700 (PDT) Received: from kwepemm600007.china.huawei.com (unknown [172.30.72.54]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4QqSfT2nG9zTm1s; Mon, 26 Jun 2023 21:01:53 +0800 (CST) Received: from [10.174.185.179] (10.174.185.179) by kwepemm600007.china.huawei.com (7.193.23.208) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Mon, 26 Jun 2023 21:02:41 +0800 Subject: Re: [RESEND PATCH v8 04/11] bus: platform, amba, fsl-mc, PCI: Add device DMA ownership management To: Lu Baolu CC: Joerg Roedel , Kevin Tian , Jason Gunthorpe , , References: <20220418005000.897664-1-baolu.lu@linux.intel.com> <20220418005000.897664-5-baolu.lu@linux.intel.com> From: Zenghui Yu Message-ID: <6472f254-c3c4-8610-4a37-8d9dfdd54ce8@huawei.com> Date: Mon, 26 Jun 2023 21:02:40 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: <20220418005000.897664-5-baolu.lu@linux.intel.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.185.179] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemm600007.china.huawei.com (7.193.23.208) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022/4/18 8:49, Lu Baolu wrote: > The devices on platform/amba/fsl-mc/PCI buses could be bound to drivers > with the device DMA managed by kernel drivers or user-space applications. > Unfortunately, multiple devices may be placed in the same IOMMU group > because they cannot be isolated from each other. The DMA on these devices > must either be entirely under kernel control or userspace control, never > a mixture. Otherwise the driver integrity is not guaranteed because they > could access each other through the peer-to-peer accesses which by-pass > the IOMMU protection. > > This checks and sets the default DMA mode during driver binding, and > cleanups during driver unbinding. In the default mode, the device DMA is > managed by the device driver which handles DMA operations through the > kernel DMA APIs (see Documentation/core-api/dma-api.rst). > > For cases where the devices are assigned for userspace control through the > userspace driver framework(i.e. VFIO), the drivers(for example, vfio_pci/ > vfio_platfrom etc.) may set a new flag (driver_managed_dma) to skip this > default setting in the assumption that the drivers know what they are > doing with the device DMA. > > Calling iommu_device_use_default_domain() before {of,acpi}_dma_configure > is currently a problem. As things stand, the IOMMU driver ignored the > initial iommu_probe_device() call when the device was added, since at > that point it had no fwspec yet. In this situation, > {of,acpi}_iommu_configure() are retriggering iommu_probe_device() after > the IOMMU driver has seen the firmware data via .of_xlate to learn that > it actually responsible for the given device. As the result, before > that gets fixed, iommu_use_default_domain() goes at the end, and calls > arch_teardown_dma_ops() if it fails. > > Cc: Greg Kroah-Hartman > Cc: Bjorn Helgaas > Cc: Stuart Yoder > Cc: Laurentiu Tudor > Signed-off-by: Lu Baolu > Reviewed-by: Greg Kroah-Hartman > Reviewed-by: Jason Gunthorpe > Reviewed-by: Robin Murphy > Tested-by: Eric Auger [...] > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 60adf42460ab..b933d2b08d4d 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -895,6 +895,13 @@ struct module; > * created once it is bound to the driver. > * @driver: Driver model structure. > * @dynids: List of dynamically added device IDs. > + * @driver_managed_dma: Device driver doesn't use kernel DMA API for DMA. > + * For most device drivers, no need to care about this flag > + * as long as all DMAs are handled through the kernel DMA API. > + * For some special ones, for example VFIO drivers, they know > + * how to manage the DMA themselves and set this flag so that > + * the IOMMU layer will allow them to setup and manage their > + * own I/O address space. > */ > struct pci_driver { > struct list_head node; > @@ -913,6 +920,7 @@ struct pci_driver { > const struct attribute_group **dev_groups; > struct device_driver driver; > struct pci_dynids dynids; > + bool driver_managed_dma; > }; > > static inline struct pci_driver *to_pci_driver(struct device_driver *drv) [...] > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 4ceeb75fc899..f83f7fbac68f 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -20,6 +20,7 @@ > #include > #include > #include > +#include > #include "pci.h" > #include "pcie/portdrv.h" > > @@ -1601,6 +1602,7 @@ static int pci_bus_num_vf(struct device *dev) > */ > static int pci_dma_configure(struct device *dev) > { > + struct pci_driver *driver = to_pci_driver(dev->driver); > struct device *bridge; > int ret = 0; > > @@ -1616,9 +1618,24 @@ static int pci_dma_configure(struct device *dev) > } > > pci_put_host_bridge_device(bridge); > + > + if (!ret && !driver->driver_managed_dma) { > + ret = iommu_device_use_default_domain(dev); > + if (ret) > + arch_teardown_dma_ops(dev); > + } > + > return ret; > } > > +static void pci_dma_cleanup(struct device *dev) > +{ > + struct pci_driver *driver = to_pci_driver(dev->driver); > + > + if (!driver->driver_managed_dma) > + iommu_device_unuse_default_domain(dev); > +} > + > struct bus_type pci_bus_type = { > .name = "pci", > .match = pci_bus_match, > @@ -1632,6 +1649,7 @@ struct bus_type pci_bus_type = { > .pm = PCI_PM_OPS_PTR, > .num_vf = pci_bus_num_vf, > .dma_configure = pci_dma_configure, > + .dma_cleanup = pci_dma_cleanup, > }; > EXPORT_SYMBOL(pci_bus_type); I (somehow forgot to delete DEBUG_TEST_DRIVER_REMOVE in my .config, and) failed to start the guest with an assigned PCI device, with something like: | qemu-system-aarch64: -device vfio-pci,host=0000:03:00.1,id=hostdev0,bus=pci.8,addr=0x0: vfio 0000:03:00.1: group 45 is not viable | Please ensure all devices within the iommu_group are bound to their vfio bus driver. It looks like on device probe, with DEBUG_TEST_DRIVER_REMOVE, .dma_configure() will be executed *twice* via the really_probe()/re_probe path, and *no* .dma_cleanup() will be executed. The resulting dev::iommu_group::owner_cnt is 2, which will confuse the later iommu_group_dma_owner_claimed() call from VFIO on guest startup. I can locally workaround the problem by deleting the DEBUG option or performing a .dma_cleanup() during test_remove'ing, but it'd be great if you can take a look. Thanks, Zenghui