Received: by 2002:a89:413:0:b0:1fd:dba5:e537 with SMTP id m19csp1684019lqs; Sat, 15 Jun 2024 23:14:33 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVdkE2xJdAFi7q9FnNw2yujj7oG9Hu+eXVlaeGnRsyeopFUUHBrxtk3jM0v3dyt1xJkWk4FB58q2EK4cheBBs7GSDkS6AoEunmORNKBqA== X-Google-Smtp-Source: AGHT+IF1LbK6al7Admu9TfTXTFVFA2c6tTwdLavMDpmNynAEFKAJaeOhuFjpWGVgi2lPZAfJQxYM X-Received: by 2002:a17:90a:ba0e:b0:2c2:f8f5:a057 with SMTP id 98e67ed59e1d1-2c4db955d55mr5745438a91.33.1718518473351; Sat, 15 Jun 2024 23:14:33 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1718518473; cv=pass; d=google.com; s=arc-20160816; b=wOEezxoK4v9/QVPyT5i5TQRXWshRoEJoebo6Ko/JbZfizfnQW3A9JC57k2fntefN4J w9WDH4xjoUixOKYa5nnEyo1wKfw3GSg9C49LGiM8lyiTdm3bBNbJrgmRBiXIasHYheat Fwf3RCFMLeTNCStgZmBu+39LC7EUZi7FEP0dJAFnCPW02hReczyDNvhKd4lusEarQcpc Lflt+oxM9pZzTYx7tcCBxq1gvvkMOH2WVDtpTp9CPJ3LPBUbPgz3geIcydWFMdK5TcHQ LFq4VH2NQNCrNdC5SvK2dKYOGnBbVEuTBPEUAiPqTUhb/8Ai6IeR8LhfHxcKe6Q9RhEy UyYw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=UQ11RQox/pVVlgs0UP/g2c7JR4KwFjYECnAzqksH1kA=; fh=tXslIEC/iII+0o9YvlUtRXapl6zSPusBr1yEidN1Q7w=; b=h8TF84TV4aP/rFVvOANwYtNSZGPxwamJoEnyWjxGADxLlUojcynzZ2FX5jei//NxQ3 MwE2vqQ2/xW8IlbW6SAve4B1OrK18UZ00DSrd+lriOC/XBFAaUh+YcWILnNUbSHzWr7K 8GcGx5u7ycfWhrhoHHcehY2Ck5ISxdomBKU3/jAfDZMPWDyLuvQlb74vb2wdRFw8/28N lRQViCm3d5RwUUkZpynnBILR2DC37Css+BDJYIjypHRGmT58TSbeC1KTyUABpvZ6lGwm puIXYxm85h3ypWbcbNJ3uIhgTPWofB3dOY3DPaatlu8KEjCAK/v2zHm7kXVLz8LSRlq5 q3JQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mrF5eQR0; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-216131-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-216131-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id 98e67ed59e1d1-2c4c4600ccasi6721016a91.11.2024.06.15.23.14.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Jun 2024 23:14:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-216131-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mrF5eQR0; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-216131-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-216131-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id E8EBF28197C for ; Sun, 16 Jun 2024 06:14:32 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C781F16D30F; Sun, 16 Jun 2024 06:14:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="mrF5eQR0" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 317FB163AA7 for ; Sun, 16 Jun 2024 06:14:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718518464; cv=none; b=jf8L6F8PufnES31plJy72B3fY55zTHd9L4cagzYZuHbVk54rQB2wpReFJGVz20CL9JFMQJn18d6hqWH1DYyWcjkU+AbwAq3Ku1sAR8ajhyyk90ocRkhCNGpgCxFr6XIgTZGg5xmMVu4SyHxN5xo1wnHt8h8i+2LhuNDEZzBS/HA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718518464; c=relaxed/simple; bh=qghNC2kj31fGGHUoSzg8Ks67YuqW3ScXQtYuqanUTSQ=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=HFQNWiO113o5Ee7z0KY3f7sSdSaSUqjGh/uTt0AmuVwPnmrfiLuPQ1SmVol2D/TBFWF8yYdzq/nMIlwV0oeD0s7fXWEs1s3NHqYCbUFz+d7e8ds/IoD5dna3CFr2alWzARazk1JgAgWbifrxoxanzulK1j0eZKBDcpwOMbtfNbY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=mrF5eQR0; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1718518463; x=1750054463; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=qghNC2kj31fGGHUoSzg8Ks67YuqW3ScXQtYuqanUTSQ=; b=mrF5eQR0zPvppPegi4KL90eFxk2GNK4WwnvhsHactTFQo9v578YdoWW/ w6ICDvIo7QTVSi3zQJb4uTDKHgxXnK4io7Kd5kMwzMuPpjpnzfy9JH9cr rYkxrYxxI6MOu4qCVlDuXGwLo19LUmkmEB/KGIQrSwqAViSPy0H9AnLME DxoQh5a/ABRMYpQrvLnVw/lkEEMgN7qr7XUaRQzy/ea2R25zJXsYQSrtw gXA5Z22oRkFmsDDRXTuoSUFgwwALesj0FU+pctQ4uXooBhFVfWORrHF0T kq8ndWaXXCJbMSmxeZK33uU5c3ojP9vtKOYTN6UcEAzJyWmaIhEvRValj g==; X-CSE-ConnectionGUID: 8HpnEGICRu6tiZL03vwiUQ== X-CSE-MsgGUID: Hr08eeW1SCyVExj5e5fIXQ== X-IronPort-AV: E=McAfee;i="6700,10204,11104"; a="18289962" X-IronPort-AV: E=Sophos;i="6.08,241,1712646000"; d="scan'208";a="18289962" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jun 2024 23:14:22 -0700 X-CSE-ConnectionGUID: DFVHY0HYRMOVguGv6E0a/g== X-CSE-MsgGUID: OZ8PuFrSRjmOoNkOirxt8w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,241,1712646000"; d="scan'208";a="40748129" Received: from unknown (HELO allen-box.sh.intel.com) ([10.239.159.127]) by fmviesa007.fm.intel.com with ESMTP; 15 Jun 2024 23:14:19 -0700 From: Lu Baolu To: Jason Gunthorpe , Kevin Tian , Joerg Roedel , Will Deacon , Robin Murphy , Jean-Philippe Brucker , Nicolin Chen , Yi Liu , Jacob Pan , Joel Granados Cc: iommu@lists.linux.dev, virtualization@lists.linux-foundation.org, linux-kernel@vger.kernel.org, Lu Baolu Subject: [PATCH v7 00/10] IOMMUFD: Deliver IO page faults to user space Date: Sun, 16 Jun 2024 14:11:45 +0800 Message-Id: <20240616061155.169343-1-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit This series implements the functionality of delivering IO page faults to user space through the IOMMUFD framework. One feasible use case is the nested translation. Nested translation is a hardware feature that supports two-stage translation tables for IOMMU. The second-stage translation table is managed by the host VMM, while the first-stage translation table is owned by user space. This allows user space to control the IOMMU mappings for its devices. When an IO page fault occurs on the first-stage translation table, the IOMMU hardware can deliver the page fault to user space through the IOMMUFD framework. User space can then handle the page fault and respond to the device top-down through the IOMMUFD. This allows user space to implement its own IO page fault handling policies. User space application that is capable of handling IO page faults should allocate a fault object, and bind the fault object to any domain that it is willing to handle the fault generatd for them. On a successful return of fault object allocation, the user can retrieve and respond to page faults by reading or writing to the file descriptor (FD) returned. The iommu selftest framework has been updated to test the IO page fault delivery and response functionality. The series and related patches are available on GitHub: https://github.com/LuBaolu/intel-iommu/commits/iommufd-io-pgfault-delivery-v7 Change log: v7: - Move the setting of handle.domain into the helpers. - Return value of copy_to_user() should be converted to -EFAULT. - Add more checks on a fetched dma handle in the sva bind path. - Add a constant flag in iommu_ops to replace IOMMU_CAP_USER_IOASID_TABLE. - Simplify iommu_hwpt_pgfault and iommu_hwpt_page_response by removing some unnecessary fields. - Address the wrong order between dec users and ctx_put in fault FD release path. - Miscellaneous cleanup. v6: https://lore.kernel.org/linux-iommu/20240527040517.38561-1-baolu.lu@linux.intel.com/ - Refine the attach handle code by shifting the handle allocation to the caller. The caller will then provide the allocated handle to the domain attachment interfaces. - Add reference counter in iommufd_fault_iopf_enable/disable() helpers. - Fix the return values of fault FD's read/write fops. - Add IOMMU_CAP_USER_IOASID_TABLE capability and check it before roll back getting attach_handle to RID. - Move the iopf respond queue from iommufd device to iommufd fault. - Disallow PRI enablement on SR-IOV VF devices. - Miscellaneous cleanup. v5: https://lore.kernel.org/linux-iommu/20240430145710.68112-1-baolu.lu@linux.intel.com/ - Removed attach handle reference count from the core. Drivers will now synchronize their use of handles and domain attach/detach. - Automatically responds to all outstanding faults in hwpt detach or replace paths. - Supports getting a domain-type specific attach handle. - Reorganized the series by changing the patch order. - Miscellaneous cleanup. v4: https://lore.kernel.org/linux-iommu/20240403011519.78512-1-baolu.lu@linux.intel.com/ - Add the iommu domain attachment handle to replace the iopf-specific domain attachment interfaces introduced in the previous v3. - Replace the iommu_sva with iommu domain attachment handle. - Refine some fields in the fault and response message encoding according to feedback collected during v3 review period. - Refine and fix some problems in the fault FD implementation. - Miscellaneous cleanup. v3: https://lore.kernel.org/linux-iommu/20240122073903.24406-1-baolu.lu@linux.intel.com/ - Add iopf domain attach/detach/replace interfaces to manage the reference counters of hwpt and device, ensuring that both can only be destroyed after all outstanding IOPFs have been responded to. - Relocate the fault handling file descriptor from hwpt to a fault object to enable a single fault handling object to be utilized across multiple domains. - Miscellaneous cleanup and performance improvements. v2: https://lore.kernel.org/linux-iommu/20231026024930.382898-1-baolu.lu@linux.intel.com/ - Move all iommu refactoring patches into a sparated series and discuss it in a different thread. The latest patch series [v6] is available at https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@linux.intel.com/ - We discussed the timeout of the pending page fault messages. We agreed that we shouldn't apply any timeout policy for the page fault handling in user space. https://lore.kernel.org/linux-iommu/20230616113232.GA84678@myrica/ - Jason suggested that we adopt a simple file descriptor interface for reading and responding to I/O page requests, so that user space applications can improve performance using io_uring. https://lore.kernel.org/linux-iommu/ZJWjD1ajeem6pK3I@ziepe.ca/ v1: https://lore.kernel.org/linux-iommu/20230530053724.232765-1-baolu.lu@linux.intel.com/ Lu Baolu (10): iommu: Introduce domain attachment handle iommu: Remove sva handle list iommu: Add attach handle to struct iopf_group iommu: Extend domain attach group with handle support iommufd: Add fault and response message definitions iommufd: Add iommufd fault object iommufd: Fault-capable hwpt attach/detach/replace iommufd: Associate fault object with iommufd_hw_pgtable iommufd/selftest: Add IOPF support for mock device iommufd/selftest: Add coverage for IOPF test include/linux/iommu.h | 41 +- drivers/iommu/iommu-priv.h | 11 + drivers/iommu/iommufd/iommufd_private.h | 80 ++++ drivers/iommu/iommufd/iommufd_test.h | 8 + include/uapi/linux/iommufd.h | 109 +++++ tools/testing/selftests/iommu/iommufd_utils.h | 80 +++- drivers/dma/idxd/init.c | 2 +- drivers/iommu/io-pgfault.c | 63 +-- drivers/iommu/iommu-sva.c | 42 +- drivers/iommu/iommu.c | 184 ++++++-- drivers/iommu/iommufd/device.c | 7 +- drivers/iommu/iommufd/fault.c | 433 ++++++++++++++++++ drivers/iommu/iommufd/hw_pagetable.c | 41 +- drivers/iommu/iommufd/main.c | 6 + drivers/iommu/iommufd/selftest.c | 64 +++ tools/testing/selftests/iommu/iommufd.c | 18 + .../selftests/iommu/iommufd_fail_nth.c | 2 +- drivers/iommu/iommufd/Makefile | 1 + 18 files changed, 1074 insertions(+), 118 deletions(-) create mode 100644 drivers/iommu/iommufd/fault.c -- 2.34.1