Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp15576300rwd; Sun, 25 Jun 2023 20:55:23 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7Xo/OipIgzr3QuwZPO/vMeHV/h7YGM17+0maQgh2yQmPu7agpsSwaBI1UQVl1x2RUyUEWI X-Received: by 2002:a17:906:dac3:b0:989:450:e585 with SMTP id xi3-20020a170906dac300b009890450e585mr12142364ejb.45.1687751722895; Sun, 25 Jun 2023 20:55:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687751722; cv=none; d=google.com; s=arc-20160816; b=qZQ11BTiLBZZoJ6Tk1nWgqcwcIIF0Gp4no/YNm7fnspJwPrpG3p86VWRddSle6RQvQ aLohLkrvurFLf2WecZwXW9D+I7qDUgfAoQ/ueztpIlRypv5c9kpNy15Lz9DdiInSKLWA wrciBqXYwnPCyQxo010Ao2WOs+JpvodlxDB/QY+Git8LXxeYPE4yPpoRqxP11DVM9amh 0V5nf08Bb6uEsic4Um5iJ7StVXxFJt4LGN+hc0cQnk3WZ34N+8kPyEd9poJjkcSpmVXl u3pttjpAWq/CjwTiIGjvfT53hYa3KwTNwZgcl4aa4sfABCDu9FhaV8xyBVSGcOBn/1Jv 1TfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:to:content-language:subject:cc:user-agent:mime-version :date:message-id:dkim-signature; bh=boHdS/rgNaazDujBObNTi+kQoq/96CdYW6oqsxJB6sY=; fh=xq4XAJ6ANYNHuXl/POFxn6Nyv0Pr0OBtbgs7eF2HE3Y=; b=iyBcW94rM8vb61BbGeNd2RmrfMw19SyRvEvWkQmCW4KOxBZXpwyEd4rBq8CBi2PpmX R9AWBtxFwhXkjyxct/IlRKhBBvWHWfDMhvxERqUh/wtKpC/Nw69x7jijFSD4fq1rDwSz natZz4FKdPQ+ngw828jioPK4V2q0a4JtyvwsWSNpbkT/9BGjH1UnGFzXzdJMH3ewyyN+ L0ICQhBoPygku7gC4sFpdfVyj9EfKXoNXEx7/21L1nc53vhMILpGI9ip+wAHtOV9dDii yWzRgc6jEIAGk0UwXXhxQlPiCqVdgMKsxz/w9LsfuG13Y8oGuLV/rq0ZkV7UkhA4KHoV rHgA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="IuCSS/wD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jo11-20020a170906f6cb00b0098df03ffa69si2200173ejb.421.2023.06.25.20.54.58; Sun, 25 Jun 2023 20:55:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="IuCSS/wD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229933AbjFZDLw (ORCPT + 99 others); Sun, 25 Jun 2023 23:11:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229481AbjFZDLu (ORCPT ); Sun, 25 Jun 2023 23:11:50 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1434E10F; Sun, 25 Jun 2023 20:11:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1687749109; x=1719285109; h=message-id:date:mime-version:cc:subject:to:references: from:in-reply-to:content-transfer-encoding; bh=5YoipUYft7QUMlguglJAyMKmrjX7z/rO5ECyxekdJdY=; b=IuCSS/wD5D8zpM3mPmI3CyOOuK0JwguXtJMP32mBuko0Cyat7QcF/TSw DUPYaU0Y88zPzl78vo/z6YRLIqHxMxnjNyiG/SVEZEZp5QxrazxA1Dq0m oX9wKgHdJd5kjKsvVFo1pQ+Wi8hWc+GTCzs2HQMnqhwstXq8GryfO32TY Ld10iJ95Y3rCclDAQEuTCs4EawMrezodMfk0YIlohQx7Lm8KbKVE5SHTV vP8Im1DI/N4Ogc50cwMDoMZ5KijN9X/0Q1oy8wSrw7wfF7WBwQVLO4jNX mW95gHdQkPeixKQBXFvTnVFsasz7QpsTaHGI5EdGEmleusW27kAer/hKx w==; X-IronPort-AV: E=McAfee;i="6600,9927,10752"; a="361196957" X-IronPort-AV: E=Sophos;i="6.01,158,1684825200"; d="scan'208";a="361196957" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jun 2023 20:11:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10752"; a="666139337" X-IronPort-AV: E=Sophos;i="6.01,158,1684825200"; d="scan'208";a="666139337" Received: from allen-box.sh.intel.com (HELO [10.239.159.127]) ([10.239.159.127]) by orsmga003.jf.intel.com with ESMTP; 25 Jun 2023 20:11:44 -0700 Message-ID: Date: Mon, 26 Jun 2023 11:10:22 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Cc: baolu.lu@linux.intel.com, Jason Gunthorpe , Kevin Tian , Joerg Roedel , Will Deacon , Robin Murphy , Jean-Philippe Brucker , Yi Liu , Jacob Pan , iommu@lists.linux.dev, linux-kselftest@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCHES 00/17] IOMMUFD: Deliver IO page faults to user space Content-Language: en-US To: Nicolin Chen References: <20230530053724.232765-1-baolu.lu@linux.intel.com> From: Baolu Lu In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/26/23 3:21 AM, Nicolin Chen wrote: > On Sun, Jun 25, 2023 at 02:30:46PM +0800, Baolu Lu wrote: >> External email: Use caution opening links or attachments >> >> >> On 2023/5/31 2:50, Nicolin Chen wrote: >>> Hi Baolu, >>> >>> On Tue, May 30, 2023 at 01:37:07PM +0800, Lu Baolu wrote: >>> >>>> This series implements the functionality of delivering IO page faults to >>>> user space through the IOMMUFD framework. The use case is nested >>>> translation, where modern IOMMU hardware supports two-stage translation >>>> tables. The second-stage translation table is managed by the host VMM >>>> while the first-stage translation table is owned by the user space. >>>> Hence, any IO page fault that occurs on the first-stage page table >>>> should be delivered to the user space and handled there. The user space >>>> should respond the page fault handling result to the device top-down >>>> through the IOMMUFD response uAPI. >>>> >>>> User space indicates its capablity of handling IO page faults by setting >>>> a user HWPT allocation flag IOMMU_HWPT_ALLOC_FLAGS_IOPF_CAPABLE. IOMMUFD >>>> will then setup its infrastructure for page fault delivery. Together >>>> with the iopf-capable flag, user space should also provide an eventfd >>>> where it will listen on any down-top page fault messages. >>>> >>>> On a successful return of the allocation of iopf-capable HWPT, a fault >>>> fd will be returned. User space can open and read fault messages from it >>>> once the eventfd is signaled. >>> I think that, whether the guest has an IOPF capability or not, >>> the host should always forward any stage-1 fault/error back to >>> the guest. Yet, the implementation of this series builds with >>> the IOPF framework that doesn't report IOMMU_FAULT_DMA_UNRECOV. >>> >>> And I have my doubt at the using the IOPF framework with that >>> IOMMU_PAGE_RESP_ASYNC flag: using the IOPF framework is for >>> its bottom half workqueue, because a page response could take >>> a long cycle. But adding that flag feels like we don't really >>> need the bottom half workqueue, i.e. losing the point of using >>> the IOPF framework, IMHO. >>> >>> Combining the two facts above, I wonder if we really need to >>> go through the IOPF framework; can't we just register a user >>> fault handler in the iommufd directly upon a valid event_fd? >> Agreed. We should avoid workqueue in sva iopf framework. Perhaps we >> could go ahead with below code? It will be registered to device with >> iommu_register_device_fault_handler() in IOMMU_DEV_FEAT_IOPF enabling >> path. Un-registering in the disable path of cause. > Well, for a virtualization use case, I still think it's should > be registered in iommufd. Emm.. you suggest iommufd calls iommu_register_device_fault_handler() to register its own page fault handler, right? I have a different opinion, iommu_register_device_fault_handler() is called to register a fault handler for a device. It should be called or initiated by a device driver. The iommufd only needs to install a per-domain io page fault handler. I am considering a use case on Intel platform. Perhaps it's similar on other platforms. An SIOV-capable device can support host SVA and assigning mediated devices to user space at the same time. Both host SVA and mediated devices require IOPF. So there will be multiple places where a page fault handler needs to be registered. > Having a device without an IOPF/PRI > capability, a guest OS should receive some faults too, if that > device causes a translation failure. Yes. DMA faults are also a consideration. But I would like to have it supported in a separated series. As I explained in the previous reply, we also need to consider the software nested translation case. > > And for a vSVA use case, the IOMMU_DEV_FEAT_IOPF feature only > gets enabled in the guest VM right? How could the host enable > the IOMMU_DEV_FEAT_IOPF to trigger this handler? As mentioned above, this should be initiated by the kernel device driver, vfio or possible mediated device driver. Best regards, baolu