Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2411268pxb; Sat, 30 Jan 2021 01:37:46 -0800 (PST) X-Google-Smtp-Source: ABdhPJw9fqKbKxYZCCqlX5gkGp9pg+BQY3r5s0Pc/0NZRaX6aGQOKbVfg9WBEH38rBwh1mgSF1Kk X-Received: by 2002:a05:6402:5207:: with SMTP id s7mr733235edd.311.1611999466109; Sat, 30 Jan 2021 01:37:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611999466; cv=none; d=google.com; s=arc-20160816; b=IAtA6GHEhy9/RkBXoqevzlC0WsCOnOrU25Zv1HFWLBV1j0fqUlMI16FT2+/nvnVCLI HF+NMPAht85X+q/4jC2h9YFbDDIs4saWd8VXu3EeDSqlt8cBZ2+NKjcEQk1PxgZndyXE FZzqki49ibGCFjFdL/gb27pUTV0NmHdm/WW8hHnCxPo1igWiSMHtbgAU0TnEBt2liItl 75W5WMTwJAwc+eA96PI6gY74dDvdJ+HDcAY1654GyOg+PLSnw73K7okwA+erglhWaQrP HYM6KUExlvP55TZJdhZA2sWNiBgx1489Zfu0c2/pQdAlp/D1Rhei7zln+E6dz4YD9F8D aXFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=g7ahIDjXdFUCnx5lW3eeVGMYOY7VnFHbLXrTR9J8a9c=; b=pN9KTbigxoBNRp+MmHDm4MLENUZ9aIFvZ7U8ABi+HwVD856S3LZq89uf+rYb7bi6jp PYBJuGbbevwTfw7CMJ48qvGF22uoOCBZE5Sya3er1Mx2xcdnwkj4zinymGIC4qFtwmAU hUouSXE0pESDtf5/SHUm8lMsgaw7dD+FHuIrc5y2Hlm2YAjM5gdsbGwDK1L5JK3uLiZG e7JnOoakU1+F8uSvLUBXTvCkv487cry8hffABwUF4ZHioEXF+MeJ9GhFsTLyc/jnaTyh PM4C0hSBfr/LDPg9T0A1ZYIHaQfRhVTR7QlHx32vq//Fv3vsd3LbKsaAD4sxvV/7lRMY k8vA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hr1si6028119ejc.438.2021.01.30.01.37.21; Sat, 30 Jan 2021 01:37:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231210AbhA3JfT (ORCPT + 99 others); Sat, 30 Jan 2021 04:35:19 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:11647 "EHLO szxga04-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230455AbhA3Jbw (ORCPT ); Sat, 30 Jan 2021 04:31:52 -0500 Received: from DGGEMS404-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4DSTSd1FX9z161dV; Sat, 30 Jan 2021 17:29:53 +0800 (CST) Received: from [10.174.184.214] (10.174.184.214) by DGGEMS404-HUB.china.huawei.com (10.3.19.204) with Microsoft SMTP Server id 14.3.498.0; Sat, 30 Jan 2021 17:30:58 +0800 Subject: Re: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough To: Alex Williamson CC: Cornelia Huck , , , Jean-Philippe Brucker , Eric Auger , Lu Baolu , Kevin Tian , , References: <20210125090402.1429-1-lushenming@huawei.com> <20210129155730.3a1d49c5@omen.home.shazbot.org> From: Shenming Lu Message-ID: <44a8b643-6920-b2b5-a593-2942b5ea4ee7@huawei.com> Date: Sat, 30 Jan 2021 17:30:58 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.2.2 MIME-Version: 1.0 In-Reply-To: <20210129155730.3a1d49c5@omen.home.shazbot.org> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.184.214] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/1/30 6:57, Alex Williamson wrote: > On Mon, 25 Jan 2021 17:03:58 +0800 > Shenming Lu wrote: > >> Hi, >> >> The static pinning and mapping problem in VFIO and possible solutions >> have been discussed a lot [1, 2]. One of the solutions is to add I/O >> page fault support for VFIO devices. Different from those relatively >> complicated software approaches such as presenting a vIOMMU that provides >> the DMA buffer information (might include para-virtualized optimizations), >> IOPF mainly depends on the hardware faulting capability, such as the PCIe >> PRI extension or Arm SMMU stall model. What's more, the IOPF support in >> the IOMMU driver is being implemented in SVA [3]. So do we consider to >> add IOPF support for VFIO passthrough based on the IOPF part of SVA at >> present? >> >> We have implemented a basic demo only for one stage of translation (GPA >> -> HPA in virtualization, note that it can be configured at either stage), >> and tested on Hisilicon Kunpeng920 board. The nested mode is more complicated >> since VFIO only handles the second stage page faults (same as the non-nested >> case), while the first stage page faults need to be further delivered to >> the guest, which is being implemented in [4] on ARM. My thought on this >> is to report the page faults to VFIO regardless of the occured stage (try >> to carry the stage information), and handle respectively according to the >> configured mode in VFIO. Or the IOMMU driver might evolve to support more... >> >> Might TODO: >> - Optimize the faulting path, and measure the performance (it might still >> be a big issue). >> - Add support for PRI. >> - Add a MMU notifier to avoid pinning. >> - Add support for the nested mode. >> ... >> >> Any comments and suggestions are very welcome. :-) > > I expect performance to be pretty bad here, the lookup involved per > fault is excessive. We might consider to prepin more pages as a further optimization. > There are cases where a user is not going to be > willing to have a slow ramp up of performance for their devices as they > fault in pages, so we might need to considering making this > configurable through the vfio interface. Yeah, makes sense, I will try to implement this: maybe add a ioctl called VFIO_IOMMU_ENABLE_IOPF for Type1 VFIO IOMMU... > Our page mapping also only > grows here, should mappings expire or do we need a least recently > mapped tracker to avoid exceeding the user's locked memory limit? How > does a user know what to set for a locked memory limit? Yeah, we can add a LRU(mapped) tracker to release the pages when exceeding a memory limit, maybe have a thread to periodically check this. And as for the memory limit, maybe we could give the user some levels (10%(default)/30%/50%/70%/unlimited of the total user memory (mapping size)) to choose from via the VFIO_IOMMU_ENABLE_IOPF ioctl... > The behavior > here would lead to cases where an idle system might be ok, but as soon > as load increases with more inflight DMA, we start seeing > "unpredictable" I/O faults from the user perspective. "unpredictable" I/O faults? We might see more problems after more testing... Thanks, Shenming > Seems like there > are lots of outstanding considerations and I'd also like to hear from > the SVA folks about how this meshes with their work. Thanks, > > Alex > > . >