Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp6093pxb; Mon, 8 Feb 2021 13:32:15 -0800 (PST) X-Google-Smtp-Source: ABdhPJzBFMlpiL0ypFB5KB/EekXhJSp9fIibZgUxrie346it+P45T4DQMNzdarvw+WyaAqKq3Qm+ X-Received: by 2002:a17:906:388a:: with SMTP id q10mr18768658ejd.496.1612819935580; Mon, 08 Feb 2021 13:32:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612819935; cv=none; d=google.com; s=arc-20160816; b=QrNNXhztK7odPcAzAAQPQjU9R5yFLeaVadIxxGmfu16wvHHURD0Q6+JA2HkVAU2yXw CdSoCG2PzkZ0rsPnc1aTIR/BLXQP+wgflMkM9JvtD3K7g173NE/j5hP6GUeKjV/iVobf 9R9nZ46Q4eQUv/cnUb6Rul/U9a7awy9v8OsR93OENO+l0f4R/Oj4btSvgL2XCls3+7gS 5UfVGVUgzrxXZ1duoqwSiLhUNEGX1DVXA3aU8zoY54r3H4LhL+73bJYeOPc7I1jwXGgz SamrVEXz4kRmsPJOXWyV8SDYfMNPpmfund8sIEQp52YI2+Gxt43f8zmtVAkxXh2u7Ces RMCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from; bh=kRjT2GemcZPvtnenDdj98vTdIcLgSx2b0BwnQEeUUs8=; b=MFRXsE7p1UFWOw5RH03tfrKnIjyiV40mmvKPI1OQHgyxGGbCYV0k49y4/UIfn13xnd ZRC0INUMu1MN6pxQddlYOwbCWbMiZGBvuK2WZJkPY7iHx1N7Hxa4jAoW+OI6xbZXQTkC hgz9X4nUfStswgQThnwbq+OGz+YqXyas18TXxk/uZWtJMtjG2MIv3GdJwzWOGggMzYtE y4RRD9OlnxiTk4TvHEZO7a4vuxFVO0+ujRsPvh/xYnXgo8LWFcpV6fBKTLKjHFqRYHFC xx2kc39+z5esElI9+8l3xUP3K7s5oS3uV3uuQaz982IU1hhRsMoPrAQUgDI3rACTCB9S jrIg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i24si11547553ejz.419.2021.02.08.13.31.31; Mon, 08 Feb 2021 13:32:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235456AbhBHVal convert rfc822-to-8bit (ORCPT + 99 others); Mon, 8 Feb 2021 16:30:41 -0500 Received: from szxga02-in.huawei.com ([45.249.212.188]:3427 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232096AbhBHUgf (ORCPT ); Mon, 8 Feb 2021 15:36:35 -0500 Received: from DGGEMM404-HUB.china.huawei.com (unknown [172.30.72.53]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4DZHmx3ft1z5NXL; Tue, 9 Feb 2021 04:34:09 +0800 (CST) Received: from dggpemm500011.china.huawei.com (7.185.36.110) by DGGEMM404-HUB.china.huawei.com (10.3.20.212) with Microsoft SMTP Server (TLS) id 14.3.498.0; Tue, 9 Feb 2021 04:35:32 +0800 Received: from dggemi761-chm.china.huawei.com (10.1.198.147) by dggpemm500011.china.huawei.com (7.185.36.110) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2106.2; Tue, 9 Feb 2021 04:35:32 +0800 Received: from dggemi761-chm.china.huawei.com ([10.9.49.202]) by dggemi761-chm.china.huawei.com ([10.9.49.202]) with mapi id 15.01.2106.006; Tue, 9 Feb 2021 04:35:32 +0800 From: "Song Bao Hua (Barry Song)" To: Jason Gunthorpe , David Hildenbrand CC: "Wangzhou (B)" , "linux-kernel@vger.kernel.org" , "iommu@lists.linux-foundation.org" , "linux-mm@kvack.org" , "linux-arm-kernel@lists.infradead.org" , "linux-api@vger.kernel.org" , Andrew Morton , Alexander Viro , "gregkh@linuxfoundation.org" , "kevin.tian@intel.com" , "jean-philippe@linaro.org" , "eric.auger@redhat.com" , "Liguozhu (Kenneth)" , "zhangfei.gao@linaro.org" , "chensihang (A)" Subject: RE: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory pin Thread-Topic: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory pin Thread-Index: AQHW/SrsWWMRpilf2UC1Pz29QqsBVqpNZGQAgACtCgCAAKKukA== Date: Mon, 8 Feb 2021 20:35:31 +0000 Message-ID: <0dca000a6cd34d8183062466ba7d6eaf@hisilicon.com> References: <1612685884-19514-1-git-send-email-wangzhou1@hisilicon.com> <1612685884-19514-2-git-send-email-wangzhou1@hisilicon.com> <20210208183348.GV4718@ziepe.ca> In-Reply-To: <20210208183348.GV4718@ziepe.ca> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.126.200.92] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > -----Original Message----- > From: Jason Gunthorpe [mailto:jgg@ziepe.ca] > Sent: Tuesday, February 9, 2021 7:34 AM > To: David Hildenbrand > Cc: Wangzhou (B) ; linux-kernel@vger.kernel.org; > iommu@lists.linux-foundation.org; linux-mm@kvack.org; > linux-arm-kernel@lists.infradead.org; linux-api@vger.kernel.org; Andrew > Morton ; Alexander Viro ; > gregkh@linuxfoundation.org; Song Bao Hua (Barry Song) > ; kevin.tian@intel.com; > jean-philippe@linaro.org; eric.auger@redhat.com; Liguozhu (Kenneth) > ; zhangfei.gao@linaro.org; chensihang (A) > > Subject: Re: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory > pin > > On Mon, Feb 08, 2021 at 09:14:28AM +0100, David Hildenbrand wrote: > > > People are constantly struggling with the effects of long term pinnings > > under user space control, like we already have with vfio and RDMA. > > > > And here we are, adding yet another, easier way to mess with core MM in the > > same way. This feels like a step backwards to me. > > Yes, this seems like a very poor candidate to be a system call in this > format. Much too narrow, poorly specified, and possibly security > implications to allow any process whatsoever to pin memory. > > I keep encouraging people to explore a standard shared SVA interface > that can cover all these topics (and no, uaccel is not that > interface), that seems much more natural. > > I still haven't seen an explanation why DMA is so special here, > migration and so forth jitter the CPU too, environments that care > about jitter have to turn this stuff off. This paper has a good explanation: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7482091 mainly because page fault can go directly to the CPU and we have many CPUs. But IO Page Faults go a different way, thus mean much higher latency 3-80x slower than page fault: events in hardware queue -> Interrupts -> cpu processing page fault -> return events to iommu/device -> continue I/O. Copied from the paper: If the IOMMU's page table walker fails to find the desired translation in the page table, it sends an ATS response to the GPU notifying it of this failure. This in turn corresponds to a page fault. In response, the GPU sends another request to the IOMMU called a Peripheral Page Request (PPR). The IOMMU places this request in a memory-mapped queue and raises an interrupt on the CPU. Multiple PPR requests can be queued before the CPU is interrupted. The OS must have a suitable IOMMU driver to process this interrupt and the queued PPR requests. In Linux, while in an interrupt context, the driver pulls PPR requests from the queue and places them in a work-queue for later processing. Presumably this design decision was made to minimize the time spent executing in an interrupt context, where lower priority interrupts would be dis-abled. At a later time, an OS worker-thread calls back into the driver to process page fault requests in the work-queue. Once the requests are serviced, the driver notifies the IOMMU. In turn, the IOMMU notifies the GPU. The GPU then sends an-other ATS request to retry the translation for the original fault-ing address. Comparison with CPU: On the CPU, a hardware excep-tion is raised on a page fault, which immediately switches to the OS. In most cases in Linux, this routine services the page fault directly, instead of queuing it for later processing. Con-trast this with a page fault from an accelerator, where the IOMMU has to interrupt the CPU to request service on its be-half, and also note the several back-and-forth messages be-tween the accelerator, the IOMMU, and the CPU. Further-more, page faults on the CPU are generally handled one at a time on the CPU, while for the GPU they are batched by the IOMMU and OS work-queue mechanism. > > Jason Thanks Barry