Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp155696pxb; Thu, 21 Jan 2021 04:00:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJy1SWThDL6h4lv3WtOFkmXtx2BGDAhLhXEkUVAEWIBGXQKa0xycR6lQj70uGRWITvAOVsMX X-Received: by 2002:a50:b586:: with SMTP id a6mr10860396ede.206.1611230414702; Thu, 21 Jan 2021 04:00:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611230414; cv=none; d=google.com; s=arc-20160816; b=z4jy97/imLESTlxigvtha294YjqU1ez+4MkwTwH6yZc+3GC9bH3dY4U9C70svXe46/ 3IVMcBEzxUEK7StUsDVUXj1s+v1ILdLbuPiH3vqOxbP7NGpjA+R1S+RZVDm3SKd1Ah7/ +0jOptLkCQmts7E+QgZgi6kimWGpfUIjMCn4qVZJQtbi3xpOjubJB2xLuSTUUMUpEXlR x5RQe8q510ft6Sfcn6Md46XatLXtATMmcWikYgs0AH5tQbVVJnUFwvQJG1tzqHm8qIlN VR74cFgh17SoJlb2S4eceSwOaF0bNZtUXaY9As/g4FFqpIESlmmZzSJQV7Gf1RCWMPL4 0qMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from; bh=eNo3gzoDccG9RN9lteSj7ac9/XiZB2BtgNSlqfmm7hU=; b=ic/R5cqHF2fvH59pLZG/X4KcjtJ+eaQ2n8EgImZjA6XNLbdHNA8e08g5UF2T5fGuck cw5JExbiImESyTepQN+CGFOa56qKgmCEMazZG/4KOpE78dsQ2QMi9q7pEKcTqbavbmeF 0/lLSCBGgruXQjao2LIvL112Rb6AlX3I54Aj4D0mtrlzb8c9NzioBQ+dEKtyViaxrAdu 6sCkQJnEKw8N9XWEQZdZN8VqT/NFdzKyaU84JP1xKCar0TMIXTRhN4vflQhyn9+WUyB8 ADfAQdY+MliqDkztW1cNoe4ry1gCnzbpbYMG5P0MVaaXx30wC+1jY50ioERCVQuM750o hrBw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i1si2207371edl.551.2021.01.21.03.59.48; Thu, 21 Jan 2021 04:00:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730820AbhAUL4N convert rfc822-to-8bit (ORCPT + 99 others); Thu, 21 Jan 2021 06:56:13 -0500 Received: from szxga03-in.huawei.com ([45.249.212.189]:2866 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730933AbhAULxo (ORCPT ); Thu, 21 Jan 2021 06:53:44 -0500 Received: from DGGEMM401-HUB.china.huawei.com (unknown [172.30.72.56]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4DM12H3CRRz5JrP; Thu, 21 Jan 2021 19:51:35 +0800 (CST) Received: from dggpemm100011.china.huawei.com (7.185.36.112) by DGGEMM401-HUB.china.huawei.com (10.3.20.209) with Microsoft SMTP Server (TLS) id 14.3.498.0; Thu, 21 Jan 2021 19:52:58 +0800 Received: from dggemi761-chm.china.huawei.com (10.1.198.147) by dggpemm100011.china.huawei.com (7.185.36.112) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2106.2; Thu, 21 Jan 2021 19:52:57 +0800 Received: from dggemi761-chm.china.huawei.com ([10.9.49.202]) by dggemi761-chm.china.huawei.com ([10.9.49.202]) with mapi id 15.01.2106.002; Thu, 21 Jan 2021 19:52:57 +0800 From: "Song Bao Hua (Barry Song)" To: Greg Kroah-Hartman CC: "Wangzhou (B)" , Zhangfei Gao , Arnd Bergmann , "linux-accelerators@lists.ozlabs.org" , "linux-kernel@vger.kernel.org" , "chensihang (A)" Subject: RE: [PATCH] uacce: Add uacce_ctrl misc device Thread-Topic: [PATCH] uacce: Add uacce_ctrl misc device Thread-Index: AQHW79ZyYq1sk039MU+9J2rOLgF+UaoxTpyAgACJurD//5BAgIAAiKPA Date: Thu, 21 Jan 2021 11:52:57 +0000 Message-ID: <4ebea7d714ed4c5a8cee9291101b0a9b@hisilicon.com> References: <1611220154-90232-1-git-send-email-wangzhou1@hisilicon.com> In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.126.203.204] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > -----Original Message----- > From: Greg Kroah-Hartman [mailto:gregkh@linuxfoundation.org] > Sent: Friday, January 22, 2021 12:19 AM > To: Song Bao Hua (Barry Song) > Cc: Wangzhou (B) ; Zhangfei Gao > ; Arnd Bergmann ; > linux-accelerators@lists.ozlabs.org; linux-kernel@vger.kernel.org; > chensihang (A) > Subject: Re: [PATCH] uacce: Add uacce_ctrl misc device > > On Thu, Jan 21, 2021 at 10:18:24AM +0000, Song Bao Hua (Barry Song) wrote: > > > > > > > -----Original Message----- > > > From: Greg Kroah-Hartman [mailto:gregkh@linuxfoundation.org] > > > Sent: Thursday, January 21, 2021 10:46 PM > > > To: Wangzhou (B) > > > Cc: Zhangfei Gao ; Arnd Bergmann ; > > > linux-accelerators@lists.ozlabs.org; linux-kernel@vger.kernel.org; > > > chensihang (A) > > > Subject: Re: [PATCH] uacce: Add uacce_ctrl misc device > > > > > > On Thu, Jan 21, 2021 at 05:09:14PM +0800, Zhou Wang wrote: > > > > When IO page fault happens, DMA performance will be affected. Pin user > page > > > > can avoid IO page fault, this patch introduces a new char device named > > > > /dev/uacce_ctrl to help to maintain pin/unpin pages. User space can do > > > > pin/unpin pages by ioctls of an open file of /dev/uacce_ctrl, all pinned > > > > pages under one file will be unpinned in file release process. > > > > > > Also, what are you really trying to do here? If you need to mess with > > > memory pages, why can't the existing memory apis work properly for you? > > > Please work with the linux-mm developers to resolve the issue using the > > > standard apis and not creating a one-off char device node for this type > > > of thing. > > > > Basically the purpose is implementing a pinned memory poll for userspace > > DMA to achieve better performance by removing io page fault. > > And what could possibly go wrong with that :) I think we have resolved this concern while uacce came in :-) Uacce is based on SVA so devices are accessing memory in userspace by strict permission control. > > > I really like this can be done in generic mm code. Unfortunately there is > no > > this standard API in kernel to support userspace pin. Right now, various > > subsystems depend on the ioctl of /dev/ to implement the pin, for example, > > v4l2, gpu, infiniband, media etc. > > > > I feel it is extremely hard to sell a standard mpin() API like mlock() > > for this stage as mm could hardly buy this. And it will require > > huge changes in kernel. > > Why? This is what mlock() is for, why can't you use it? mlock() can only guarantee memory won't be swapped out, it doesn't make sure memory won't move. alloc_pages() can cause memory compaction, cma, numa balance, huge pages etc can move mlock()-ed pages. We would still see many I/O page faults for mlock() area. > > > We need a way to manage what pages are pinned by process and ensure the > > pages can be unpinned while the process is killed abnormally. otherwise, > > memory gets leaked. > > Can't mlock() handle that? It works on the process that called it. > > > file_operations release() is a good entry for this kind of things. In > > this way, we don't have to maintain the pinned page set in task_struct > > and unpin them during exit(). > > > > If there is anything to make it better by doing this in a driver. I > > would believe we could have a generic misc driver for pin like > > vms_ballon.c for ballon. The driver doesn't have to bind with uacce. > > > > In this way, the pinned memory pool implementation in userspace doesn't > > need to depend on a specific uacce driver any more. > > Please work with the mm developers to get them to agree with this type > of thing, as well as the dma developers, both of which you didn't cc: on > this patch :( Yep. > > Remember, you are creating a new api for Linux that goes around existing > syscalls, but is in reality, a new syscall, so why not just make it a > new syscall? The difficulty would be how to record which pages are pinned for a process if it is done by a new syscall. For mlock(), it can be much easier as it will change VMA. Hardly we can change VMA for pin. On the other hand, if the implementation is done in driver, with file_operations, we can record pinned pages in the private data of an opened file. > > thanks, > > greg k-h Thanks Barry