Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3668931pxb; Sun, 7 Feb 2021 18:30:09 -0800 (PST) X-Google-Smtp-Source: ABdhPJxTjl7w5jVWm33wVyNzD1nepT8Tgdpvg8BdvSEMbnKCr48kuBsJwMSwXXGfj6F/7GZWBlsJ X-Received: by 2002:a05:6402:289:: with SMTP id l9mr964017edv.218.1612751408977; Sun, 07 Feb 2021 18:30:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612751408; cv=none; d=google.com; s=arc-20160816; b=GC5BVyfKVC/dhW9tzBpJTXzNqNQgFN34Etm8kQnP76Fjb3Wt5Im1rY3CA25CzFz0ii BSTeAbEpFgBA5AaKMiEsgbgk45WLh3veKewsYOR04z5+gZBiR7jb7lGyOn8c0nkjmN6s QPRSYGYUonFVXahAf/uv5R3gYBKK17r1yIQ7ocOcJcEUXZ1LgXvgKs8livslZ2LdGER5 44QNo2Z6GV4YLmq1yJERQMcWfGMPni8q9xRpEsn/mdS8zCjd/gJI2VjjpkBJW4No1XW3 WZDcowHxFeXBAeBbk/vd02rnKuh0Nt2ClcJwgvHaF8eDPXFh/ha8dkMUUGsCUc4yYSbs Mojw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from; bh=SUBDmS+Hd1KADpmNXXasv8E+3uVZhlYyHD7SU5JzTV8=; b=RPj+eIU+Z7vLCqjBFdbqsnUxg/yZsBq/0MATTnWoidjVNZhrX9X6LdSuS7BKySQj3F pTr680vzLBdyu/MukqxhWctnqqJ0vNB8jGPEtdbGiNX/iLCKeiL41H9Xhj9fV0DBAC5R /tFRgNmmGoCyImTKTwcY6WvTf0n+n89x6r2nP3Xms/GdYn46rltE5CO/wELwSUYzlI78 wmMr1q4OYzVUmUKuBpFp7WUBbdpecQP52t6DvAxvbtFc897WyLfvS/uC2l/ANvUhdgs1 fYLqLW6fvMPdmKNEDt17t63yS2RfYuxj3bX0noNRldfNgpzn8r9mmOD3IGYzuCvxe19I 1ZeA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l1si11865389edk.364.2021.02.07.18.29.45; Sun, 07 Feb 2021 18:30:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229716AbhBHC26 convert rfc822-to-8bit (ORCPT + 99 others); Sun, 7 Feb 2021 21:28:58 -0500 Received: from szxga02-in.huawei.com ([45.249.212.188]:3008 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229611AbhBHC25 (ORCPT ); Sun, 7 Feb 2021 21:28:57 -0500 Received: from DGGEMM404-HUB.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4DYqfS6xxczRCKl; Mon, 8 Feb 2021 10:26:56 +0800 (CST) Received: from dggpemm100010.china.huawei.com (7.185.36.24) by DGGEMM404-HUB.china.huawei.com (10.3.20.212) with Microsoft SMTP Server (TLS) id 14.3.498.0; Mon, 8 Feb 2021 10:27:06 +0800 Received: from dggemi761-chm.china.huawei.com (10.1.198.147) by dggpemm100010.china.huawei.com (7.185.36.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2106.2; Mon, 8 Feb 2021 10:27:06 +0800 Received: from dggemi761-chm.china.huawei.com ([10.9.49.202]) by dggemi761-chm.china.huawei.com ([10.9.49.202]) with mapi id 15.01.2106.006; Mon, 8 Feb 2021 10:27:06 +0800 From: "Song Bao Hua (Barry Song)" To: Matthew Wilcox CC: "Wangzhou (B)" , "linux-kernel@vger.kernel.org" , "iommu@lists.linux-foundation.org" , "linux-mm@kvack.org" , "linux-arm-kernel@lists.infradead.org" , "linux-api@vger.kernel.org" , Andrew Morton , Alexander Viro , "gregkh@linuxfoundation.org" , "jgg@ziepe.ca" , "kevin.tian@intel.com" , "jean-philippe@linaro.org" , "eric.auger@redhat.com" , "Liguozhu (Kenneth)" , "zhangfei.gao@linaro.org" , "chensihang (A)" Subject: RE: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory pin Thread-Topic: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory pin Thread-Index: AQHW/SrsWWMRpilf2UC1Pz29QqsBVqpMsX2AgACQE1D//7IVAIAAi2xQ Date: Mon, 8 Feb 2021 02:27:06 +0000 Message-ID: References: <1612685884-19514-1-git-send-email-wangzhou1@hisilicon.com> <1612685884-19514-2-git-send-email-wangzhou1@hisilicon.com> <20210207213409.GL308988@casper.infradead.org> <20210208013056.GM308988@casper.infradead.org> In-Reply-To: <20210208013056.GM308988@casper.infradead.org> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.126.200.200] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > -----Original Message----- > From: owner-linux-mm@kvack.org [mailto:owner-linux-mm@kvack.org] On Behalf Of > Matthew Wilcox > Sent: Monday, February 8, 2021 2:31 PM > To: Song Bao Hua (Barry Song) > Cc: Wangzhou (B) ; linux-kernel@vger.kernel.org; > iommu@lists.linux-foundation.org; linux-mm@kvack.org; > linux-arm-kernel@lists.infradead.org; linux-api@vger.kernel.org; Andrew > Morton ; Alexander Viro ; > gregkh@linuxfoundation.org; jgg@ziepe.ca; kevin.tian@intel.com; > jean-philippe@linaro.org; eric.auger@redhat.com; Liguozhu (Kenneth) > ; zhangfei.gao@linaro.org; chensihang (A) > > Subject: Re: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory > pin > > On Sun, Feb 07, 2021 at 10:24:28PM +0000, Song Bao Hua (Barry Song) wrote: > > > > In high-performance I/O cases, accelerators might want to perform > > > > I/O on a memory without IO page faults which can result in dramatically > > > > increased latency. Current memory related APIs could not achieve this > > > > requirement, e.g. mlock can only avoid memory to swap to backup device, > > > > page migration can still trigger IO page fault. > > > > > > Well ... we have two requirements. The application wants to not take > > > page faults. The system wants to move the application to a different > > > NUMA node in order to optimise overall performance. Why should the > > > application's desires take precedence over the kernel's desires? And why > > > should it be done this way rather than by the sysadmin using numactl to > > > lock the application to a particular node? > > > > NUMA balancer is just one of many reasons for page migration. Even one > > simple alloc_pages() can cause memory migration in just single NUMA > > node or UMA system. > > > > The other reasons for page migration include but are not limited to: > > * memory move due to CMA > > * memory move due to huge pages creation > > > > Hardly we can ask users to disable the COMPACTION, CMA and Huge Page > > in the whole system. > > You're dodging the question. Should the CMA allocation fail because > another application is using SVA? > > I would say no. I would say no as well. While IOMMU is enabled, CMA almost has one user only: IOMMU driver as other drivers will depend on iommu to use non-contiguous memory though they are still calling dma_alloc_coherent(). In iommu driver, dma_alloc_coherent is called during initialization and there is no new allocation afterwards. So it wouldn't cause runtime impact on SVA performance. Even there is new allocations, CMA will fall back to general alloc_pages() and iommu drivers are almost allocating small memory for command queues. So I would say general compound pages, huge pages, especially transparent huge pages, would be bigger concerns than CMA for internal page migration within one NUMA. Not like CMA, general alloc_pages() can get memory by moving pages other than those pinned. And there is no guarantee we can always bind the memory of SVA applications to single one NUMA, so NUMA balancing is still a concern. But I agree we need a way to make CMA success while the userspace pages are pinned. Since pin has been viral in many drivers, I assume there is a way to handle this. Otherwise, APIs like V4L2_MEMORY_USERPTR[1] will possibly make CMA fail as there is no guarantee that usersspace will allocate unmovable memory and there is no guarantee the fallback path- alloc_pages() can succeed while allocating big memory. Will investigate more. > The application using SVA should take the one-time > performance hit from having its memory moved around. Sometimes I also feel SVA is doomed to suffer from performance impact due to page migration. But we are still trying to extend its use cases to high-performance I/O. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/v4l2-core/videobuf-dma-sg.c Thanks Barry