Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp1854535pxb; Sat, 21 Nov 2020 00:00:29 -0800 (PST) X-Google-Smtp-Source: ABdhPJwyQ0QKWR4wwnqx/MvuqGSsyDSYQ6pAjHoODM4D/XY7WlpNUFQ6kLVJ1aiabjwLua5RDHWR X-Received: by 2002:aa7:d54a:: with SMTP id u10mr15056970edr.168.1605945629735; Sat, 21 Nov 2020 00:00:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605945629; cv=none; d=google.com; s=arc-20160816; b=xbC6Tj8FvBUWf8fLwJZdwNpGtfRxyzc+jXjd8OiZRJoSdUUPuh/kIJMc19bjjnJLgV bjbgYRTNOVnlvojFFbXxlpGWS75j8DdhO1tNsEWzVYxVydevDWoxbvLP/2KxJOqwRT/E woxMDNlt8TYypYl+NU0O7PFpLBKltQMTrLQwdtV0lPOHmuIKRcB3mYV4yRjWmvn+6SGH TsFDxw9AQAK2G3uOrcQRed0csaXqHppVSf/4X1V9iME1yeTkTDx/JU3n01CnjEq2LIbR jWgKEroGoz9fx7k1ozJvTJj+oNdBbHmRdJ1KcMmK1x0owj5sQmbrgwFhFwOWMvLJFgYI jYZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :mime-version:user-agent:date:message-id:subject:from:cc:to; bh=JNUtahFq6Sy77iYxFuAs5xCwkmNMHo+Fn0GuZPDPAcg=; b=vZNTAlM1pndRVYg72YKooxsFTvTKGyqNZGnQ3kZc+Og4C05V/ZBfnvsuPtahQzmd+r hDC0Ak+QJHv2JSQYJoD9ExkE/FzjJUrJkObuHwyql3JI/VnX2edVsNXm2Lr4xLjNoFyr dnEv3ThYdpybRHv13P3Wru43X68sAN+xwCWnWlMUjA14CLCpzR6Jz3yxzgsqFM/X0F86 ouPjCOFIZAf5DHzg7x2gEeyT68ihUqDRRvFP/kWwMKKkqwDw3u5mYwlPZZnf6r3XX/3i kpbXTZCS5MtRr9o5AvtHl7CDDon1AahBdwMSSCmeL+Wwg1XPSfDWhxH08oXg0IlTQSvz J9Qw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gl10si3025028ejb.138.2020.11.21.00.00.07; Sat, 21 Nov 2020 00:00:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727140AbgKUH6p (ORCPT + 99 others); Sat, 21 Nov 2020 02:58:45 -0500 Received: from szxga08-in.huawei.com ([45.249.212.255]:2312 "EHLO szxga08-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726483AbgKUH6p (ORCPT ); Sat, 21 Nov 2020 02:58:45 -0500 Received: from dggeme753-chm.china.huawei.com (unknown [172.30.72.56]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4CdQl32JYqz13Kkm; Sat, 21 Nov 2020 15:58:07 +0800 (CST) Received: from [10.174.184.120] (10.174.184.120) by dggeme753-chm.china.huawei.com (10.3.19.99) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1913.5; Sat, 21 Nov 2020 15:58:37 +0800 To: , , Alex Williamson CC: , , , , , , "xuxiaoyang (C)" From: "xuxiaoyang (C)" Subject: [PATCH v2] vfio iommu type1: Improve vfio_iommu_type1_pin_pages performance Message-ID: <60d22fc6-88d6-c7c2-90bd-1e8eccb1fdcc@huawei.com> Date: Sat, 21 Nov 2020 15:58:37 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.3.2 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Language: en-GB Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.184.120] X-ClientProxiedBy: dggeme709-chm.china.huawei.com (10.1.199.105) To dggeme753-chm.china.huawei.com (10.3.19.99) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org vfio_pin_pages() accepts an array of unrelated iova pfns and processes each to return the physical pfn. When dealing with large arrays of contiguous iovas, vfio_iommu_type1_pin_pages is very inefficient because it is processed page by page.In this case, we can divide the iova pfn array into multiple continuous ranges and optimize them. For example, when the iova pfn array is {1,5,6,7,9}, it will be divided into three groups {1}, {5,6,7}, {9} for processing. When processing {5,6,7}, the number of calls to pin_user_pages_remote is reduced from 3 times to once. For single page or large array of discontinuous iovas, we still use vfio_pin_page_external to deal with it to reduce the performance loss caused by refactoring. Signed-off-by: Xiaoyang Xu --- v1 -> v2: * make vfio_iommu_type1_pin_contiguous_pages use vfio_pin_page_external to pin single page when npage=1 * make vfio_pin_contiguous_pages_external use set npage to mark consecutive pages as dirty. simplify the processing logic of unwind * remove unnecessary checks in vfio_get_contiguous_pages_length, put the least costly judgment logic at the top, and replace vfio_iova_get_vfio_pfn with vfio_find_vpfn drivers/vfio/vfio_iommu_type1.c | 231 ++++++++++++++++++++++++++++---- 1 file changed, 204 insertions(+), 27 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 67e827638995..080727b531c6 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -628,6 +628,196 @@ static int vfio_unpin_page_external(struct vfio_dma *dma, dma_addr_t iova, return unlocked; } +static int contiguous_vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, + int prot, long npage, unsigned long *phys_pfn) +{ + struct page **pages = NULL; + unsigned int flags = 0; + int i, ret; + + pages = kvmalloc_array(npage, sizeof(struct page *), GFP_KERNEL); + if (!pages) + return -ENOMEM; + + if (prot & IOMMU_WRITE) + flags |= FOLL_WRITE; + + mmap_read_lock(mm); + ret = pin_user_pages_remote(mm, vaddr, npage, flags | FOLL_LONGTERM, + pages, NULL, NULL); + mmap_read_unlock(mm); + + for (i = 0; i < ret; i++) + *(phys_pfn + i) = page_to_pfn(pages[i]); + + kvfree(pages); + + return ret; +} + +static int vfio_pin_contiguous_pages_external(struct vfio_iommu *iommu, + struct vfio_dma *dma, + unsigned long *user_pfn, + int npage, unsigned long *phys_pfn, + bool do_accounting) +{ + int ret, i, j, lock_acct = 0; + unsigned long remote_vaddr; + dma_addr_t iova; + struct mm_struct *mm; + struct vfio_pfn *vpfn; + + mm = get_task_mm(dma->task); + if (!mm) + return -ENODEV; + + iova = user_pfn[0] << PAGE_SHIFT; + remote_vaddr = dma->vaddr + iova - dma->iova; + ret = contiguous_vaddr_get_pfn(mm, remote_vaddr, dma->prot, + npage, phys_pfn); + mmput(mm); + if (ret <= 0) + return ret; + + npage = ret; + for (i = 0; i < npage; i++) { + iova = user_pfn[i] << PAGE_SHIFT; + ret = vfio_add_to_pfn_list(dma, iova, phys_pfn[i]); + if (ret) + goto unwind; + + if (!is_invalid_reserved_pfn(phys_pfn[i])) + lock_acct++; + } + + if (do_accounting) { + ret = vfio_lock_acct(dma, lock_acct, true); + if (ret) { + if (ret == -ENOMEM) + pr_warn("%s: Task %s (%d) RLIMIT_MEMLOCK (%ld) exceeded\n", + __func__, dma->task->comm, task_pid_nr(dma->task), + task_rlimit(dma->task, RLIMIT_MEMLOCK)); + goto unwind; + } + } + + if (iommu->dirty_page_tracking) { + unsigned long pgshift = __ffs(iommu->pgsize_bitmap); + + /* + * Bitmap populated with the smallest supported page + * size + */ + bitmap_set(dma->bitmap, + ((user_pfn[0] << PAGE_SHIFT) - dma->iova) >> pgshift, npage); + } + + return i; +unwind: + for (j = 0; j < npage; j++) { + if (j < i) { + iova = user_pfn[j] << PAGE_SHIFT; + vpfn = vfio_find_vpfn(dma, iova); + vfio_iova_put_vfio_pfn(dma, vpfn); + } else { + put_pfn(phys_pfn[j], dma->prot); + } + + phys_pfn[j] = 0; + } + + return ret; +} + +static int vfio_iommu_type1_pin_contiguous_pages(struct vfio_iommu *iommu, + struct vfio_dma *dma, + unsigned long *user_pfn, + int npage, unsigned long *phys_pfn, + bool do_accounting) +{ + int ret = 0, i, j; + unsigned long remote_vaddr; + dma_addr_t iova; + + if (npage == 1) + goto pin_single_page; + + ret = vfio_pin_contiguous_pages_external(iommu, dma, user_pfn, npage, + phys_pfn, do_accounting); + if (ret == npage) + return ret; + + if (ret < 0) + ret = 0; + +pin_single_page: + for (i = ret; i < npage; i++) { + iova = user_pfn[i] << PAGE_SHIFT; + remote_vaddr = dma->vaddr + iova - dma->iova; + + ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn[i], + do_accounting); + if (ret) + goto pin_unwind; + + ret = vfio_add_to_pfn_list(dma, iova, phys_pfn[i]); + if (ret) { + if (put_pfn(phys_pfn[i], dma->prot) && do_accounting) + vfio_lock_acct(dma, -1, true); + goto pin_unwind; + } + + if (iommu->dirty_page_tracking) { + unsigned long pgshift = __ffs(iommu->pgsize_bitmap); + + /* + * Bitmap populated with the smallest supported page + * size + */ + bitmap_set(dma->bitmap, + (iova - dma->iova) >> pgshift, 1); + } + } + + return i; + +pin_unwind: + phys_pfn[i] = 0; + for (j = 0; j < i; j++) { + iova = user_pfn[j] << PAGE_SHIFT; + vfio_unpin_page_external(dma, iova, do_accounting); + phys_pfn[j] = 0; + } + + return ret; +} + +static int vfio_get_contiguous_pages_length(struct vfio_dma *dma, + unsigned long *user_pfn, int npage) +{ + int i; + dma_addr_t iova = user_pfn[0] << PAGE_SHIFT; + struct vfio_pfn *vpfn; + + if (npage <= 1) + return npage; + + for (i = 1; i < npage; i++) { + if (user_pfn[i] != user_pfn[0] + i) + break; + + iova = user_pfn[i] << PAGE_SHIFT; + if (iova >= dma->iova + dma->size || + iova + PAGE_SIZE <= dma->iova) + break; + + vpfn = vfio_find_vpfn(dma, iova); + if (vpfn) + break; + } + return i; +} + static int vfio_iommu_type1_pin_pages(void *iommu_data, struct iommu_group *iommu_group, unsigned long *user_pfn, @@ -637,9 +827,9 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, struct vfio_iommu *iommu = iommu_data; struct vfio_group *group; int i, j, ret; - unsigned long remote_vaddr; struct vfio_dma *dma; bool do_accounting; + int contiguous_npage; if (!iommu || !user_pfn || !phys_pfn) return -EINVAL; @@ -663,7 +853,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, */ do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu); - for (i = 0; i < npage; i++) { + for (i = 0; i < npage; i += contiguous_npage) { dma_addr_t iova; struct vfio_pfn *vpfn; @@ -682,31 +872,18 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, vpfn = vfio_iova_get_vfio_pfn(dma, iova); if (vpfn) { phys_pfn[i] = vpfn->pfn; - continue; - } - - remote_vaddr = dma->vaddr + (iova - dma->iova); - ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn[i], - do_accounting); - if (ret) - goto pin_unwind; - - ret = vfio_add_to_pfn_list(dma, iova, phys_pfn[i]); - if (ret) { - if (put_pfn(phys_pfn[i], dma->prot) && do_accounting) - vfio_lock_acct(dma, -1, true); - goto pin_unwind; - } - - if (iommu->dirty_page_tracking) { - unsigned long pgshift = __ffs(iommu->pgsize_bitmap); - - /* - * Bitmap populated with the smallest supported page - * size - */ - bitmap_set(dma->bitmap, - (iova - dma->iova) >> pgshift, 1); + contiguous_npage = 1; + } else { + ret = vfio_get_contiguous_pages_length(dma, + &user_pfn[i], npage - i); + if (ret < 0) + goto pin_unwind; + + ret = vfio_iommu_type1_pin_contiguous_pages(iommu, + dma, &user_pfn[i], ret, &phys_pfn[i], do_accounting); + if (ret < 0) + goto pin_unwind; + contiguous_npage = ret; } } ret = i; -- 2.19.1