Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp612589pxk; Thu, 3 Sep 2020 08:14:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzdmY88o0iwxakgWiCXAIjYpYmcJB7JfHAc9LeQMLrHb6nFnTKFeHkXDjxzTB64/82n4m+L X-Received: by 2002:a50:baa2:: with SMTP id x31mr3618381ede.330.1599146071800; Thu, 03 Sep 2020 08:14:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599146071; cv=none; d=google.com; s=arc-20160816; b=KLo621y/hWQPg8TgYpTdCSUfPBAuJtfzhNuyq9uK5MCWkmA7s3Bxn+JuQSAZpHwNsQ 7gdZ68QSpRFd2J+BtgzoZ99sq2g+c182ivIoh7vU7RB1iytsD0KMrcOa8jkIOju2Cllt gCF6jMbf9PqLihdMOjCqYYIPoR9g/Kw4M23nhiEiPkAs8zHSPHOVfc7EL8B/WnpqXcfI bzEhqadplBMic+8FanOyVUuI9e0MVVjC5rPSJ33O2N4MKWF2G64EsXfOCd0Pp9XbIu7c RSFZ152h9XpqpZRGtooSUNZAhAzPgI3zFLz09B28jKkjtY4sVqygOabq1C26Yjg/09C3 AdWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=VpD82lVaKaS2JuiRT2exYaoCXdVpXaDvQ753aIi0KFI=; b=fYSV5ijIpupz0nIs4n0SpQ+hew5ayLIvqs6HaAXwMDILMQHhoJJbiUl42qouxTOx3e DUjuxxZe88UAynRAFLGbpanWj2D+2aXNOEkiOlbHSfjDegENPi3Y72pxF3JFycSWvdDt EtnrtWDcKGLcIrZLaUbrwNC70uuzRdZS0D4eHh6mHcSwRWZMQu5d/unFw2O2yGYmt6Vg K3uVXnokeaav2EwAYrY0Nv8abCI9r2Vqf44JXl0UTCIc4YI9O26RXaZkfbI+sb8bLPhi wXmODutY9hVFbsvNwjqSzW8pp9a8WRwgL0VWn4U9Vq7tSKgTwPjL8Nziw0DYSYAHwX5S MIbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="bwCTh40/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ha18si1997598ejb.543.2020.09.03.08.14.08; Thu, 03 Sep 2020 08:14:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="bwCTh40/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729315AbgICPMW (ORCPT + 99 others); Thu, 3 Sep 2020 11:12:22 -0400 Received: from mail.kernel.org ([198.145.29.99]:44718 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728785AbgICMTP (ORCPT ); Thu, 3 Sep 2020 08:19:15 -0400 Received: from localhost (unknown [213.57.247.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id BA2A6208B3; Thu, 3 Sep 2020 12:19:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1599135554; bh=+cqm82p4iiF8vpY35ibkg76jII4BWrBq5/GuWJZ1IY8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=bwCTh40/5BUeuPBBaAzgC30W1HQqKwjiFtuI5MCC7hpuPTX+kDNC40GGdxQirpsin 2NDOBw7C6gIRBXTj2ZtTXlTOLAn9j3Qsy/YxJ9naYPS/JETR8ZmvrjJEzwdmEwu3Gk y18rdwP3q8d2qGhezN6ufFsw8gkNX7wll0V4+pig= From: Leon Romanovsky To: Christoph Hellwig , Doug Ledford , Jason Gunthorpe Cc: Maor Gottlieb , linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org Subject: [PATCH rdma-next 4/4] RDMA/umem: Move to allocate SG table from pages Date: Thu, 3 Sep 2020 15:18:53 +0300 Message-Id: <20200903121853.1145976-5-leon@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200903121853.1145976-1-leon@kernel.org> References: <20200903121853.1145976-1-leon@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Maor Gottlieb Remove the implementation of ib_umem_add_sg_table and instead call to sg_alloc_table_append which already has the logic to merge contiguous pages. Besides that it removes duplicated functionality, it reduces the memory consumption of the SG table significantly. Prior to this patch, the SG table was allocated in advance regardless consideration of contiguous pages. In huge pages system of 2MB page size, without this change, the SG table would contain x512 SG entries. E.g. for 100GB memory registration: Number of entries Size Before 26214400 600.0MB After 51200 1.2MB Signed-off-by: Maor Gottlieb Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/umem.c | 93 +++++----------------------------- 1 file changed, 14 insertions(+), 79 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index be889e99cfac..9eb946f665ec 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -62,73 +62,6 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d sg_free_table(&umem->sg_head); } -/* ib_umem_add_sg_table - Add N contiguous pages to scatter table - * - * sg: current scatterlist entry - * page_list: array of npage struct page pointers - * npages: number of pages in page_list - * max_seg_sz: maximum segment size in bytes - * nents: [out] number of entries in the scatterlist - * - * Return new end of scatterlist - */ -static struct scatterlist *ib_umem_add_sg_table(struct scatterlist *sg, - struct page **page_list, - unsigned long npages, - unsigned int max_seg_sz, - int *nents) -{ - unsigned long first_pfn; - unsigned long i = 0; - bool update_cur_sg = false; - bool first = !sg_page(sg); - - /* Check if new page_list is contiguous with end of previous page_list. - * sg->length here is a multiple of PAGE_SIZE and sg->offset is 0. - */ - if (!first && (page_to_pfn(sg_page(sg)) + (sg->length >> PAGE_SHIFT) == - page_to_pfn(page_list[0]))) - update_cur_sg = true; - - while (i != npages) { - unsigned long len; - struct page *first_page = page_list[i]; - - first_pfn = page_to_pfn(first_page); - - /* Compute the number of contiguous pages we have starting - * at i - */ - for (len = 0; i != npages && - first_pfn + len == page_to_pfn(page_list[i]) && - len < (max_seg_sz >> PAGE_SHIFT); - len++) - i++; - - /* Squash N contiguous pages from page_list into current sge */ - if (update_cur_sg) { - if ((max_seg_sz - sg->length) >= (len << PAGE_SHIFT)) { - sg_set_page(sg, sg_page(sg), - sg->length + (len << PAGE_SHIFT), - 0); - update_cur_sg = false; - continue; - } - update_cur_sg = false; - } - - /* Squash N contiguous pages into next sge or first sge */ - if (!first) - sg = sg_next(sg); - - (*nents)++; - sg_set_page(sg, first_page, len << PAGE_SHIFT, 0); - first = false; - } - - return sg; -} - /** * ib_umem_find_best_pgsz - Find best HW page size to use for this MR * @@ -205,7 +138,8 @@ static struct ib_umem *__ib_umem_get(struct ib_device *device, struct mm_struct *mm; unsigned long npages; int ret; - struct scatterlist *sg; + struct scatterlist *sg = NULL; + struct sg_append append = {}; unsigned int gup_flags = FOLL_WRITE; /* @@ -255,15 +189,9 @@ static struct ib_umem *__ib_umem_get(struct ib_device *device, cur_base = addr & PAGE_MASK; - ret = sg_alloc_table(&umem->sg_head, npages, GFP_KERNEL); - if (ret) - goto vma; - if (!umem->writable) gup_flags |= FOLL_FORCE; - sg = umem->sg_head.sgl; - while (npages) { cond_resched(); ret = pin_user_pages_fast(cur_base, @@ -276,10 +204,18 @@ static struct ib_umem *__ib_umem_get(struct ib_device *device, cur_base += ret * PAGE_SIZE; npages -= ret; - - sg = ib_umem_add_sg_table(sg, page_list, ret, - dma_get_max_seg_size(device->dma_device), - &umem->sg_nents); + append.left_pages = npages; + append.prv = sg; + sg = sg_alloc_table_append(&umem->sg_head, page_list, ret, 0, + ret << PAGE_SHIFT, + dma_get_max_seg_size(device->dma_device), + GFP_KERNEL, &append); + umem->sg_nents = umem->sg_head.nents; + if (IS_ERR(sg)) { + unpin_user_pages_dirty_lock(page_list, ret, 0); + ret = PTR_ERR(sg); + goto umem_release; + } } sg_mark_end(sg); @@ -301,7 +237,6 @@ static struct ib_umem *__ib_umem_get(struct ib_device *device, umem_release: __ib_umem_release(device, umem, 0); -vma: atomic64_sub(ib_umem_num_pages(umem), &mm->pinned_vm); out: free_page((unsigned long) page_list); -- 2.26.2