Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp176118pxb; Thu, 21 Jan 2021 04:33:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJxQsyiDhjWW0n2syLPYoxdrKhrCAudZoXPPUPITb9EHHl1ko36MsH4UeXNWXQNRv+C4+WyU X-Received: by 2002:a17:906:c40e:: with SMTP id u14mr9012971ejz.547.1611232418369; Thu, 21 Jan 2021 04:33:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611232418; cv=none; d=google.com; s=arc-20160816; b=Jm6+NY/v4VaEw7R0SI+dQpABlyfP6YHsNqB7ANCtr4CjJAxxEyx2U70HnJm1bodz0S 9reTe2PT0rFK5+ZwQ0p08yxecaFoJ0shPh3nqIhJyRgAzJ2smgZBZAOkKQYVugx5oFlS 1YVX6Oa/fGiObUHB2vdtyNtsqlrpUYcpyzZqitKXlvnkjVCZ5OVU32C5QMsSqB5qH//K as28e5QQ68brLVIrjeQjfLcZOwftEgzeQlnIRPr3ULWfz4rpMfoyLi4julH/uXGahZKg 9t4Vi4GlPW/X/wtP6o05wUHksgdUaHucs3aumCb3V8szItCpYVKOIFEsXvJtbND6aSwo JllA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=SB5iKBLQYxuQX5yW9n1qDZnoqRPnShKWSUYSzLS4PtM=; b=IAkewQ8do8KATzbZ50cEcc+eZMwhBzHjjnCY9uafaoVCex0fbotMyFQE35Km/U+Fx+ 3pFbdVTRcmO0XXerZAWofi9cmQjEnEEOeQ+WuO7WcAk2ldQQVW20b7LQFYE72HTpKS09 P4KeBvz42K7VLj5boO8oQIbtJI0VdAw75XzNjo/yflWgGlj4U3wVEXR8X1oLvyC0xHAH +Cw4P0U3Yl2YWRH0KowdLi6AVIkYOdx26ezwQgXFOdEAY/atSYVnFB65cOZZQSgxU6Ns 9v5uzAAsip5R5taHgPLqc7+dpeP/HIzfIFN/hnhTYSJdzLSNEb9pXs1DwqSJ6DpeqCsr LALA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HnqgRuO0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i1si2207371edl.551.2021.01.21.04.33.14; Thu, 21 Jan 2021 04:33:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HnqgRuO0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731242AbhAUMcC (ORCPT + 99 others); Thu, 21 Jan 2021 07:32:02 -0500 Received: from mail.kernel.org ([198.145.29.99]:56212 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729851AbhAUM3s (ORCPT ); Thu, 21 Jan 2021 07:29:48 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 9461D23A00; Thu, 21 Jan 2021 12:28:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1611232129; bh=9kaVLl7ns6FDXpeBythPZbgRKr2kk8KbnqpQJU19OBs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=HnqgRuO0LOTHyk6g9Xvq1dFVel9zw9hTm8TQQVORwQic36aIup+MTwSMASfaZvf1o IEVGsxlFFfJ65h/VpDsDNUHorY/gwoZOvTmr7B36D636ZdGudu7wXdmB3XNRwCa0w1 yWknpajjb73468E9zw++w3gkPg1xIeyETOKKeBNAPQ/iW/LpD27xTsgv7XdF0grw6r SZy1m6Uc8YAsXF0lpWvbcgmVHRd+k5NH3LaiggM4EEbBfKral7IAsyFX9dzOsF9xuN Y+oCaINMAuMtRU4hpKXeGa3Vjc+OoGxWaIa434FZ0xh8Anc0Or/4E8XXyIF7/dkAaz klUi9VNgb9kWQ== From: Mike Rapoport To: Andrew Morton Cc: Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , David Hildenbrand , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Mike Rapoport , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt Subject: [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation Date: Thu, 21 Jan 2021 14:27:19 +0200 Message-Id: <20210121122723.3446-8-rppt@kernel.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20210121122723.3446-1-rppt@kernel.org> References: <20210121122723.3446-1-rppt@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Mike Rapoport Removing a PAGE_SIZE page from the direct map every time such page is allocated for a secret memory mapping will cause severe fragmentation of the direct map. This fragmentation can be reduced by using PMD-size pages as a pool for small pages for secret memory mappings. Add a gen_pool per secretmem inode and lazily populate this pool with PMD-size pages. As pages allocated by secretmem become unmovable, use CMA to back large page caches so that page allocator won't be surprised by failing attempt to migrate these pages. The CMA area used by secretmem is controlled by the "secretmem=" kernel parameter. This allows explicit control over the memory available for secretmem and provides upper hard limit for secretmem consumption. Signed-off-by: Mike Rapoport Cc: Alexander Viro Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Catalin Marinas Cc: Christopher Lameter Cc: Dan Williams Cc: Dave Hansen Cc: David Hildenbrand Cc: Elena Reshetova Cc: Hagen Paul Pfeifer Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: James Bottomley Cc: "Kirill A. Shutemov" Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael Kerrisk Cc: Palmer Dabbelt Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Peter Zijlstra Cc: Rick Edgecombe Cc: Roman Gushchin Cc: Shakeel Butt Cc: Shuah Khan Cc: Thomas Gleixner Cc: Tycho Andersen Cc: Will Deacon --- mm/Kconfig | 2 + mm/secretmem.c | 175 +++++++++++++++++++++++++++++++++++++++++-------- 2 files changed, 150 insertions(+), 27 deletions(-) diff --git a/mm/Kconfig b/mm/Kconfig index 5f8243442f66..ec35bf406439 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -874,5 +874,7 @@ config KMAP_LOCAL config SECRETMEM def_bool ARCH_HAS_SET_DIRECT_MAP && !EMBEDDED + select GENERIC_ALLOCATOR + select CMA endmenu diff --git a/mm/secretmem.c b/mm/secretmem.c index 904351d12c33..469211c7cc3a 100644 --- a/mm/secretmem.c +++ b/mm/secretmem.c @@ -7,12 +7,15 @@ #include #include +#include #include #include #include #include #include +#include #include +#include #include #include #include @@ -35,24 +38,94 @@ #define SECRETMEM_FLAGS_MASK SECRETMEM_MODE_MASK struct secretmem_ctx { + struct gen_pool *pool; unsigned int mode; }; -static struct page *secretmem_alloc_page(gfp_t gfp) +static struct cma *secretmem_cma; + +static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp) { + unsigned long nr_pages = (1 << PMD_PAGE_ORDER); + struct gen_pool *pool = ctx->pool; + unsigned long addr; + struct page *page; + int i, err; + + page = cma_alloc(secretmem_cma, nr_pages, PMD_SIZE, gfp & __GFP_NOWARN); + if (!page) + return -ENOMEM; + /* - * FIXME: use a cache of large pages to reduce the direct map - * fragmentation + * clear the data left from the prevoius user before dropping the + * pages from the direct map */ - return alloc_page(gfp | __GFP_ZERO); + for (i = 0; i < nr_pages; i++) + clear_highpage(page + i); + + err = set_direct_map_invalid_noflush(page, nr_pages); + if (err) + goto err_cma_release; + + addr = (unsigned long)page_address(page); + err = gen_pool_add(pool, addr, PMD_SIZE, NUMA_NO_NODE); + if (err) + goto err_set_direct_map; + + flush_tlb_kernel_range(addr, addr + PMD_SIZE); + + return 0; + +err_set_direct_map: + /* + * If a split of PUD-size page was required, it already happened + * when we marked the pages invalid which guarantees that this call + * won't fail + */ + set_direct_map_default_noflush(page, nr_pages); +err_cma_release: + cma_release(secretmem_cma, page, nr_pages); + return err; +} + +static void secretmem_free_page(struct secretmem_ctx *ctx, struct page *page) +{ + unsigned long addr = (unsigned long)page_address(page); + struct gen_pool *pool = ctx->pool; + + gen_pool_free(pool, addr, PAGE_SIZE); +} + +static struct page *secretmem_alloc_page(struct secretmem_ctx *ctx, + gfp_t gfp) +{ + struct gen_pool *pool = ctx->pool; + unsigned long addr; + struct page *page; + int err; + + if (gen_pool_avail(pool) < PAGE_SIZE) { + err = secretmem_pool_increase(ctx, gfp); + if (err) + return NULL; + } + + addr = gen_pool_alloc(pool, PAGE_SIZE); + if (!addr) + return NULL; + + page = virt_to_page(addr); + get_page(page); + + return page; } static vm_fault_t secretmem_fault(struct vm_fault *vmf) { + struct secretmem_ctx *ctx = vmf->vma->vm_file->private_data; struct address_space *mapping = vmf->vma->vm_file->f_mapping; struct inode *inode = file_inode(vmf->vma->vm_file); pgoff_t offset = vmf->pgoff; - unsigned long addr; struct page *page; int err; @@ -62,40 +135,25 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf) retry: page = find_lock_page(mapping, offset); if (!page) { - page = secretmem_alloc_page(vmf->gfp_mask); + page = secretmem_alloc_page(ctx, vmf->gfp_mask); if (!page) return VM_FAULT_OOM; - err = set_direct_map_invalid_noflush(page, 1); - if (err) { - put_page(page); - return vmf_error(err); - } - __SetPageUptodate(page); err = add_to_page_cache(page, mapping, offset, vmf->gfp_mask); if (unlikely(err)) { + secretmem_free_page(ctx, page); put_page(page); if (err == -EEXIST) goto retry; - goto err_restore_direct_map; + return vmf_error(err); } - addr = (unsigned long)page_address(page); - flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + set_page_private(page, (unsigned long)ctx); } vmf->page = page; return VM_FAULT_LOCKED; - -err_restore_direct_map: - /* - * If a split of large page was required, it already happened - * when we marked the page invalid which guarantees that this call - * won't fail - */ - set_direct_map_default_noflush(page, 1); - return vmf_error(err); } static const struct vm_operations_struct secretmem_vm_ops = { @@ -141,8 +199,9 @@ static int secretmem_migratepage(struct address_space *mapping, static void secretmem_freepage(struct page *page) { - set_direct_map_default_noflush(page, 1); - clear_highpage(page); + struct secretmem_ctx *ctx = (struct secretmem_ctx *)page_private(page); + + secretmem_free_page(ctx, page); } static const struct address_space_operations secretmem_aops = { @@ -177,13 +236,18 @@ static struct file *secretmem_file_create(unsigned long flags) if (!ctx) goto err_free_inode; + ctx->pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE); + if (!ctx->pool) + goto err_free_ctx; + file = alloc_file_pseudo(inode, secretmem_mnt, "secretmem", O_RDWR, &secretmem_fops); if (IS_ERR(file)) - goto err_free_ctx; + goto err_free_pool; mapping_set_unevictable(inode->i_mapping); + inode->i_private = ctx; inode->i_mapping->private_data = ctx; inode->i_mapping->a_ops = &secretmem_aops; @@ -197,6 +261,8 @@ static struct file *secretmem_file_create(unsigned long flags) return file; +err_free_pool: + gen_pool_destroy(ctx->pool); err_free_ctx: kfree(ctx); err_free_inode: @@ -215,6 +281,9 @@ SYSCALL_DEFINE1(memfd_secret, unsigned long, flags) if (flags & ~(SECRETMEM_FLAGS_MASK | O_CLOEXEC)) return -EINVAL; + if (!secretmem_cma) + return -ENOMEM; + fd = get_unused_fd_flags(flags & O_CLOEXEC); if (fd < 0) return fd; @@ -235,11 +304,37 @@ SYSCALL_DEFINE1(memfd_secret, unsigned long, flags) return err; } +static void secretmem_cleanup_chunk(struct gen_pool *pool, + struct gen_pool_chunk *chunk, void *data) +{ + unsigned long start = chunk->start_addr; + unsigned long end = chunk->end_addr; + struct page *page = virt_to_page(start); + unsigned long nr_pages = (end - start + 1) / PAGE_SIZE; + int i; + + set_direct_map_default_noflush(page, nr_pages); + + for (i = 0; i < nr_pages; i++) + clear_highpage(page + i); + + cma_release(secretmem_cma, page, nr_pages); +} + +static void secretmem_cleanup_pool(struct secretmem_ctx *ctx) +{ + struct gen_pool *pool = ctx->pool; + + gen_pool_for_each_chunk(pool, secretmem_cleanup_chunk, ctx); + gen_pool_destroy(pool); +} + static void secretmem_evict_inode(struct inode *inode) { struct secretmem_ctx *ctx = inode->i_private; truncate_inode_pages_final(&inode->i_data); + secretmem_cleanup_pool(ctx); clear_inode(inode); kfree(ctx); } @@ -276,3 +371,29 @@ static int secretmem_init(void) return ret; } fs_initcall(secretmem_init); + +static int __init secretmem_setup(char *str) +{ + phys_addr_t align = PMD_SIZE; + unsigned long reserved_size; + int err; + + reserved_size = memparse(str, NULL); + if (!reserved_size) + return 0; + + if (reserved_size * 2 > PUD_SIZE) + align = PUD_SIZE; + + err = cma_declare_contiguous(0, reserved_size, 0, align, 0, false, + "secretmem", &secretmem_cma); + if (err) { + pr_err("failed to create CMA: %d\n", err); + return err; + } + + pr_info("reserved %luM\n", reserved_size >> 20); + + return 0; +} +__setup("secretmem=", secretmem_setup); -- 2.28.0