Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp5896946ybi; Wed, 12 Jun 2019 10:17:32 -0700 (PDT) X-Google-Smtp-Source: APXvYqwkiQljFiPQorG3xMGY4IddBjUMeFTjiSTCM8q8JeOvaGIP8jbvado28hexz02Z4qlLbog3 X-Received: by 2002:a63:5d25:: with SMTP id r37mr25355775pgb.449.1560359852156; Wed, 12 Jun 2019 10:17:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560359852; cv=none; d=google.com; s=arc-20160816; b=BXmGXpeXcxkBWgGVa9WBMhYWpwlafnrvxX2527nFP8VwMK5x9N9QJ+W0yeiDvObTTk SWUNTN4rMDLGDA6gFi41+4DIdnJMucjY5e8O0foB7JdgLpAyH+QAyI7j8zr0xAlp9Uwz itTt6tTY4M57ozeu7MlCsN7FHfDkxh7U7LF2zavYyl4JtiUfziMt5T54jxvl2nqasaFd PFRbI0EEZzsYI/UvZe6m2W5lnUGJ0oWbPbsN79XV5O30tOREJ7DhMFqF4rMcKYK/4SmP I7W/qzSOTNU8pqGfmyFdB0Gk9LEMrnpL+CXY5sbE1b/k98tRHw7G1+uZX3pvGVJGOyyK 3wmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject :dkim-signature; bh=6PDP5o7bl4rUA/772CcJlc+gJIZRm/9c226zDNSYBFc=; b=lapfWzR/0zSYPjqWhzFR85BymrXSAKBg6Y3cMv328XusBXsvA/exzPqWAc3MedCNi/ Lqgjvvc6CK97jQwbm1CAm+fm0wgUTvmH/MwYFTm4j6SwdE1FPAMBZgg1mei/oqXV9oeZ Gx1WY0zKAtlH+EVHVA4tDjh2I/yAe28PmlIJJ34590ktFLxXrQAeuLboot7Yoy3mFlr7 I66wpME7MiJo3VRtDC8XNiT3Ofp6Z7eisxFuWjzUEX11VkQfbJNNqf772lm/YeYM7SPZ g193IQCCUIfs6fYPqj7RJfMgH+NSIYg+1knjIx8BEG9DA6nsWVgHsgR1iXDD/JXKQka7 PhOw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@vmwopensource.org header.s=mail header.b=gYw3bH5a; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a63si211074pla.348.2019.06.12.10.17.17; Wed, 12 Jun 2019 10:17:31 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@vmwopensource.org header.s=mail header.b=gYw3bH5a; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2438870AbfFLM1P (ORCPT + 99 others); Wed, 12 Jun 2019 08:27:15 -0400 Received: from pio-pvt-msa1.bahnhof.se ([79.136.2.40]:58922 "EHLO pio-pvt-msa1.bahnhof.se" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2438842AbfFLM1O (ORCPT ); Wed, 12 Jun 2019 08:27:14 -0400 Received: from localhost (localhost [127.0.0.1]) by pio-pvt-msa1.bahnhof.se (Postfix) with ESMTP id 949F83F773; Wed, 12 Jun 2019 14:27:01 +0200 (CEST) Authentication-Results: pio-pvt-msa1.bahnhof.se; dkim=pass (1024-bit key; unprotected) header.d=vmwopensource.org header.i=@vmwopensource.org header.b=gYw3bH5a; dkim-atps=neutral X-Virus-Scanned: Debian amavisd-new at bahnhof.se X-Spam-Flag: NO X-Spam-Score: -3.099 X-Spam-Level: X-Spam-Status: No, score=-3.099 tagged_above=-999 required=6.31 tests=[ALL_TRUSTED=-1, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, URIBL_RED=0.001] autolearn=ham autolearn_force=no Received: from pio-pvt-msa1.bahnhof.se ([127.0.0.1]) by localhost (pio-pvt-msa1.bahnhof.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VQ4Rb09IF_rR; Wed, 12 Jun 2019 14:26:47 +0200 (CEST) Received: from mail1.shipmail.org (h-205-35.A357.priv.bahnhof.se [155.4.205.35]) (Authenticated sender: mb878879) by pio-pvt-msa1.bahnhof.se (Postfix) with ESMTPA id 0FA363F6C5; Wed, 12 Jun 2019 14:26:45 +0200 (CEST) Received: from localhost.localdomain (h-205-35.A357.priv.bahnhof.se [155.4.205.35]) by mail1.shipmail.org (Postfix) with ESMTPSA id 8059D3619A3; Wed, 12 Jun 2019 14:26:45 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=vmwopensource.org; s=mail; t=1560342405; bh=+5GVyhTYo8FM9oLssS/tZIxqS+/3NprZW4Ed/jkf8Xk=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=gYw3bH5a+sDW1srtGfX3oOPFaJmEUYHu9j4KFVytZjIqhibon/ppKhfZlZctk7Fvk PhtyC1kk+A7lKxnjH/LHqYqw4477BqdjT1VhKZxTyHp8At7IPmCVDZ2TCBVaLPYrAt K3B4Xpn4RuRjJN/syumfKiMEUG6emAiU6sRskopY= Subject: Re: [PATCH v5 2/9] mm: Add an apply_to_pfn_range interface To: Christoph Hellwig Cc: dri-devel@lists.freedesktop.org, linux-graphics-maintainer@vmware.com, pv-drivers@vmware.com, linux-kernel@vger.kernel.org, nadav.amit@gmail.com, Thomas Hellstrom , Andrew Morton , Matthew Wilcox , Will Deacon , Peter Zijlstra , Rik van Riel , Minchan Kim , Michal Hocko , Huang Ying , Souptick Joarder , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , linux-mm@kvack.org, Ralph Campbell References: <20190612064243.55340-1-thellstrom@vmwopensource.org> <20190612064243.55340-3-thellstrom@vmwopensource.org> <20190612121604.GB719@infradead.org> From: =?UTF-8?Q?Thomas_Hellstr=c3=b6m_=28VMware=29?= Organization: VMware Inc. Message-ID: <8f5a5b25-e21f-43f2-a4dd-a50debfd1287@vmwopensource.org> Date: Wed, 12 Jun 2019 14:26:45 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <20190612121604.GB719@infradead.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/12/19 2:16 PM, Christoph Hellwig wrote: > On Wed, Jun 12, 2019 at 08:42:36AM +0200, Thomas Hellström (VMware) wrote: >> From: Thomas Hellstrom >> >> This is basically apply_to_page_range with added functionality: >> Allocating missing parts of the page table becomes optional, which >> means that the function can be guaranteed not to error if allocation >> is disabled. Also passing of the closure struct and callback function >> becomes different and more in line with how things are done elsewhere. >> >> Finally we keep apply_to_page_range as a wrapper around apply_to_pfn_range >> >> The reason for not using the page-walk code is that we want to perform >> the page-walk on vmas pointing to an address space without requiring the >> mmap_sem to be held rather than on vmas belonging to a process with the >> mmap_sem held. >> >> Notable changes since RFC: >> Don't export apply_to_pfn range. >> >> Cc: Andrew Morton >> Cc: Matthew Wilcox >> Cc: Will Deacon >> Cc: Peter Zijlstra >> Cc: Rik van Riel >> Cc: Minchan Kim >> Cc: Michal Hocko >> Cc: Huang Ying >> Cc: Souptick Joarder >> Cc: "Jérôme Glisse" >> Cc: linux-mm@kvack.org >> Cc: linux-kernel@vger.kernel.org >> >> Signed-off-by: Thomas Hellstrom >> Reviewed-by: Ralph Campbell #v1 >> --- >> include/linux/mm.h | 10 ++++ >> mm/memory.c | 135 ++++++++++++++++++++++++++++++++++----------- >> 2 files changed, 113 insertions(+), 32 deletions(-) >> >> diff --git a/include/linux/mm.h b/include/linux/mm.h >> index 0e8834ac32b7..3d06ce2a64af 100644 >> --- a/include/linux/mm.h >> +++ b/include/linux/mm.h >> @@ -2675,6 +2675,16 @@ typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr, >> extern int apply_to_page_range(struct mm_struct *mm, unsigned long address, >> unsigned long size, pte_fn_t fn, void *data); >> >> +struct pfn_range_apply; >> +typedef int (*pter_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr, >> + struct pfn_range_apply *closure); >> +struct pfn_range_apply { >> + struct mm_struct *mm; >> + pter_fn_t ptefn; >> + unsigned int alloc; >> +}; >> +extern int apply_to_pfn_range(struct pfn_range_apply *closure, >> + unsigned long address, unsigned long size); >> >> #ifdef CONFIG_PAGE_POISONING >> extern bool page_poisoning_enabled(void); >> diff --git a/mm/memory.c b/mm/memory.c >> index 168f546af1ad..462aa47f8878 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -2032,18 +2032,17 @@ int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long >> } >> EXPORT_SYMBOL(vm_iomap_memory); >> >> -static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, >> - unsigned long addr, unsigned long end, >> - pte_fn_t fn, void *data) >> +static int apply_to_pte_range(struct pfn_range_apply *closure, pmd_t *pmd, >> + unsigned long addr, unsigned long end) >> { >> pte_t *pte; >> int err; >> pgtable_t token; >> spinlock_t *uninitialized_var(ptl); >> >> - pte = (mm == &init_mm) ? >> + pte = (closure->mm == &init_mm) ? >> pte_alloc_kernel(pmd, addr) : >> - pte_alloc_map_lock(mm, pmd, addr, &ptl); >> + pte_alloc_map_lock(closure->mm, pmd, addr, &ptl); >> if (!pte) >> return -ENOMEM; >> >> @@ -2054,86 +2053,109 @@ static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, >> token = pmd_pgtable(*pmd); >> >> do { >> - err = fn(pte++, token, addr, data); >> + err = closure->ptefn(pte++, token, addr, closure); >> if (err) >> break; >> } while (addr += PAGE_SIZE, addr != end); >> >> arch_leave_lazy_mmu_mode(); >> >> - if (mm != &init_mm) >> + if (closure->mm != &init_mm) >> pte_unmap_unlock(pte-1, ptl); >> return err; >> } >> >> -static int apply_to_pmd_range(struct mm_struct *mm, pud_t *pud, >> - unsigned long addr, unsigned long end, >> - pte_fn_t fn, void *data) >> +static int apply_to_pmd_range(struct pfn_range_apply *closure, pud_t *pud, >> + unsigned long addr, unsigned long end) >> { >> pmd_t *pmd; >> unsigned long next; >> - int err; >> + int err = 0; >> >> BUG_ON(pud_huge(*pud)); >> >> - pmd = pmd_alloc(mm, pud, addr); >> + pmd = pmd_alloc(closure->mm, pud, addr); >> if (!pmd) >> return -ENOMEM; >> + >> do { >> next = pmd_addr_end(addr, end); >> - err = apply_to_pte_range(mm, pmd, addr, next, fn, data); >> + if (!closure->alloc && pmd_none_or_clear_bad(pmd)) >> + continue; >> + err = apply_to_pte_range(closure, pmd, addr, next); >> if (err) >> break; >> } while (pmd++, addr = next, addr != end); >> return err; >> } >> >> -static int apply_to_pud_range(struct mm_struct *mm, p4d_t *p4d, >> - unsigned long addr, unsigned long end, >> - pte_fn_t fn, void *data) >> +static int apply_to_pud_range(struct pfn_range_apply *closure, p4d_t *p4d, >> + unsigned long addr, unsigned long end) >> { >> pud_t *pud; >> unsigned long next; >> - int err; >> + int err = 0; >> >> - pud = pud_alloc(mm, p4d, addr); >> + pud = pud_alloc(closure->mm, p4d, addr); >> if (!pud) >> return -ENOMEM; >> + >> do { >> next = pud_addr_end(addr, end); >> - err = apply_to_pmd_range(mm, pud, addr, next, fn, data); >> + if (!closure->alloc && pud_none_or_clear_bad(pud)) >> + continue; >> + err = apply_to_pmd_range(closure, pud, addr, next); >> if (err) >> break; >> } while (pud++, addr = next, addr != end); >> return err; >> } >> >> -static int apply_to_p4d_range(struct mm_struct *mm, pgd_t *pgd, >> - unsigned long addr, unsigned long end, >> - pte_fn_t fn, void *data) >> +static int apply_to_p4d_range(struct pfn_range_apply *closure, pgd_t *pgd, >> + unsigned long addr, unsigned long end) >> { >> p4d_t *p4d; >> unsigned long next; >> - int err; >> + int err = 0; >> >> - p4d = p4d_alloc(mm, pgd, addr); >> + p4d = p4d_alloc(closure->mm, pgd, addr); >> if (!p4d) >> return -ENOMEM; >> + >> do { >> next = p4d_addr_end(addr, end); >> - err = apply_to_pud_range(mm, p4d, addr, next, fn, data); >> + if (!closure->alloc && p4d_none_or_clear_bad(p4d)) >> + continue; >> + err = apply_to_pud_range(closure, p4d, addr, next); >> if (err) >> break; >> } while (p4d++, addr = next, addr != end); >> return err; >> } >> >> -/* >> - * Scan a region of virtual memory, filling in page tables as necessary >> - * and calling a provided function on each leaf page table. >> +/** >> + * apply_to_pfn_range - Scan a region of virtual memory, calling a provided >> + * function on each leaf page table entry >> + * @closure: Details about how to scan and what function to apply >> + * @addr: Start virtual address >> + * @size: Size of the region >> + * >> + * If @closure->alloc is set to 1, the function will fill in the page table >> + * as necessary. Otherwise it will skip non-present parts. >> + * Note: The caller must ensure that the range does not contain huge pages. >> + * The caller must also assure that the proper mmu_notifier functions are >> + * called before and after the call to apply_to_pfn_range. >> + * >> + * WARNING: Do not use this function unless you know exactly what you are >> + * doing. It is lacking support for huge pages and transparent huge pages. >> + * >> + * Return: Zero on success. If the provided function returns a non-zero status, >> + * the page table walk will terminate and that status will be returned. >> + * If @closure->alloc is set to 1, then this function may also return memory >> + * allocation errors arising from allocating page table memory. >> */ >> -int apply_to_page_range(struct mm_struct *mm, unsigned long addr, >> - unsigned long size, pte_fn_t fn, void *data) >> +int apply_to_pfn_range(struct pfn_range_apply *closure, >> + unsigned long addr, unsigned long size) >> { >> pgd_t *pgd; >> unsigned long next; >> @@ -2143,16 +2165,65 @@ int apply_to_page_range(struct mm_struct *mm, unsigned long addr, >> if (WARN_ON(addr >= end)) >> return -EINVAL; >> >> - pgd = pgd_offset(mm, addr); >> + pgd = pgd_offset(closure->mm, addr); >> do { >> next = pgd_addr_end(addr, end); >> - err = apply_to_p4d_range(mm, pgd, addr, next, fn, data); >> + if (!closure->alloc && pgd_none_or_clear_bad(pgd)) >> + continue; >> + err = apply_to_p4d_range(closure, pgd, addr, next); >> if (err) >> break; >> } while (pgd++, addr = next, addr != end); >> >> return err; >> } >> + >> +/** >> + * struct page_range_apply - Closure structure for apply_to_page_range() >> + * @pter: The base closure structure we derive from >> + * @fn: The leaf pte function to call >> + * @data: The leaf pte function closure >> + */ >> +struct page_range_apply { >> + struct pfn_range_apply pter; >> + pte_fn_t fn; >> + void *data; >> +}; >> + >> +/* >> + * Callback wrapper to enable use of apply_to_pfn_range for >> + * the apply_to_page_range interface >> + */ >> +static int apply_to_page_range_wrapper(pte_t *pte, pgtable_t token, >> + unsigned long addr, >> + struct pfn_range_apply *pter) >> +{ >> + struct page_range_apply *pra = >> + container_of(pter, typeof(*pra), pter); >> + >> + return pra->fn(pte, token, addr, pra->data); >> +} >> + >> +/* >> + * Scan a region of virtual memory, filling in page tables as necessary >> + * and calling a provided function on each leaf page table. >> + * >> + * WARNING: Do not use this function unless you know exactly what you are >> + * doing. It is lacking support for huge pages and transparent huge pages. >> + */ >> +int apply_to_page_range(struct mm_struct *mm, unsigned long addr, >> + unsigned long size, pte_fn_t fn, void *data) >> +{ >> + struct page_range_apply pra = { >> + .pter = {.mm = mm, >> + .alloc = 1, >> + .ptefn = apply_to_page_range_wrapper }, >> + .fn = fn, >> + .data = data >> + }; >> + >> + return apply_to_pfn_range(&pra.pter, addr, size); >> +} >> >> EXPORT_SYMBOL_GPL(apply_to_page_range); > Actually - did you look into converting our two hand full of > apply_to_page_range callers to your new scheme? It seems like that > might actually not be to bad and avoid various layers of wrappers. Yes, I had that in mind once this landed and got some serious testing. /Thomas