Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp107693imm; Thu, 11 Oct 2018 16:25:24 -0700 (PDT) X-Google-Smtp-Source: ACcGV62tp5aMjgv0Ck8nTqmVM8kKnRdLSaWv3jO1nZGE3EIYJEKxvzrJwyJfTGiLZPN+A/QPPVMZ X-Received: by 2002:a62:c252:: with SMTP id l79-v6mr3518780pfg.141.1539300324710; Thu, 11 Oct 2018 16:25:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539300324; cv=none; d=google.com; s=arc-20160816; b=uL6t/9gte2yfrXNK02FMu7KhcEb9ge6RHN73LiXBl8aOLc3GfJtMgpEYwaSt2N9L/v +gsKJchfx6OsKBO5QOLKt8ILYcrvbm5TOrlyeS7wIdaR8cY/6lya/LerTNM3MJLhmfFc LAmGrxoS8GQZoixrP460E1t0yFUHVQj+A2YcW8PdEPCAu6zpZbzrs30ypjfNn5AUWi1L fl0cQM2vRWkVu94Yjkx3xJglCFOWf/T/bLQRzQZVTj7Pi91GRcvKQ9WbYdUbaKtJmG2Q CxY3POdGV3Aw4b4wvZ6SU+dnXXz4QWukfG75TjU39qew8TxbXCfanZX6TTngZWpWyRR0 gPXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=GfWMLeTPrwZhqn4Jj+gUe61EvNxwuCNb9W1/NvTM3nE=; b=E8kZetteB/lU4VAOouqBx56gBbNM+dBE/i41OhqnK03Fh6vuqoK/tMGDj9TsC44LE7 xdV/BJT1LRv/bd8J2ipTdMndcd5AFN0dqi/mgDBMEmC1QA7GAS0jv9G31fNNT02uBoTQ ZkiWVTFkubF+eufGx2rf5/pea63Rsd6NJu/Wejrb5PqDTgTWVLWxNT+bwu/c3I0FEZxk DOk0oJgxnOhVsk1vgC7N7sGh+krBatWNlrWCOfGsPEzgB2ayyE+ecu+fEAxtIjvbcfuR WsvdyTnItyI50jz0/j37YjCPg4hJXwKXMjArB/UwS3V8uXx6aidrO46cTRUa/sQxheHq F/Jw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=o3IUKIgB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a23-v6si28637973pgg.372.2018.10.11.16.25.09; Thu, 11 Oct 2018 16:25:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=o3IUKIgB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726954AbeJLGxp (ORCPT + 99 others); Fri, 12 Oct 2018 02:53:45 -0400 Received: from mail-oi1-f196.google.com ([209.85.167.196]:38346 "EHLO mail-oi1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725850AbeJLGxp (ORCPT ); Fri, 12 Oct 2018 02:53:45 -0400 Received: by mail-oi1-f196.google.com with SMTP id u197-v6so8481598oif.5 for ; Thu, 11 Oct 2018 16:24:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=GfWMLeTPrwZhqn4Jj+gUe61EvNxwuCNb9W1/NvTM3nE=; b=o3IUKIgBABfBLUQOAxYzsKVmin/pnO/edwezAZ/WW4z+6wjx9lmlO3JD12QWzTQvy1 VPBiTtFaEzHBMFKCX7cv9xVojq4aaofsdRqdRpv7n2lBKL6a5TpQGIuYAVK64vOGZYYy 7CMRtE2VzDhcGCWGl1biH0nUcxl0Uibplf6o8+AOu69GgiS2q8PMrL8cjTXN2PpXcZw+ ++n8E4Y22JcR0GGV/VCkqiwY/6NuRusxRnybjO7GlWtn66ywikBcDKntv4t+Jmvw11MA Q5hJIuu5LXgbHxUpBgUDp76d41Iex+qcVbj1Os21nJ98cqXPU3OnLPcUiRPQUrBWP6yM /LqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=GfWMLeTPrwZhqn4Jj+gUe61EvNxwuCNb9W1/NvTM3nE=; b=k/qxU8IfBJCs0BZIaFmROlT5YLA0KnX5zs/YxSj5IYQkYgDoE5tA7r6Wp0I21putgF EY0xqeLiKNEY/d5dyNzyQaNftTUhoM+IxUQo1ylKGsup67aFhiaKBN+SxVyO7tHV6lqU daz+nT0DyejQH+svOcnJC1hVlZvbmHjj7Mj7dNpYm6o9/2gbLAEgcMJZUkF9PxBhnODy DhGQaF26hryILS13RK4Ngz9QIzJ2Pdl+I/gHR9/g75023LL7kPZ0/5RoYBYwf3p7j7MP gUamQuMrJxFRPLuY9igWR587tHSSkkbsqbS6elvnM8q1f2VccND/bplKAMsLTO0exiw3 emeA== X-Gm-Message-State: ABuFfogi0qbOWcInxKF8vxNVeBH4Fcb+1H9nRxlOe5bQoJUNmkCZuzU+ IwjxE4RqMgTB6p8DNdvRgGRvqzL0YgKXeGxGiIvTmA== X-Received: by 2002:aca:590b:: with SMTP id n11-v6mr1924716oib.232.1539300253254; Thu, 11 Oct 2018 16:24:13 -0700 (PDT) MIME-Version: 1.0 References: <20181011175542.13045-1-keith.busch@intel.com> In-Reply-To: <20181011175542.13045-1-keith.busch@intel.com> From: Dan Williams Date: Thu, 11 Oct 2018 16:24:02 -0700 Message-ID: Subject: Re: [PATCHv2] mm/gup: Cache dev_pagemap while pinning pages To: Keith Busch Cc: Linux MM , Linux Kernel Mailing List , "Kirill A. Shutemov" , Dave Hansen , Andrew Morton Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 11, 2018 at 11:00 AM Keith Busch wrote: > > Getting pages from ZONE_DEVICE memory needs to check the backing device's > live-ness, which is tracked in the device's dev_pagemap metadata. This > metadata is stored in a radix tree and looking it up adds measurable > software overhead. > > This patch avoids repeating this relatively costly operation when > dev_pagemap is used by caching the last dev_pagemap while getting user > pages. The gup_benchmark kernel self test reports this reduces time to > get user pages to as low as 1/3 of the previous time. > > Cc: Kirill Shutemov > Cc: Dave Hansen > Cc: Dan Williams > Signed-off-by: Keith Busch Other than the 2 comments below, this looks good to me: Reviewed-by: Dan Williams [..] > diff --git a/mm/gup.c b/mm/gup.c > index 1abc8b4afff6..d2700dff6f66 100644 > --- a/mm/gup.c > +++ b/mm/gup.c [..] > @@ -431,7 +430,22 @@ struct page *follow_page_mask(struct vm_area_struct *vma, > return no_page_table(vma, flags); > } > > - return follow_p4d_mask(vma, address, pgd, flags, page_mask); > + return follow_p4d_mask(vma, address, pgd, flags, ctx); > +} > + > +struct page *follow_page(struct vm_area_struct *vma, unsigned long address, > + unsigned int foll_flags) > +{ > + struct page *page; > + struct follow_page_context ctx = { > + .pgmap = NULL, > + .page_mask = 0, > + }; You don't need to init all members. It is defined that if you init at least one member then all non initialized members are set to zero, so you should be able to do " = { 0 }". > + > + page = follow_page_mask(vma, address, foll_flags, &ctx); > + if (ctx.pgmap) > + put_dev_pagemap(ctx.pgmap); > + return page; > } > > static int get_gate_page(struct mm_struct *mm, unsigned long address, > @@ -659,9 +673,9 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, > unsigned int gup_flags, struct page **pages, > struct vm_area_struct **vmas, int *nonblocking) > { > - long i = 0; > - unsigned int page_mask; > + long ret = 0, i = 0; > struct vm_area_struct *vma = NULL; > + struct follow_page_context ctx = {}; Does this have defined behavior? I would feel better with " = { 0 }" to be explicit. > > if (!nr_pages) > return 0; > @@ -691,12 +705,14 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, > pages ? &pages[i] : NULL); > if (ret) > return i ? : ret; > - page_mask = 0; > + ctx.page_mask = 0; > goto next_page; > } > > - if (!vma || check_vma_flags(vma, gup_flags)) > - return i ? : -EFAULT; > + if (!vma || check_vma_flags(vma, gup_flags)) { > + ret = -EFAULT; > + goto out; > + } > if (is_vm_hugetlb_page(vma)) { > i = follow_hugetlb_page(mm, vma, pages, vmas, > &start, &nr_pages, i, > @@ -709,23 +725,26 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, > * If we have a pending SIGKILL, don't keep faulting pages and > * potentially allocating memory. > */ > - if (unlikely(fatal_signal_pending(current))) > - return i ? i : -ERESTARTSYS; > + if (unlikely(fatal_signal_pending(current))) { > + ret = -ERESTARTSYS; > + goto out; > + } > cond_resched(); > - page = follow_page_mask(vma, start, foll_flags, &page_mask); > + > + page = follow_page_mask(vma, start, foll_flags, &ctx); > if (!page) { > - int ret; > ret = faultin_page(tsk, vma, start, &foll_flags, > nonblocking); > switch (ret) { > case 0: > goto retry; > + case -EBUSY: > + ret = 0; > + /* FALLTHRU */ > case -EFAULT: > case -ENOMEM: > case -EHWPOISON: > - return i ? i : ret; > - case -EBUSY: > - return i; > + goto out; > case -ENOENT: > goto next_page; > } > @@ -737,27 +756,31 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, > */ > goto next_page; > } else if (IS_ERR(page)) { > - return i ? i : PTR_ERR(page); > + ret = PTR_ERR(page); > + goto out; > } > if (pages) { > pages[i] = page; > flush_anon_page(vma, page, start); > flush_dcache_page(page); > - page_mask = 0; > + ctx.page_mask = 0; > } > next_page: > if (vmas) { > vmas[i] = vma; > - page_mask = 0; > + ctx.page_mask = 0; > } > - page_increm = 1 + (~(start >> PAGE_SHIFT) & page_mask); > + page_increm = 1 + (~(start >> PAGE_SHIFT) & ctx.page_mask); > if (page_increm > nr_pages) > page_increm = nr_pages; > i += page_increm; > start += page_increm * PAGE_SIZE; > nr_pages -= page_increm; > } while (nr_pages); > - return i; > +out: > + if (ctx.pgmap) > + put_dev_pagemap(ctx.pgmap); > + return i ? i : ret; > } > > static bool vma_permits_fault(struct vm_area_struct *vma, > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 533f9b00147d..d2b510fe5156 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -852,11 +852,10 @@ static void touch_pmd(struct vm_area_struct *vma, unsigned long addr, > } > > struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, > - pmd_t *pmd, int flags) > + pmd_t *pmd, int flags, struct dev_pagemap **pgmap) > { > unsigned long pfn = pmd_pfn(*pmd); > struct mm_struct *mm = vma->vm_mm; > - struct dev_pagemap *pgmap; > struct page *page; > > assert_spin_locked(pmd_lockptr(mm, pmd)); > @@ -886,12 +885,11 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, > return ERR_PTR(-EEXIST); > > pfn += (addr & ~PMD_MASK) >> PAGE_SHIFT; > - pgmap = get_dev_pagemap(pfn, NULL); > - if (!pgmap) > + *pgmap = get_dev_pagemap(pfn, *pgmap); > + if (!*pgmap) > return ERR_PTR(-EFAULT); > page = pfn_to_page(pfn); > get_page(page); > - put_dev_pagemap(pgmap); > > return page; > } > @@ -1000,11 +998,10 @@ static void touch_pud(struct vm_area_struct *vma, unsigned long addr, > } > > struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, > - pud_t *pud, int flags) > + pud_t *pud, int flags, struct dev_pagemap **pgmap) > { > unsigned long pfn = pud_pfn(*pud); > struct mm_struct *mm = vma->vm_mm; > - struct dev_pagemap *pgmap; > struct page *page; > > assert_spin_locked(pud_lockptr(mm, pud)); > @@ -1028,12 +1025,11 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, > return ERR_PTR(-EEXIST); > > pfn += (addr & ~PUD_MASK) >> PAGE_SHIFT; > - pgmap = get_dev_pagemap(pfn, NULL); > - if (!pgmap) > + *pgmap = get_dev_pagemap(pfn, *pgmap); > + if (!*pgmap) > return ERR_PTR(-EFAULT); > page = pfn_to_page(pfn); > get_page(page); > - put_dev_pagemap(pgmap); > > return page; > } > diff --git a/mm/nommu.c b/mm/nommu.c > index e4aac33216ae..749276beb109 100644 > --- a/mm/nommu.c > +++ b/mm/nommu.c > @@ -1709,11 +1709,9 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, > return ret; > } > > -struct page *follow_page_mask(struct vm_area_struct *vma, > - unsigned long address, unsigned int flags, > - unsigned int *page_mask) > +struct page *follow_page(struct vm_area_struct *vma, unsigned long address, > + unsigned int foll_flags) > { > - *page_mask = 0; > return NULL; > } > > -- > 2.14.4 >