Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1140693ybl; Fri, 13 Dec 2019 10:14:28 -0800 (PST) X-Google-Smtp-Source: APXvYqy/w2hW6F5wYR8zGZcxHXym6tfIxmXZo0WGba1YCIuwxsomjhSXj2Av9ibP0NRz+vjFmtUO X-Received: by 2002:a9d:75da:: with SMTP id c26mr16078247otl.40.1576260868060; Fri, 13 Dec 2019 10:14:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1576260868; cv=none; d=google.com; s=arc-20160816; b=Vh+rGVYU0YxWIgDSH7Mlfi+rmu9ydtiWkBYdR73Djcppe8+EOxQq0IZJCpZz2DWKcl mAZhZxsyiO+u3dIsc2umud3yBA4IYJhMr1arDER+nj83B9fv125NkQaTuxo28DSGa2TQ 3gXpnNWs226H/rgr7PMPxGDDJLQDyn2+VGfL5VlFtIgcYE/eN5UIQFZ1vpan0rd/2pOh lM9g+1gH/MHm7Si+wjByfTmTuVgHX+r8ITanuG3Id2fSZB2D6wb5tRXLdDpXUMVFCASu ieFgdWMdHm7yJT+Q7WELnlongyNSdmqqb3MMUqDjGCrio4o9HFeZnSkxtX+jGOZ+HMdc 5+vQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=w1cqQGOHLxt9WnlMRPGWtbihZqq/TPUJMIUsmupkqQc=; b=bGZ4Qr8/qQ/+L2RErqAkliTX3BN1ufgacF8h/c72Uwwqhb8FbTTBx4BW1cGLV3QlRV wD5ac46DUmHRoCXb/kNtO62TAHutX9Yw8oaJe5ZRZ5+7Z6ZieCz4XAYHn/BHrm38Fp3e ENw8hOEMRm5XI7fkZm5XU4jzLKTfKiY00b0Qq5ggnw7WosNFedd70EWY8JKB0umueY1z IsIY42d9bhx+FE4OGSe6AkTn3XiUdsnps9ig2v3pMXV4Vm5K0LbzSn60sNuR96p5Lh35 eL1zrzpWacX+lSFgSLry12t/wp4O8rKi94SVOGlws8eW2OgzGtODGXbYjJilKvY9YN0C 3zsw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=Z37IwFua; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o128si5436331oih.66.2019.12.13.10.14.16; Fri, 13 Dec 2019 10:14:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=Z37IwFua; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728728AbfLMSNZ (ORCPT + 99 others); Fri, 13 Dec 2019 13:13:25 -0500 Received: from mail-oi1-f193.google.com ([209.85.167.193]:40918 "EHLO mail-oi1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728693AbfLMSNY (ORCPT ); Fri, 13 Dec 2019 13:13:24 -0500 Received: by mail-oi1-f193.google.com with SMTP id 6so1574092oix.7 for ; Fri, 13 Dec 2019 10:13:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=w1cqQGOHLxt9WnlMRPGWtbihZqq/TPUJMIUsmupkqQc=; b=Z37IwFuamUtHj2UGycAzTlRIcOH/54EKCkDpaxP7Qp3MLa22kc/Usl9HnTrRe+y3QV jmlzBiBcgKWz0AkFOWkABdgtG4hYJTy1l4UIT6yWZWyPw9dRV6jw8VgbCyBLWLKiWGEX vM6Poow6sLwsq/ysxLVntPEkC62ExQjZKQkH6702/0HtxVqhTzpHmVb6lnPrLaCqpsEL zgjoMLNDG4MeH71UbAFhH4TqD4PJMpQDpjWWrZvQirUHWydeRNzmnynyQH4+Xkyz7ims L+1U3HsYxKKlN5qI/Ha+t6ww7rmUaJZ9hBlYoQDd0mpeknpPM1bHMMTuLHN+KJ75i+eW EFZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=w1cqQGOHLxt9WnlMRPGWtbihZqq/TPUJMIUsmupkqQc=; b=pI0xoRSRIdQ6C9PBnpUMOzVgyCatkR+5ew+Q3qEnWOSKtxi1WUPCT8uBICA42E68vj T4tSi9ZV60keHKYCbATLEJhY4YHYQyyZrhXSFPSP46MYoOU3b/x1IN1Pjq8qJf3lLQlF 4uwsVQbKeZq2GTXu00jJvFP/DrklsXcRopINO6EXkqtIq+RHw5Wugo6IjcGFHW588MKB GWAthTlDndhEkK6H1JZGe64R3LqakRMS+E2g9oK0qPXX+UWUT8opEdLut8of2FzfZ+R6 qvbpBkieZ+GYLnMQx/g9hCYw1L7BxdAcOTGnNL9GOMur4egfUhgUvzX9pLUjeRDm14LK Dwgg== X-Gm-Message-State: APjAAAUiLwYoW/oXxJJvuolNaJbZvguTecCXuDhe+C8qAhK7SyuufeJg OrCFTHg3KsUbhbOM2kuq03CbVvOoEjNPpVul5jFzAw== X-Received: by 2002:aca:4c9:: with SMTP id 192mr7971550oie.105.1576260803989; Fri, 13 Dec 2019 10:13:23 -0800 (PST) MIME-Version: 1.0 References: <20191212182238.46535-1-brho@google.com> <20191212182238.46535-2-brho@google.com> <20191213174702.GB31552@linux.intel.com> In-Reply-To: <20191213174702.GB31552@linux.intel.com> From: Dan Williams Date: Fri, 13 Dec 2019 10:13:13 -0800 Message-ID: Subject: Re: [PATCH v5 1/2] mm: make dev_pagemap_mapping_shift() externally visible To: Sean Christopherson Cc: Barret Rhoden , Paolo Bonzini , David Hildenbrand , Dave Jiang , Alexander Duyck , linux-nvdimm , X86 ML , KVM list , Linux Kernel Mailing List , "Zeng, Jason" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Dec 13, 2019 at 9:47 AM Sean Christopherson wrote: > > On Thu, Dec 12, 2019 at 01:22:37PM -0500, Barret Rhoden wrote: > > KVM has a use case for determining the size of a dax mapping. > > > > The KVM code has easy access to the address and the mm, and > > dev_pagemap_mapping_shift() needs only those parameters. It was > > deriving them from page and vma. This commit changes those parameters > > from (page, vma) to (address, mm). > > > > Signed-off-by: Barret Rhoden > > Reviewed-by: David Hildenbrand > > Acked-by: Dan Williams > > --- > > include/linux/mm.h | 3 +++ > > mm/memory-failure.c | 38 +++----------------------------------- > > mm/util.c | 34 ++++++++++++++++++++++++++++++++++ > > 3 files changed, 40 insertions(+), 35 deletions(-) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index a2adf95b3f9c..bfd1882dd5c6 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -1013,6 +1013,9 @@ static inline bool is_pci_p2pdma_page(const struct page *page) > > #define page_ref_zero_or_close_to_overflow(page) \ > > ((unsigned int) page_ref_count(page) + 127u <= 127u) > > > > +unsigned long dev_pagemap_mapping_shift(unsigned long address, > > + struct mm_struct *mm); > > + > > static inline void get_page(struct page *page) > > { > > page = compound_head(page); > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > > index 3151c87dff73..bafa464c8290 100644 > > --- a/mm/memory-failure.c > > +++ b/mm/memory-failure.c > > @@ -261,40 +261,6 @@ void shake_page(struct page *p, int access) > > } > > EXPORT_SYMBOL_GPL(shake_page); > > > > -static unsigned long dev_pagemap_mapping_shift(struct page *page, > > - struct vm_area_struct *vma) > > -{ > > - unsigned long address = vma_address(page, vma); > > - pgd_t *pgd; > > - p4d_t *p4d; > > - pud_t *pud; > > - pmd_t *pmd; > > - pte_t *pte; > > - > > - pgd = pgd_offset(vma->vm_mm, address); > > - if (!pgd_present(*pgd)) > > - return 0; > > - p4d = p4d_offset(pgd, address); > > - if (!p4d_present(*p4d)) > > - return 0; > > - pud = pud_offset(p4d, address); > > - if (!pud_present(*pud)) > > - return 0; > > - if (pud_devmap(*pud)) > > - return PUD_SHIFT; > > - pmd = pmd_offset(pud, address); > > - if (!pmd_present(*pmd)) > > - return 0; > > - if (pmd_devmap(*pmd)) > > - return PMD_SHIFT; > > - pte = pte_offset_map(pmd, address); > > - if (!pte_present(*pte)) > > - return 0; > > - if (pte_devmap(*pte)) > > - return PAGE_SHIFT; > > - return 0; > > -} > > - > > /* > > * Failure handling: if we can't find or can't kill a process there's > > * not much we can do. We just print a message and ignore otherwise. > > @@ -324,7 +290,9 @@ static void add_to_kill(struct task_struct *tsk, struct page *p, > > } > > tk->addr = page_address_in_vma(p, vma); > > if (is_zone_device_page(p)) > > - tk->size_shift = dev_pagemap_mapping_shift(p, vma); > > + tk->size_shift = > > + dev_pagemap_mapping_shift(vma_address(page, vma), > > + vma->vm_mm); > > else > > tk->size_shift = compound_order(compound_head(p)) + PAGE_SHIFT; > > > > diff --git a/mm/util.c b/mm/util.c > > index 3ad6db9a722e..59984e6b40ab 100644 > > --- a/mm/util.c > > +++ b/mm/util.c > > @@ -901,3 +901,37 @@ int memcmp_pages(struct page *page1, struct page *page2) > > kunmap_atomic(addr1); > > return ret; > > } > > + > > +unsigned long dev_pagemap_mapping_shift(unsigned long address, > > + struct mm_struct *mm) > > +{ > > + pgd_t *pgd; > > + p4d_t *p4d; > > + pud_t *pud; > > + pmd_t *pmd; > > + pte_t *pte; > > + > > + pgd = pgd_offset(mm, address); > > + if (!pgd_present(*pgd)) > > + return 0; > > + p4d = p4d_offset(pgd, address); > > + if (!p4d_present(*p4d)) > > + return 0; > > + pud = pud_offset(p4d, address); > > + if (!pud_present(*pud)) > > + return 0; > > + if (pud_devmap(*pud)) > > + return PUD_SHIFT; > > + pmd = pmd_offset(pud, address); > > + if (!pmd_present(*pmd)) > > + return 0; > > + if (pmd_devmap(*pmd)) > > + return PMD_SHIFT; > > + pte = pte_offset_map(pmd, address); > > + if (!pte_present(*pte)) > > + return 0; > > + if (pte_devmap(*pte)) > > + return PAGE_SHIFT; > > + return 0; > > +} > > +EXPORT_SYMBOL_GPL(dev_pagemap_mapping_shift); > > This is basically a rehash of lookup_address_in_pgd(), and doesn't provide > exactly what KVM needs. E.g. KVM works with levels instead of shifts, and > it would be nice to provide the pte so that KVM can sanity check that the > pfn from this walk matches the pfn it plans on mapping. > > Instead of exporting dev_pagemap_mapping_shift(), what about relacing it > with a patch to introduce lookup_address_mm() and export that? > > dev_pagemap_mapping_shift() could then wrap the new helper (if you want), > and KVM could do lookup_address_mm() for querying the size of ZONE_DEVICE > pages. All of the above sounds great to me. Should have looked that much harder when implementing dev_pagemap_mapping_shift() originally.