Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp3635369ybg; Fri, 25 Oct 2019 06:55:41 -0700 (PDT) X-Google-Smtp-Source: APXvYqwWyqiu5VTLvZdGEypjhXi8x3av3nn9+bDlTk8863oKGoC+pCPiJnhCGtvEMyuyuuBKVIPW X-Received: by 2002:a17:906:5502:: with SMTP id r2mr3729780ejp.3.1572011741270; Fri, 25 Oct 2019 06:55:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572011741; cv=none; d=google.com; s=arc-20160816; b=hbKQFozXlFmdiUfz3+3GIa5BMTF0dOsZuoKdZbSQfCvJmhhSSbQuAbB7fzv3ZTThIn 13eq15aIeqA0QVjgaOIPr/LWOJ3ufrkp6BKS7w1+f36e8ih/iCx/j3wCSCdNVnO2E/El XL3Qb16Zg69mtp2SKgW1XXKZ7hh5Qr8fyedo/t42dqART2EiSfuVGlSKVFTEJ18ohrdn 5suc1CT7m4ZFd3TVJWOHvX4+OvDezssD/GmvRATzl+kudv9VC1cucKDsNMjzylWJq/Vq rL53hbj/MzzmlYNSP6qJP8BQpAJ+l13bWQB+C+QJ5KhBEm6fr6nOPeZ+1se63xcLkpNA fLyQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=FOM7ndVHXZYwpn+oC9FBYu9U1A1waA78aHf4RJR9TG4=; b=E18ztx4KJx77wu47YKD+hmM/LvTAm2EDMO3BWO7o/PENgNZXXz0qfBE2AHa3+MS4Bo DIjh9UDI634hJgqyMrYAdj87efnJEREDV0ZapePWD/o/C405qJXp1Qffay4AVXvGNkhV iZYOyZ6Q4Q/bh9qhb9LEYKMm5XJ2DJQLgAvFc0opBdqNVriPd3xCjtZgUjCA43fYhsKE YN0rWtJAcLN1J+FzhM4ze2dX0m2InAhSm5dV/reigdUNJa0gU9Ys9gmePcVuyu5gd2BU 4t6NTPrB+QgzcGTdG+KzQCOBEJv+zO+w4HT2CS0gEcn2qA9tw2DzFVgKSihYappT4KA9 8O3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=tUrpbdi8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c24si1466392edb.328.2019.10.25.06.55.17; Fri, 25 Oct 2019 06:55:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=tUrpbdi8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2502342AbfJXNHh (ORCPT + 99 others); Thu, 24 Oct 2019 09:07:37 -0400 Received: from mail-lj1-f193.google.com ([209.85.208.193]:32937 "EHLO mail-lj1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2502315AbfJXNHg (ORCPT ); Thu, 24 Oct 2019 09:07:36 -0400 Received: by mail-lj1-f193.google.com with SMTP id a22so24962117ljd.0 for ; Thu, 24 Oct 2019 06:07:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=FOM7ndVHXZYwpn+oC9FBYu9U1A1waA78aHf4RJR9TG4=; b=tUrpbdi8Eq04JfJSCU2ivYJh7UkVx4nqIXjqa8J9drpcNlEVXcNo4a9YwE/JAmjIyu TRDJxWh5NoPsRPbYJHYEXSJmYcAvR1763uPXPh7WDgnuz8axippQY0aZ5jphZUvKJOUq ezEU+BUHXhPAWfutE/aUnkq1vAUfYqjsQy1OzzXrf9Ah2vlUA1EzIyvQIJcjMTBPEuM+ GcE2jhkR1lGAVhjvtZD4gE5xQ5WHb30S1lFUeq30zGi8KzCyo95piwJ590DLLGjrn+v9 FxaPlAT75rgyIg4RKDfXjskQSEOiA1ihxXAVaXK0TjbMhSVvAT4EzHb0mCAYRYcN9/Pk +VVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=FOM7ndVHXZYwpn+oC9FBYu9U1A1waA78aHf4RJR9TG4=; b=fy+bcAUvLvA2bgPKXJJVDEPfGjd1muHvfzoFaq4c50Rtk1i85bkt/BkTTw/LoZsnTa gH1badlvdvsYOAj6JLg8qIqnDlmDH3rlQ+OfcMgqFhu2+13Sg2PS7hmvQ8yxUq8lDk7I NOyAEa3rhSi3GvtmiA5XTe627eIHIxWkJJ6fHQEPPe/2E7pWFASGPVBPRgFARdHkhoH7 oolpemesZfA/nQ8W67CYVcWV2HjMCsnCeRujlOVTqkNkeDrJhzOzTk+x1l1ALLgn/M70 Gh+VKXgZrZQv5JS2vrrDeZgRMco8LqDB3N7UF1qCzLP0jnH1s5jDoQgSn1Rebwe+bk7N YzKA== X-Gm-Message-State: APjAAAWn7hNvRSV2mLfdNQkRmPAM6VeosSlpF597mta4Pl+EOj17S1i5 N3kB+b9OKskpu8dzWKKi809PV/UsrWoXgXquoiM= X-Received: by 2002:a2e:3617:: with SMTP id d23mr4728822lja.169.1571922452313; Thu, 24 Oct 2019 06:07:32 -0700 (PDT) MIME-Version: 1.0 References: <20191018101248.33727-1-steven.price@arm.com> <20191018101248.33727-15-steven.price@arm.com> In-Reply-To: <20191018101248.33727-15-steven.price@arm.com> From: Zong Li Date: Thu, 24 Oct 2019 21:07:20 +0800 Message-ID: Subject: Re: [PATCH v12 14/22] mm: pagewalk: Add 'depth' parameter to pte_hole To: Steven Price Cc: linux-mm@kvack.org, Andy Lutomirski , Ard Biesheuvel , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Dave Hansen , Ingo Molnar , James Morse , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Peter Zijlstra , Thomas Gleixner , Will Deacon , x86@kernel.org, "H. Peter Anvin" , linux-arm-kernel@lists.infradead.org, Linux Kernel Mailing List , Mark Rutland , "Liang, Kan" , Andrew Morton Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Steven Price =E6=96=BC 2019=E5=B9=B410=E6=9C=8819=E6= =97=A5 =E9=80=B1=E5=85=AD =E4=B8=8B=E5=8D=884:13=E5=AF=AB=E9=81=93=EF=BC=9A > > The pte_hole() callback is called at multiple levels of the page tables. > Code dumping the kernel page tables needs to know what at what depth > the missing entry is. Add this is an extra parameter to pte_hole(). > When the depth isn't know (e.g. processing a vma) then -1 is passed. > > The depth that is reported is the actual level where the entry is > missing (ignoring any folding that is in place), i.e. any levels where > PTRS_PER_P?D is set to 1 are ignored. > > Note that depth starts at 0 for a PGD so that PUD/PMD/PTE retain their > natural numbers as levels 2/3/4. > > Signed-off-by: Steven Price > --- > fs/proc/task_mmu.c | 4 ++-- > include/linux/pagewalk.h | 7 +++++-- > mm/hmm.c | 8 ++++---- > mm/migrate.c | 5 +++-- > mm/mincore.c | 1 + > mm/pagewalk.c | 31 +++++++++++++++++++++++++------ > 6 files changed, 40 insertions(+), 16 deletions(-) > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 9442631fd4af..3ba9ae83bff5 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -505,7 +505,7 @@ static void smaps_account(struct mem_size_stats *mss,= struct page *page, > > #ifdef CONFIG_SHMEM > static int smaps_pte_hole(unsigned long addr, unsigned long end, > - struct mm_walk *walk) > + __always_unused int depth, struct mm_walk *walk= ) > { > struct mem_size_stats *mss =3D walk->private; > > @@ -1282,7 +1282,7 @@ static int add_to_pagemap(unsigned long addr, pagem= ap_entry_t *pme, > } > > static int pagemap_pte_hole(unsigned long start, unsigned long end, > - struct mm_walk *walk) > + __always_unused int depth, struct mm_walk *wa= lk) > { > struct pagemapread *pm =3D walk->private; > unsigned long addr =3D start; > diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h > index df424197a25a..90466d60f87a 100644 > --- a/include/linux/pagewalk.h > +++ b/include/linux/pagewalk.h > @@ -17,7 +17,10 @@ struct mm_walk; > * split_huge_page() instead of handling it explicit= ly. > * @pte_entry: if set, called for each non-empty PTE (lowest-lev= el) > * entry > - * @pte_hole: if set, called for each hole at all levels > + * @pte_hole: if set, called for each hole at all levels, > + * depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:= PMD > + * 4:PTE. Any folded depths (where PTRS_PER_P?D is e= qual > + * to 1) are skipped. > * @hugetlb_entry: if set, called for each hugetlb entry > * @test_walk: caller specific callback function to determine wh= ether > * we walk over the current vma or not. Returning 0 = means > @@ -45,7 +48,7 @@ struct mm_walk_ops { > int (*pte_entry)(pte_t *pte, unsigned long addr, > unsigned long next, struct mm_walk *walk); > int (*pte_hole)(unsigned long addr, unsigned long next, > - struct mm_walk *walk); > + int depth, struct mm_walk *walk); > int (*hugetlb_entry)(pte_t *pte, unsigned long hmask, > unsigned long addr, unsigned long next, > struct mm_walk *walk); > diff --git a/mm/hmm.c b/mm/hmm.c > index 902f5fa6bf93..df3d531c8f2d 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -376,7 +376,7 @@ static void hmm_range_need_fault(const struct hmm_vma= _walk *hmm_vma_walk, > } > > static int hmm_vma_walk_hole(unsigned long addr, unsigned long end, > - struct mm_walk *walk) > + __always_unused int depth, struct mm_walk *w= alk) > { > struct hmm_vma_walk *hmm_vma_walk =3D walk->private; > struct hmm_range *range =3D hmm_vma_walk->range; > @@ -564,7 +564,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, > again: > pmd =3D READ_ONCE(*pmdp); > if (pmd_none(pmd)) > - return hmm_vma_walk_hole(start, end, walk); > + return hmm_vma_walk_hole(start, end, -1, walk); > > if (thp_migration_supported() && is_pmd_migration_entry(pmd)) { > bool fault, write_fault; > @@ -666,7 +666,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned lon= g start, unsigned long end, > again: > pud =3D READ_ONCE(*pudp); > if (pud_none(pud)) > - return hmm_vma_walk_hole(start, end, walk); > + return hmm_vma_walk_hole(start, end, -1, walk); > > if (pud_huge(pud) && pud_devmap(pud)) { > unsigned long i, npages, pfn; > @@ -674,7 +674,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned lon= g start, unsigned long end, > bool fault, write_fault; > > if (!pud_present(pud)) > - return hmm_vma_walk_hole(start, end, walk); > + return hmm_vma_walk_hole(start, end, -1, walk); > > i =3D (addr - range->start) >> PAGE_SHIFT; > npages =3D (end - addr) >> PAGE_SHIFT; > diff --git a/mm/migrate.c b/mm/migrate.c > index 4fe45d1428c8..435258df9a36 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -2123,6 +2123,7 @@ int migrate_misplaced_transhuge_page(struct mm_stru= ct *mm, > #ifdef CONFIG_DEVICE_PRIVATE > static int migrate_vma_collect_hole(unsigned long start, > unsigned long end, > + __always_unused int depth, > struct mm_walk *walk) > { > struct migrate_vma *migrate =3D walk->private; > @@ -2167,7 +2168,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > > again: > if (pmd_none(*pmdp)) > - return migrate_vma_collect_hole(start, end, walk); > + return migrate_vma_collect_hole(start, end, -1, walk); > > if (pmd_trans_huge(*pmdp)) { > struct page *page; > @@ -2200,7 +2201,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > return migrate_vma_collect_skip(start, en= d, > walk); > if (pmd_none(*pmdp)) > - return migrate_vma_collect_hole(start, en= d, > + return migrate_vma_collect_hole(start, en= d, -1, > walk); > } > } > diff --git a/mm/mincore.c b/mm/mincore.c > index 49b6fa2f6aa1..0e6dd9948f1a 100644 > --- a/mm/mincore.c > +++ b/mm/mincore.c > @@ -112,6 +112,7 @@ static int __mincore_unmapped_range(unsigned long add= r, unsigned long end, > } > > static int mincore_unmapped_range(unsigned long addr, unsigned long end, > + __always_unused int depth, > struct mm_walk *walk) > { > walk->private +=3D __mincore_unmapped_range(addr, end, > diff --git a/mm/pagewalk.c b/mm/pagewalk.c > index 43acffefd43f..b67400dc1def 100644 > --- a/mm/pagewalk.c > +++ b/mm/pagewalk.c > @@ -4,6 +4,22 @@ > #include > #include > > +/* > + * We want to know the real level where a entry is located ignoring any > + * folding of levels which may be happening. For example if p4d is folde= d then > + * a missing entry found at level 1 (p4d) is actually at level 0 (pgd). > + */ > +static int real_depth(int depth) > +{ > + if (depth =3D=3D 3 && PTRS_PER_PMD =3D=3D 1) > + depth =3D 2; > + if (depth =3D=3D 2 && PTRS_PER_PUD =3D=3D 1) > + depth =3D 1; > + if (depth =3D=3D 1 && PTRS_PER_P4D =3D=3D 1) > + depth =3D 0; > + return depth; > +} > + > static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long = end, > struct mm_walk *walk) > { > @@ -33,6 +49,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long add= r, unsigned long end, > unsigned long next; > const struct mm_walk_ops *ops =3D walk->ops; > int err =3D 0; > + int depth =3D real_depth(3); > > if (ops->test_pmd) { > err =3D ops->test_pmd(addr, end, pmd_offset(pud, 0UL), wa= lk); > @@ -48,7 +65,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long add= r, unsigned long end, > next =3D pmd_addr_end(addr, end); > if (pmd_none(*pmd)) { > if (ops->pte_hole) > - err =3D ops->pte_hole(addr, next, walk); > + err =3D ops->pte_hole(addr, next, depth, = walk); > if (err) > break; > continue; > @@ -92,6 +109,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long ad= dr, unsigned long end, > unsigned long next; > const struct mm_walk_ops *ops =3D walk->ops; > int err =3D 0; > + int depth =3D real_depth(2); > > if (ops->test_pud) { > err =3D ops->test_pud(addr, end, pud_offset(p4d, 0UL), wa= lk); > @@ -107,7 +125,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long a= ddr, unsigned long end, > next =3D pud_addr_end(addr, end); > if (pud_none(*pud)) { > if (ops->pte_hole) > - err =3D ops->pte_hole(addr, next, walk); > + err =3D ops->pte_hole(addr, next, depth, = walk); > if (err) > break; > continue; > @@ -143,6 +161,7 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long a= ddr, unsigned long end, > unsigned long next; > const struct mm_walk_ops *ops =3D walk->ops; > int err =3D 0; > + int depth =3D real_depth(1); > > if (ops->test_p4d) { > err =3D ops->test_p4d(addr, end, p4d_offset(pgd, 0UL), wa= lk); > @@ -157,7 +176,7 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long a= ddr, unsigned long end, > next =3D p4d_addr_end(addr, end); > if (p4d_none_or_clear_bad(p4d)) { > if (ops->pte_hole) > - err =3D ops->pte_hole(addr, next, walk); > + err =3D ops->pte_hole(addr, next, depth, = walk); > if (err) > break; > continue; > @@ -189,7 +208,7 @@ static int walk_pgd_range(unsigned long addr, unsigne= d long end, > next =3D pgd_addr_end(addr, end); > if (pgd_none_or_clear_bad(pgd)) { > if (ops->pte_hole) > - err =3D ops->pte_hole(addr, next, walk); > + err =3D ops->pte_hole(addr, next, 0, walk= ); > if (err) > break; > continue; > @@ -236,7 +255,7 @@ static int walk_hugetlb_range(unsigned long addr, uns= igned long end, > if (pte) > err =3D ops->hugetlb_entry(pte, hmask, addr, next= , walk); > else if (ops->pte_hole) > - err =3D ops->pte_hole(addr, next, walk); > + err =3D ops->pte_hole(addr, next, -1, walk); > > if (err) > break; > @@ -280,7 +299,7 @@ static int walk_page_test(unsigned long start, unsign= ed long end, > if (vma->vm_flags & VM_PFNMAP) { > int err =3D 1; > if (ops->pte_hole) > - err =3D ops->pte_hole(start, end, walk); > + err =3D ops->pte_hole(start, end, -1, walk); > return err ? err : 1; > } > return 0; > -- > 2.20.1 > It's good to me. Tested-by: Zong Li