Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp702401ybl; Thu, 12 Dec 2019 03:34:52 -0800 (PST) X-Google-Smtp-Source: APXvYqzzhA9RgIvQOf+Hy+exn63oU10pgxe8R0Z8fqt9+sW5N2fkJw4fExJq/kOmkS1FlYLeALH+ X-Received: by 2002:a05:6830:1116:: with SMTP id w22mr7829845otq.63.1576150492518; Thu, 12 Dec 2019 03:34:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1576150492; cv=none; d=google.com; s=arc-20160816; b=YTNZYFyqUsseh7+JvZIkbRycUQi60SPtlQ2OAQEL6PuF+/sYPJ3Sx5juOuSBB2JjxE uXtCNSutagtUnEeOYyORAVZ/8iXwamjtqScm7FmW9NMaV0A6WtfQhCGmgdOo1Z/6wf+H y9QcS48PiMfXXt0KH4lYpjUqfVO+kaYUepsFDQopTHC1RY6mFG2WpE2iJZ02icgohERj nt2PpoPGv5ww+G+pOhlXpVPNxp61zE5DvyImvFVic7mBd6krY0UcXZbjT9A1GnIFYMfB 2KYzZpo37QeD2lhOdZ3++ZLDLvFL3tElIPP3to5RZO3KwLzIF+f8ji5qmEZABhjfyS7I C4vg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:organization:references:cc:to:from:subject :dkim-signature; bh=q1O/edVIf2otD2oAV3Vj2dshxJ/3hIYyKuJ6NQefBys=; b=uxdByt0wenAmMwbslzWA8er+amHh1N+detgSIX/DrmEwFWK/aiHrjvqTHoAN/Gwl5d gdlwF9JEvaQ1JN5vKPwjdC9Kego7Hme0yYHKJJ6lITZQaF6/wFEockPjoiJCcNvPwCKZ 9yqb5SAPrLKkZ9J/t0TBnk0NQGB2yliw8HH/Qs3wNgt9QHJifVifTKJ0ttlBq9lE89pX pEJDqbwtwX61w5ykO7CVplIVkW9fhwha8yINubv0Uape2kSKC5Moi5kLM/gsFIqg/4Bn qBh5ipX8sK1HDnjtlWSV/5NtaCnygNc17PyFw6IbVpO46ePLd1NMK+gf1rn0zIoCZC9X 2IsA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail (test mode) header.i=@shipmail.org header.s=mail header.b=KNdoCX78; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j7si2896775otp.323.2019.12.12.03.34.37; Thu, 12 Dec 2019 03:34:52 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail (test mode) header.i=@shipmail.org header.s=mail header.b=KNdoCX78; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729005AbfLLLdu (ORCPT + 99 others); Thu, 12 Dec 2019 06:33:50 -0500 Received: from pio-pvt-msa2.bahnhof.se ([79.136.2.41]:47276 "EHLO pio-pvt-msa2.bahnhof.se" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728920AbfLLLdt (ORCPT ); Thu, 12 Dec 2019 06:33:49 -0500 Received: from localhost (localhost [127.0.0.1]) by pio-pvt-msa2.bahnhof.se (Postfix) with ESMTP id 1F5F43F26D; Thu, 12 Dec 2019 12:33:47 +0100 (CET) Authentication-Results: pio-pvt-msa2.bahnhof.se; dkim=pass (1024-bit key; unprotected) header.d=shipmail.org header.i=@shipmail.org header.b=KNdoCX78; dkim-atps=neutral X-Virus-Scanned: Debian amavisd-new at bahnhof.se X-Spam-Flag: NO X-Spam-Score: -2.099 X-Spam-Level: X-Spam-Status: No, score=-2.099 tagged_above=-999 required=6.31 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no Received: from pio-pvt-msa2.bahnhof.se ([127.0.0.1]) by localhost (pio-pvt-msa2.bahnhof.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lVmIvZW3qp8P; Thu, 12 Dec 2019 12:33:46 +0100 (CET) Received: from mail1.shipmail.org (h-205-35.A357.priv.bahnhof.se [155.4.205.35]) (Authenticated sender: mb878879) by pio-pvt-msa2.bahnhof.se (Postfix) with ESMTPA id BBCC13F260; Thu, 12 Dec 2019 12:33:42 +0100 (CET) Received: from localhost.localdomain (h-205-35.A357.priv.bahnhof.se [155.4.205.35]) by mail1.shipmail.org (Postfix) with ESMTPSA id E08613621B7; Thu, 12 Dec 2019 12:33:41 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=shipmail.org; s=mail; t=1576150422; bh=j+eZpqHQB3Fw6vN5gPAYjZGzDln9/TW6yi+1c53Ud3s=; h=Subject:From:To:Cc:References:Date:In-Reply-To:From; b=KNdoCX78D1HbeR+xToPzg2mJbPz6wjJPjDEQEQ52UZaJwdyDy3+Z0KvmHWDn3zLgX wQL2+nf0yms8GAk3wQ9STIOAjhr2665XcT+S6WoGFE+vpqB/qUYM4zJtemeV6M9mG1 UoMNK4wLqkbbpG8BBjfClh+E+jwqtrhyQt3/BKrM= Subject: Re: [PATCH v16 11/25] mm: pagewalk: Add p4d_entry() and pgd_entry() From: =?UTF-8?Q?Thomas_Hellstr=c3=b6m_=28VMware=29?= To: Steven Price Cc: Andy Lutomirski , Ard Biesheuvel , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Dave Hansen , Ingo Molnar , James Morse , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Peter Zijlstra , Thomas Gleixner , Will Deacon , x86@kernel.org, "H. Peter Anvin" , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Mark Rutland , "Liang, Kan" , Zong Li , Andrew Morton , "linux-mm@kvack.org" References: <20191206135316.47703-1-steven.price@arm.com> <20191206135316.47703-12-steven.price@arm.com> <13280f9e-6f03-e1fd-659a-31462ba185b0@shipmail.org> Organization: VMware Inc. Message-ID: <7fd20e9f-822a-897d-218e-bddf135fd33d@shipmail.org> Date: Thu, 12 Dec 2019 12:33:41 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <13280f9e-6f03-e1fd-659a-31462ba185b0@shipmail.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/12/19 12:23 PM, Thomas Hellström (VMware) wrote: > On 12/6/19 2:53 PM, Steven Price wrote: >> pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410 >> ("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were >> no users. We're about to add users so reintroduce them, along with >> p4d_entry() as we now have 5 levels of tables. >> >> Note that commit a00cc7d9dd93d66a ("mm, x86: add support for >> PUD-sized transparent hugepages") already re-added pud_entry() but with >> different semantics to the other callbacks. Since there have never >> been upstream users of this, revert the semantics back to match the >> other callbacks. This means pud_entry() is called for all entries, not >> just transparent huge pages. > > Actually, there are two users of pud_entry(), in hmm.c and since > 5.5rc1 also mapping_dirty_helpers.c. The latter one is unproblematic > and requires no attention but the one in hmm.c is probably largely > untested, and seems to assume it was called outside of the spinlock. > > The problem with the current patch is that the hmm pud_entry will > traverse also pmds, so that will be done twice now. > > In another thread we were discussing a means of rerunning the level > (in case of a race), or continuing after a level, based on the return > value after the callback. The change was fairly invasive, > Hmm. Forgot to remove the above text that appears twice. :(. The correct one is inline below. > >> Tested-by: Zong Li >> Signed-off-by: Steven Price >> --- >>   include/linux/pagewalk.h | 19 +++++++++++++------ >>   mm/pagewalk.c            | 27 ++++++++++++++++----------- >>   2 files changed, 29 insertions(+), 17 deletions(-) >> >> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h >> index 6ec82e92c87f..06790f23957f 100644 >> --- a/include/linux/pagewalk.h >> +++ b/include/linux/pagewalk.h >> @@ -8,15 +8,15 @@ struct mm_walk; >>     /** >>    * mm_walk_ops - callbacks for walk_page_range >> - * @pud_entry:        if set, called for each non-empty PUD >> (2nd-level) entry >> - *            this handler should only handle pud_trans_huge() puds. >> - *            the pmd_entry or pte_entry callbacks will be used for >> - *            regular PUDs. >> - * @pmd_entry:        if set, called for each non-empty PMD >> (3rd-level) entry >> + * @pgd_entry:        if set, called for each non-empty PGD >> (top-level) entry >> + * @p4d_entry:        if set, called for each non-empty P4D entry >> + * @pud_entry:        if set, called for each non-empty PUD entry >> + * @pmd_entry:        if set, called for each non-empty PMD entry >>    *            this handler is required to be able to handle >>    *            pmd_trans_huge() pmds.  They may simply choose to >>    *            split_huge_page() instead of handling it explicitly. >> - * @pte_entry:        if set, called for each non-empty PTE >> (4th-level) entry >> + * @pte_entry:        if set, called for each non-empty PTE >> (lowest-level) >> + *            entry >>    * @pte_hole:        if set, called for each hole at all levels >>    * @hugetlb_entry:    if set, called for each hugetlb entry >>    * @test_walk:        caller specific callback function to >> determine whether >> @@ -27,8 +27,15 @@ struct mm_walk; >>    * @pre_vma:            if set, called before starting walk on a >> non-null vma. >>    * @post_vma:           if set, called after a walk on a non-null >> vma, provided >>    *                      that @pre_vma and the vma walk succeeded. >> + * >> + * p?d_entry callbacks are called even if those levels are folded on a >> + * particular architecture/configuration. >>    */ >>   struct mm_walk_ops { >> +    int (*pgd_entry)(pgd_t *pgd, unsigned long addr, >> +             unsigned long next, struct mm_walk *walk); >> +    int (*p4d_entry)(p4d_t *p4d, unsigned long addr, >> +             unsigned long next, struct mm_walk *walk); >>       int (*pud_entry)(pud_t *pud, unsigned long addr, >>                unsigned long next, struct mm_walk *walk); >>       int (*pmd_entry)(pmd_t *pmd, unsigned long addr, >> diff --git a/mm/pagewalk.c b/mm/pagewalk.c >> index ea0b9e606ad1..c089786e7a7f 100644 >> --- a/mm/pagewalk.c >> +++ b/mm/pagewalk.c >> @@ -94,15 +94,9 @@ static int walk_pud_range(p4d_t *p4d, unsigned >> long addr, unsigned long end, >>           } >>             if (ops->pud_entry) { >> -            spinlock_t *ptl = pud_trans_huge_lock(pud, walk->vma); >> - >> -            if (ptl) { >> -                err = ops->pud_entry(pud, addr, next, walk); >> -                spin_unlock(ptl); >> -                if (err) >> -                    break; >> -                continue; >> -            } >> +            err = ops->pud_entry(pud, addr, next, walk); >> +            if (err) >> +                break; > > Actually, there are two current users of pud_entry(), in hmm.c and > since 5.5rc1 also mapping_dirty_helpers.c. The latter one is > unproblematic and requires no attention but the one in hmm.c is > probably largely untested, and seems to assume it was called outside > of the spinlock. > > The problem with the current patch is that the hmm pud_entry will > traverse also pmds, so that will now be done twice. > > /Thomas >