Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp37917432rwd; Tue, 11 Jul 2023 23:51:25 -0700 (PDT) X-Google-Smtp-Source: APBJJlG3BoTlubC68YD8P2fiEuXQHGfl0IFKkBFszH2y6Nov4EOdp0xOkiJFyIAUIn84MSDPsoF8 X-Received: by 2002:a17:90b:283:b0:263:45c3:b17c with SMTP id az3-20020a17090b028300b0026345c3b17cmr18240692pjb.14.1689144684773; Tue, 11 Jul 2023 23:51:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689144684; cv=none; d=google.com; s=arc-20160816; b=yEDnPMjT9J1wUPWWHS8qJJm48ovhAIBI/lWF9umZlwHSOl7gfWlx/8HX3NEORL+sFl DFwOv6oTgZ5retJhBfMQ8fzsgIwPs5OfT9q6nxRgRFB+wYflCtlMWe+3zst0pfWT3Al3 Q3KAXoymR9w4VIm5DxH+mKwrSv7n58CH8rMoC3HymUYGnaC9qadFD242a2YzcCJD1u0Q Fz9jIsH/4cB2uyBaQ03inOLCPd52E09AGkE3Yx4PuDnjhvvS7CwphuMdkRe8MUvK731p P3dLaeX8VJ/FHtLd7aRxphmsfYjjNws0J2uuLqIqx9oXVoh2BMTsH2RlHa02CqmhL/ED Qkng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=zZpxFys7nJdvmBPjIJ49j1uiao8v048XzoTDU+vGGrY=; fh=BprBv6rOoAonkI42LFZZCe0KRTic3hX9pq7p946UTFA=; b=jT7AY0Sv+H/8i1HdVm16uMSVaLp69eUIZmVREp3pMwWqgzbhJXFqho4vB0+y+ezaLk zn9XWAFHCj0Q+bW8NnX8Tw62F1vX6oqE0OmN0B6Ap3RfSeu+y85h/IL2sGcfdgsqxC27 lQ8YkxMrQWhCIJIgCiYXSoOOazWUA4H165Ph7hRi71OBCyvHPrUdfEXeKS2Kjq2N7tKj zexnzyIKzXs7z/bF2/kRKiGRlhSMbmsmjwfDnhnbyUJ1et+eeEfYLSBTCfzXUUJCBfKF 984cXoXsq4CFSaEF6CPuTDJYsCRhwX7qE4S6jeU3BDJ5pTRVI+pgap4e+Xk36uiiYEBc tubA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=uyGuYrze; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a22-20020a17090a8c1600b00263aff4ccf0si2880365pjo.3.2023.07.11.23.51.10; Tue, 11 Jul 2023 23:51:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=uyGuYrze; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232225AbjGLGek (ORCPT + 99 others); Wed, 12 Jul 2023 02:34:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59206 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231615AbjGLGeN (ORCPT ); Wed, 12 Jul 2023 02:34:13 -0400 Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com [IPv6:2607:f8b0:4864:20::82e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8DE72D40 for ; Tue, 11 Jul 2023 23:32:31 -0700 (PDT) Received: by mail-qt1-x82e.google.com with SMTP id d75a77b69052e-4036bd4fff1so134501cf.0 for ; Tue, 11 Jul 2023 23:32:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689143550; x=1691735550; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zZpxFys7nJdvmBPjIJ49j1uiao8v048XzoTDU+vGGrY=; b=uyGuYrzeVa5Bc8Dj1mPDQiSMBckJ1dcgG3j7LVGxZ1wlmcdmZsHIo6jNJnlYit7fJ1 xH2BVbT3dBlHskg1w8DmNVmZuIpItmo6RTWYbvXMg3vGtbEGJ+e4yYK0Iq7BFv5dnVA7 uGG3nK4TiJRvLJYoz6sV8EYvCNUpyCpm16/SjGJRR4qXZ055HeFwRZBs15YdtKKEpCxF UcVH3luQlSPFkvH9WeRTBoLWMoFL+gquZx3y0z/vvV2sR+l9z7TtXgHLJcOywowfc6VW FfYLSi4aDV6S1trmPRat3youSjdpEcp/id5GxR4cLAVkvhm5pzfnXJyjCCFKvBtsrHYm 1aKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689143550; x=1691735550; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zZpxFys7nJdvmBPjIJ49j1uiao8v048XzoTDU+vGGrY=; b=enIFthliQlYqaBIbZ1wfYpyTYzVURGxajoaJ6yOFFypi2C4CgaOwLq/itcxdGm9XgA loJxW2k4m8FSQInqpT624xVCthGwW0Q96yD40g0clIZ4cmhAtaLcvFvLL5eARSDovLQy Yf2SwXy6qCC1HdcyY+wzkhrLwpMIgqn+OBUi5cQ+GzH6L9H4SaIflhVLOzZYAQ1ieSMA e2mEUMlsHyFw9wMSNbn1J3mCNvCibbrz0+2EwLvhymk12fQNn+I16+5Ahn+oE0KB8YXv TJLNZ+W49lwZWWe7XTbGiT8J4Vh07tCaYXJvh5/1wcq1jt+N168jMapAGsJYyq7yeg0o AkLg== X-Gm-Message-State: ABy/qLaMfhEU1nOgoe+fLZ8heMcUsaPLV+qdb6jgqMszzDsBwqcOxDDH Z1C8uWrHXi349Sdb88znLKPAIuJqO4Cvm/jrY3aPMP4NpvZDKhyMT+eZmJR0 X-Received: by 2002:a05:622a:1898:b0:403:b242:3e30 with SMTP id v24-20020a05622a189800b00403b2423e30mr80006qtc.1.1689143550478; Tue, 11 Jul 2023 23:32:30 -0700 (PDT) MIME-Version: 1.0 References: <20230712060144.3006358-1-fengwei.yin@intel.com> <20230712060144.3006358-4-fengwei.yin@intel.com> In-Reply-To: <20230712060144.3006358-4-fengwei.yin@intel.com> From: Yu Zhao Date: Wed, 12 Jul 2023 00:31:54 -0600 Message-ID: Subject: Re: [RFC PATCH v2 3/3] mm: mlock: update mlock_pte_range to handle large folio To: Yin Fengwei Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 12, 2023 at 12:02=E2=80=AFAM Yin Fengwei wrote: > > Current kernel only lock base size folio during mlock syscall. > Add large folio support with following rules: > - Only mlock large folio when it's in VM_LOCKED VMA range > > - If there is cow folio, mlock the cow folio as cow folio > is also in VM_LOCKED VMA range. > > - munlock will apply to the large folio which is in VMA range > or cross the VMA boundary. > > The last rule is used to handle the case that the large folio is > mlocked, later the VMA is split in the middle of large folio > and this large folio become cross VMA boundary. > > Signed-off-by: Yin Fengwei > --- > mm/mlock.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 99 insertions(+), 5 deletions(-) > > diff --git a/mm/mlock.c b/mm/mlock.c > index 0a0c996c5c214..f49e079066870 100644 > --- a/mm/mlock.c > +++ b/mm/mlock.c > @@ -305,6 +305,95 @@ void munlock_folio(struct folio *folio) > local_unlock(&mlock_fbatch.lock); > } > > +static inline bool should_mlock_folio(struct folio *folio, > + struct vm_area_struct *vma) > +{ > + if (vma->vm_flags & VM_LOCKED) > + return (!folio_test_large(folio) || > + folio_within_vma(folio, vma)); > + > + /* > + * For unlock, allow munlock large folio which is partially > + * mapped to VMA. As it's possible that large folio is > + * mlocked and VMA is split later. > + * > + * During memory pressure, such kind of large folio can > + * be split. And the pages are not in VM_LOCKed VMA > + * can be reclaimed. > + */ > + > + return true; Looks good, or just should_mlock_folio() // or whatever name you see fit, can_mlock_folio()? { return !(vma->vm_flags & VM_LOCKED) || folio_within_vma(); } > +} > + > +static inline unsigned int get_folio_mlock_step(struct folio *folio, > + pte_t pte, unsigned long addr, unsigned long end) > +{ > + unsigned int nr; > + > + nr =3D folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte); > + return min_t(unsigned int, nr, (end - addr) >> PAGE_SHIFT); > +} > + > +void mlock_folio_range(struct folio *folio, struct vm_area_struct *vma, > + pte_t *pte, unsigned long addr, unsigned int nr) > +{ > + struct folio *cow_folio; > + unsigned int step =3D 1; > + > + mlock_folio(folio); > + if (nr =3D=3D 1) > + return; > + > + for (; nr > 0; pte +=3D step, addr +=3D (step << PAGE_SHIFT), nr = -=3D step) { > + pte_t ptent; > + > + step =3D 1; > + ptent =3D ptep_get(pte); > + > + if (!pte_present(ptent)) > + continue; > + > + cow_folio =3D vm_normal_folio(vma, addr, ptent); > + if (!cow_folio || cow_folio =3D=3D folio) { > + continue; > + } > + > + mlock_folio(cow_folio); > + step =3D get_folio_mlock_step(folio, ptent, > + addr, addr + (nr << PAGE_SHIFT)); > + } > +} > + > +void munlock_folio_range(struct folio *folio, struct vm_area_struct *vma= , > + pte_t *pte, unsigned long addr, unsigned int nr) > +{ > + struct folio *cow_folio; > + unsigned int step =3D 1; > + > + munlock_folio(folio); > + if (nr =3D=3D 1) > + return; > + > + for (; nr > 0; pte +=3D step, addr +=3D (step << PAGE_SHIFT), nr = -=3D step) { > + pte_t ptent; > + > + step =3D 1; > + ptent =3D ptep_get(pte); > + > + if (!pte_present(ptent)) > + continue; > + > + cow_folio =3D vm_normal_folio(vma, addr, ptent); > + if (!cow_folio || cow_folio =3D=3D folio) { > + continue; > + } > + > + munlock_folio(cow_folio); > + step =3D get_folio_mlock_step(folio, ptent, > + addr, addr + (nr << PAGE_SHIFT)); > + } > +} I'll finish the above later. > static int mlock_pte_range(pmd_t *pmd, unsigned long addr, > unsigned long end, struct mm_walk *walk) > > @@ -314,6 +403,7 @@ static int mlock_pte_range(pmd_t *pmd, unsigned long = addr, > pte_t *start_pte, *pte; > pte_t ptent; > struct folio *folio; > + unsigned int step =3D 1; > > ptl =3D pmd_trans_huge_lock(pmd, vma); > if (ptl) { > @@ -329,24 +419,28 @@ static int mlock_pte_range(pmd_t *pmd, unsigned lon= g addr, > goto out; > } > > - start_pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); > + pte =3D start_pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, = &ptl); > if (!start_pte) { > walk->action =3D ACTION_AGAIN; > return 0; > } > - for (pte =3D start_pte; addr !=3D end; pte++, addr +=3D PAGE_SIZE= ) { > + > + for (; addr !=3D end; pte +=3D step, addr +=3D (step << PAGE_SHIF= T)) { > + step =3D 1; > ptent =3D ptep_get(pte); > if (!pte_present(ptent)) > continue; > folio =3D vm_normal_folio(vma, addr, ptent); > if (!folio || folio_is_zone_device(folio)) > continue; > - if (folio_test_large(folio)) > + if (!should_mlock_folio(folio, vma)) > continue; > + > + step =3D get_folio_mlock_step(folio, ptent, addr, end); > if (vma->vm_flags & VM_LOCKED) > - mlock_folio(folio); > + mlock_folio_range(folio, vma, pte, addr, step); > else > - munlock_folio(folio); > + munlock_folio_range(folio, vma, pte, addr, step); > } > pte_unmap(start_pte); > out: Looks good.