Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp91499rdb; Fri, 29 Sep 2023 18:22:53 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGY1wbP/t0AP0bnaYPyh4LK8nsYtFhbINpXh0U8DzwMuOe6waHt4fQRpv69uHvtqBtWraHr X-Received: by 2002:a17:902:c115:b0:1bf:4a1f:2b57 with SMTP id 21-20020a170902c11500b001bf4a1f2b57mr5589706pli.13.1696036972838; Fri, 29 Sep 2023 18:22:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696036972; cv=none; d=google.com; s=arc-20160816; b=drqW+ePNQNIqOUY26Yti3Y/QpTZswzS+NlvZHWcDDkM+EyjAUfFHiT8u6Kwy0XZSE6 BfAVH4Gx0BEKIcSJ22RAEiq99MbNYx5SuI2fj9TlxHo7587GbNnYRdW2ExFCPC9XMTeO 7kT2vs6x0KWdGbMbHsXXFbC7qMZji+bqYDiTXab21OPpa7Z5q/TP1VSN9p9nsMSWMIC4 7XG4TjmMafWV/Gxh/yQ5XytImCbKzG2nOmRqKLnQYnw0hpVzAnp2R/ZitRwa8wut0qO1 ABbClUXWnBlsnO7svx+1PNKMmQWGx3esoJmisaBXLdMldqiYJnG9om9IOf08LPpx0gxV pggw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=WCYPjN0yoearjEsSFF68E1sP3QXMDNTGCnc22SIpFoY=; fh=t5JN1Exz1sNlZpR50Eiyme0eKX/9F9jHctQHxMBaspU=; b=hYtTii7wG5PV/Hlf37wAzwMtoYtD8DtA4W1NLLRdcbOB8CXcpirRXJ0Lfcy6kdvJD4 R4MvHKCn1ZJQFKUq5b+iZsNwPdVtknXW34M+SEZXEHilldWoTM8srgT6HytqbEFPXaWE daP9IJTqxboBK1QQcEJhcS9yjxFCPCwJAyFN46KXdUkJO+HCO2Zxga/hp2HN+OJtZG34 DERtGiMqd8rQ0XDDVONLhDjWguetQPJ//9Mb7suENTTMOOlikJ7jnEcvSu4p3IDTxJc0 Sf1MjnsLboF9w+jzu07kbz3SJxiDyA/NAbwSLbETIMjSWV81PljsOSTKBaR4DgxvV3i0 tj4Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=G1diTRiz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id u9-20020a17090341c900b001c62cfff798si13013760ple.372.2023.09.29.18.22.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Sep 2023 18:22:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=G1diTRiz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 1769C80BB215; Fri, 29 Sep 2023 12:08:06 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233592AbjI2THv (ORCPT + 99 others); Fri, 29 Sep 2023 15:07:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47762 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233365AbjI2THu (ORCPT ); Fri, 29 Sep 2023 15:07:50 -0400 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2A0F5195 for ; Fri, 29 Sep 2023 12:07:48 -0700 (PDT) Received: by mail-pj1-x102a.google.com with SMTP id 98e67ed59e1d1-279150bad13so1665458a91.3 for ; Fri, 29 Sep 2023 12:07:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696014467; x=1696619267; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WCYPjN0yoearjEsSFF68E1sP3QXMDNTGCnc22SIpFoY=; b=G1diTRizF6S/ael5r8c+zhUoUqhrIRNFUbn6RYJnBTgxNGCNC5zNQW4I45MuS0h2a2 pTC54HiRCQ5pLZee1vKUzyrWuRQT2zU1dJd22+fEff74ilRzbr5qQPPmxhF8hZF4wpgg 1fhdG0UaFc5/Lh8k6HszZjDBmX90VGpe4VHPuhnSRut4bYoyrHTrJnVdBhipTrfVCsaJ AxmJgZdi3J9fM4VX6u/k7OZfyPrGNlhNQ3lrhnW8TnoZDwFo8EYAJyisJbPDR7+ifVOE I1pAmGG0Qygj8NzzIV+NjJlsAcZfqr0TBqQQO+9YP5zXvUhrSPXTzdyLrvtmp9Jp/wAH s5kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696014467; x=1696619267; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WCYPjN0yoearjEsSFF68E1sP3QXMDNTGCnc22SIpFoY=; b=nZMYRb8UGNhfWiKGIR92C5cmgy4UwmDBfOsEL8eomYlLVkOxmtXaUNOcQSKUIWA67A Il5p+0D25N5ChWHBMtGPOkLL39XJZ2qYmN0R/VsqqX9AXR8gsrfHfAs/Yq5MpjXjEP3o 90Vs0du1n4ANGg4dIeeQhYpqp6H/ggQU8sjDIwUXVzrmsRNWsmq3akPKUT6GH0DnHQmr qLmB2iJ0JQzDlwXfADaIKfhw/NoIB7iwaOS4SMEGPvrIVgdzNwIH5TSXkfgMjV2PGfxx Ki0ZSLWA+9PGZZ8JK+mzydW1ZeXJJqY/hwE10H0UPAIE+YHKN7E996kZHiYDzFjS6zkc RQNw== X-Gm-Message-State: AOJu0YxnSUJLwarnqTsCb+sAaIEC8GJ1ey7Fif9h/WXSvijuS6FA++7Y aXT1MjLf9nGqhUQYjn7u697MOWyg1F9QWGt5m1PYVfiw X-Received: by 2002:a17:90a:4f0d:b0:268:ca76:64a with SMTP id p13-20020a17090a4f0d00b00268ca76064amr4420059pjh.49.1696014467474; Fri, 29 Sep 2023 12:07:47 -0700 (PDT) MIME-Version: 1.0 References: <20230922193639.10158-1-vishal.moola@gmail.com> <20230922193639.10158-3-vishal.moola@gmail.com> In-Reply-To: From: Yang Shi Date: Fri, 29 Sep 2023 12:07:35 -0700 Message-ID: Subject: Re: [RFC PATCH 2/2] mm/khugepaged: Remove compound_pagelist To: "Vishal Moola (Oracle)" Cc: linux-mm@kvack.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Fri, 29 Sep 2023 12:08:06 -0700 (PDT) On Tue, Sep 26, 2023 at 3:07=E2=80=AFPM Yang Shi wrot= e: > > On Fri, Sep 22, 2023 at 9:33=E2=80=AFPM Vishal Moola (Oracle) > wrote: > > > > Currently, khugepaged builds a compound_pagelist while scanning, which > > is used to properly account for compound pages. We can now account > > for a compound page as a singular folio instead, so remove this list. > > > > Large folios are guaranteed to have consecutive ptes and addresses, so > > once the first pte of a large folio is found skip over the rest. > > The address space may just map a partial folio, for example, in the > extreme case the HUGE_PMD size range may have HUGE_PMD_NR folios with > mapping one subpage from each folio per PTE. So assuming the PTE > mapped folio is mapped consecutively may be wrong. > > Please refer to collapse_compound_extreme() in > tools/testing/selftests/mm/khugepaged.c. > > > > > This helps convert khugepaged to use folios. It removes 3 compound_head > > calls in __collapse_huge_page_copy_succeeded(), and removes 980 bytes o= f > > kernel text. > > > > Signed-off-by: Vishal Moola (Oracle) > > --- > > mm/khugepaged.c | 76 ++++++++++++------------------------------------- > > 1 file changed, 18 insertions(+), 58 deletions(-) > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index f46a7a7c489f..b6c7d55a8231 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -498,10 +498,9 @@ static void release_pte_page(struct page *page) > > release_pte_folio(page_folio(page)); > > } > > > > -static void release_pte_pages(pte_t *pte, pte_t *_pte, > > - struct list_head *compound_pagelist) > > +static void release_pte_folios(pte_t *pte, pte_t *_pte) > > { > > - struct folio *folio, *tmp; > > + struct folio *folio; > > > > while (--_pte >=3D pte) { > > pte_t pteval =3D ptep_get(_pte); > > @@ -514,12 +513,7 @@ static void release_pte_pages(pte_t *pte, pte_t *_= pte, > > continue; > > folio =3D pfn_folio(pfn); > > if (folio_test_large(folio)) > > - continue; > > - release_pte_folio(folio); > > - } > > - > > - list_for_each_entry_safe(folio, tmp, compound_pagelist, lru) { > > - list_del(&folio->lru); > > + _pte -=3D folio_nr_pages(folio) - 1; > > release_pte_folio(folio); > > } > > } > > @@ -538,8 +532,7 @@ static bool is_refcount_suitable(struct page *page) > > static int __collapse_huge_page_isolate(struct vm_area_struct *vma, > > unsigned long address, > > pte_t *pte, > > - struct collapse_control *cc, > > - struct list_head *compound_page= list) > > + struct collapse_control *cc) > > { > > struct folio *folio =3D NULL; > > pte_t *_pte; > > @@ -588,19 +581,6 @@ static int __collapse_huge_page_isolate(struct vm_= area_struct *vma, > > } > > } > > > > - if (folio_test_large(folio)) { > > - struct folio *f; > > - > > - /* > > - * Check if we have dealt with the compound pag= e > > - * already > > - */ > > - list_for_each_entry(f, compound_pagelist, lru) = { > > - if (folio =3D=3D f) > > - goto next; > > - } > > - } > > - > > /* > > * We can do it before isolate_lru_page because the > > * page can't be freed from under us. NOTE: PG_lock > > @@ -644,9 +624,6 @@ static int __collapse_huge_page_isolate(struct vm_a= rea_struct *vma, > > VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); > > VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); > > > > - if (folio_test_large(folio)) > > - list_add_tail(&folio->lru, compound_pagelist); > > -next: > > /* > > * If collapse was initiated by khugepaged, check that = there is > > * enough young pte to justify collapsing the page > > @@ -660,6 +637,10 @@ static int __collapse_huge_page_isolate(struct vm_= area_struct *vma, > > if (pte_write(pteval)) > > writable =3D true; > > > > + if (folio_test_large(folio)) { > > + _pte +=3D folio_nr_pages(folio) - 1; > > + address +=3D folio_size(folio) - PAGE_SIZE; > > + } > > } > > > > if (unlikely(!writable)) { > > @@ -673,7 +654,7 @@ static int __collapse_huge_page_isolate(struct vm_a= rea_struct *vma, > > return result; > > } > > out: > > - release_pte_pages(pte, _pte, compound_pagelist); > > + release_pte_folios(pte, _pte); > > trace_mm_collapse_huge_page_isolate(&folio->page, none_or_zero, > > referenced, writable, resul= t); > > return result; > > @@ -682,11 +663,9 @@ static int __collapse_huge_page_isolate(struct vm_= area_struct *vma, > > static void __collapse_huge_page_copy_succeeded(pte_t *pte, > > struct vm_area_struct *= vma, > > unsigned long address, > > - spinlock_t *ptl, > > - struct list_head *compo= und_pagelist) > > + spinlock_t *ptl) > > { > > struct page *src_page; > > - struct page *tmp; > > pte_t *_pte; > > pte_t pteval; > > > > @@ -706,8 +685,7 @@ static void __collapse_huge_page_copy_succeeded(pte= _t *pte, > > } > > } else { > > src_page =3D pte_page(pteval); > > - if (!PageCompound(src_page)) > > - release_pte_page(src_page); > > + release_pte_page(src_page); This line is problematic too. It may cause double unlock if I read it correctly. The loop scans the mapped subpages from the same folio, release_pte_page() is called for the same folio multiple times. > > /* > > * ptl mostly unnecessary, but preempt has to > > * be disabled to update the per-cpu stats > > @@ -720,23 +698,12 @@ static void __collapse_huge_page_copy_succeeded(p= te_t *pte, > > free_page_and_swap_cache(src_page); > > } > > } > > - > > - list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru)= { > > - list_del(&src_page->lru); > > - mod_node_page_state(page_pgdat(src_page), > > - NR_ISOLATED_ANON + page_is_file_lru= (src_page), > > - -compound_nr(src_page)); > > - unlock_page(src_page); > > - free_swap_cache(src_page); > > - putback_lru_page(src_page); > > - } > > } > > > > static void __collapse_huge_page_copy_failed(pte_t *pte, > > pmd_t *pmd, > > pmd_t orig_pmd, > > - struct vm_area_struct *vma= , > > - struct list_head *compound= _pagelist) > > + struct vm_area_struct *vma= ) > > { > > spinlock_t *pmd_ptl; > > > > @@ -753,7 +720,7 @@ static void __collapse_huge_page_copy_failed(pte_t = *pte, > > * Release both raw and compound pages isolated > > * in __collapse_huge_page_isolate. > > */ > > - release_pte_pages(pte, pte + HPAGE_PMD_NR, compound_pagelist); > > + release_pte_folios(pte, pte + HPAGE_PMD_NR); > > } > > > > /* > > @@ -769,7 +736,6 @@ static void __collapse_huge_page_copy_failed(pte_t = *pte, > > * @vma: the original raw pages' virtual memory area > > * @address: starting address to copy > > * @ptl: lock on raw pages' PTEs > > - * @compound_pagelist: list that stores compound pages > > */ > > static int __collapse_huge_page_copy(pte_t *pte, > > struct page *page, > > @@ -777,8 +743,7 @@ static int __collapse_huge_page_copy(pte_t *pte, > > pmd_t orig_pmd, > > struct vm_area_struct *vma, > > unsigned long address, > > - spinlock_t *ptl, > > - struct list_head *compound_pagelis= t) > > + spinlock_t *ptl) > > { > > struct page *src_page; > > pte_t *_pte; > > @@ -804,11 +769,9 @@ static int __collapse_huge_page_copy(pte_t *pte, > > } > > > > if (likely(result =3D=3D SCAN_SUCCEED)) > > - __collapse_huge_page_copy_succeeded(pte, vma, address, = ptl, > > - compound_pagelist); > > + __collapse_huge_page_copy_succeeded(pte, vma, address, = ptl); > > else > > - __collapse_huge_page_copy_failed(pte, pmd, orig_pmd, vm= a, > > - compound_pagelist); > > + __collapse_huge_page_copy_failed(pte, pmd, orig_pmd, vm= a); > > > > return result; > > } > > @@ -1081,7 +1044,6 @@ static int collapse_huge_page(struct mm_struct *m= m, unsigned long address, > > int referenced, int unmapped, > > struct collapse_control *cc) > > { > > - LIST_HEAD(compound_pagelist); > > pmd_t *pmd, _pmd; > > pte_t *pte; > > pgtable_t pgtable; > > @@ -1168,8 +1130,7 @@ static int collapse_huge_page(struct mm_struct *m= m, unsigned long address, > > > > pte =3D pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); > > if (pte) { > > - result =3D __collapse_huge_page_isolate(vma, address, p= te, cc, > > - &compound_pagelis= t); > > + result =3D __collapse_huge_page_isolate(vma, address, p= te, cc); > > spin_unlock(pte_ptl); > > } else { > > result =3D SCAN_PMD_NULL; > > @@ -1198,8 +1159,7 @@ static int collapse_huge_page(struct mm_struct *m= m, unsigned long address, > > anon_vma_unlock_write(vma->anon_vma); > > > > result =3D __collapse_huge_page_copy(pte, hpage, pmd, _pmd, > > - vma, address, pte_ptl, > > - &compound_pagelist); > > + vma, address, pte_ptl); > > pte_unmap(pte); > > if (unlikely(result !=3D SCAN_SUCCEED)) > > goto out_up_write; > > -- > > 2.40.1 > >