Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp1949679rwp; Thu, 13 Jul 2023 20:48:09 -0700 (PDT) X-Google-Smtp-Source: APBJJlEl2y4FDjAqoXdhufHYRKnH2EBGtN4Fs7gfB3PVkNThPJRKFGa7Ana8IBK5bDlf72QMwy8L X-Received: by 2002:a17:906:64d3:b0:992:7e1f:8419 with SMTP id p19-20020a17090664d300b009927e1f8419mr3292381ejn.2.1689306489449; Thu, 13 Jul 2023 20:48:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689306489; cv=none; d=google.com; s=arc-20160816; b=ABWXcMS0K/jQmpxio76IskStFP4zpigkQvGUnv8F4xYZnosxCK/+Enxh9+vzXXozrn Cc5VldUUFuSL3fjPHDgcOdStwFhSXaciNC2nNi64/LBrAu2n3fnt5cvs7/L+l3D5Ppu4 Qp/zoqw4CmbPRGz2UbMkFMRjw+9Cxd1nOB1l536ALinfjchBpxYu/n0jpbjO0cxzbXdG cHlqbYJiR+BcBFetDm3x68g9Spjf5x0sP9NwvAqjxpnMrMvSlO0Yo+bz966zO+TAaXbh dGpprB8DlBjS+Wj6neONmIFdWGJENo+gkIYP6UU7aCuX08cE/yHLGS0yef6G6ZsjQaJ8 MF6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=uAavaLmIeCwYclXN+9/jkCoblgrGrtrOk9lcsx3pq0w=; fh=xC0BK6g+ZazqxHe1uFMuyp7n0IVwttcgihhym7XnBdE=; b=x0dDhJ9Hn1CojEt0vlGv6XyYxqpr3pqLnSON0jjxeabr94/XeOTck41P6WNTYtKIl8 Ck2P8WQa1gXvIkkNeLyLtqr8B+vPYexhbYl2vWLzG3fenzTYwaH6l+4qL8qguTuZMnO9 RNS31hO1YJ8+nMcWEHMnSG9svucUDPp8vs+Ls+/RyQeGhgwrRRr5iMeE8Mea9KoQuJhZ K64lvO5Jc+KLkOYVKjnAtmvhkOXai0bliA4pzQispk6g3LY+ciooh12MhWqFBJfMT/YX wxagFPe6ogOo/87Y4bqTZna0UIxbD//hcuOTEW55mkwqBFhgr5TPdgpuh8PYsSk3Btos fL5w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=KIXsfkQw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r17-20020aa7d591000000b0051de2b406easi8285636edq.511.2023.07.13.20.47.45; Thu, 13 Jul 2023 20:48:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=KIXsfkQw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234028AbjGNDYI (ORCPT + 99 others); Thu, 13 Jul 2023 23:24:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231292AbjGNDYG (ORCPT ); Thu, 13 Jul 2023 23:24:06 -0400 Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E6621FFC for ; Thu, 13 Jul 2023 20:24:01 -0700 (PDT) Received: by mail-ed1-x52d.google.com with SMTP id 4fb4d7f45d1cf-51e55517de3so4237a12.1 for ; Thu, 13 Jul 2023 20:24:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689305040; x=1691897040; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=uAavaLmIeCwYclXN+9/jkCoblgrGrtrOk9lcsx3pq0w=; b=KIXsfkQwyuSiZ7lAeGp6ZIlSb0TYvzb3dcNKFNJPdbKpiqxZ+wHFTp4eqzpe2tkANv pv3Qs46L3I3UpQ0j2QoVYO25+jRodIjC+C9c/bxR/gB6H4P/n+bLdRYux3pWloWLyGQI j6WetHkIsnSmEstipMxWommnaseemHOYIjBASDk4un8tR37s+zFnezjQEVbLP8ED1cWN wHYhSa9pQNeUMkwc8oDEzSbLQdCquhS9w9+mjGMobinX5ubMFOn9sLy3m89W+wMX94j8 eMbBPGG9yOB3T8epjm+QMTFf39Obi/B9nahejQp89OKprTBRnVV1EVo32nh4ULZYLXIi qKIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689305040; x=1691897040; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uAavaLmIeCwYclXN+9/jkCoblgrGrtrOk9lcsx3pq0w=; b=Sojqg8a+J95CUdCGHvRYmn+AlZSZFH9mrerBRE06CLDzQnSHEK8rRbWgDv8VMWYcEb Aa+7eThPcseZfU6pdjUCZY6i/9ccbAHfpD7AhAFZJv6eGbyCxPbiHSE5iHMuBWfVm/8f 4HAC/9oxRUNdxOP2eEC0JXGTc9NPtnim4ugeqLOAylnSAprte/1+OB8V/C4k2FAjjiGc evx6xEHe2u9PZMlrApTXIrRTfInILWHIq7EsiztTsJ+gAOPdrOVzh0Tg7PyiucHd6bfC 3yhZRQjOiUpKsr4Ln8NrM84WtauChSo0SoXnO6Qkp1Cjr1FgLe/l7+Svaunv2xlr5Jkj i20A== X-Gm-Message-State: ABy/qLZk0L8JWUzhtO3miQkogtR/nw+YSGD0sK5GF6qqml9KXq4xDF2a zplkYKSZYqIWRblciQ6Oc2KnqN1KufJWdv1WMVj8Rg== X-Received: by 2002:a50:bb6a:0:b0:51e:5e41:a0b2 with SMTP id y97-20020a50bb6a000000b0051e5e41a0b2mr341376ede.2.1689305039871; Thu, 13 Jul 2023 20:23:59 -0700 (PDT) MIME-Version: 1.0 References: <20230713150558.200545-1-fengwei.yin@intel.com> <8547495c-9051-faab-a47d-1962f2e0b1da@intel.com> In-Reply-To: <8547495c-9051-faab-a47d-1962f2e0b1da@intel.com> From: Yu Zhao Date: Thu, 13 Jul 2023 21:23:21 -0600 Message-ID: Subject: Re: [RFC PATCH] madvise: make madvise_cold_or_pageout_pte_range() support large folio To: "Yin, Fengwei" Cc: Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 13, 2023 at 9:10=E2=80=AFPM Yin, Fengwei wrote: > > > > On 7/14/2023 10:08 AM, Yu Zhao wrote: > > On Thu, Jul 13, 2023 at 9:06=E2=80=AFAM Yin Fengwei wrote: > >> > >> Current madvise_cold_or_pageout_pte_range() has two problems for > >> large folio support: > >> - Using folio_mapcount() with large folio prevent large folio from > >> picking up. > >> - If large folio is in the range requested, shouldn't split it > >> in madvise_cold_or_pageout_pte_range(). > >> > >> Fix them by: > >> - Use folio_estimated_sharers() with large folio > >> - If large folio is in the range requested, don't split it. Leave > >> to page reclaim phase. > >> > >> For large folio cross boundaries of requested range, skip it if it's > >> page cache. Try to split it if it's anonymous folio. If splitting > >> fails, skip it. > > > > For now, we may not want to change the existing semantic (heuristic). > > IOW, we may want to stick to the "only owner" condition: > > > > - if (folio_mapcount(folio) !=3D 1) > > + if (folio_entire_mapcount(folio) || > > + (any_page_within_range_has_mapcount > 1)) > > > > +Minchan Kim > The folio_estimated_sharers() was discussed here: > https://lore.kernel.org/linux-mm/20230118232219.27038-6-vishal.moola@gmai= l.com/ > https://lore.kernel.org/linux-mm/20230124012210.13963-2-vishal.moola@gmai= l.com/ > > Yes. It's accurate to check each page of large folio. But it may be over = killed in > some cases (And I think madvise is one of the cases not necessary to be a= ccurate. > So folio_estimated_sharers() is enough. Correct me if I am wrong). I see. Then it's possible this is also what the original commit wants to do -- Minchan, could you clarify? Regardless, I think we can have the following fix, potentially cc'ing stabl= e: - if (folio_mapcount(folio) !=3D 1) + if (folio_estimated_sharers(folio) !=3D 1) Sounds good? > > Also there is an existing bug here: the later commit 07e8c82b5eff8 > > ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios") > > is incorrect for sure; the original commit 9c276cc65a58f ("mm: > > introduce MADV_COLD") seems incorrect too. > > > > +Vishal Moola (Oracle) > > > > The "any_page_within_range_has_mapcount" test above seems to be the > > only correct to meet condition claimed by the comments, before or > > after the folio conversion, assuming here a THP page means the > > compound page without PMD mappings (PMD-split). Otherwise the test is > > always false (if it's also PMD mapped somewhere else). > > > > /* > > * Creating a THP page is expensive so split it only if we > > * are sure it's worth. Split it if we are only owner. > > */