Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp1407912rwe; Thu, 1 Sep 2022 18:52:09 -0700 (PDT) X-Google-Smtp-Source: AA6agR5sFyfTFWkprJtoZYOA78SwL05BnwP8Y0AGWwh+NErSMOGRxrZiq6HPBB9wipCgTDh2fqVC X-Received: by 2002:a17:907:94c5:b0:74f:25e3:5f86 with SMTP id dn5-20020a17090794c500b0074f25e35f86mr935841ejc.304.1662083529006; Thu, 01 Sep 2022 18:52:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1662083529; cv=none; d=google.com; s=arc-20160816; b=JZ0qja9GZU9sCHfaazJrIZKc2HFV1jkL7973i/ap/g08pNERrHOeB4h/bDDXeYW7S/ 1U+VSaRLa1/hFMEL+eKcJ5cL/UlK2mu56d/QcP9JXSXV1KsGIhT56M1FrUfuv1ABQm8J XMBC6DPhdgYkMJMmUqFNlQ/FlTTx2fcN7yUmWByFuwJ56JYS5wd9db6aAfkDnTHE8H1O ua2EUt6PaYajTGO9i7QpNn3wBRxEwKfWTo9Kf4bOoozcL1+UcsJ5TeY/D67hFKPvRWMK eGHblS4tpq6crobjVbORqyfI5qVmnvkojH3ZkZbvOZ4he1UEuYQaIR4+sW/Gyqzv2/Jh RABA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=S/fTF/4dM/pEqTpy2bm0bAwsPgRgvgo33NrrNtuAix8=; b=yw2iy0NAGzM7aeC0DakiAoK8rxzpag+O7sqfZPL037ok5zMof1uXcrG8J0H9IhsF6E Ea85/VAydxycgZIHYYtIuIcgDcy744jA5eTP5owWaf36qn/LaA5+xV8N9D2yUUyvKyvn Htexyg8o1cdS6Y8eXaptO3ayQiU/wiobTYJPYwj+0qXogpzgNiHMNYHiz7WsDmkCxoJz +vDyuh3zKSiNScGsDrKB3KQM7dbjhen/pBleKqlx0TrgCVU+hduV1ZKhH5/X7LnPu1ci 89smd8vh0c/Z/lpgfoPZtNQT3PNRZA6JUPlBelp5G4zTQZPVg/+nPlLzI3RhS0lULLgN QejQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=drzeCFI6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hz2-20020a1709072ce200b0073d88927a49si648144ejc.124.2022.09.01.18.51.43; Thu, 01 Sep 2022 18:52:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=drzeCFI6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235204AbiIBB3P (ORCPT + 99 others); Thu, 1 Sep 2022 21:29:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43518 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235034AbiIBB3K (ORCPT ); Thu, 1 Sep 2022 21:29:10 -0400 Received: from mail-vs1-xe2d.google.com (mail-vs1-xe2d.google.com [IPv6:2607:f8b0:4864:20::e2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B65F9FAB8 for ; Thu, 1 Sep 2022 18:29:08 -0700 (PDT) Received: by mail-vs1-xe2d.google.com with SMTP id i12so633699vsr.10 for ; Thu, 01 Sep 2022 18:29:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date; bh=S/fTF/4dM/pEqTpy2bm0bAwsPgRgvgo33NrrNtuAix8=; b=drzeCFI6C3D4GkZ2WMTCU8o1/1IhLN/qaofev4e6fi1mIufwzBZlDMul7pVTnFowrj UckddCVK33VtTWFTYgSwggcc35E+H7akCFrt9aaUc+lQxJdtPiXUuUrFpYBqX8whJFLb maD1mApMxPqykVUQY8do9X2znhlNAa8zwt8fqeqCHOMVM2U4Lclb9fGQz7L2MFxi6a/X GAjAqGed8QjLRpy9QHJoiiouQPeURRiKw4HyoZ1wrnLam2rEtSyGqPqRP92Js5aUIZV+ TR+++JpoQg+gEp4AScIH3ekOAcuqFdJ3sGEG67PWwlb6Flin6pQOiyjE829Zn852Pf+l XUKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date; bh=S/fTF/4dM/pEqTpy2bm0bAwsPgRgvgo33NrrNtuAix8=; b=bLIT/NJbSPYM1CTXZ43B0qZf6s3ZhYyM7d885Q/0pKte0kJdccIKUFEXhCc+TStEbp ouh6ATEGbobwIPsWQucrQFuzrywmo2i/CgOrKqq2aiEWJOuIut4amQ9s6eiM7W0uF9wd DBRhDYV+I3Qq9xiTAqOycrMWSspnjtJK9c99O/M37aEgEr2WlhezQgNeMBJQV0nc1NDa iTPUN40SEEDfSckidrAuDmDgQm8tJ51Y79QWgsjFRlGngI+qa+E8Beqt1XsYQRhWbo7R tw+Kj23dD4jJAIfreB+/NtyTur/9Es+axyREiOTCqKJ3c9DWjfblR8o7bzKslaly3ULX bMQA== X-Gm-Message-State: ACgBeo1dBUpOwZ16B4UfrhZ1IorbJ5KDSjW0gxN9Amtd8YoQU2cWdH7V MdFR1al1OQjUjqJIVCWqgJn4dwinMaZeczaPPNYsgA== X-Received: by 2002:a67:f909:0:b0:390:e960:7f5a with SMTP id t9-20020a67f909000000b00390e9607f5amr7264884vsq.50.1662082147436; Thu, 01 Sep 2022 18:29:07 -0700 (PDT) MIME-Version: 1.0 References: <20220815071332.627393-1-yuzhao@google.com> <20220815071332.627393-8-yuzhao@google.com> <0F7CF2A7-F671-4196-B8FD-F35E9556391B@gmail.com> In-Reply-To: From: Yu Zhao Date: Thu, 1 Sep 2022 19:28:31 -0600 Message-ID: Subject: Re: [PATCH v14 07/14] mm: multi-gen LRU: exploit locality in rmap To: Nadav Amit Cc: Andrew Morton , Andi Kleen , Aneesh Kumar , Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Johannes Weiner , Jonathan Corbet , Linus Torvalds , Matthew Wilcox , Mel Gorman , Michael Larabel , Michal Hocko , Mike Rapoport , Peter Zijlstra , Tejun Heo , Vlastimil Babka , Will Deacon , Linux ARM , "open list:DOCUMENTATION" , LKML , Linux MM , X86 ML , Kernel Page Reclaim v2 , Barry Song , Brian Geffon , Jan Alexander Steffens , Oleksandr Natalenko , Steven Barrett , Suleiman Souhlal , Daniel Byrne , Donald Carr , =?UTF-8?Q?Holger_Hoffst=C3=A4tte?= , Konstantin Kharlamov , Shuang Zhai , Sofia Trinh , Vaibhav Jain Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 1, 2022 at 7:17 PM Yu Zhao wrote: > > On Thu, Sep 1, 2022 at 3:18 AM Nadav Amit wrote: > > > > > > > > > On Aug 15, 2022, at 12:13 AM, Yu Zhao wrote: > > > > > > Searching the rmap for PTEs mapping each page on an LRU list (to test > > > and clear the accessed bit) can be expensive because pages from > > > different VMAs (PA space) are not cache friendly to the rmap (VA > > > space). For workloads mostly using mapped pages, searching the rmap > > > can incur the highest CPU cost in the reclaim path. > > > > Impressive work. Thanks. > > Sorry if my feedback is not timely. > > > > Just one minor point for thought, that can be left for a later cleanup. > > > > > > > > + for (i =3D 0, addr =3D start; addr !=3D end; i++, addr +=3D PAG= E_SIZE) { > > > + unsigned long pfn; > > > + > > > + pfn =3D get_pte_pfn(pte[i], pvmw->vma, addr); > > > + if (pfn =3D=3D -1) > > > + continue; > > > + > > > + if (!pte_young(pte[i])) > > > + continue; > > > + > > > + folio =3D get_pfn_folio(pfn, memcg, pgdat); > > > + if (!folio) > > > + continue; > > > + > > > + if (!ptep_test_and_clear_young(pvmw->vma, addr, pte + i= )) > > > + continue; > > > + > > > > You have already checked that the PTE is old (not young) so this check > > seems redundant. > > You are right, for x86, which belongs to category 1: hardware and > OS share the same paging data structure. > > > I do not see a way in which the access-bit can be cleared > > since you hold the ptl. > > There is also category 2: the OS paging data structure is a shadow of wha= t > hardware actually uses, e.g., POWER9 radix. > > To make both categories work, the general rule is that the OS paging > data structure must be more strict, i.e., it can have A/D bits set > while the hardware paging data structure may not. The opposite is not > allowed, even for the A bit, because the A bit can also be used to > determine whether a TLB flush is required. The Linux kernel doesn't do > this but there are other OSes that do. > > For prefaulted PTEs, we generally mark them young unless > arch_wants_old_prefaulted_pte() returns true (currently only ARMv8.2+ > do). On POWER9, we'd see those PTEs pass the first check but fail the > second. Because the first check (non-atomic) is allowed to fetch from the OS paging data structure (which is more strict) while the second check (atomic) must fetch from the hardware page data structure (which does not have the A bit because those PTEs are preffaulted). > > IOW, there is no need for the =E2=80=9Cif" and =E2=80=9Ccontinue". > > > > Makes me also wonder whether having a separate ptep_clear_young() can > > slightly help, since anyhow the access-bit is more of an estimation, > > and having a separate ptep_clear_young() can enable optimizations. > > > > On x86, for instance, if the PTE is dirty, we may be able to clear the > > access-bit without an atomic operation, which should be faster. > > Agreed.