Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp6652154rdb; Fri, 15 Dec 2023 05:00:56 -0800 (PST) X-Google-Smtp-Source: AGHT+IH2KL+LEVxqPKtoSyHtD8hVbq+tLjmjRCgmr1k06yKU57jk5KHF68OOzKQQIT1D6RLz+ri7 X-Received: by 2002:a05:6a00:2394:b0:6cb:a2f7:83d with SMTP id f20-20020a056a00239400b006cba2f7083dmr13638539pfc.19.1702645256459; Fri, 15 Dec 2023 05:00:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702645256; cv=none; d=google.com; s=arc-20160816; b=Edvo+FB49qq6MdoOEaoSS+uQJ4pd/KTmu7mwtMLAIsFI3AA6TnxHfPgbDHdjIzdxuA s9khe4jOnXS/BWmyPPyswZdDGhQ7Ex2nCfdwqgLtIGmgyzxY69CBjNi1CDpq2UIEQIvx wsu9/41Hwj3fGZwkblm9fsLg8I69VwUDFdCGkrfGJVgURRs0Webfy5OSZtpHPcRyCeuh GfoOv/HvwVF+6Epeo9by0S3yJwxcBaH7rSNpWreengcwHaXzcafDRLj+229coxQeHx34 fal16grel2+UQEcuAQUOcxPsoM+zPilDQJewp3xXV5U20rpl9CKd/6GRJ/VKFKsZkaO4 1MJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=A20vANv0qziX5EvAFM+tE9KI46VJPd2Riq2IaQKrnLI=; fh=Dgr2rP423c9B4JtGY2wvso0lxv36gFjYaMsg5OciKyo=; b=oKKCixCQTjda7leNs+We4P7Ddo8IiUtlKoOdVhiE/UMKfJXqlpS0uTySGBROd2SS0d kCx414qJ3lgURPR9kdOJAaKe4OHWuJ0oLThPPhmrI6DE2NNr8gdGA9dUAEdMZTY9PPRj Fr/UQkyt0w91LWeFKg4LuhR0rmYEuEiV8fKm1qz0LtFe5Z9uqBU3IgkXZh/Wx8ybaqoZ 2jPZpMyfc66z5lqbmIfKd3uIUJAUxXyzS0B58IERCU8Y5gM3bDxLkNtYXzpag6A1LVSL 9YQap/6Wahr6buLqeGfMLsHkM/eXD4GFMKZC88nm3SGJJxpMYrSMCHNPHaL5JCIng3p3 VxVQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-966-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-966-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=antgroup.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id cw13-20020a056a00450d00b006cef6734e93si10447606pfb.78.2023.12.15.05.00.55 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Dec 2023 05:00:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-966-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-966-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-966-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=antgroup.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 178F9B22DF8 for ; Fri, 15 Dec 2023 13:00:24 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9E23F2D04D; Fri, 15 Dec 2023 13:00:14 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from out187-13.us.a.mail.aliyun.com (out187-13.us.a.mail.aliyun.com [47.90.187.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C6C7610951 for ; Fri, 15 Dec 2023 13:00:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=antgroup.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=antgroup.com X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R641e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018047198;MF=henry.hj@antgroup.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---.Vl7PTuA_1702644264; Received: from localhost(mailfrom:henry.hj@antgroup.com fp:SMTPD_---.Vl7PTuA_1702644264) by smtp.aliyun-inc.com; Fri, 15 Dec 2023 20:44:25 +0800 From: "Henry Huang" To: yuzhao@google.com Cc: , "Henry Huang" , "=?UTF-8?B?6LCI6Ym06ZSL?=" , , , "=?UTF-8?B?5pyx6L6JKOiMtuawtCk=?=" Subject: Re: [RFC v2] mm: Multi-Gen LRU: fix use mm/page_idle/bitmap Date: Fri, 15 Dec 2023 20:44:17 +0800 Message-ID: <20231215124423.88878-1-henry.hj@antgroup.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On Fri, Dec 15, 2023 at 15:23 PM Yu Zhao wrote: >Regarding the change itself, it'd cause a slight regression to other >use cases (details below). > > > @@ -3355,6 +3359,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, > > unsigned long pfn; > > struct folio *folio; > > pte_t ptent = ptep_get(pte + i); > > + bool is_pte_young; > > > > total++; > > walk->mm_stats[MM_LEAF_TOTAL]++; > > @@ -3363,16 +3368,20 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, > > if (pfn == -1) > > continue; > > > > - if (!pte_young(ptent)) { > > - walk->mm_stats[MM_LEAF_OLD]++; > > Most overhead from page table scanning normally comes from > get_pfn_folio() because it almost always causes a cache miss. This is > like a pointer dereference, whereas scanning PTEs is like streaming an > array (bad vs good cache performance). > > pte_young() is here to avoid an unnecessary cache miss from > get_pfn_folio(). Also see the first comment in get_pfn_folio(). It > should be easy to verify the regression -- FlameGraph from the > memcached benchmark in the original commit message should do it. > > Would a tracepoint here work for you? > > > > > + is_pte_young = !!pte_young(ptent); > > + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap, is_pte_young); > > + if (!folio) { > > + if (!is_pte_young) > > + walk->mm_stats[MM_LEAF_OLD]++; > > continue; > > } > > > > - folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); > > - if (!folio) > > + if (!folio_test_clear_young(folio) && !is_pte_young) { > > + walk->mm_stats[MM_LEAF_OLD]++; > > continue; > > + } > > > > - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) > > + if (is_pte_young && !ptep_test_and_clear_young(args->vma, addr, pte + i)) > > VM_WARN_ON_ONCE(true); > > > > young++; Thanks for replying. For avoiding below: 1. confict between page_idle/bitmap and mglru scan 2. performance downgrade in mglru page-table scan if call get_pfn_folio for each pte. We have a new idea: 1. Include a new api under /sys/kernel/mm/page_idle, support mark idle flag only, without rmap walking or clearing pte young. 2. Use mglru proactive scan to clear page idle flag. workflows: t1 t2 mark pages idle mglru scan and check pages idle It's easy for us to know that whether a page is accessed during t1~t2. Some code changes like these: We clear idle flags in get_pfn_folio, and in walk_pte_range we still follow original design. static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, struct pglist_data *pgdat, bool can_swap, bool clear_idle) { struct folio *folio; /* try to avoid unnecessary memory loads */ if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) return NULL; folio = pfn_folio(pfn); + + if (clear_idle && folio_test_idle(folio)) + folio_clear_idle(folio); + if (folio_nid(folio) != pgdat->node_id) return NULL; if (folio_memcg_rcu(folio) != memcg) return NULL; /* file VMAs can contain anon pages from COW */ if (!folio_is_file_lru(folio) && !can_swap) return NULL; return folio; } static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, struct mm_walk *args) { ... for (i = pte_index(start), addr = start; addr != end; i++, addr += PAGE_SIZE) { unsigned long pfn; struct folio *folio; pte_t ptent = ptep_get(pte + i); total++; walk->mm_stats[MM_LEAF_TOTAL]++; pfn = get_pte_pfn(ptent, args->vma, addr); if (pfn == -1) continue; if (!pte_young(ptent)) { walk->mm_stats[MM_LEAF_OLD]++; continue; } + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap, true); - folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) continue; ... } Is it a good idea or not ? -- 2.43.0