Received: by 2002:ab2:710b:0:b0:1ef:a325:1205 with SMTP id z11csp470363lql; Mon, 11 Mar 2024 08:03:40 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWEoVRAIIrY6I36QwkMYLawNXDwuQgEjXzVE/bUNioO0/fg1PfFE1FfktreKEdOS5c295oSRkPcrQtX9v3K/Wvq1XipeiDwcd8NeoOE/w== X-Google-Smtp-Source: AGHT+IHdNGBaun0mOUQw5vsGUZC0u+lJp27U2sur5SecHkTCLdBLejxdZaJNRluT90vmgLSThOhq X-Received: by 2002:a05:6102:a48:b0:473:35b2:7530 with SMTP id i8-20020a0561020a4800b0047335b27530mr3147645vss.23.1710169419908; Mon, 11 Mar 2024 08:03:39 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1710169419; cv=pass; d=google.com; s=arc-20160816; b=1A77ZHi+MpoWbR7ndCmHsKwzEMhcrxIdlahSYt2nxkLvoeU4Br9vD8Ecqf9BUAc5OE qmqCrchnlAhXsAyzOAh8R5VRWU2rLNExDS4jgfr6pnr+Q3MESi8CjigOtN/2/NbKwKHv +T7BAOCL61lcR6UV3zfZ21Xoou5wVbN2QaLAHzfdbA9LXJ53G5RGpNImJrfA7aN8HhSG eveGC65Z25+bOiozJtUSA1SU0CdAAqiBAyipuISA908QiBw7DzEZEcPA9ydnp+6C+sCJ gbaWNPrSDdU8vMF81dMfIDigNUvB+Vnxk2p6ekfwsoRyMW7AgPuwJB9PC2odLJBJDPeV N16w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=UI5PRAj7U+kLDNSybnSkxpbypRwhNSJXjiMivm+ZKt8=; fh=JqZy7iLc6dHI48tYxBOMxKaUckLuIxmFnYJ17pQsdfQ=; b=ltVbUvNcxlQkhdr9xllo//6QvubEgG9pKak6/ZGheGqi8PAlidW1AnhKo61X652AUk Zhq0F7p8xYZa8r9nNHmcxPS2jXoyqNx9O2hkag2JHc5WKoPBb6EUKQeBBDf0iOg/bulv njoxwGreu8lm0Rp4rQKnaDAVosaMGT4ZxRMA0jMCMJFGkLNAn0z0zs7bR9a31jADjgBz goC5sPPe7YoPb+IRWPK9sDI7XiDfXfJotzgHpG2pP7R/iv6BdQo4pdoPl+URzQ2dXNsV Bg5sVd1vBZkREVcPlyiL/3EV0CI7Ay6YR56YTUVbZx71NRLtTFG8yhKuxVTnlFK+ux1y u3gw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-99037-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-99037-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id o3-20020a67fc03000000b00471b5ea45f9si462428vsq.376.2024.03.11.08.03.39 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Mar 2024 08:03:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-99037-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-99037-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-99037-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 2526C1C22A8D for ; Mon, 11 Mar 2024 15:03:13 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E61E3481C7; Mon, 11 Mar 2024 15:01:26 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DAC4247F41 for ; Mon, 11 Mar 2024 15:01:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710169286; cv=none; b=ZLG0mQHsIVd91AIBNnTXhS0eno5ftUbqkg1od+jcJJ45qkEnheu5wnnnTu4QjiYFYZGRMeF7Ik4iU3TIrS2Dn6+gfzCjapt4ME8EaQaofwAJUzz4p1H8YO0IIk706yyI0pgonJ27zjQAzSh27T2DxNjGFaURXcu0K5WUrTUgsK4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710169286; c=relaxed/simple; bh=i8N9AeOLlOU3zHXlKW31HNI+AQCrILnV7S/8j95cLC8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=YGhn75yTDJbJO9m76JmWl4Vmaz132yH5h1xI/VYcUKDD1zd40kdxiDcBF3Mr7DveIcwvoxaDuqMCGJYE0+gx1XPZIJ9+mEwQWayOV02Q53k07acGZ5JLm53MjxOfsXACmpXuxqozXHMwC+VjGX/t8ikEh6gukuRc+KZbkR4ruyw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0FEE3153B; Mon, 11 Mar 2024 08:02:01 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 267053F64C; Mon, 11 Mar 2024 08:01:22 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , David Hildenbrand , Matthew Wilcox , Huang Ying , Gao Xiang , Yu Zhao , Yang Shi , Michal Hocko , Kefeng Wang , Barry Song <21cnbao@gmail.com>, Chris Li Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 6/6] mm: madvise: Avoid split during MADV_PAGEOUT and MADV_COLD Date: Mon, 11 Mar 2024 15:00:58 +0000 Message-Id: <20240311150058.1122862-7-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240311150058.1122862-1-ryan.roberts@arm.com> References: <20240311150058.1122862-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Rework madvise_cold_or_pageout_pte_range() to avoid splitting any large folio that is fully and contiguously mapped in the pageout/cold vm range. This change means that large folios will be maintained all the way to swap storage. This both improves performance during swap-out, by eliding the cost of splitting the folio, and sets us up nicely for maintaining the large folio when it is swapped back in (to be covered in a separate series). Folios that are not fully mapped in the target range are still split, but note that behavior is changed so that if the split fails for any reason (folio locked, shared, etc) we now leave it as is and move to the next pte in the range and continue work on the proceeding folios. Previously any failure of this sort would cause the entire operation to give up and no folios mapped at higher addresses were paged out or made cold. Given large folios are becoming more common, this old behavior would have likely lead to wasted opportunities. While we are at it, change the code that clears young from the ptes to use ptep_test_and_clear_young(), which is more efficent than get_and_clear/modify/set, especially for contpte mappings on arm64, where the old approach would require unfolding/refolding and the new approach can be done in place. Signed-off-by: Ryan Roberts --- mm/madvise.c | 89 ++++++++++++++++++++++++++++++---------------------- 1 file changed, 51 insertions(+), 38 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 547dcd1f7a39..56c7ba7bd558 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -336,6 +336,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, LIST_HEAD(folio_list); bool pageout_anon_only_filter; unsigned int batch_count = 0; + int nr; if (fatal_signal_pending(current)) return -EINTR; @@ -423,7 +424,8 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, return 0; flush_tlb_batched_pending(mm); arch_enter_lazy_mmu_mode(); - for (; addr < end; pte++, addr += PAGE_SIZE) { + for (; addr < end; pte += nr, addr += nr * PAGE_SIZE) { + nr = 1; ptent = ptep_get(pte); if (++batch_count == SWAP_CLUSTER_MAX) { @@ -447,55 +449,66 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, continue; /* - * Creating a THP page is expensive so split it only if we - * are sure it's worth. Split it if we are only owner. + * If we encounter a large folio, only split it if it is not + * fully mapped within the range we are operating on. Otherwise + * leave it as is so that it can be swapped out whole. If we + * fail to split a folio, leave it in place and advance to the + * next pte in the range. */ if (folio_test_large(folio)) { - int err; - - if (folio_estimated_sharers(folio) > 1) - break; - if (pageout_anon_only_filter && !folio_test_anon(folio)) - break; - if (!folio_trylock(folio)) - break; - folio_get(folio); - arch_leave_lazy_mmu_mode(); - pte_unmap_unlock(start_pte, ptl); - start_pte = NULL; - err = split_folio(folio); - folio_unlock(folio); - folio_put(folio); - if (err) - break; - start_pte = pte = - pte_offset_map_lock(mm, pmd, addr, &ptl); - if (!start_pte) - break; - arch_enter_lazy_mmu_mode(); - pte--; - addr -= PAGE_SIZE; - continue; + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | + FPB_IGNORE_SOFT_DIRTY; + int max_nr = (end - addr) / PAGE_SIZE; + + nr = folio_pte_batch(folio, addr, pte, ptent, max_nr, + fpb_flags, NULL); + + if (nr < folio_nr_pages(folio)) { + int err; + + if (folio_estimated_sharers(folio) > 1) + continue; + if (pageout_anon_only_filter && !folio_test_anon(folio)) + continue; + if (!folio_trylock(folio)) + continue; + folio_get(folio); + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(start_pte, ptl); + start_pte = NULL; + err = split_folio(folio); + folio_unlock(folio); + folio_put(folio); + if (err) + continue; + start_pte = pte = + pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!start_pte) + break; + arch_enter_lazy_mmu_mode(); + nr = 0; + continue; + } } /* * Do not interfere with other mappings of this folio and - * non-LRU folio. + * non-LRU folio. If we have a large folio at this point, we + * know it is fully mapped so if its mapcount is the same as its + * number of pages, it must be exclusive. */ - if (!folio_test_lru(folio) || folio_mapcount(folio) != 1) + if (!folio_test_lru(folio) || + folio_mapcount(folio) != folio_nr_pages(folio)) continue; if (pageout_anon_only_filter && !folio_test_anon(folio)) continue; - VM_BUG_ON_FOLIO(folio_test_large(folio), folio); - - if (!pageout && pte_young(ptent)) { - ptent = ptep_get_and_clear_full(mm, addr, pte, - tlb->fullmm); - ptent = pte_mkold(ptent); - set_pte_at(mm, addr, pte, ptent); - tlb_remove_tlb_entry(tlb, pte, addr); + if (!pageout) { + for (; nr != 0; nr--, pte++, addr += PAGE_SIZE) { + if (ptep_test_and_clear_young(vma, addr, pte)) + tlb_remove_tlb_entry(tlb, pte, addr); + } } /* -- 2.25.1