Received: by 2002:a05:7208:9594:b0:7e:5202:c8b4 with SMTP id gs20csp2581177rbb; Wed, 28 Feb 2024 06:24:53 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWN/OrslveIqyiDfiLRUsSzFW5zEjyNq7YeC+CdTtxmuC5uR4a+bnzz3F+fjKrfIVVJrIWlEvqtgvMC9L7JILV2PfUjDoJvpJBz38NK+Q== X-Google-Smtp-Source: AGHT+IHLax3PelBiLKMc2pGDiIjkgjZMUegHC/lzqjyxmu8iX+3R+G1Jgdj1JrmGCvu3P6+xDC6P X-Received: by 2002:aa7:82d4:0:b0:6e4:9ba7:8e73 with SMTP id f20-20020aa782d4000000b006e49ba78e73mr11750835pfn.17.1709130293107; Wed, 28 Feb 2024 06:24:53 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709130292; cv=pass; d=google.com; s=arc-20160816; b=zlcXVHeRo5HkCq3pf3MTEnVgcFKh9bBtklNgTypo4kI56JMhk6NJan0YW/gPGFLzz6 QhnIguReYGzbxoCU6IShvkM7Y19bzLXk35Kvg6OCxejpZbtxb8HpgiC75W+BzVCIDQG4 R8vjttWgkVjQIi4g7TfooVOWyfAEZuwG3WWhdMq2PHamRDmCgCPFwNexE5wmrlWHLwdL xEFG6hMcglRZb8JcBt0T7bEVOgxA8uvP8BRPwIgO4qcwywF6e14+OehnVH2VuiZoBwK2 nKqo1PzoSRAYMJedQh73iGfQFi8/4FlJkrcWP/aFGxF1uM0SdaxSKNlCHj/sCUCLOH4u fcPA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=n0uEJGkTVlg5ueAeyIytgi7xtVEozuVpfv+L3YarOLo=; fh=fcSDNSYrryrcYk5yYau+qkZ+Yl1/kMEhSVF9PtN2gmE=; b=FO8R5lKi8UFMlmFZmZWZYfmZKHqDMizyOcxacDtoBgxLH4VwUeEm1PZNXC6tRLzDUv aG9T1ZeWGm00OvObGI3iUKH3zGmPee6Rwhe1kkuaTYbCQwOcyhSA0K789zlr7K5l1cZD BGPpIJMpnN+iUbFy61hvwtCtnjL6ndmbDiqXLh7zgufEQf+mVp90FyqQw25SR2etEHcH nHdqvJNhpMlOcG1XTIh84Y5C2bAyjFQ9RFuVRRQTm+ro+UdEWHDdEv3dSHKBtrTzAQhH Pk70hFx83wHXGhNVbBq4Wyb8YVeCEuOAu6JibAutWZ5zzKmAoFXpxFnvD8JRXnEZMz2Z 4H9Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-85178-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85178-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id z3-20020aa78883000000b006e485ed9477si7580806pfe.317.2024.02.28.06.24.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 06:24:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-85178-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-85178-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85178-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 7F78E28A34D for ; Wed, 28 Feb 2024 14:24:52 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id CC74615957C; Wed, 28 Feb 2024 14:24:45 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id CFB24157E9F for ; Wed, 28 Feb 2024 14:24:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709130285; cv=none; b=Z945MohXwWpQEwjHUPVZAVZE3tyseBHO12Ngx5dadOn1g8HGjfXDrsn/2ctOCwGzCRdc5DP8q5MmuB1XT0EFtriIe1DTw5mGUIQKyyJBKz19exPRTmP1jbD6uwtcd4xY4niQesi5QbGxIb1bHFCoIis5pHmj8Fv6w2xUv1PrP/o= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709130285; c=relaxed/simple; bh=huop7d02EiFKkMw8Bk8K+OGUNEN62EsppTBVNBgjzZY=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=A3LccB4CGxu+lEUYmHJXTUmjldT53sC4Wn56RZm56dIED2EmSCtb/7UaJ4FEys9i7HNucx590aLdvzvEguAW+1eNC93ueiYgYHp3mzRA3l5uqqA4W30NPxJbKhqvmZlj5Z9GPXWe+L/AfSGTAPRSthDI96bk1RCwst7G70ozM40= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E5018C15; Wed, 28 Feb 2024 06:25:19 -0800 (PST) Received: from [10.1.38.163] (XHFQ2J9959.cambridge.arm.com [10.1.38.163]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6A53C3F73F; Wed, 28 Feb 2024 06:24:39 -0800 (PST) Message-ID: Date: Wed, 28 Feb 2024 14:24:37 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 1/4] mm: swap: Remove CLUSTER_FLAG_HUGE from swap_cluster_info:flags Content-Language: en-GB To: Matthew Wilcox Cc: David Hildenbrand , Andrew Morton , Huang Ying , Gao Xiang , Yu Zhao , Yang Shi , Michal Hocko , Kefeng Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20231025144546.577640-1-ryan.roberts@arm.com> <20231025144546.577640-2-ryan.roberts@arm.com> <6541e29b-f25a-48b8-a553-fd8febe85e5a@redhat.com> <2934125a-f2e2-417c-a9f9-3cb1e074a44f@redhat.com> <049818ca-e656-44e4-b336-934992c16028@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 28/02/2024 13:33, Matthew Wilcox wrote: > On Wed, Feb 28, 2024 at 09:37:06AM +0000, Ryan Roberts wrote: >> Fundamentally, we would like to be able to figure out the size of the swap slot >> from the swap entry. Today swap supports 2 sizes; PAGE_SIZE and PMD_SIZE. For >> PMD_SIZE, it always uses a full cluster, so can easily add a flag to the cluster >> to mark it as PMD_SIZE. >> >> Going forwards, we want to support all sizes (power-of-2). Most of the time, a >> cluster will contain only one size of THPs, but this is not the case when a THP >> in the swapcache gets split or when an order-0 slot gets stolen. We expect these >> cases to be rare. >> >> 1) Keep the size of the smallest swap entry in the cluster header. Most of the >> time it will be the full size of the swap entry, but sometimes it will cover >> only a portion. In the latter case you may see a false negative for >> swap_page_trans_huge_swapped() meaning we take the slow path, but that is rare. >> There is one wrinkle: currently the HUGE flag is cleared in put_swap_folio(). We >> wouldn't want to do the equivalent in the new scheme (i.e. set the whole cluster >> to order-0). I think that is safe, but haven't completely convinced myself yet. >> >> 2) allocate 4 bits per (small) swap slot to hold the order. This will give >> precise information and is conceptually simpler to understand, but will cost >> more memory (half as much as the initial swap_map[] again). >> >> I still prefer to avoid this at all if we can (and would like to hear Huang's >> thoughts). But if its a choice between 1 and 2, I prefer 1 - I'll do some >> prototyping. > > I can't quite bring myself to look up the encoding of swap entries > but as long as we're willing to restrict ourselves to naturally aligning > the clusters, there's an encoding (which I believe I invented) that lets > us encode arbitrary power-of-two sizes with a single bit. > > I describe it here: > https://kernelnewbies.org/MatthewWilcox/NaturallyAlignedOrder > > Let me know if it's not clear. Ahh yes, I'm familiar with this encoding scheme from other settings. Although I've previously thought of it as having a bit to indicate whether the scheme is enabled or not, and if it is enabled then the encoded PFN is: PFNe = PFNd | (1 << (log2(n) - 1)) Where n is the power-of-2 page count. Same thing, I think. I think we would have to steal a bit from the offset to make this work, and it looks like the size of that is bottlnecked on the arch's swp_entry PTE representation. Looks like there is a MIPS config that only has 17 bits for offset to begin with, so I doubt we would be able to spare a bit here? Although it looks possible that there are some unused low bits that could be used...