Received: by 2002:a05:7208:9594:b0:7e:5202:c8b4 with SMTP id gs20csp2599353rbb; Wed, 28 Feb 2024 06:59:43 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXGjIDk+NenK18XisPpdSpwY96fHWsnfuv21o3d+s2IrBWg8c+v4WhlglOX1nh7hZP5CQ1jtax73yUuKTbKip0YGwN+aYzMkzdzLE/S5Q== X-Google-Smtp-Source: AGHT+IGV/BuDL19SfOP4aGX8YuqTuo53tTdniQAnP1qdONf+yz14mOEtKHi1CHlYK5UCb91YQzfj X-Received: by 2002:a05:620a:11b4:b0:787:3311:4327 with SMTP id c20-20020a05620a11b400b0078733114327mr5423251qkk.73.1709132382857; Wed, 28 Feb 2024 06:59:42 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709132382; cv=pass; d=google.com; s=arc-20160816; b=mTEYELu/GkRA5fWhnvYHQZhaBUpqP79SoHMhyGA72kwu8Vpi774teOjvYxLreAfw2y yRMcXWtbxgZWTI6fJB0A91aH+iU6NEomer+35XpzGdrhhPnTYr9aTOiMFEjdw/M5lnWM VejiRc859pYhyJ5TxKM47oCF+KkUBUO4bst8N0MAOwFFPR5x3ECfNUUPxzV++57OXeql inlCGPy9JW2OVV+zvsY4MNUTgofPu8pHuHdTJOayitrUFTwT03oUHRqMuxJuYfXvQa5w OHOSvOifQxZXNkdHT3QHgxs6k1e6ls79F24UHYRFUguIT8GEBY2l4c/P2xfAeE4ES6BS +noA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=1p50KJCxOOdEVGbck1g3vHzWcpSmv86Nw4B+djGDEdE=; fh=fcSDNSYrryrcYk5yYau+qkZ+Yl1/kMEhSVF9PtN2gmE=; b=CKVRKaixMINNIp3TBM3YusuPr9zoBLdTxtXL7uJYUOdMqFPCCPf6ZMj3YS0NebeFaK ybleRw0bMA1msXTpnLJOnCBXpti6WxhW7Vu+U3nSI5lUDxdSz9Q2kh/N8oail7TA7VGO u5m80v9qNga9nDdk3zC3OP113B+FTfw/QAaEFE/bU+92rm5RmMwkKqZ64Pbl6puhryt4 bG6prW2AxPP7Bb6LCtYAi0AS0hRRGh0PL5yst8YJOAn5Lgc+wdlGgDWjhsRop0eCVyrc TNxMWItKCeGNt4h+wkM6a02Pnc1TI02iF5eXlx4bzZJxWpxHaEtjKYqjkGstAK/hwRnD rBiQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-85229-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85229-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id j18-20020a05620a0a5200b00787b0fc1375si9587739qka.436.2024.02.28.06.59.42 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 06:59:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-85229-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-85229-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85229-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 9A78D1C256B8 for ; Wed, 28 Feb 2024 14:59:42 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6B8CF15B105; Wed, 28 Feb 2024 14:59:37 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 95A0E73531 for ; Wed, 28 Feb 2024 14:59:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709132376; cv=none; b=GSWxXkiLkQQIfVDXoJZMIICzIwY7JN0WXKakM7rF+84GlRRwSAQRsKgDYTH3v/KMbr28QgugPVN3zXOBXK2GxE51EIm0GF1xLa54SAKus6TKOD1DgGA/66zDhFM1KTBNOePKWTl2QFRdmYEAw4PJH7rgCy6auhe9zFtYimJouZI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709132376; c=relaxed/simple; bh=gYL/LGvFvwpN3CtLDtVeVHzxeWkGqY6Q4q1PntanCMc=; h=Message-ID:Date:MIME-Version:Subject:From:To:Cc:References: In-Reply-To:Content-Type; b=C84vi1QJldhVJ5ik4EsOxqYqk/4aTtX0oSitl421OQzY1e1K0JBSeQgRB7Zt3FqtpCCGBsnakuuUKyt20htvlQurGWofZxmauEsRG5hr3F7MBAn5he0y1uFR5gSUrUPcBWnc/Zafw7WTZgcgWcHlZrVagM5KeEJg3XNzptFZJl0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E40391FB; Wed, 28 Feb 2024 07:00:10 -0800 (PST) Received: from [10.1.38.163] (XHFQ2J9959.cambridge.arm.com [10.1.38.163]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4800C3F73F; Wed, 28 Feb 2024 06:59:30 -0800 (PST) Message-ID: <531c6702-1389-42c5-9cdd-062989d40133@arm.com> Date: Wed, 28 Feb 2024 14:59:28 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 1/4] mm: swap: Remove CLUSTER_FLAG_HUGE from swap_cluster_info:flags Content-Language: en-GB From: Ryan Roberts To: Matthew Wilcox Cc: David Hildenbrand , Andrew Morton , Huang Ying , Gao Xiang , Yu Zhao , Yang Shi , Michal Hocko , Kefeng Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20231025144546.577640-1-ryan.roberts@arm.com> <20231025144546.577640-2-ryan.roberts@arm.com> <6541e29b-f25a-48b8-a553-fd8febe85e5a@redhat.com> <2934125a-f2e2-417c-a9f9-3cb1e074a44f@redhat.com> <049818ca-e656-44e4-b336-934992c16028@arm.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 28/02/2024 14:24, Ryan Roberts wrote: > On 28/02/2024 13:33, Matthew Wilcox wrote: >> On Wed, Feb 28, 2024 at 09:37:06AM +0000, Ryan Roberts wrote: >>> Fundamentally, we would like to be able to figure out the size of the swap slot >>> from the swap entry. Today swap supports 2 sizes; PAGE_SIZE and PMD_SIZE. For >>> PMD_SIZE, it always uses a full cluster, so can easily add a flag to the cluster >>> to mark it as PMD_SIZE. >>> >>> Going forwards, we want to support all sizes (power-of-2). Most of the time, a >>> cluster will contain only one size of THPs, but this is not the case when a THP >>> in the swapcache gets split or when an order-0 slot gets stolen. We expect these >>> cases to be rare. >>> >>> 1) Keep the size of the smallest swap entry in the cluster header. Most of the >>> time it will be the full size of the swap entry, but sometimes it will cover >>> only a portion. In the latter case you may see a false negative for >>> swap_page_trans_huge_swapped() meaning we take the slow path, but that is rare. >>> There is one wrinkle: currently the HUGE flag is cleared in put_swap_folio(). We >>> wouldn't want to do the equivalent in the new scheme (i.e. set the whole cluster >>> to order-0). I think that is safe, but haven't completely convinced myself yet. >>> >>> 2) allocate 4 bits per (small) swap slot to hold the order. This will give >>> precise information and is conceptually simpler to understand, but will cost >>> more memory (half as much as the initial swap_map[] again). >>> >>> I still prefer to avoid this at all if we can (and would like to hear Huang's >>> thoughts). But if its a choice between 1 and 2, I prefer 1 - I'll do some >>> prototyping. >> >> I can't quite bring myself to look up the encoding of swap entries >> but as long as we're willing to restrict ourselves to naturally aligning >> the clusters, there's an encoding (which I believe I invented) that lets >> us encode arbitrary power-of-two sizes with a single bit. >> >> I describe it here: >> https://kernelnewbies.org/MatthewWilcox/NaturallyAlignedOrder >> >> Let me know if it's not clear. > > Ahh yes, I'm familiar with this encoding scheme from other settings. Although > I've previously thought of it as having a bit to indicate whether the scheme is > enabled or not, and if it is enabled then the encoded PFN is: > > PFNe = PFNd | (1 << (log2(n) - 1)) > > Where n is the power-of-2 page count. > > Same thing, I think. > > I think we would have to steal a bit from the offset to make this work, and it > looks like the size of that is bottlnecked on the arch's swp_entry PTE > representation. Looks like there is a MIPS config that only has 17 bits for > offset to begin with, so I doubt we would be able to spare a bit here? Although > it looks possible that there are some unused low bits that could be used... > I think the other problem with this is that it won't tell us which slot in the "swap slot block" each entry is targetting?