Received: by 2002:ab2:1689:0:b0:1f7:5705:b850 with SMTP id d9csp1158947lqa; Sun, 28 Apr 2024 22:51:00 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWm36fp7YOIIa0YpJklY42u0CGNJ9KZBViptFttcbhxWVHMJm+kX4FB7b2CD3thg5hQRdGz1KePYN2B5HUQpg6d98zPeKHUj6jCCUIGGQ== X-Google-Smtp-Source: AGHT+IF2GseLTcHGkHTZJ0E9go88BLEe6VYAblD9DDuGx/6lrL0cOSueO6MYi7l+BIdN//5YSfvD X-Received: by 2002:a17:90b:23c7:b0:2af:3ff7:4a81 with SMTP id md7-20020a17090b23c700b002af3ff74a81mr5616819pjb.31.1714369860726; Sun, 28 Apr 2024 22:51:00 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1714369860; cv=pass; d=google.com; s=arc-20160816; b=oMa5GoYh6nOZ/TXSuxQsjlK72l5VHJBZW3biBbECxiWmxHfbuWqbBnKtIbQjVanUZZ 3cBQIWQUzAmZqD/EWkAr7zk+D1uPmZVv6mWxSaJ04+st2gO5kcBeyo6Mpg+0ix7gkH0V oQ3NFjgmnQ2UsPr1mqGJZ1XoE5QBEZ732rDc7cJ41R5FsyDRXqe0sowxm/v54ldNJc6z 2GO3VrUepN40uxPmcWgsqtIzXzqLtHzaTR6LO23rDs1Q7rE8ecujeXnVhw74uz/ucnBv Tdtqxlu8NNCsp1N0M/tWvLG7QlHu4UmRjlWuh0iCCJCo8OGCo/9vjHxARRsxPq3W7A0H xsNw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=IQreSB5jUX/x4EPAys8G3uizLK1e1tgSCI6JUBDKGZ4=; fh=RCKDcUgwn1D7nupNxsCKhi9VDtp7eM1xjiKoxk+IJ8g=; b=xcN6VrYqG1peJ5pxfV8l8h5WfgDYyX/iinMjfWvk4MmgObvWdU9R3HpPZ27APM/fnX HEsGIuMTeaJU6vq9Zd+AfQlzp4mCa4PMOU4SZ8oBX1ZWKbzLeekGnMuKj3uFQfceXi+T INt1MXQtbOqVaHOp4Qw5ggPtYWqQaNR9yY72rV69Dm3INvdaUJBoN0VRTVEXmePpuUvZ Tq2vdlT4VqhUmrY1GsJs48uL/VScg/EQsmUG9BkGhoKAI3u3BfPx+ojrN+MaDrBALG+l nfX9Ed3peDUDI/gCGroKciM54Po2ieIq9rkANUcNJaqnhBFRUopHKzuEzgzIxHe15O8w Rbhg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=fDVQFtHI; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-161757-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-161757-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id lk8-20020a17090b33c800b002ad4890f36asi15372732pjb.173.2024.04.28.22.51.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Apr 2024 22:51:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-161757-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=fDVQFtHI; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-161757-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-161757-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 56AD8281D27 for ; Mon, 29 Apr 2024 05:51:00 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A783510A13; Mon, 29 Apr 2024 05:50:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fDVQFtHI" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98DF5F9DF; Mon, 29 Apr 2024 05:50:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714369847; cv=none; b=pE6f8OXpyHC7S10gzSToCvXSeSiSm6CyrzQ4XkWO68HC+D8irwRSc1kw6NtrF/VakR6NzBV5XCuN7mOiLRkilPoeFP7OFABIpEtSXVesUqFz9hDSMdJAgPmisjNxoSxrtHafv/PDA8+GbbCo4omRQjJR1MIVik5EoyXR0xNEOmc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714369847; c=relaxed/simple; bh=DOYjcOGT2lvFGzzOnztFKgQBfcSNc3I4JqaOJDOA3jQ=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=E8vYmeg8GBgsovdqbPNg1NTQ4ApuBt4dtpeAVWsaymxqUee/6z1lej6g91WzcqmL0Heo2mneNSKVL9UFpN59CPhuRnv0R+wYtsBnh+j+W7117cDf3xjs6JmIcFIQAHP3gRJxM2rITukHXW7iNLjxT1mtO6i5M+uL2ttTHzHiauE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fDVQFtHI; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 20C9AC4AF1D; Mon, 29 Apr 2024 05:50:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1714369847; bh=DOYjcOGT2lvFGzzOnztFKgQBfcSNc3I4JqaOJDOA3jQ=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=fDVQFtHIL71AwWubiWRpC5V7cVJP+UE8URy1geXOBDV0MEeHBMi8JS/1WT/qM7av7 pFgt79/xS7B9NQlz1bu7FBDA6lDitzKg2x+GAYAv7oHjDrrlUSluBZVoD53ddRmYNF lJvd2unpFmJ7qgGESv1GKcOWsxQcNiToIZ8zotCpcCDC8GPXeO3ZGG+j4eTcj0+kNF FOD+qvx55NejF/Fw1y/iWVDThxXqJNDaKO2XKGDjLU92tf5pOix+fCvCPCPp6tZQS3 etYPP/u8YObV6Lyrg7vQuf7tAqQaUZrfmSze8VondIYVqtiDNARcUGZyMqOf3cQKmK jC8Bt754Q3jfw== Received: by mail-lf1-f45.google.com with SMTP id 2adb3069b0e04-51ac9c6599bso4335459e87.1; Sun, 28 Apr 2024 22:50:47 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCU8TtYlMBKdW8+lso4yzFEaL4CxY5ezJ0DP9gkS+odUUulLUyZstlUxeHHun7JS+5oDgTv3pPI3M6z0aAXarUFKXK7Jm2/vow/dppUGI9x/M0APpbTpwxNn2TT1ZMNrjWCa9TDsUmjp+SVdjw== X-Gm-Message-State: AOJu0YxfrDlQJj7fb5Vn7B51onH/8BIVVRhMuIIH1XCIdbiYnSU2tHv/ EVdkk6/yeXUPCVQ1a2tXBrjwvxYceobw5nyamVVheR2nDcjFyl+1KD1Xp+zi3CeR8+EEylUaagN jArNvjXZO6vZygbILEJwAT7Bjdw== X-Received: by 2002:a05:6512:e86:b0:51d:ed1:b44c with SMTP id bi6-20020a0565120e8600b0051d0ed1b44cmr4438441lfb.19.1714369845752; Sun, 28 Apr 2024 22:50:45 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240417160842.76665-1-ryncsn@gmail.com> <87zftlx25p.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o79zsdku.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: From: Chris Li Date: Sun, 28 Apr 2024 22:50:33 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 0/8] mm/swap: optimize swap cache search space To: Kairui Song Cc: "Huang, Ying" , Matthew Wilcox , linux-mm@kvack.org, Andrew Morton , Barry Song , Ryan Roberts , Neil Brown , Minchan Kim , Hugh Dickins , David Hildenbrand , Yosry Ahmed , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Apr 28, 2024 at 10:37=E2=80=AFAM Kairui Song wro= te: > > On Sat, Apr 27, 2024 at 7:16=E2=80=AFAM Chris Li wrot= e: > > > > Hi Ying, > > > > On Tue, Apr 23, 2024 at 7:26=E2=80=AFPM Huang, Ying wrote: > > > > > > Hi, Matthew, > > > > > > Matthew Wilcox writes: > > > > > > > On Mon, Apr 22, 2024 at 03:54:58PM +0800, Huang, Ying wrote: > > > >> Is it possible to add "start_offset" support in xarray, so "index" > > > >> will subtract "start_offset" before looking up / inserting? > > > > > > > > We kind of have that with XA_FLAGS_ZERO_BUSY which is used for > > > > XA_FLAGS_ALLOC1. But that's just one bit for the entry at 0. We c= ould > > > > generalise it, but then we'd have to store that somewhere and there= 's > > > > no obvious good place to store it that wouldn't enlarge struct xarr= ay, > > > > which I'd be reluctant to do. > > > > > > > >> Is it possible to use multiple range locks to protect one xarray t= o > > > >> improve the lock scalability? This is why we have multiple "struc= t > > > >> address_space" for one swap device. And, we may have same lock > > > >> contention issue for large files too. > > > > > > > > It's something I've considered. The issue is search marks. If we = delete > > > > an entry, we may have to walk all the way up the xarray clearing bi= ts as > > > > we go and I'd rather not grab a lock at each level. There's a conv= enient > > > > 4 byte hole between nr_values and parent where we could put it. > > > > > > > > Oh, another issue is that we use i_pages.xa_lock to synchronise > > > > address_space.nrpages, so I'm not sure that a per-node lock will he= lp. > > > > > > Thanks for looking at this. > > > > > > > But I'm conscious that there are workloads which show contention on > > > > xa_lock as their limiting factor, so I'm open to ideas to improve a= ll > > > > these things. > > > > > > I have no idea so far because my very limited knowledge about xarray. > > > > For the swap file usage, I have been considering an idea to remove the > > index part of the xarray from swap cache. Swap cache is different from > > file cache in a few aspects. > > For one if we want to have a folio equivalent of "large swap entry". > > Then the natural alignment of those swap offset on does not make > > sense. Ideally we should be able to write the folio to un-aligned swap > > file locations. > > > > Hi Chris, > > This sound interesting, I have a few questions though... > > Are you suggesting we handle swap on file and swap on device > differently? Swap on file is much less frequently used than swap on > device I think. That is not what I have in mind. The swap struct idea did not distinguish the swap file vs swap device.BTW, I sometimes use swap on file because I did not allocate a swap partition in advance. > > And what do you mean "index part of the xarray"? If we need a cache, > xarray still seems one of the best choices to hold the content. We still need to look up swap file offset -> folio. However if we allocate each swap offset a "struct swap", then the folio lookup can be as simple as get the swap_struc by offset, then atomic read of swap_structt->folio. Not sure how you come to the conclusion for "best choices"? It is one choice, but it has its drawbacks. The natural alignment requirement of xarray, e.g. 2M large swap entries need to be written to 2M aligned offset, that is an unnecessary restriction. If we allocate the "struct swap" ourselves, we have more flexibility. > > The other aspect for swap files is that, we already have different > > data structures organized around swap offset, swap_map and > > swap_cgroup. If we group the swap related data structure together. We > > can add a pointer to a union of folio or a shadow swap entry. We can > > use atomic updates on the swap struct member or breakdown the access > > lock by ranges just like swap cluster does. > > > > I want to discuss those ideas in the upcoming LSF/MM meet up as well. > > Looking forward to it! Thanks, I will post more when I get more progress on that. > > Oh, and BTW I'm also trying to breakdown the swap address space range > (from 64M to 16M, SWAP_ADDRESS_SPACE_SHIFT from 14 to > 12). It's a simple approach, but the coupling and increased memory > usage of address_space structure makes the performance go into > regression (about -2% for worst real world workload). I found this Yes, that sounds plausible. > part very performance sensitive, so basically I'm not making much > progress for the future items I mentioned in this cover letter. New > ideas could be very helpful! > The swap_struct idea is very different from what you are trying to do in this series. It is more related to my LSF/MM topic on the swap back end overhaul. More long term and bigger undertakings. Chris