Received: by 2002:ab2:620c:0:b0:1ef:ffd0:ce49 with SMTP id o12csp1260506lqt; Tue, 19 Mar 2024 19:21:02 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCV0PsI0iG1rGQ4FM3kZy14yv7C7bIERCLLXkb0cxxB1c+TSYYyMjFIeuQbt64/zE6bktZTfvYsk4+3RE1xsTzLbeFWhhto87/dw1m/e3Q== X-Google-Smtp-Source: AGHT+IE5OMhi939jeLRWX/kr3c4t/5PxLbfFYH4cobMA3X3e5rRrdJY7arKtKIWpksrmIoOQAbt5 X-Received: by 2002:a05:6a00:1482:b0:6e7:2379:dd18 with SMTP id v2-20020a056a00148200b006e72379dd18mr5159227pfu.0.1710901262749; Tue, 19 Mar 2024 19:21:02 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1710901262; cv=pass; d=google.com; s=arc-20160816; b=bJ0mJQ3+b5xT0zrdPdJkNAdGPRzxG35DCq0fDyizagPGUi/KrqgOAawcYUp6r2S0Bs vc3ohOkaDj7E6AWoMuXQFJalmQGd/uJ1TCrcWt++sn7VRECCuf4VV7nxEw3ePTCT08R9 pb6ar399Wnzml/Bz6820cIUD2i/GdDTWLG0iaMSBFFwAvdTkagpgrC72lmE6WOg73Qqs cJgQg8pa+64LjwZNr3BqJDCzYJvY829vCny0ohkXkS95Z+xsKhiuJMeT9naQth9E0MO+ ci3RKfc9dN54ao86sQHjTmUkK39e9GdEfPrmXfDX39T/nErxj3iO4Zl3ZvX4QF4gVvJk Eq0Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :user-agent:message-id:date:references:in-reply-to:subject:cc:to :from:dkim-signature; bh=Au9jckZAkKiOeWhPojl/o88qytO0EoQ1TxzwByGnsBA=; fh=9gMh959B23j6cTiusz5NN+Zm1qTqiVQ9nicqAP+UDgM=; b=Pr3lp6mdZdgEw8t/6k7BU06MEi5NvhYRm2a8dttYE6XpcL9c/f2B8MyG6l2bBcPRrE mKHq97OcVf8oet0Ytw8cIRjpqx8d9wSsgAIO3AoUZ+oKVVCcHVZtRTdolFOfkvNi9yoc Axo4XS7zogMjrBOE/XeBOiM8kOfzzfFeW/h1fIpkf4mBthtgo65NfSE/0rgBy6vUCD3h uO4oSQitG4xjbqAYm1qez/gQOQS7gLhhxUsZ1Jhfu+XriaQljbuzw4Am0ZZwli4YP3ke oJqu1w2UdWjioOy+4Wta/KnOJk8XP7LtloAGERugrbJA9BJF8c0iu+eZYGTF7j+HPwQJ POzQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=EbyN2Z8n; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-108353-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-108353-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id x52-20020a056a000bf400b006e558757d5csi12083062pfu.16.2024.03.19.19.21.02 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Mar 2024 19:21:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-108353-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=EbyN2Z8n; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-108353-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-108353-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 39E3EB231DB for ; Wed, 20 Mar 2024 02:20:24 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 60FD7F9D3; Wed, 20 Mar 2024 02:20:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="EbyN2Z8n" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 96DF1E54C for ; Wed, 20 Mar 2024 02:20:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710901213; cv=none; b=J8pO+N55s1nXsulNHkY4NqkE4SJdyEeRt+4e59rLmCZp9YTcxaQ/sWeC/ueJSLbO7ZudIKNSi91VFYGiN9H44ZsHdyqu9rDFu0zTyL18vJO62BaaPgQpC6glDoYeq6sDQuHGQk1rsKKef/ClCiId6aEVEQ+qIT24eGog5ReVWYU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710901213; c=relaxed/simple; bh=T0GWfvNZD4CoU9h2YcX9DL4cW7mete9K+BslYn1DsKE=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=EiemHdDcDBkWDqTJXRFdfDi6qeGn3/GdeUCsbqht1zLcsMnKKJDMXe4s9MR6f4EvilnvEAiRPy+SKwCIWmv+vEadNAD/keZiLed9787zJP8J4gRbek/6I3IilrstoGHOWk7f6O1BCV2lvly3IIXuhzhMLjoI/f8NmdArJMNmwxE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=EbyN2Z8n; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1710901211; x=1742437211; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=T0GWfvNZD4CoU9h2YcX9DL4cW7mete9K+BslYn1DsKE=; b=EbyN2Z8n9RfbyTYRwVk1UvAeBAWwe68fpwAGr6Sxx/s/Sv9XQ+wtKgkh C5f56VRFFSdxK0P9YQou9dqi4WV0HpobSCXv2zTLhV1XNnrQzUjWquWhA zxWVOUKVVE2TFYBpAeEKePes4vMahVvgQJs7RZy/VFlnw4KzNYN0cWjne kyB6iw/dwiNvR6ljakB4ihG+2Aw+GqHGClNkJu461fGZ66tpKWLN46Qn7 qIGHRXWHMRjkym9n2vZ/vGVLQTAv3gQrgMusjtGVV7+XVi9ntn98eivIJ WcbBLou9lg3vXMK7/RD3NhSfx8F2evYRUqWcvISL49B9+yjW2ddh8FcSi A==; X-IronPort-AV: E=McAfee;i="6600,9927,11018"; a="5931284" X-IronPort-AV: E=Sophos;i="6.07,138,1708416000"; d="scan'208";a="5931284" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Mar 2024 19:20:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,138,1708416000"; d="scan'208";a="13909615" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Mar 2024 19:20:04 -0700 From: "Huang, Ying" To: Ryan Roberts Cc: Barry Song <21cnbao@gmail.com>, Matthew Wilcox , , , , , , , , , , , , , , , , , , , "Chuanhua Han" , Barry Song Subject: Re: [RFC PATCH v3 5/5] mm: support large folios swapin as a whole In-Reply-To: (Ryan Roberts's message of "Tue, 19 Mar 2024 12:19:27 +0000") References: <20240304081348.197341-1-21cnbao@gmail.com> <20240304081348.197341-6-21cnbao@gmail.com> <87wmq3yji6.fsf@yhuang6-desk2.ccr.corp.intel.com> <87sf0rx3d6.fsf@yhuang6-desk2.ccr.corp.intel.com> <87jzm0wblq.fsf@yhuang6-desk2.ccr.corp.intel.com> <9ec62266-26f1-46b6-8bb7-9917d04ed04e@arm.com> <87jzlyvar3.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Wed, 20 Mar 2024 10:18:10 +0800 Message-ID: <87zfutsl25.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Ryan Roberts writes: > On 19/03/2024 09:20, Huang, Ying wrote: >> Ryan Roberts writes: >> >>>>>> I agree phones are not the only platform. But Rome wasn't built in a >>>>>> day. I can only get >>>>>> started on a hardware which I can easily reach and have enough hardware/test >>>>>> resources on it. So we may take the first step which can be applied on >>>>>> a real product >>>>>> and improve its performance, and step by step, we broaden it and make it >>>>>> widely useful to various areas in which I can't reach :-) >>>>> >>>>> We must guarantee the normal swap path runs correctly and has no >>>>> performance regression when developing SWP_SYNCHRONOUS_IO optimization. >>>>> So we have to put some effort on the normal path test anyway. >>>>> >>>>>> so probably we can have a sysfs "enable" entry with default "n" or >>>>>> have a maximum >>>>>> swap-in order as Ryan's suggestion [1] at the beginning, >>>>>> >>>>>> " >>>>>> So in the common case, swap-in will pull in the same size of folio as was >>>>>> swapped-out. Is that definitely the right policy for all folio sizes? Certainly >>>>>> it makes sense for "small" large folios (e.g. up to 64K IMHO). But I'm not sure >>>>>> it makes sense for 2M THP; As the size increases the chances of actually needing >>>>>> all of the folio reduces so chances are we are wasting IO. There are similar >>>>>> arguments for CoW, where we currently copy 1 page per fault - it probably makes >>>>>> sense to copy the whole folio up to a certain size. >>>>>> " >>> >>> I thought about this a bit more. No clear conclusions, but hoped this might help >>> the discussion around policy: >>> >>> The decision about the size of the THP is made at first fault, with some help >>> from user space and in future we might make decisions to split based on >>> munmap/mremap/etc hints. In an ideal world, the fact that we have had to swap >>> the THP out at some point in its lifetime should not impact on its size. It's >>> just being moved around in the system and the reason for our original decision >>> should still hold. >>> >>> So from that PoV, it would be good to swap-in to the same size that was >>> swapped-out. >> >> Sorry, I don't agree with this. It's better to swap-in and swap-out in >> smallest size if the page is only accessed seldom to avoid to waste >> memory. > > If we want to optimize only for memory consumption, I'm sure there are many > things we would do differently. We need to find a balance between memory and > performance. The benefits of folios are well documented and the kernel is > heading in the direction of managing memory in variable-sized blocks. So I don't > think it's as simple as saying we should always swap-in the smallest possible > amount of memory. It's conditional, that is, "if the page is only accessed seldom" Then, the page swapped-in will be swapped-out soon and adjacent pages in the same large folio will not be accessed during this period. So, I suggest to create an algorithm to decide swap-in order based on swap-readahead information automatically. It can detect the situation above via reduced swap readahead window size. And, if the page is accessed for quite long time, and the adjacent pages in the same large folio are accessed too, swap-readahead window will increase and large swap-in order will be used. > You also said we should swap *out* in smallest size possible. Have I > misunderstood you? I thought the case for swapping-out a whole folio without > splitting was well established and non-controversial? That is conditional too. >> >>> But we only kind-of keep that information around, via the swap >>> entry contiguity and alignment. With that scheme it is possible that multiple >>> virtually adjacent but not physically contiguous folios get swapped-out to >>> adjacent swap slot ranges and then they would be swapped-in to a single, larger >>> folio. This is not ideal, and I think it would be valuable to try to maintain >>> the original folio size information with the swap slot. One way to do this would >>> be to store the original order for which the cluster was allocated in the >>> cluster. Then we at least know that a given swap slot is either for a folio of >>> that order or an order-0 folio (due to cluster exhaustion/scanning). Can we >>> steal a bit from swap_map to determine which case it is? Or are there better >>> approaches? >> >> [snip] -- Best Regards, Huang, Ying