Received: by 2002:ab2:6309:0:b0:1fb:d597:ff75 with SMTP id s9csp109801lqt; Wed, 5 Jun 2024 19:49:56 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXoUGFLEZdZiMrD0h+wLEPWN8PbV3oDHdiTgQk0V029MMBFnHqXNoY3nZzs8fYGm5N3atd6/VvciCqboXbfHFY76EHbTqvAUmLcMKxl2Q== X-Google-Smtp-Source: AGHT+IEgmuL9CFA4OcrxvOPkTFzsVl6DFkqN6KGl06EC6lxH+7ydf85gsa/NzS+ZzZ60dsye9ec0 X-Received: by 2002:a0c:f209:0:b0:6af:c82a:310e with SMTP id 6a1803df08f44-6b04c0239b8mr32071456d6.12.1717642196296; Wed, 05 Jun 2024 19:49:56 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717642196; cv=pass; d=google.com; s=arc-20160816; b=Rop8Tu7Vrv0oMnxH0jL+GoL+HsFrycnGNCdidrH2fvxa1tG7/MDmN3BXoMEsLSRHh3 WXCqn1cZuyqFDcphEfD0G6++crdXfvSWfZIzRAoXOjVR9unrBY70y5ot1MkYIcuizi+4 Xkltetwq2j4RwTNzOec2/qg6Rp2J4ItzuJsY0HkE0XFp+ao6m0DvVh4E3scA8DVeKv4o ElOyu0yWGa5tv6TCx2cGHd+0bL6RlFefa6EXc/2xUrr55lqdLIN+7lm5d98+LLOCLm8l nP4F8Z443nptiv5qpcCUgIQbI1z+bf6hdg6d1ynV9XS8sn2Yo71bawHZh3pdXnX+f8/9 HFRw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=5TRIXdZH0MFsDrtAUMxWXmV78YJWERsVBogU5sB6ock=; fh=lvoajIyRsr0fR3hdf7CuN22vVCwXb2KOIxdz4x+DxwI=; b=IZju8OXlBC5NqWGWEg4+swTw2UoXTYVuCVAKdvL5nZJIB8EknUc0B+qi0adBanzF3j z5wQPmmf+Z3zfC3XAeC4qh4npXfESsZdVEZaxNu/LKVb9W3to9k8hBp0reNEcBT4BYCN 981FFOGslDX76Ge6S+1CHzgsEMdPUbQUznU4flUcMNvqc23GARqyXy60VQr8bD52itNk MucZL6hemyRqy77JvVNwSWreuGY4A/6DjyLH0lgpUL9Vjdo2MscqiK01NFvBV05poD+A QmBZii/5bB4zlJLGBdrHs1c1blYt/aZ+LP4vaKrQ7NKhjrSyJjx8jlELiNkbcBdrbY4y YO1A==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=b15hR+TW; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-203549-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-203549-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id 6a1803df08f44-6b04f632a01si5195926d6.74.2024.06.05.19.49.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Jun 2024 19:49:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-203549-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=b15hR+TW; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-203549-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-203549-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id EBFE21C21ADE for ; Thu, 6 Jun 2024 02:49:55 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 297BC1DFF5; Thu, 6 Jun 2024 02:49:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="b15hR+TW" Received: from out-180.mta1.migadu.com (out-180.mta1.migadu.com [95.215.58.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D3881DA21 for ; Thu, 6 Jun 2024 02:49:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.180 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717642188; cv=none; b=ATS3HYS4fZizJgJVmmgdRJeSceYCV0nfzJprtAGnsuU+PtmFVRsVE4XqQThzWJOT/vvpTc3cD2QMzQkFpVLyMxMu/Hw5CSkjb6Hqio8N8g/PX/IlvJMptK5WeZMv2n067Q62k4F+uZuEdQMGxvIEYEf7VVh2OFPpDwqwj607FRU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717642188; c=relaxed/simple; bh=CsMmXvFWKWeEOEYDFtkrA2Y7PcHkLJp/c2+0scuW1/E=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=AXyHX8Yqj+SEgStZ8pEmtOk9L3PKQstogZsgAOmlpourmNDEJBVG3Ok0spZE8jb8fW90otJYjhllzsAuY/y7LDfMeBEdahz/pbsTOQXa/D9f3/IS69MhbCsOKCX2csDqiZ0lSBSW5A+zauWBTEOQUPnoYub0mI5QA6wYBsXbwjQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=b15hR+TW; arc=none smtp.client-ip=95.215.58.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Envelope-To: yosryahmed@google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1717642183; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5TRIXdZH0MFsDrtAUMxWXmV78YJWERsVBogU5sB6ock=; b=b15hR+TWR7d88Ppv7+DQY4gI99mEhGw5y7HNfof+KNldn8mCJBaHtegIi55tjR8gLi7Fvh LzjNotdZuArhnxmgbjP0qzxxvOOs2urIoOC9W/bzm/+ieKLjcNz5xptVOgnI8jeg+7zGhh r3/nehoX5EfkUhFIM6+BkpxU1JkFNF4= X-Envelope-To: erhard_f@mailbox.org X-Envelope-To: yuzhao@google.com X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: linuxppc-dev@lists.ozlabs.org X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: nphamcs@gmail.com X-Envelope-To: senozhatsky@chromium.org X-Envelope-To: minchan@kernel.org X-Envelope-To: vbabka@kernel.org Message-ID: Date: Thu, 6 Jun 2024 10:49:14 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 (Kernel v6.5.9, 32bit ppc) Content-Language: en-US To: Yosry Ahmed , Erhard Furtner Cc: Yu Zhao , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Johannes Weiner , Nhat Pham , Sergey Senozhatsky , Minchan Kim , "Vlastimil Babka (SUSE)" References: <20240508202111.768b7a4d@yea> <20240515224524.1c8befbe@yea> <20240602200332.3e531ff1@yea> <20240604001304.5420284f@yea> <20240604134458.3ae4396a@yea> <20240604231019.18e2f373@yea> <20240606010431.2b33318c@yea> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On 2024/6/6 07:41, Yosry Ahmed wrote: > On Wed, Jun 5, 2024 at 4:04 PM Erhard Furtner wrote: >> >> On Tue, 4 Jun 2024 20:03:27 -0700 >> Yosry Ahmed wrote: >> >>> Could you check if the attached patch helps? It basically changes the >>> number of zpools from 32 to min(32, nr_cpus). >> >> Thanks! The patch does not fix the issue but it helps. >> >> Means I still get to see the 'kswapd0: page allocation failure' in the dmesg, a 'stress-ng-vm: page allocation failure' later on, another kswapd0 error later on, etc. _but_ the machine keeps running the workload, stays usable via VNC and I get no hard crash any longer. >> >> Without patch kswapd0 error and hard crash (need to power-cycle) <3min. With patch several kswapd0 errors but running for 2 hrs now. I double checked this to be sure. > > Thanks for trying this out. This is interesting, so even two zpools is > too much fragmentation for your use case. > > I think there are multiple ways to go forward here: > (a) Make the number of zpools a config option, leave the default as > 32, but allow special use cases to set it to 1 or similar. This is > probably not preferable because it is not clear to users how to set > it, but the idea is that no one will have to set it except special use > cases such as Erhard's (who will want to set it to 1 in this case). > > (b) Make the number of zpools scale linearly with the number of CPUs. > Maybe something like nr_cpus/4 or nr_cpus/8. The problem with this > approach is that with a large number of CPUs, too many zpools will > start having diminishing returns. Fragmentation will keep increasing, > while the scalability/concurrency gains will diminish. > > (c) Make the number of zpools scale logarithmically with the number of > CPUs. Maybe something like 4log2(nr_cpus). This will keep the number > of zpools from increasing too much and close to the status quo. The > problem is that at a small number of CPUs (e.g. 2), 4log2(nr_cpus) > will actually give a nr_zpools > nr_cpus. So we will need to come up > with a more fancy magic equation (e.g. 4log2(nr_cpus/4)). > > (d) Make the number of zpools scale linearly with memory. This makes > more sense than scaling with CPUs because increasing the number of > zpools increases fragmentation, so it makes sense to limit it by the > available memory. This is also more consistent with other magic > numbers we have (e.g. SWAP_ADDRESS_SPACE_SHIFT). > > The problem is that unlike zswap trees, the zswap pool is not > connected to the swapfile size, so we don't have an indication for how > much memory will be in the zswap pool. We can scale the number of > zpools with the entire memory on the machine during boot, but this > seems like it would be difficult to figure out, and will not take into > consideration memory hotplugging and the zswap global limit changing. > > (e) A creative mix of the above. > > (f) Something else (probably simpler). > > I am personally leaning toward (c), but I want to hear the opinions of > other people here. Yu, Vlastimil, Johannes, Nhat? Anyone else? > > In the long-term, I think we may want to address the lock contention > in zsmalloc itself instead of zswap spawning multiple zpools. > Agree, I think we should try to improve locking scalability of zsmalloc. I have some thoughts to share, no code or test data yet: 1. First, we can change the pool global lock to per-class lock, which is more fine-grained. 2. Actually, we only need to take per-zspage lock when malloc/free, only need to take class lock when its fullness changed. 3. If this is not enough, we can have fewer fullness groups, so the need to take class lock becomes fewer. (will need some test data) More comments are welcome. Thanks!