Received: by 2002:a05:6500:1b41:b0:1fb:d597:ff75 with SMTP id cz1csp279541lqb; Tue, 4 Jun 2024 11:02:33 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVeDwki8838VbKHEbJPhI40m89HhasYsg6X/hIoebgQZ6YnTIsvMnqtVlPpHjR9l+ffkekR2qc9SVIFvLximRXUKPQcfNSVw68/pRGvtg== X-Google-Smtp-Source: AGHT+IEzaSAgyu3tGNosk/n9eduLzymmwFELQu+5SsetlBfmgIXGesYqQ7qgKRdRp6AbJzHIS/fi X-Received: by 2002:a05:620a:1099:b0:795:1523:d4ca with SMTP id af79cd13be357-79523d35cdemr15333185a.3.1717524153288; Tue, 04 Jun 2024 11:02:33 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717524153; cv=pass; d=google.com; s=arc-20160816; b=vzrsqnIfYZsN5eAIkO2SWunNDDweVMvNSo80XFCxGG6cF4TExqSXeV0CvOr3DlbIwD rRp/SZhfx1AK5B5BG6PZi6NVM5aoRpqi6OYb6DFjruZyE6B+yTxKCm1BPEoDnp78S0Vp qfzSjGCPqujJ+evXaVPWTwKdaj6thKCZmqLw4P7wRImOjPBOhDLE4g3BtDAs13sHHL+/ ndG1oQdPAfS7Q/B5WDbbFLX1c6C5wOCHvNUdTnQqMIeitvfmD9M8H0XdYmjh698fCxnA al/CyuGppoWMixQwCIXxw/4b7+SmCFDS82Qq2b95l6/jwv+ZNJc3AKfCAJ5bQzN2a1q7 +Glw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=tHWLGQWisUQj7HhRAw2go4sy8LHyULISrzif4HyJkSE=; fh=sXexm3GL9famRzM20VFnLX0xTZgFq4uGzU6aeZQJqKM=; b=xhQA+EjAnFzO0MaNXF6+JRPWWW+C5sfrMK7mYR3fb8srEqTh6hHlOml63m2RnEQIbL VhHt22enHRZ1Cqpdr3b5F9E3WlQgUS5m9MpAXzmOTopDMktRsJEZKyCpElbJwr1wPa8f XT2qfF9akfnXgEPZBqQJIeTXBt8SW6dQodLRy4aQ9SXwTlrbupbeusou7wricXqxrbB6 IgrMhv2KNa4u/UycAT+zZ/EF7y54qO4k0LkOi26X/RcMbyrmINniS5DMv2NXQBsdtmaE Nw0f0Lq0mVkVyOtYAv5Ov0a4bSO9IDYGydF+27JoRu9TSrbOuuEMwiyy7aYjNHrCbLF+ oWpw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=XXGi1TRT; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-201221-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-201221-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id d75a77b69052e-43ff258293csi106941541cf.501.2024.06.04.11.02.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jun 2024 11:02:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-201221-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=XXGi1TRT; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-201221-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-201221-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id E9A211C2181E for ; Tue, 4 Jun 2024 18:02:32 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id EF7D014A4E1; Tue, 4 Jun 2024 18:02:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="XXGi1TRT" Received: from mail-oi1-f180.google.com (mail-oi1-f180.google.com [209.85.167.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7AA2A18635 for ; Tue, 4 Jun 2024 18:02:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.180 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717524142; cv=none; b=nL4suiM8rTFwDxwbYV7oYHQUdpLK/aMpoi9oHP0muRazQsfQQgg4pj29Sck33ykftqW6o6uI2q8GlbwWoU+nRySDhZIyyKEcerH3cPOronOTFyvAajEZVt6quNvE5qPIbXNlKTCYFOt8zu7ZOfIhGe451B1KzLtk4bzEA2fPt4Y= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717524142; c=relaxed/simple; bh=tHWLGQWisUQj7HhRAw2go4sy8LHyULISrzif4HyJkSE=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=o8cUvVAypQclA1pRpoqbHBdMkhbE3DB3cBSRsYcnn/n02BwR/lmfkqcpSJzBrOXgMjOxilSy9Lbx4E2XbYbZQucMB9AJaqIC7z7hm/KwXJW5laF+/n4s4inELgbV7uSBTNsrfV+JghCyM4bWtvIixQ3Qcj2i5Qp2KF6c/AXnc3w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=XXGi1TRT; arc=none smtp.client-ip=209.85.167.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-oi1-f180.google.com with SMTP id 5614622812f47-3c9c36db8eeso3335531b6e.0 for ; Tue, 04 Jun 2024 11:02:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717524139; x=1718128939; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=tHWLGQWisUQj7HhRAw2go4sy8LHyULISrzif4HyJkSE=; b=XXGi1TRTITQwz1xn5H/aKaCoS1R041RPFztAHSxRoiXq3+Bw7CYbeYIWqAYHmmcWLN T5WTQKTHaDSFhcvizWs4M6wfWiZwSgDc0Bc8cF+jntVDel//da+vKKInTbayVL2iUssB aTOozmG3g2u6UGpBQdZ0ofPZQpDWIfMHV/vlX0H/h1DCZX9N/t9yMrn7Kibr73vcvXsi vw3IsQvtfbgG9KuRz6k0IwUeBbfRj5bGKTmtmt2scRJT0YIeOJgfa+fgWt9ew29T58m9 yZepVSVY5JclGQKnIZYKgyW9cefdtDdwuJ7Qixxij8WALOlIBUuuU8M/yKTSt4TpwcU4 iGXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717524139; x=1718128939; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tHWLGQWisUQj7HhRAw2go4sy8LHyULISrzif4HyJkSE=; b=CWEQT6RDG7677oymRtJ5Y7C2SBiT7bGfErkWqR/BasSlAz0cDuD86a+ykyG0HQKhRf I8SqcyOb5iz8G/CmQJFaNNV44OCULJL9Rzci8gCwN7f9fmcjXtPM8cqymmo3GNJwE+d3 vFG19IWKS3bbo6frVERWSJvn5lxkApniZkP4uC7sDliC3n8xiz7T9kZizCu+IMwIbKUI mcc2JJ8QyBEL1bckJb5weLE3iVilljnVmlaI17I96DMO8JEtAFzftpF1h2gUw7cmek52 WLqR03UNghSNVaEVPikORr+4EbjoIrmi1d1KWTnBIhkznA+U1iKiRcBqZkgEkE6beaCt Iy1g== X-Forwarded-Encrypted: i=1; AJvYcCXnY38QYd3SqZk77BkxcZR9vsJEhjU9YzfMy916U3jiKH7wmAZ7NCNj15KHoRnzH2izCBTeZP8x6/mYN5F9R/5mVQjIaNPRiwaVeDCX X-Gm-Message-State: AOJu0YzSUAvCCEkdpiyDNh4+Ky2j2fDxQxAvZa9JndF1SU6GshWP+0J5 52ewilT8lAJjb8YDFfDuKxAmj5f/eoDUaHQ7sPznxs0e+wXP2DyJctlwJHOUTpIrn35euC9RIIs 4R4gMs3acXxlR4Ir1Wahvlwc4kKA406R+Jdj9 X-Received: by 2002:a05:6808:911:b0:3d1:d9e6:7ee9 with SMTP id 5614622812f47-3d20439d504mr112188b6e.33.1717524139167; Tue, 04 Jun 2024 11:02:19 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240508202111.768b7a4d@yea> <20240515224524.1c8befbe@yea> <20240602200332.3e531ff1@yea> <20240604001304.5420284f@yea> <20240604134458.3ae4396a@yea> In-Reply-To: From: Yosry Ahmed Date: Tue, 4 Jun 2024 11:01:39 -0700 Message-ID: Subject: Re: kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 (Kernel v6.5.9, 32bit ppc) To: Yu Zhao Cc: Erhard Furtner , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Johannes Weiner , Nhat Pham , Chengming Zhou , Sergey Senozhatsky , Minchan Kim Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Jun 4, 2024 at 10:54=E2=80=AFAM Yu Zhao wrote: > > On Tue, Jun 4, 2024 at 11:34=E2=80=AFAM Yosry Ahmed wrote: > > > > On Tue, Jun 4, 2024 at 10:19=E2=80=AFAM Yu Zhao wro= te: > > > > > > On Tue, Jun 4, 2024 at 10:12=E2=80=AFAM Yosry Ahmed wrote: > > > > > > > > On Tue, Jun 4, 2024 at 4:45=E2=80=AFAM Erhard Furtner wrote: > > > > > > > > > > On Mon, 3 Jun 2024 16:24:02 -0700 > > > > > Yosry Ahmed wrote: > > > > > > > > > > > Thanks for bisecting. Taking a look at the thread, it seems lik= e you > > > > > > have a very limited area of memory to allocate kernel memory fr= om. One > > > > > > possible reason why that commit can cause an issue is because w= e will > > > > > > have multiple instances of the zsmalloc slab caches 'zspage' an= d > > > > > > 'zs_handle', which may contribute to fragmentation in slab memo= ry. > > > > > > > > > > > > Do you have /proc/slabinfo from a good and a bad run by any cha= nce? > > > > > > > > > > > > Also, could you check if the attached patch helps? It makes sur= e that > > > > > > even when we use multiple zsmalloc zpools, we will use a single= slab > > > > > > cache of each type. > > > > > > > > > > Thanks for looking into this! I got you 'cat /proc/slabinfo' from= a good HEAD, from a bad HEAD and from the bad HEAD + your patch applied. > > > > > > > > > > Good was 6be3601517d90b728095d70c14f3a04b9adcb166, bad was b8cf32= dc6e8c75b712cbf638e0fd210101c22f17 which I got both from my bisect.log. I g= ot the slabinfo shortly after boot and a 2nd time shortly before the OOM or= the kswapd0: page allocation failure happens. I terminated the workload (s= tress-ng --vm 2 --vm-bytes 1930M --verify -v) manually shortly before the 2= GiB RAM exhausted and got the slabinfo then. > > > > > > > > > > The patch applied to git b8cf32dc6e8c75b712cbf638e0fd210101c22f17= unfortunately didn't make a difference, I got the kswapd0: page allocation= failure nevertheless. > > > > > > > > Thanks for trying this out. The patch reduces the amount of wasted > > > > memory due to the 'zs_handle' and 'zspage' caches by an order of > > > > magnitude, but it was a small number to begin with (~250K). > > > > > > > > I cannot think of other reasons why having multiple zsmalloc pools > > > > will end up using more memory in the 0.25GB zone that the kernel > > > > allocations can be made from. > > > > > > > > The number of zpools can be made configurable or determined at runt= ime > > > > by the size of the machine, but I don't want to do this without > > > > understanding the problem here first. Adding other zswap and zsmall= oc > > > > folks in case they have any ideas. > > > > > > Hi Erhard, > > > > > > If it's not too much trouble, could you "grep nr_zspages /proc/vmstat= " > > > on kernels before and after the bad commit? It'd be great if you coul= d > > > run the grep command right before the OOM kills. > > > > > > The overall internal fragmentation of multiple zsmalloc pools might b= e > > > higher than a single one. I suspect this might be the cause. > > > > I thought about the internal fragmentation of pools, but zsmalloc > > should have access to highmem, and if I understand correctly the > > problem here is that we are running out of space in the DMA zone when > > making kernel allocations. > > > > Do you suspect zsmalloc is allocating memory from the DMA zone > > initially, even though it has access to highmem? > > There was a lot of user memory in the DMA zone. So at a point the > highmem zone was full and allocation fallback happened. > > The problem with zone fallback is that recent allocations go into > lower zones, meaning they are further back on the LRU list. This > applies to both user memory and zsmalloc memory -- the latter has a > writeback LRU. On top of this, neither the zswap shrinker nor the > zsmalloc shrinker (compaction) is zone aware. So page reclaim might > have trouble hitting the right target zone. I see what you mean. In this case, yeah I think the internal fragmentation in the zsmalloc pools may be the reason behind the problem. How many CPUs does this machine have? I am wondering if 32 can be an overkill for small machines, perhaps the number of pools should be max(nr_cpus, 32)? Alternatively, the number of pools should scale with the memory size in some way, such that we only increase fragmentation when it's tolerable. > > We can't really tell how zspages are distributed across zones, but the > overall number might be helpful. It'd be great if someone could make > nr_zspages per zone :)