Received: by 2002:a25:d783:0:0:0:0:0 with SMTP id o125csp409023ybg; Thu, 19 Mar 2020 01:58:54 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvtaht4DJ0Rx0owGSTS/lZ326Iaq5dbA2tpq4XgalHiZQCGDLzAFK1o/PLEuGlfD5cO6LyA X-Received: by 2002:a05:6808:2ce:: with SMTP id a14mr1545124oid.175.1584608334593; Thu, 19 Mar 2020 01:58:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584608334; cv=none; d=google.com; s=arc-20160816; b=jYF/arBubzym1YQhc82BZZa3Sh/mSEeV+STk6sNM6kH0IVJQzD4pBmMJE+1zM6DaSg dsrlto/d9Y7Q/t70xCj0JnaSZGWN73I022Tb3tf84X4q4vdVThqD8zt08YqhdVoeJn3I Dik04NSj8otjI3UWicC8k4UbF7d+CqhBlcmBvFZqugJb3TBjTE3X59wxvSPX1AyRwJhO uDD27EuS27YhRUYERtg774pVF37Np+l1/+FjiBEHf1xo5MXR+FLBgyfV15SOU6Cjj6qG AlnCCCeWSGQJUPtNQ28zkTQNUd1cjuAp3JCKb6uGLKXswDJB5hhoaMl1fFUcnwNYl3YC MkSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=5WXZVzgoGeWeHjXvOqVQJ1mJgWoy087Hb8hKgACB9rY=; b=GoPZd3GcK5U1zYND6ncDg2JYa4kQna9ZvEr9gJRNj+J5Lwa+zkXwCIdtGA9yXH6LFc shr6+FjMvuvvprL9YsVcqMc+7Ox8pA3/h0f2iuK9xv0ItECqTJG8Ozy8DSJY9ZoIowTt WWRP0FvD0WCLgZQNLsEXzjn0A4Q4n7ki6QgLc0/KK7WCViUQOis8fgCy2P/L5QSp/nkC FxPv00JQ2FITSZ+6oUEZgCv39JJdjiTu/R8eQC4dexpvG2Tb1G0HrnHG6WBJUHEH6p2E M1JSNkgDmTC5K4bNckdsW558jTJCMNuXFfZP88sHJ9oGxFxXhBmlFJYT1ysVN0RKENQC pvHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=J+kDhC6A; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y17si934193oto.70.2020.03.19.01.58.41; Thu, 19 Mar 2020 01:58:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=J+kDhC6A; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726589AbgCSI6M (ORCPT + 99 others); Thu, 19 Mar 2020 04:58:12 -0400 Received: from mail-qt1-f195.google.com ([209.85.160.195]:38498 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725767AbgCSI6L (ORCPT ); Thu, 19 Mar 2020 04:58:11 -0400 Received: by mail-qt1-f195.google.com with SMTP id z12so1127683qtq.5 for ; Thu, 19 Mar 2020 01:58:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=5WXZVzgoGeWeHjXvOqVQJ1mJgWoy087Hb8hKgACB9rY=; b=J+kDhC6A90eyhRiUK8pvjOktXN744EyDTo7d7nq+ma7ldoaUrHSrqFWyZqc4iOw56Z 6xtzkos9fpAij/e2YFmgBGbdbufUodJ2n82nQclswLXrHIECJzQtCG859XzAgT1/Pn8V VvRSQrxoQjhJHFhLUIHry0Iqd/dYsot1Fx0pPzO3m6n4LosjSCEScnHjPrP4M/Y5KStQ 7KYjAjCUJYuuX2QsyBZjaB/iprp1HBJDJUHjT4WfWFWUxXWJj8qkQ8unHaVG/qfMWb9F /w2IwvKwXLKy3nRJ3dMjCbB9NKZ1U9yC4Y9y/1nJqFM6WIrOOjYuuheNIpQaVXRhItpu DHbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=5WXZVzgoGeWeHjXvOqVQJ1mJgWoy087Hb8hKgACB9rY=; b=QhPJuAaVc7JbdWMmEkdqPKMfKoHxfudR7frY/u9FAMeVBdYutwMufQeUKxzhhq3vTB Q0wR0XFZut5R2a2+YIkoPy8GPj/4c+/wdTQF4pcn1rkPDVdpVboheA3nRufBq5Vk9p8r jGYsUBOjAL2oT0MMqlQPoGAhWS0aAZXsd3qjWLChQH9Rmt2elsqbBwdBf1bWhtfJRQu7 7fqxiIJXDqWdPSHk2OP3gta2Mkw0SQG2sfulNnoX5zgk8Ybz6H4MHNCPxUo2ZVRvE8bS 2DCIVKFVZlZyP+J3iFq7f3jkIRtGTC21pl6p2dq33+lz/wLdc5boMG4bwkHrKu5tmdVN pLRA== X-Gm-Message-State: ANhLgQ2GyQZBQBXtXUeUXGI3z55vbj0y4vEY0whIpej7bnuTnvQqhhGr 78+NFP4RjDtfdaUSgbN98Er24InwB4jbj+8L+jk= X-Received: by 2002:ac8:708f:: with SMTP id y15mr1814514qto.35.1584608289921; Thu, 19 Mar 2020 01:58:09 -0700 (PDT) MIME-Version: 1.0 References: <1584502378-12609-1-git-send-email-iamjoonsoo.kim@lge.com> <1584502378-12609-2-git-send-email-iamjoonsoo.kim@lge.com> In-Reply-To: From: Joonsoo Kim Date: Thu, 19 Mar 2020 17:57:58 +0900 Message-ID: Subject: Re: [PATCH v2 1/2] mm/page_alloc: use ac->high_zoneidx for classzone_idx To: David Rientjes Cc: Andrew Morton , Linux Memory Management List , LKML , Johannes Weiner , Michal Hocko , Minchan Kim , Vlastimil Babka , Mel Gorman , kernel-team@lge.com, Ye Xiaolong , Joonsoo Kim Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2020=EB=85=84 3=EC=9B=94 19=EC=9D=BC (=EB=AA=A9) =EC=98=A4=EC=A0=84 6:29, D= avid Rientjes =EB=8B=98=EC=9D=B4 =EC=9E=91=EC=84=B1: > > On Wed, 18 Mar 2020, js1304@gmail.com wrote: > > > From: Joonsoo Kim > > > > Currently, we use the zone index of preferred_zone which represents > > the best matching zone for allocation, as classzone_idx. It has a probl= em > > on NUMA system with ZONE_MOVABLE. > > > > Hi Joonsoo, Hello, David. > More specifically, it has a problem on NUMA systems when the lowmem > reserve protection exists for some zones on a node that do not exist on > other nodes, right? Right. > In other words, to make sure I understand correctly, if your node 1 had a > ZONE_MOVABLE than this would not have happened. If that's true, it might > be helpful to call out that ZONE_MOVABLE itself is not necessarily a > problem, but a system where one node has ZONE_NORMAL and ZONE_MOVABLE and > another only has ZONE_NORMAL is the problem. Okay. I will try to re-write the commit message as you suggested. > > In NUMA system, it can be possible that each node has different populat= ed > > zones. For example, node 0 could have DMA/DMA32/NORMAL/MOVABLE zone and > > node 1 could have only NORMAL zone. In this setup, allocation request > > initiated on node 0 and the one on node 1 would have different > > classzone_idx, 3 and 2, respectively, since their preferred_zones are > > different. If they are handled by only their own node, there is no prob= lem. > > I'd say "If the allocation is local" rather than "If they are handled by > only their own node". I will replace it with yours. Thanks for correcting. > > However, if they are somtimes handled by the remote node due to memory > > shortage, the problem would happen. > > > > In the following setup, allocation initiated on node 1 will have some > > precedence than allocation initiated on node 0 when former allocation i= s > > processed on node 0 due to not enough memory on node 1. They will have > > different lowmem reserve due to their different classzone_idx thus > > an watermark bars are also different. > > > > root@ubuntu:/sys/devices/system/memory# cat /proc/zoneinfo > > Node 0, zone DMA > > per-node stats > > ... > > pages free 3965 > > min 5 > > low 8 > > high 11 > > spanned 4095 > > present 3998 > > managed 3977 > > protection: (0, 2961, 4928, 5440) > > ... > > Node 0, zone DMA32 > > pages free 757955 > > min 1129 > > low 1887 > > high 2645 > > spanned 1044480 > > present 782303 > > managed 758116 > > protection: (0, 0, 1967, 2479) > > ... > > Node 0, zone Normal > > pages free 459806 > > min 750 > > low 1253 > > high 1756 > > spanned 524288 > > present 524288 > > managed 503620 > > protection: (0, 0, 0, 4096) > > ... > > Node 0, zone Movable > > pages free 130759 > > min 195 > > low 326 > > high 457 > > spanned 1966079 > > present 131072 > > managed 131072 > > protection: (0, 0, 0, 0) > > ... > > Node 1, zone DMA > > pages free 0 > > min 0 > > low 0 > > high 0 > > spanned 0 > > present 0 > > managed 0 > > protection: (0, 0, 1006, 1006) > > Node 1, zone DMA32 > > pages free 0 > > min 0 > > low 0 > > high 0 > > spanned 0 > > present 0 > > managed 0 > > protection: (0, 0, 1006, 1006) > > Node 1, zone Normal > > per-node stats > > ... > > pages free 233277 > > min 383 > > low 640 > > high 897 > > spanned 262144 > > present 262144 > > managed 257744 > > protection: (0, 0, 0, 0) > > ... > > Node 1, zone Movable > > pages free 0 > > min 0 > > low 0 > > high 0 > > spanned 262144 > > present 0 > > managed 0 > > protection: (0, 0, 0, 0) > > > > min watermark for NORMAL zone on node 0 > > allocation initiated on node 0: 750 + 4096 =3D 4846 > > allocation initiated on node 1: 750 + 0 =3D 750 > > > > This watermark difference could cause too many numa_miss allocation > > in some situation and then performance could be downgraded. > > > > Recently, there was a regression report about this problem on CMA patch= es > > since CMA memory are placed in ZONE_MOVABLE by those patches. I checked > > that problem is disappeared with this fix that uses high_zoneidx > > for classzone_idx. > > > > http://lkml.kernel.org/r/20180102063528.GG30397@yexl-desktop > > > > Using high_zoneidx for classzone_idx is more consistent way than previo= us > > approach because system's memory layout doesn't affect anything to it. > > With this patch, both classzone_idx on above example will be 3 so will > > have the same min watermark. > > > > allocation initiated on node 0: 750 + 4096 =3D 4846 > > allocation initiated on node 1: 750 + 4096 =3D 4846 > > > > Alternatively, I assume that this could also be fixed by changing the > value of the lowmem protection on the node without managed pages in the > upper zone to be the max protection from the lowest zones? In your > example, node 1 ZONE_NORMAL would then be (0, 0, 0, 4096). No, if lowmem_reserve of node 0 ZONE_NORMAL is (0, 0, 4096, 4096), min watermark of the allocation initiated on node 1 is 750 + 4096(classzone_idx 2) when allocation is tried on node 0 ZONE_NORMAL and issue would be gone. So, I think that it cannot be fixed by your alternative. > > One could wonder if there is a side effect that allocation initiated on > > node 1 will use higher bar when allocation is handled on node 1 since > > classzone_idx could be higher than before. It will not happen because > > the zone without managed page doesn't contributes lowmem_reserve at all= . > > > > Reported-by: Ye Xiaolong > > Tested-by: Ye Xiaolong > > Signed-off-by: Joonsoo Kim > > Curious: is this only an issue when vm.numa_zonelist_order is set to Node= ? Do you mean "/proc/sys/vm/numa_zonelist_order"? It looks like it's gone now= . Thanks.