Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp3297287ybb; Sun, 22 Mar 2020 20:51:57 -0700 (PDT) X-Google-Smtp-Source: ADFU+vtm4ACiiQJGya7xqw3NcBxQweMTTnC9pz0XHkZHZgyAOQv0m6tyu5oTRLGuiC8C+wXAMD65 X-Received: by 2002:aca:4fc3:: with SMTP id d186mr15254186oib.171.1584935517478; Sun, 22 Mar 2020 20:51:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584935517; cv=none; d=google.com; s=arc-20160816; b=mDAw5ZNmxhQVLQBo1Cvul2jIENp1Jq3txSyv8cBfHigQiAr0yfZ4n6Uaif5m6x7vW6 MCwmM918iAwMlewbupikq1Co9FhggvJEuBa6IKqBD+E7O+yJ9nmrpUh8iNcSQQBRVg74 65TR6d/9oGNRdaTpvcrJkkzG8j7c42CFo8iFfCXh3XYXXExNor0x645OnIR3tp7Hysy6 QMWIpoNs1p0gU9DmVuFAXSnlu0mAk6BQhe9jtVm+dZxj440gXGeM+5yX4TenEbLtk0Iw 2RYeB2iUcsMzA5naznBrH22hPteTxGtWHp2ERxXh0nV/DEKi8M4af+aaS1Z8nEMPV3Jw GsVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=6INYt4gGhpcoFn705pKJj9o8HPNDr8E2DVtk1eeKAJQ=; b=OwNZCb3xgkq8XUB99b/rBWNbVr6mj54Z4iNPXAcJnSJp0uCZ5Me75l6BUNc3Sy4FDQ wuZrcoeBt79wRSTJ07vbptBkgIvnou9IU71ad3Dx/TcA8IqfccV1whirt1SeCHO3/Bd4 DeiQzOytfvtdRRFcak7AUPIUJLczWaz5yVfnkELmTNURJBeRyIlYnM5k5h628nTAg9u4 DrlVQzKw5CdjqTq4y3gyougs/QKYu5NDT9fdsDRYBwqrrOSukS+iUxz63LAC8dzG4ksQ tA4l0KPPTo8KUz4bZUmbdlMeiiLxe2OZ/+J+rzyNTa7KDrVBhmTWJmDl3jKLKbqFH01s XrEQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=PIg5w6jz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g19si7499232otn.219.2020.03.22.20.51.44; Sun, 22 Mar 2020 20:51:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=PIg5w6jz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727102AbgCWDuv (ORCPT + 99 others); Sun, 22 Mar 2020 23:50:51 -0400 Received: from mail-qk1-f196.google.com ([209.85.222.196]:36723 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726983AbgCWDuv (ORCPT ); Sun, 22 Mar 2020 23:50:51 -0400 Received: by mail-qk1-f196.google.com with SMTP id d11so13979846qko.3 for ; Sun, 22 Mar 2020 20:50:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=6INYt4gGhpcoFn705pKJj9o8HPNDr8E2DVtk1eeKAJQ=; b=PIg5w6jzjqlwC2zGBgV01xRGV1zVFSK8wau6VhPDQT6HSNq653JjVBe9KQ/1QBIGus n+mN86UBBgu5RbxQ7DEI+3z9l3W8auqXtQct8bXnOrPZkXjdpqOlWePJPH69v65KVgod EuUX99twNQpzFignvuSb6NMjRxULtYAQghdmXRZXXh+eL5DP0/F+qIjEXKuy+0eBgWzD d3tRWwEzCIa4OHQVBpXsZLv8TXdq2Gr5ayy0YOo6TGhas0sFpNwymcK7nXN9mDuJz897 lcyYFoUO7DYMzPqbWT4kAY92SEnOtQ4sLPnjwG3uhvGqJ4wbKjbubGh0owbQLR1GlxNH g3Mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=6INYt4gGhpcoFn705pKJj9o8HPNDr8E2DVtk1eeKAJQ=; b=kv9wZTTs6xtC7UfVXUCzZE/flUWdiU3LqObRaWLEUPBUVpCOqss9T4X/9vryJMEJ66 7Oe0eUa6dbGpa/JEdoKIubMp+vemHqyqpdcqorjWOQTQhYIYJr8Te76Vvsz2zijdOUTd P3VeT0M4gEh5liLlvL86swhEXcPu6oX9wDLRFkdauw0wYPkeqgw6p5a3sVIYBnQ/DAiK 9sRKxvURjcPKYM8M4dZGytEWHadBpidL34xlAZ3rYil0BDV6gHvegIEVBI9UkGiPqv6T hyWvVDKIxVy4AWJwOVH3Pw86ptPpe8D/WVAEh+AvLR9tM8MXRjZ5Y6TV9T3CYAtXNsLr 8UEA== X-Gm-Message-State: ANhLgQ2vzPXXphLgHVyCbLSt1wdNkxtrBrwQcNPDPuv8Z6s6nUTmGrV/ HNtF/6CKb0PrSFGCkeLc4oalY8TT5AxZaKWj/Kg= X-Received: by 2002:a37:b4c1:: with SMTP id d184mr19779707qkf.452.1584935449102; Sun, 22 Mar 2020 20:50:49 -0700 (PDT) MIME-Version: 1.0 References: <1584693135-4396-1-git-send-email-iamjoonsoo.kim@lge.com> <1584693135-4396-2-git-send-email-iamjoonsoo.kim@lge.com> <20200320103000.GB3039@MiWiFi-R3L-srv> In-Reply-To: <20200320103000.GB3039@MiWiFi-R3L-srv> From: Joonsoo Kim Date: Mon, 23 Mar 2020 12:50:38 +0900 Message-ID: Subject: Re: [PATCH v3 1/2] mm/page_alloc: use ac->high_zoneidx for classzone_idx To: Baoquan He Cc: Andrew Morton , Linux Memory Management List , LKML , Johannes Weiner , Michal Hocko , Minchan Kim , Vlastimil Babka , Mel Gorman , kernel-team@lge.com, Ye Xiaolong , David Rientjes , Joonsoo Kim Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Baoquan. 2020=EB=85=84 3=EC=9B=94 20=EC=9D=BC (=EA=B8=88) =EC=98=A4=ED=9B=84 7:30, B= aoquan He =EB=8B=98=EC=9D=B4 =EC=9E=91=EC=84=B1: > > > On 03/20/20 at 05:32pm, js1304@gmail.com wrote: > > From: Joonsoo Kim > > > > Currently, we use the zone index of preferred_zone which represents > > the best matching zone for allocation, as classzone_idx. It has > > a problem on NUMA systems when the lowmem reserve protection exists > > for some zones on a node that do not exist on other nodes. > > > > In NUMA system, it can be possible that each node has different populat= ed > > zones. For example, node 0 could have DMA/DMA32/NORMAL/MOVABLE zone and > > node 1 could have only NORMAL zone. In this setup, allocation request > > initiated on node 0 and the one on node 1 would have different > > classzone_idx, 3 and 2, respectively, since their preferred_zones are > > different. If the allocation is local, there is no problem. However, > > if it is handled by the remote node due to memory shortage, the problem > > would happen. > > Hi Joonsoo, > > Not sure if adding one sentence into above paragraph would be make it > easier to understand. Assume you are only talking about the high_zoneidx > is MOVABLE_ZONE with calculation of gfp_zone(gfp_mask), since any other > case doesn't have this problem. Please correct me if I am wrong. You're right. This example is for the allocation request with gfp_zone(gfp_mask), MOVABLE_ZONE. > In NUMA system, it can be possible that each node has different populated > zones. For example, node 0 could have DMA/DMA32/NORMAL/MOVABLE zone and > node 1 could have only NORMAL zone. In this setup, if we get high_zoneidx > as 3 (namely MOVABLE zone), with gfp_zone(gfp_mask), allocation request > initiated on node 0 and the one on node 1 would have different > classzone_idx, 3 and 2, respectively, since their preferred_zones are > different. If the allocation is local, there is no problem. However, > if it is handled by the remote node due to memory shortage, the problem > would happen. I'm okay with your change but I try again to be better. Please check the following rewritten commit message and please let me know if it is better than before. Thanks. ------------------------>8------------------------------- From 3842d99d54c02c25cc6f6077fd0e97acabc10917 Mon Sep 17 00:00:00 2001 From: Joonsoo Kim Date: Fri, 4 May 2018 09:55:21 +0900 Subject: [PATCH] mm/page_alloc: use ac->high_zoneidx for classzone_idx Currently, we use classzone_idx to calculate lowmem reserve proetection for an allocation request. This classzone_idx causes a problem on NUMA systems when the lowmem reserve protection exists for some zones on a node that do not exist on other nodes. Before further explanation, I should first clarify how to compute the classzone_idx and the high_zoneidx. - ac->high_zoneidx is computed via the arcane gfp_zone(gfp_mask) and represents the index of the highest zone the allocation can use - classzone_idx was supposed to be the index of the highest zone on the local node that the allocation can use, that is actually available in the system Think about following example. Node 0 has 4 populated zone, DMA/DMA32/NORMAL/MOVABLE. Node 1 has 1 populated zone, NORMAL. Some zones, such as MOVABLE, doesn't exist on node 1 and this makes following difference. Assume that there is an allocation request whose gfp_zone(gfp_mask) is the zone, MOVABLE. Then, it's high_zoneidx is 3. If this allocation is initiated on node 0, it's classzone_idx is 3 since actually available/usable zone on local (node 0) is MOVABLE. If this allocation is initiated on node 1, it's classzone_idx is 2 since actually available/usable zone on local (node 1) is NORMAL. You can see that classzone_idx of the allocation request are different according to their starting node, even if their high_zoneidx is the same. Think more about these two allocation requests. If they are processed on local, there is no problem. However, if allocation is initiated on node 1 are processed on remote, in this example, at the NORMAL zone on node 0, due to memory shortage, problem occurs. Their different classzone_idx leads to different lowmem reserve and then different min watermark. See the following example. root@ubuntu:/sys/devices/system/memory# cat /proc/zoneinfo Node 0, zone DMA per-node stats ... pages free 3965 min 5 low 8 high 11 spanned 4095 present 3998 managed 3977 protection: (0, 2961, 4928, 5440) ... Node 0, zone DMA32 pages free 757955 min 1129 low 1887 high 2645 spanned 1044480 present 782303 managed 758116 protection: (0, 0, 1967, 2479) ... Node 0, zone Normal pages free 459806 min 750 low 1253 high 1756 spanned 524288 present 524288 managed 503620 protection: (0, 0, 0, 4096) ... Node 0, zone Movable pages free 130759 min 195 low 326 high 457 spanned 1966079 present 131072 managed 131072 protection: (0, 0, 0, 0) ... Node 1, zone DMA pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 protection: (0, 0, 1006, 1006) Node 1, zone DMA32 pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 protection: (0, 0, 1006, 1006) Node 1, zone Normal per-node stats ... pages free 233277 min 383 low 640 high 897 spanned 262144 present 262144 managed 257744 protection: (0, 0, 0, 0) ... Node 1, zone Movable pages free 0 min 0 low 0 high 0 spanned 262144 present 0 managed 0 protection: (0, 0, 0, 0) - static min watermark for the NORMAL zone on node 0 is 750. - lowmem reserve for the request with classzone idx 3 at the NORMAL on node 0 is 4096. - lowmem reserve for the request with classzone idx 2 at the NORMAL on node 0 is 0. So, overall min watermark is: allocation initiated on node 0 (classzone_idx 3): 750 + 4096 =3D 4846 allocation initiated on node 1 (classzone_idx 2): 750 + 0 =3D 750 allocation initiated on node 1 will have some precedence than allocation initiated on node 0 because min watermark of the former allocation is lower than the other. So, allocation initiated on node 1 could succeed on node 0 when allocation initiated on node 0 could not, and, this could cause too many numa_miss allocation. Then, performance could be downgraded. Recently, there was a regression report about this problem on CMA patches since CMA memory are placed in ZONE_MOVABLE by those patches. I checked that problem is disappeared with this fix that uses high_zoneidx for classzone_idx. http://lkml.kernel.org/r/20180102063528.GG30397@yexl-desktop Using high_zoneidx for classzone_idx is more consistent way than previous approach because system's memory layout doesn't affect anything to it. With this patch, both classzone_idx on above example will be 3 so will have the same min watermark. allocation initiated on node 0: 750 + 4096 =3D 4846 allocation initiated on node 1: 750 + 4096 =3D 4846 One could wonder if there is a side effect that allocation initiated on node 1 will use higher bar when allocation is handled on local since classzone_idx could be higher than before. It will not happen because the zone without managed page doesn't contributes lowmem_reserve at all. Reported-by: Ye Xiaolong Tested-by: Ye Xiaolong Acked-by: Vlastimil Babka Signed-off-by: Joonsoo Kim --- mm/internal.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/internal.h b/mm/internal.h index c39c895..aebaa33 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -119,7 +119,7 @@ struct alloc_context { bool spread_dirty_pages; }; -#define ac_classzone_idx(ac) zonelist_zone_idx(ac->preferred_zoneref) +#define ac_classzone_idx(ac) (ac->high_zoneidx) /* * Locate the struct page for both the matching buddy in our --=20 2.7.4