Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp169662ybh; Tue, 17 Mar 2020 20:34:10 -0700 (PDT) X-Google-Smtp-Source: ADFU+vsIM4rHeeUqDsmdYjR0EcvE3TXJsXZzk4oLFpuW3LfQS00s8lX8LAxBzyFQ38qdKk7dHexR X-Received: by 2002:a05:6830:2014:: with SMTP id e20mr2185836otp.317.1584502449969; Tue, 17 Mar 2020 20:34:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584502449; cv=none; d=google.com; s=arc-20160816; b=fz3wof3Fecp3Jdj8TOSxEjO2egz70C+Pe9DG7OyfbCcAvTje2gll3iXvB/Ut3N3DIV vpO1zPxDTscptCcRMcGKIUeqR4SJorwGq5O3gwZpuDQ734I0WW57xapF++9poThdY/Cf 2YMglsw7uA969Dpgh5AwQUNLy7Uk+5/OEF3cMpYEgz0AgADYvyG5dQCU5jBcuAYtsyDb QBfox61amnSp2oq+iq4Vpv3PiXxit5659bhrCekEH4qTWlYUVPozlSwF2mfex9ZX1Nk8 tiv3oK0NH9tZ3kOuJlqeUbq33ot/eQBgSfMwuCZ6qaWEI/JNjmdmCRPwTdrVs1ZjHUKV 7O9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=b6do2FdM++ZNF8atrZaZPrVTraaxJfH9p4SJ3Cchx9M=; b=hhX+ouoCG6H3Jf46yxbPhll8tx7qmfA2nYoimvVHRPO7PcS1I1derF0RydryLGYv3q 4qs9pevvK0dugWF0FIkVK7bhkzLj4EOT7K9LEcpn3DQhmVmMsZxD/6rNwmShE9JF58Xv iAikgxKJKckzpyWC4qXiwX487UXbevTnW53q7oO6Z5VMHskeqo1Ycz6/F6MwqaoARs/D yZhquK36Zb6Qhh0rxSGvd5AW3ItHKmhAo21Ah4omLARwdyQ5IIiSjj5jyT/pl8CXgE+k 2RSqZT2uUlHITjVVYcMHz9yHG4Uw9J6qp6757eSvXTqWPNafc6o7Ob5skcaqQSSHVJDc bZHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=bNnUA5nj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b204si2617548oif.69.2020.03.17.20.33.57; Tue, 17 Mar 2020 20:34:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=bNnUA5nj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726869AbgCRDdd (ORCPT + 99 others); Tue, 17 Mar 2020 23:33:33 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:38857 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726229AbgCRDdc (ORCPT ); Tue, 17 Mar 2020 23:33:32 -0400 Received: by mail-pf1-f196.google.com with SMTP id z5so13120647pfn.5 for ; Tue, 17 Mar 2020 20:33:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=b6do2FdM++ZNF8atrZaZPrVTraaxJfH9p4SJ3Cchx9M=; b=bNnUA5njuPgjKCIrC9rnivOwgdkNOnKofd4X93noFXuJftyr3thd65QJ+M83ojvcvY t7m4G3TkleHv5rQNJ6cDoA5Uc+2T64eOYiKfF4XtUOd0Goj9OjsVCh1ZH0zzqB2c6bPr raDe4tTZnDh3mIr/cLWvmIo0JLYVEK5uhZqg60MnYwxZH4H2cWC2eRTFXLVkdvxBwlC8 m2WT2v99W1/ovHvxDEHNG20Qz3NdO0Ki0aJ9v0pWoQxd+ZnUFmyaCXQD2gXRjYcft8b/ /uVcQC96CLnaHEbKx+pCd0S7XuvBhCweZP53nF0Shubh5LS8BfSZtBOY/gCL9fvuRCdo wnFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=b6do2FdM++ZNF8atrZaZPrVTraaxJfH9p4SJ3Cchx9M=; b=K+V5tSFmR2NiBPcUvqNsFppZcuGJZ0hq/UmWprZgDUSbekCoB/8eTS2Ek8nsolAxnx D0eLjMDTbB5En3nS/eYg1gltERCf2teIHXXR4JwkzX21Mq6RWg0FCw5aBXyffsnHdZwk sTWzJNjKmOt5jrwt2DrF73dg4LK+PScVk1TziFmL5r/nLX/AsQ5LgXSoecoItViB99JG XV8qbf09eMGwyMgpEL6RM+39Iin3DO8JQ3ju+XBAXkaEZXE++CnafvORTDV52q1ZInsV HiJQLnlTo1i6yS4OGN+b9TsRUQQbyCgvtpPxvE7FtWtUDLGFDfrG7w7PCuoh0CrK0oeT MZAg== X-Gm-Message-State: ANhLgQ2FhcWbkArHNXfg68ciBbhJMJ7S3Ij0M4litCI5jzt15MUjQqRE OXBTiTxw0y07734eTLsoTXo= X-Received: by 2002:a62:2684:: with SMTP id m126mr2019555pfm.153.1584502409939; Tue, 17 Mar 2020 20:33:29 -0700 (PDT) Received: from localhost.localdomain ([114.206.198.176]) by smtp.gmail.com with ESMTPSA id ep7sm679641pjb.24.2020.03.17.20.33.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 17 Mar 2020 20:33:29 -0700 (PDT) From: js1304@gmail.com X-Google-Original-From: iamjoonsoo.kim@lge.com To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Johannes Weiner , Michal Hocko , Minchan Kim , Vlastimil Babka , Mel Gorman , kernel-team@lge.com, Ye Xiaolong , Joonsoo Kim Subject: [PATCH v2 1/2] mm/page_alloc: use ac->high_zoneidx for classzone_idx Date: Wed, 18 Mar 2020 12:32:57 +0900 Message-Id: <1584502378-12609-2-git-send-email-iamjoonsoo.kim@lge.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584502378-12609-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1584502378-12609-1-git-send-email-iamjoonsoo.kim@lge.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Joonsoo Kim Currently, we use the zone index of preferred_zone which represents the best matching zone for allocation, as classzone_idx. It has a problem on NUMA system with ZONE_MOVABLE. In NUMA system, it can be possible that each node has different populated zones. For example, node 0 could have DMA/DMA32/NORMAL/MOVABLE zone and node 1 could have only NORMAL zone. In this setup, allocation request initiated on node 0 and the one on node 1 would have different classzone_idx, 3 and 2, respectively, since their preferred_zones are different. If they are handled by only their own node, there is no problem. However, if they are somtimes handled by the remote node due to memory shortage, the problem would happen. In the following setup, allocation initiated on node 1 will have some precedence than allocation initiated on node 0 when former allocation is processed on node 0 due to not enough memory on node 1. They will have different lowmem reserve due to their different classzone_idx thus an watermark bars are also different. root@ubuntu:/sys/devices/system/memory# cat /proc/zoneinfo Node 0, zone DMA per-node stats ... pages free 3965 min 5 low 8 high 11 spanned 4095 present 3998 managed 3977 protection: (0, 2961, 4928, 5440) ... Node 0, zone DMA32 pages free 757955 min 1129 low 1887 high 2645 spanned 1044480 present 782303 managed 758116 protection: (0, 0, 1967, 2479) ... Node 0, zone Normal pages free 459806 min 750 low 1253 high 1756 spanned 524288 present 524288 managed 503620 protection: (0, 0, 0, 4096) ... Node 0, zone Movable pages free 130759 min 195 low 326 high 457 spanned 1966079 present 131072 managed 131072 protection: (0, 0, 0, 0) ... Node 1, zone DMA pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 protection: (0, 0, 1006, 1006) Node 1, zone DMA32 pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 protection: (0, 0, 1006, 1006) Node 1, zone Normal per-node stats ... pages free 233277 min 383 low 640 high 897 spanned 262144 present 262144 managed 257744 protection: (0, 0, 0, 0) ... Node 1, zone Movable pages free 0 min 0 low 0 high 0 spanned 262144 present 0 managed 0 protection: (0, 0, 0, 0) min watermark for NORMAL zone on node 0 allocation initiated on node 0: 750 + 4096 = 4846 allocation initiated on node 1: 750 + 0 = 750 This watermark difference could cause too many numa_miss allocation in some situation and then performance could be downgraded. Recently, there was a regression report about this problem on CMA patches since CMA memory are placed in ZONE_MOVABLE by those patches. I checked that problem is disappeared with this fix that uses high_zoneidx for classzone_idx. http://lkml.kernel.org/r/20180102063528.GG30397@yexl-desktop Using high_zoneidx for classzone_idx is more consistent way than previous approach because system's memory layout doesn't affect anything to it. With this patch, both classzone_idx on above example will be 3 so will have the same min watermark. allocation initiated on node 0: 750 + 4096 = 4846 allocation initiated on node 1: 750 + 4096 = 4846 One could wonder if there is a side effect that allocation initiated on node 1 will use higher bar when allocation is handled on node 1 since classzone_idx could be higher than before. It will not happen because the zone without managed page doesn't contributes lowmem_reserve at all. Reported-by: Ye Xiaolong Tested-by: Ye Xiaolong Signed-off-by: Joonsoo Kim --- mm/internal.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/internal.h b/mm/internal.h index c39c895..aebaa33 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -119,7 +119,7 @@ struct alloc_context { bool spread_dirty_pages; }; -#define ac_classzone_idx(ac) zonelist_zone_idx(ac->preferred_zoneref) +#define ac_classzone_idx(ac) (ac->high_zoneidx) /* * Locate the struct page for both the matching buddy in our -- 2.7.4