Received: by 10.192.165.148 with SMTP id m20csp440111imm; Fri, 4 May 2018 00:05:38 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqfcs/ugtYh3yt0lAhqRC2vYh0xTjExvi7mmnp43WOhtyc6gvMakfPUu8zNr8ONdxN2Vxg7 X-Received: by 2002:a65:4502:: with SMTP id n2-v6mr20861724pgq.95.1525417538562; Fri, 04 May 2018 00:05:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525417538; cv=none; d=google.com; s=arc-20160816; b=BCIU7wg4YTdT+BoffxJHzgfme7r3xW9qNkxGiKWz+7iunSFJnTDoB4p9XhCVOfcJTn BEgsPxqtJpmr+N0lgI8SnwCNwPi3+MNCX3sxcTc7lj4Rtl7dccwGs7ZtE56pOwOtB25a 6iNvB+xT2EEAJEBtFUIHx+nxPAJby8WzDJyNmbxCJn8S5jvrqsOUHp2/zxYHvU+8e4ca fKdOQ47oMKH+EpMZwiVWgmPbYn7uZKwSPxjDAVlB/fv6aBseJ484JQ7lFSCwfNMYgd4k RJ1lop3hFEBxFkswNjQlI4zB/K48PWrqCACY6RLsuSqXu/vAIu1Fz6TkhSn4tQA4Tpo2 +cSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=+u6LTiCS9dTGlROSLsQdx275Do0hfvOyv2PJuL6oihQ=; b=SfF4xAGE4q/Wwc7TdLoJ5lfKCPd2YFWJcs5lE/LdzJipO8931XV0mSYA66V4CJzjAp GxZAUWkxGqhQ6l795JreYNUqOVWI2YOPgYvUzg0VUzXqKJkQbVXFUSesJzRCGxfAqcp6 6MKWS4vvndt661AWHHkKx288gR1TshcabgF05oaLKcLmfTxupUzs+JM1R7jcWGBYTwcX 4b729jLP63fcRJmbKFH/KTdGRwmsNlNuof5EWrQg/DSYsMmgIuC/chBhofHVRV2qSN/T 1a4HoQ9XRV2oPt+m15ATvZJAwXJSNFlsgSanaQFBhhkQ6uAj1e8hkQy6IBXRjJiMFkJD a/5Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z1-v6si14971122plo.263.2018.05.04.00.05.23; Fri, 04 May 2018 00:05:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751390AbeEDHFH (ORCPT + 99 others); Fri, 4 May 2018 03:05:07 -0400 Received: from mx2.suse.de ([195.135.220.15]:55936 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750733AbeEDHFF (ORCPT ); Fri, 4 May 2018 03:05:05 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 7908BAE6F; Fri, 4 May 2018 07:05:04 +0000 (UTC) Subject: Re: [PATCH] mm/page_alloc: use ac->high_zoneidx for classzone_idx To: js1304@gmail.com, Andrew Morton Cc: Mel Gorman , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Johannes Weiner , Minchan Kim , Ye Xiaolong , Joonsoo Kim References: <1525408246-14768-1-git-send-email-iamjoonsoo.kim@lge.com> From: Vlastimil Babka Message-ID: <8b06973c-ef82-17d2-a83d-454368de75e6@suse.cz> Date: Fri, 4 May 2018 09:03:02 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <1525408246-14768-1-git-send-email-iamjoonsoo.kim@lge.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/04/2018 06:30 AM, js1304@gmail.com wrote: > From: Joonsoo Kim > > Currently, we use the zone index of preferred_zone which represents > the best matching zone for allocation, as classzone_idx. It has a problem > on NUMA system with ZONE_MOVABLE. > > In NUMA system, it can be possible that each node has different populated > zones. For example, node 0 could have DMA/DMA32/NORMAL/MOVABLE zone and > node 1 could have only NORMAL zone. In this setup, allocation request > initiated on node 0 and the one on node 1 would have different > classzone_idx, 3 and 2, respectively, since their preferred_zones are > different. If they are handled by only their own node, there is no problem. > However, if they are somtimes handled by the remote node, the problem > would happen. > > In the following setup, allocation initiated on node 1 will have some > precedence than allocation initiated on node 0 when former allocation is > processed on node 0 due to not enough memory on node 1. They will have > different lowmem reserve due to their different classzone_idx thus > an watermark bars are also different. > ... > > min watermark for NORMAL zone on node 0 > allocation initiated on node 0: 750 + 4096 = 4846 > allocation initiated on node 1: 750 + 0 = 750 > > This watermark difference could cause too many numa_miss allocation > in some situation and then performance could be downgraded. > > Recently, there was a regression report about this problem on CMA patches > since CMA memory are placed in ZONE_MOVABLE by those patches. I checked > that problem is disappeared with this fix that uses high_zoneidx > for classzone_idx. > > http://lkml.kernel.org/r/20180102063528.GG30397@yexl-desktop > > Using high_zoneidx for classzone_idx is more consistent way than previous > approach because system's memory layout doesn't affect anything to it. So to summarize; - ac->high_zoneidx is computed via the arcane gfp_zone(gfp_mask) and represents the highest zone the allocation can use - classzone_idx was supposed to be the highest zone that the allocation can use, that is actually available in the system. Somehow that became the highest zone that is available on the preferred node (in the default node-order zonelist), which causes the watermark inconsistencies you mention. I don't see a problem with your change. I would be worried about inflated reserves when e.g. ZONE_MOVABLE doesn't exist, but that doesn't seem to be the case. My laptop has empty ZONE_MOVABLE and the ZONE_NORMAL protection for movable is 0. But there had to be some reason for classzone_idx to be like this and not simple high_zoneidx. Maybe Mel remembers? Maybe it was important then, but is not anymore? Sigh, it seems to be pre-git. > With this patch, both classzone_idx on above example will be 3 so will > have the same min watermark. > > allocation initiated on node 0: 750 + 4096 = 4846 > allocation initiated on node 1: 750 + 4096 = 4846 > > Reported-by: Ye Xiaolong > Tested-by: Ye Xiaolong > Signed-off-by: Joonsoo Kim > --- > mm/internal.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/internal.h b/mm/internal.h > index 228dd66..e1d7376 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -123,7 +123,7 @@ struct alloc_context { > bool spread_dirty_pages; > }; > > -#define ac_classzone_idx(ac) zonelist_zone_idx(ac->preferred_zoneref) > +#define ac_classzone_idx(ac) (ac->high_zoneidx) > > /* > * Locate the struct page for both the matching buddy in our >