Received: by 10.192.165.148 with SMTP id m20csp4866172imm; Tue, 8 May 2018 16:14:49 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpfQgZFQnmHICbhHQU/7WZdvfR5wRQWzjpajnp/RdJqvE6YqKmNIRi18LX5YU9G+0XmK/cE X-Received: by 10.98.222.2 with SMTP id h2mr39892148pfg.205.1525821289181; Tue, 08 May 2018 16:14:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525821289; cv=none; d=google.com; s=arc-20160816; b=C+cwhhBw2OLs5anHUoVRxzR6S1K1rRu/jQYsB+b5KOLqdzdhN/wt33xcMBqDZaKCWE 0BE3j2igEKiYnm5ShrDySGOX2zm8gM/xyFo/+JvFnVDceIMRU1Dvk2qv2rD8FH35jYnS GaVf+qmejew2OWTgdgN5QgAoHAzGZQLbRbfTabiocBms7TLUFZWT3KuH4qEyxa508hXZ ltB0JMK6CpYcC6t0i7CHhznyQE5157FxTbbQd8VIUD+FG2Dtc6vHq7QXyDwCz4k3XtVZ guq9LVSMola3wZc+pg2IjszTGHMr7DX526bOeeNRUHstNR7vXFsYs6MBWzEtu6oTxlUn ELng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=CY1pO/TxgyHNY9e51Ps0cAZw6fURBY/v3kRVIUwO/x4=; b=Yy4TiDBAAk0g+uH1ueiLRB6SdjuaD6D91V4W+LMXwA41m+lRdD2nLJXrPEgp7u/BYW EpmiOaSbHbBl8rG7JhkHo+c+txh3VWPiEWB8iRbDUdGOVcYVdlSAbsbjI77D+jPn0X3x Z5u0NZlSYKvPFth0e+TR3uuDZ3otO6pKiFUDoIEnKV4T8ZD1vtNjMRNvnyEXW0tnR1iY /bQ/fKg+B9t8hehelV2mAg6eC8/bHFR16A/01eery6yTW1r0+6haHgBasnqCYR8MP8z9 oMVJuei7/3VhpXnP7EuqQiqqVRTpL6JsV17qT42HY/gTRTumECO84y4FTV5OkK7v1lex aVbw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s5-v6si1870374pgo.12.2018.05.08.16.14.34; Tue, 08 May 2018 16:14:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756024AbeEHXNM (ORCPT + 99 others); Tue, 8 May 2018 19:13:12 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:34722 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755975AbeEHXNL (ORCPT ); Tue, 8 May 2018 19:13:11 -0400 Received: from akpm3.svl.corp.google.com (unknown [104.133.9.71]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 8BF1BC3D; Tue, 8 May 2018 23:13:10 +0000 (UTC) Date: Tue, 8 May 2018 16:13:09 -0700 From: Andrew Morton To: Mel Gorman Cc: Vlastimil Babka , js1304@gmail.com, Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Johannes Weiner , Minchan Kim , Ye Xiaolong , Joonsoo Kim , Andrea Arcangeli Subject: Re: [PATCH] mm/page_alloc: use ac->high_zoneidx for classzone_idx Message-Id: <20180508161309.f8ef0a4962b1721863902e60@linux-foundation.org> In-Reply-To: <20180504103322.2nbadmnehwdxxaso@suse.de> References: <1525408246-14768-1-git-send-email-iamjoonsoo.kim@lge.com> <8b06973c-ef82-17d2-a83d-454368de75e6@suse.cz> <20180504103322.2nbadmnehwdxxaso@suse.de> X-Mailer: Sylpheed 3.6.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 4 May 2018 11:33:22 +0100 Mel Gorman wrote: > On Fri, May 04, 2018 at 09:03:02AM +0200, Vlastimil Babka wrote: > > > min watermark for NORMAL zone on node 0 > > > allocation initiated on node 0: 750 + 4096 = 4846 > > > allocation initiated on node 1: 750 + 0 = 750 > > > > > > This watermark difference could cause too many numa_miss allocation > > > in some situation and then performance could be downgraded. > > > > > > Recently, there was a regression report about this problem on CMA patches > > > since CMA memory are placed in ZONE_MOVABLE by those patches. I checked > > > that problem is disappeared with this fix that uses high_zoneidx > > > for classzone_idx. > > > > > > http://lkml.kernel.org/r/20180102063528.GG30397@yexl-desktop > > > > > > Using high_zoneidx for classzone_idx is more consistent way than previous > > > approach because system's memory layout doesn't affect anything to it. > > > > So to summarize; > > - ac->high_zoneidx is computed via the arcane gfp_zone(gfp_mask) and > > represents the highest zone the allocation can use > > It's arcane but it was simply a fast-path calculation. A much older > definition would be easier to understand but it was slower. > > > - classzone_idx was supposed to be the highest zone that the allocation > > can use, that is actually available in the system. Somehow that became > > the highest zone that is available on the preferred node (in the default > > node-order zonelist), which causes the watermark inconsistencies you > > mention. > > > > I think it *always* was the index of the first preferred zone of a > zonelist. The treatment of classzone has changed a lot over the years and > I didn't do a historical check but the general intent was always "protect > some pages in lower zones". This was particularly important for 32-bit > and highmem albeit that is less of a concern today. When it transferred to > NUMA, I don't think it ever was seriously considered if it should change > as the critical node was likely to be node 0 with all the zones and the > remote nodes all used the highest zone. CMA/MOVABLE changed that slightly > by allowing the possibility of node0 having a "higher" zone than every > other node. When MOVABLE was introduced, it wasn't much of a problem as > the purpose of MOVABLE was for systems that dynamically needed to allocate > hugetlbfs later in the runtime but for CMA, it was a lot more critical > for ordinary usage so this is primarily a CMA thing. > > > I don't see a problem with your change. I would be worried about > > inflated reserves when e.g. ZONE_MOVABLE doesn't exist, but that doesn't > > seem to be the case. My laptop has empty ZONE_MOVABLE and the > > ZONE_NORMAL protection for movable is 0. > > > > But there had to be some reason for classzone_idx to be like this and > > not simple high_zoneidx. Maybe Mel remembers? Maybe it was important > > then, but is not anymore? Sigh, it seems to be pre-git. > > > > classzone predates my involvement with Linux but I would be less concerneed > about what the original intent was and instead ensure that classzone index > is consistent, sane and potentially renamed while preserving the intent of > "reserve pages in lower zones when an allocation request can use higher > zones". While historically the critical intent was to preserve Normal and > to a lesser extent DMA on 32-bit systems, there still should be some care > of DMA32 so we should not lose that. > > With the patch, the allocator looks like it would be fine as just > reservations change. I think it's unlikely that CMA usage will result > in lowmem starvation. Compaction becomes a bit weird as classzone index > has no special meaning versis highmem and I think it'll be very easy to > forget. Similarly, vmscan can reclaim pages from remote nodes and zones > that are higher than the original request. That is not likely to be a > problem but it's a change in behaviour and easy to miss. > > Fundamentally, I find it extremely weird we now have two variables that are > essentially the same thing. They should be collapsed into one variable, > renamed and documented on what the index means for page allocator, > compaction, vmscan and the special casing around CMA. You're all so young ;) classzone was Andrea. Perhaps he can shed some light upon the questions which have been raised?