Received: by 10.192.165.148 with SMTP id m20csp612817imm; Fri, 4 May 2018 03:34:02 -0700 (PDT) X-Google-Smtp-Source: AB8JxZo19Xkz/0hEFn20PszHrmIGqI1GXPMEh+v0haloJNX2rx42GKbEbuGwR+Vcg5RfiEUYZ2wF X-Received: by 2002:a17:902:3343:: with SMTP id a61-v6mr17125184plc.241.1525430042505; Fri, 04 May 2018 03:34:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525430042; cv=none; d=google.com; s=arc-20160816; b=rnD5rzKf1w+hgUnwVOlVOWOMnhxfCzRnLpfZssx/TvPzK0xKD8NrAM46mCBwO5JdAM /62BHPOh56cYMrVm6m7sEUPDWR/LexqnJ3Y1YsJCKl7/e4UO8rAJkD8Yr5bwLPusW4Za d4Udkk6rwP9ZElvJwnV7aIQol/dR5W4ZkJfGj5Y3cAKvOzhppjJ6Rp2XkdPFrM5eU6xe 6hCIevnbDxcMW7GKbvvDN1yGbIRFDvKSlaTKJuNIRqNgv6a+RNGmLcwTr3FmuQx2SqIk eofFIzITMlzZ2hydmvcwaT1MNC+z/tQhF9AZbkv+Bl04sVhWGs0P0aTUCnZYbgSnTmGq z1rQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=hKqtFPab3qws1AAL3iBLHa55ddLEH2nq98HP7Zbnggc=; b=eROlS3GjWtuJVJTLv0hT8sd5qjPC8G0i5/VhwoucKd/+UggfmFHHUxKNOVRVVw/1Tm MEdlFUDI9X7Y1NDf/woNguvlGlgZv0aFNchZqT/iWxFCmEQvPJ4ENkfC/ZC7dewuz24l 3drMzPnaJmdT/GKL0RMAMxBrC7BqQTWnj6j5+VJ+u4/PV22MZFgtDJ0M/r2ZxOWPDwfg WOjJX77oGdjZtSIO7Xox2+OSlOzg9ssO3IXUjpbOYz9eejSSi27TgzLkz4uWzAsYgAmh CoM9DHMHwtcrYqj74pf2s9MUm2ayHTN1pCJrTB+9YBhW0BIX03BIeUDtmNlybbck3y6V t1Yw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y64si16755190pfj.239.2018.05.04.03.33.48; Fri, 04 May 2018 03:34:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751287AbeEDKd1 (ORCPT + 99 others); Fri, 4 May 2018 06:33:27 -0400 Received: from mx2.suse.de ([195.135.220.15]:46476 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750707AbeEDKd0 (ORCPT ); Fri, 4 May 2018 06:33:26 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 399CFAD97; Fri, 4 May 2018 10:33:25 +0000 (UTC) Date: Fri, 4 May 2018 11:33:22 +0100 From: Mel Gorman To: Vlastimil Babka Cc: js1304@gmail.com, Andrew Morton , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Johannes Weiner , Minchan Kim , Ye Xiaolong , Joonsoo Kim Subject: Re: [PATCH] mm/page_alloc: use ac->high_zoneidx for classzone_idx Message-ID: <20180504103322.2nbadmnehwdxxaso@suse.de> References: <1525408246-14768-1-git-send-email-iamjoonsoo.kim@lge.com> <8b06973c-ef82-17d2-a83d-454368de75e6@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <8b06973c-ef82-17d2-a83d-454368de75e6@suse.cz> User-Agent: NeoMutt/20170912 (1.9.0) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 04, 2018 at 09:03:02AM +0200, Vlastimil Babka wrote: > > min watermark for NORMAL zone on node 0 > > allocation initiated on node 0: 750 + 4096 = 4846 > > allocation initiated on node 1: 750 + 0 = 750 > > > > This watermark difference could cause too many numa_miss allocation > > in some situation and then performance could be downgraded. > > > > Recently, there was a regression report about this problem on CMA patches > > since CMA memory are placed in ZONE_MOVABLE by those patches. I checked > > that problem is disappeared with this fix that uses high_zoneidx > > for classzone_idx. > > > > http://lkml.kernel.org/r/20180102063528.GG30397@yexl-desktop > > > > Using high_zoneidx for classzone_idx is more consistent way than previous > > approach because system's memory layout doesn't affect anything to it. > > So to summarize; > - ac->high_zoneidx is computed via the arcane gfp_zone(gfp_mask) and > represents the highest zone the allocation can use It's arcane but it was simply a fast-path calculation. A much older definition would be easier to understand but it was slower. > - classzone_idx was supposed to be the highest zone that the allocation > can use, that is actually available in the system. Somehow that became > the highest zone that is available on the preferred node (in the default > node-order zonelist), which causes the watermark inconsistencies you > mention. > I think it *always* was the index of the first preferred zone of a zonelist. The treatment of classzone has changed a lot over the years and I didn't do a historical check but the general intent was always "protect some pages in lower zones". This was particularly important for 32-bit and highmem albeit that is less of a concern today. When it transferred to NUMA, I don't think it ever was seriously considered if it should change as the critical node was likely to be node 0 with all the zones and the remote nodes all used the highest zone. CMA/MOVABLE changed that slightly by allowing the possibility of node0 having a "higher" zone than every other node. When MOVABLE was introduced, it wasn't much of a problem as the purpose of MOVABLE was for systems that dynamically needed to allocate hugetlbfs later in the runtime but for CMA, it was a lot more critical for ordinary usage so this is primarily a CMA thing. > I don't see a problem with your change. I would be worried about > inflated reserves when e.g. ZONE_MOVABLE doesn't exist, but that doesn't > seem to be the case. My laptop has empty ZONE_MOVABLE and the > ZONE_NORMAL protection for movable is 0. > > But there had to be some reason for classzone_idx to be like this and > not simple high_zoneidx. Maybe Mel remembers? Maybe it was important > then, but is not anymore? Sigh, it seems to be pre-git. > classzone predates my involvement with Linux but I would be less concerneed about what the original intent was and instead ensure that classzone index is consistent, sane and potentially renamed while preserving the intent of "reserve pages in lower zones when an allocation request can use higher zones". While historically the critical intent was to preserve Normal and to a lesser extent DMA on 32-bit systems, there still should be some care of DMA32 so we should not lose that. With the patch, the allocator looks like it would be fine as just reservations change. I think it's unlikely that CMA usage will result in lowmem starvation. Compaction becomes a bit weird as classzone index has no special meaning versis highmem and I think it'll be very easy to forget. Similarly, vmscan can reclaim pages from remote nodes and zones that are higher than the original request. That is not likely to be a problem but it's a change in behaviour and easy to miss. Fundamentally, I find it extremely weird we now have two variables that are essentially the same thing. They should be collapsed into one variable, renamed and documented on what the index means for page allocator, compaction, vmscan and the special casing around CMA. -- Mel Gorman SUSE Labs