Received: by 10.192.165.148 with SMTP id m20csp3712943imm; Mon, 7 May 2018 18:02:11 -0700 (PDT) X-Google-Smtp-Source: AB8JxZp9sU+CUhHysfOCPZ8WsIWu7vJ9LS1AKtvP7kNdws+5IwDmtS59XlEh2+HqOVMOAHtb+faF X-Received: by 10.98.157.137 with SMTP id a9mr37849865pfk.206.1525741331711; Mon, 07 May 2018 18:02:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525741331; cv=none; d=google.com; s=arc-20160816; b=GFD6ua82FIXDZwqFOqkbS3OnygcV0tAI7iv1lzG5oQf+2nUDINW9U8OmbdEQq5lMYQ KD8LR5RYt/qi9ztNgWUNMKBKKQga3DzWlntv7xNWkH3yAvLx4/zqSTv7vLiUPlrN9WNp Vkf1WZgp1+GRblS/x2NJUrQHU9MyExKfeLaDLMZK1xNlqmPhmTX3xZoZYDtRnpQQoc0B l+X5pctAOVlyIj3rYhFMlSMphPVG+RgxRn33A1SHdiYE+XdrTPTeUSZcy78Sspvx2VDs x7ZpfG/j/Eed78m3sI+KK2Ls83xJqsPpG5eRw7YXoeWJQnyr/JugXMatDSyDHbEI0F1F h9vg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=3PW/T3a6GBHEFald++Ca70QITSpHDk7hsarx8OMa1eA=; b=Qy+4goEHJvsLuX8NmLHuv4Th9skV4LPUvHZHNxcSkgNpoGnw6cDBaYe1B3YBuTo+uz baK92a5YWsh6AoGftPGxjgoSByc7aB0rewELl1mzCCabPuZwZMV9TOLDnHm4+heO9i9J 16ZjxRj6Y1olgdxbxKEs2FZUKSFjm8BkmLQzXco0rr263FzbR0AFwCAXeFsof78Lo3By 4q+bBY8bJrD3d1JB0umTYJRnhiXEmtoQWMIC2fZh2YzEePofp6d/JAkXhN2DThVy1p88 ss6Q7s3sHAb/H8f5vxBEpAwoKZtx1577zs48+/Pug+W2ch8oT1bFYvI+4SLEyDbjBBOy T+3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=cc9bIx0c; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a21si21973121pfo.31.2018.05.07.18.01.56; Mon, 07 May 2018 18:02:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=cc9bIx0c; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753581AbeEHBBD (ORCPT + 99 others); Mon, 7 May 2018 21:01:03 -0400 Received: from mail-wr0-f169.google.com ([209.85.128.169]:45954 "EHLO mail-wr0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751647AbeEHBBC (ORCPT ); Mon, 7 May 2018 21:01:02 -0400 Received: by mail-wr0-f169.google.com with SMTP id p5-v6so30474080wre.12 for ; Mon, 07 May 2018 18:01:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=3PW/T3a6GBHEFald++Ca70QITSpHDk7hsarx8OMa1eA=; b=cc9bIx0cl4ZmBVWtABGW0/vH77HfD57Djp1+JI6WcR/KWXbJ+CfYgi6kkSvAn5QayT Xbw3S6MOJVZXRlPyYSoQmruJt8Kkul05IRYxy0kqLgiywLtJDl+uI7wHcIgd4LgkY6m9 AL6JhiI8IMNUnUxmaKXoSwgTHA0FJTLKbZ6H8fVjMfbEAhum7J84NFBHMCfEz5J8LlGi 219vJFljZyI2qJLQNnib8c5rHKOEo4p7j8Y9cjqGHNPX4txA9QH6rjBxE2EXP9izI/F9 cR8aqPEdupWrNgguVLgEwJGNxslYG8lISCQs4FUU1SJvsicuUxtFgAQDeWkwxZzP2+QJ OlTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=3PW/T3a6GBHEFald++Ca70QITSpHDk7hsarx8OMa1eA=; b=QCCiO5BZYsxWJlv0geb5hInb2ti1fOPwi89J56E+c3CIQ+QktTfF+suHAwfB2YEgSK 6C1O3NwPe+jxPYOyF3aHNkQKj7AhE7OGRTH2XpLkVdNBkQ/4RpFGs/1AItxnawKs2kK4 ItkyXPaUzUlaMfANph/uyGnznOwuTOlIna+R4U2BEITuTXaU+/Y08Cx9KXo6bUzkRc14 0+Hyg2NEeCKN1QqqeABQOplBMqTVu/EVYgxz7QHC5O/NYZVO3E7IvmVQY1lNgWeOXYt3 56VAeZvnGJYDatnBYMkZJVc6oSJGIvr34hXfawGLZarpm3cfIj4Pax7aX99oWdV++esf IH8g== X-Gm-Message-State: ALQs6tBI3faj9ehxzIAeg+Ghx6kSmf7m2+eNTuMAe2q7L5dtI/xcQM1f PiVQXeFXjX2fWVY0MeDuqglR5Kn7+aioIdCsvt0Mmw== X-Received: by 2002:adf:a970:: with SMTP id u103-v6mr29782212wrc.71.1525741260800; Mon, 07 May 2018 18:01:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.187.131 with HTTP; Mon, 7 May 2018 18:00:59 -0700 (PDT) In-Reply-To: <20180504103322.2nbadmnehwdxxaso@suse.de> References: <1525408246-14768-1-git-send-email-iamjoonsoo.kim@lge.com> <8b06973c-ef82-17d2-a83d-454368de75e6@suse.cz> <20180504103322.2nbadmnehwdxxaso@suse.de> From: Joonsoo Kim Date: Tue, 8 May 2018 10:00:59 +0900 Message-ID: Subject: Re: [PATCH] mm/page_alloc: use ac->high_zoneidx for classzone_idx To: Mel Gorman Cc: Vlastimil Babka , Andrew Morton , Michal Hocko , Linux Memory Management List , LKML , Johannes Weiner , Minchan Kim , Ye Xiaolong , Joonsoo Kim Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Mel. Thanks for precious input! 2018-05-04 19:33 GMT+09:00 Mel Gorman : > On Fri, May 04, 2018 at 09:03:02AM +0200, Vlastimil Babka wrote: >> > min watermark for NORMAL zone on node 0 >> > allocation initiated on node 0: 750 + 4096 = 4846 >> > allocation initiated on node 1: 750 + 0 = 750 >> > >> > This watermark difference could cause too many numa_miss allocation >> > in some situation and then performance could be downgraded. >> > >> > Recently, there was a regression report about this problem on CMA patches >> > since CMA memory are placed in ZONE_MOVABLE by those patches. I checked >> > that problem is disappeared with this fix that uses high_zoneidx >> > for classzone_idx. >> > >> > http://lkml.kernel.org/r/20180102063528.GG30397@yexl-desktop >> > >> > Using high_zoneidx for classzone_idx is more consistent way than previous >> > approach because system's memory layout doesn't affect anything to it. >> >> So to summarize; >> - ac->high_zoneidx is computed via the arcane gfp_zone(gfp_mask) and >> represents the highest zone the allocation can use > > It's arcane but it was simply a fast-path calculation. A much older > definition would be easier to understand but it was slower. > >> - classzone_idx was supposed to be the highest zone that the allocation >> can use, that is actually available in the system. Somehow that became >> the highest zone that is available on the preferred node (in the default >> node-order zonelist), which causes the watermark inconsistencies you >> mention. >> > > I think it *always* was the index of the first preferred zone of a > zonelist. The treatment of classzone has changed a lot over the years and > I didn't do a historical check but the general intent was always "protect > some pages in lower zones". This was particularly important for 32-bit > and highmem albeit that is less of a concern today. When it transferred to > NUMA, I don't think it ever was seriously considered if it should change > as the critical node was likely to be node 0 with all the zones and the > remote nodes all used the highest zone. CMA/MOVABLE changed that slightly > by allowing the possibility of node0 having a "higher" zone than every I think that this problem is related to not only protection of the lowmem (that is lower than normal) but also node balance. In fact, problem reported by zeroday-bot is caused by node1 having a "higher" zone. In this case, node0's lowmem is protected well but node balance of the allocation is broken since node1's normal memory cannot be protected from allocation that is initiated on remote node. > other node. When MOVABLE was introduced, it wasn't much of a problem as > the purpose of MOVABLE was for systems that dynamically needed to allocate > hugetlbfs later in the runtime but for CMA, it was a lot more critical > for ordinary usage so this is primarily a CMA thing. I'm not sure that it's primarily a CMA thing. There is an another critical setup for this problem, that is, memory hotplug. If someone plug-in a new memory to the MOVABLE zone, "higher" zone will be created in a specific node and this problem happens. I have checked this with QEMU. >> I don't see a problem with your change. I would be worried about >> inflated reserves when e.g. ZONE_MOVABLE doesn't exist, but that doesn't >> seem to be the case. My laptop has empty ZONE_MOVABLE and the >> ZONE_NORMAL protection for movable is 0. >> >> But there had to be some reason for classzone_idx to be like this and >> not simple high_zoneidx. Maybe Mel remembers? Maybe it was important >> then, but is not anymore? Sigh, it seems to be pre-git. >> > > classzone predates my involvement with Linux but I would be less concerneed > about what the original intent was and instead ensure that classzone index > is consistent, sane and potentially renamed while preserving the intent of > "reserve pages in lower zones when an allocation request can use higher > zones". While historically the critical intent was to preserve Normal and > to a lesser extent DMA on 32-bit systems, there still should be some care > of DMA32 so we should not lose that. Agreed! > With the patch, the allocator looks like it would be fine as just > reservations change. I think it's unlikely that CMA usage will result > in lowmem starvation. Compaction becomes a bit weird as classzone index > has no special meaning versis highmem and I think it'll be very easy to > forget. Similarly, vmscan can reclaim pages from remote nodes and zones > that are higher than the original request. That is not likely to be a > problem but it's a change in behaviour and easy to miss. > > Fundamentally, I find it extremely weird we now have two variables that are > essentially the same thing. They should be collapsed into one variable, > renamed and documented on what the index means for page allocator, > compaction, vmscan and the special casing around CMA. Agreed! I will update this patch to reflect your comment. If someone have an idea on renaming this variable, please let me know. Thanks.