Received: by 10.192.165.148 with SMTP id m20csp3562053imm; Mon, 7 May 2018 14:35:35 -0700 (PDT) X-Google-Smtp-Source: AB8JxZoBU+Ygkk4U8iB4Qvk1Ihwc1NAvm3ft17oPqCB3cxEGccarq0mqdJTlvmbnZfC4+A9LYJWF X-Received: by 10.98.21.73 with SMTP id 70mr37713729pfv.91.1525728935695; Mon, 07 May 2018 14:35:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525728935; cv=none; d=google.com; s=arc-20160816; b=RL5MTnM2W/LPNLrvUpCSjFd5x2yKHwuLYRJIy+Bxjj4tmmCjXjbvcr4KP+RvJfoNwJ sFWmhNITIwxYckZf7lm997PaQko2PwWx7qeYGrvhtZVRtV1T0abFJZu2fit8rX5AwY2a Sc9GqTTkiy0enP7aHOA/ocn9vhiKxFP6YSWAjCmCbmdJyJB5cDF0Ln5Lk9jBasy2TNH/ jQcjJWC1xTmG6b7My0QBQamv6G2hJDTNWo7rsgNtgz1ui2Nl7mQp1cyXCmJD6Ac13Tuy 2jEL4uzC4jFXy5Y2TS3A1ARpoKDn+bcOSrDMZxwMtUA5LxxPzftKdO3g6ZXD/w/N+7Mr 9OYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:mail-followup-to :reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=hK28ZOFDi9ItzR42ImaAgPBvgov8GAumCeBEJmERQLk=; b=EahX7Kgh2jDns7L2XCr23Vist8IPLY4vRw9G73paOHENHZQM2tJSEduoSSv8q/ejfK oNu8ClV45qo5WnlgNoJFXef0WfAaAvfzk0OxzJOkyMzds8k46PBbGXvLIkasnLstVbxq 9ixZrz1W8q72HZPpByxqpw9mAg2zggEtx/fVLzTFMSa5rfwKlWE0snT0uZM9WVcpq+AL fMzOjfEvhSPN7C5Yy3PAtnL2Gq15IDeyVXY3ysjZvA1qiqg56APsWOtzwfUPPYN4KgkJ GdC5B2OE6ZQ7NHyslgQjanhhsd1Y9AwCwT5x3UazKFJglSs57WsLkv6tJSFQ28SaYyyL oepQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s4-v6si441176pgn.403.2018.05.07.14.35.09; Mon, 07 May 2018 14:35:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753418AbeEGV1o (ORCPT + 99 others); Mon, 7 May 2018 17:27:44 -0400 Received: from mx2.suse.de ([195.135.220.15]:43601 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752908AbeEGV1n (ORCPT ); Mon, 7 May 2018 17:27:43 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 9DC8BAD38; Mon, 7 May 2018 21:27:41 +0000 (UTC) Received: by ds.suse.cz (Postfix, from userid 10065) id 255EBDAD56; Mon, 7 May 2018 23:25:01 +0200 (CEST) Date: Mon, 7 May 2018 23:25:01 +0200 From: David Sterba To: Matthew Wilcox Cc: Huaisheng HS1 Ye , Michal Hocko , "akpm@linux-foundation.org" , "linux-mm@kvack.org" , "vbabka@suse.cz" , "mgorman@techsingularity.net" , "pasha.tatashin@oracle.com" , "alexander.levin@verizon.com" , "hannes@cmpxchg.org" , "penguin-kernel@I-love.SAKURA.ne.jp" , "colyli@suse.de" , NingTing Cheng , "linux-kernel@vger.kernel.org" Subject: Re: [External] Re: [PATCH 2/3] include/linux/gfp.h: use unsigned int in gfp_zone Message-ID: <20180507212500.bdphwfhk55w6vlbb@twin.jikos.cz> Reply-To: dsterba@suse.cz Mail-Followup-To: dsterba@suse.cz, Matthew Wilcox , Huaisheng HS1 Ye , Michal Hocko , "akpm@linux-foundation.org" , "linux-mm@kvack.org" , "vbabka@suse.cz" , "mgorman@techsingularity.net" , "pasha.tatashin@oracle.com" , "alexander.levin@verizon.com" , "hannes@cmpxchg.org" , "penguin-kernel@I-love.SAKURA.ne.jp" , "colyli@suse.de" , NingTing Cheng , "linux-kernel@vger.kernel.org" References: <1525416729-108201-1-git-send-email-yehs1@lenovo.com> <1525416729-108201-3-git-send-email-yehs1@lenovo.com> <20180504133533.GR4535@dhcp22.suse.cz> <20180504154004.GB29829@bombadil.infradead.org> <20180506134814.GB7362@bombadil.infradead.org> <20180506185532.GA13604@bombadil.infradead.org> <20180507184410.GA12361@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180507184410.GA12361@bombadil.infradead.org> User-Agent: NeoMutt/20180323 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 07, 2018 at 11:44:10AM -0700, Matthew Wilcox wrote: > On Mon, May 07, 2018 at 05:16:50PM +0000, Huaisheng HS1 Ye wrote: > > I hope it couldn't cause problem, but based on my analyzation it has the potential to go wrong if users still use the flags as usual, which are __GFP_DMA, __GFP_DMA32 and __GFP_HIGHMEM. > > Let me take an example with my testing platform, these logics are much abstract, an example will be helpful. > > > > There is a two sockets X86_64 server, No HIGHMEM and it has 16 + 16GB memories. > > Its zone types shall be like this below, > > > > ZONE_DMA 0 0b0000 > > ZONE_DMA32 1 0b0001 > > ZONE_NORMAL 2 0b0010 > > (OPT_ZONE_HIGHMEM) 2 0b0010 > > ZONE_MOVABLE 3 0b0011 > > ZONE_DEVICE 4 0b0100 (virtual zone) > > __MAX_NR_ZONES 5 > > > > __GFP_DMA = ZONE_DMA ^ ZONE_NORMAL= 0b0010 > > __GFP_DMA32 = ZONE_DMA32 ^ ZONE_NORMAL= 0b0011 > > __GFP_HIGHMEM = OPT_ZONE_HIGHMEM ^ ZONE_NORMAL = 0b0000 > > __GFP_MOVABLE = ZONE_MOVABLE ^ ZONE_NORMAL | ___GFP_MOVABLE = 0b1001 > > > > Eg. > > If a driver uses flags like this below, > > Step 1: > > gfp_mask | __GFP_DMA32; > > (0b 0000 | 0b 0011 = 0b 0011) > > gfp_mask's low four bits shall equal to 0011, assuming no __GFP_MOVABLE > > > > Step 2: > > gfp_mask & ~__GFP_DMA; > > (0b 0011 & ~0b0010 = 0b0001) > > gfp_mask's low four bits shall equal to 0001 now, then when it enter gfp_zone(), > > > > return ((__force int)flags & ___GFP_ZONE_MASK) ^ ZONE_NORMAL; > > (0b0001 ^ 0b0010 = 0b0011) > > You know 0011 means that ZONE_MOVABLE will be returned. > > In this case, error can be found, because gfp_mask needs to get ZONE_DMA32 originally. > > But with existing GFP_ZONE_TABLE/BAD, it is correct. Because the bits are way of 0x1, 0x2, 0x4, 0x8 > > Yes, I understand your point here. My point was that this was already a bug; > the caller shouldn't simply be clearing __GFP_DMA; they really mean to clear > all of the GFP_ZONE bits so that they allocate from ZONE_NORMAL. And for > that, they should be using ~GFP_ZONEMASK > > Unless they already know, of course. For example, this one in > arch/x86/mm/pgtable.c is fine: > > if (strcmp(arg, "nohigh") == 0) > __userpte_alloc_gfp &= ~__GFP_HIGHMEM; > > because it knows that __userpte_alloc_gfp can only have __GFP_HIGHMEM set. > > But something like btrfs should almost certainly be using ~GFP_ZONEMASK. Agreed, the direct use of __GFP_DMA32 was added in 3ba7ab220e8918176c6f to substitute GFP_NOFS, so the allocation flags are less restrictive but still acceptable for allocation from slab. The requirement from btrfs is to avoid highmem, the 'must be acceptable for slab' requirement is more MM internal and should have been hidden under some opaque flag mask. There was no strong need for that at the time.