Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp2934318imm; Thu, 24 May 2018 19:22:12 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrBPdPUdHwtEx2y6jJlecOsO8qVSgUruI3N9rVpqDOeiTkvXgFMZ9KGhvyIFi4UB1+HTn8k X-Received: by 2002:a62:db05:: with SMTP id f5-v6mr520747pfg.137.1527214932427; Thu, 24 May 2018 19:22:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527214932; cv=none; d=google.com; s=arc-20160816; b=XgSv9SP6gV0qlI4BZlRyBTCJjOv4hfQ+P3O1mKNLnC+nxBPnq4iRZkFLQH5fpx0jUo BcdoqAjGqRH0zKkNZxb7E5cucCGiRTMVuHeQ3vUF1cDKXoaQloelYKYEvV8xXZ8tVbzd UhjEnIaUDwylT3AJGEeIx9iuBBGkCW6c7r1eh572InqprhCTKelz8tqglHwsiC9pY6Oc qnzpNzcfH79mzWg7oBU1kdsv4FfESW3+t7rQhzTocjlTec+Ez2GjdR7x4jKelxPtP/TY GduBrotg6+1SoxZOKz0E66nKagdegLa8MiwndecRN+DUcr5ODYSAGco2h7rZBvTPZTdi veTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=7nsDdfRmD1yxTLOzwnaAMcHguwZ/GzfwPjN0XdPCeuA=; b=sR2/vvuXFFpHxHVany2TAdBeZQRSYq/9P7sset7cnh1yErj8DvBWQNOtidLyH5nGHq Wze/vkJzmdhz8WJa46v/iDT1+MrPaIn05Ev2PXtIvnpE+Vwo6x9JJGaV1rgE9wiWnV6v C3tOfUJtmd65jRG0LuCgKJTOdnlJtZdSB5pfRat1DPlOcRcZEtrRKvn8lW4i3q3loUpM 1GjYv/IiPRmfYxN8fHuyZ4G24bDqVvgAzNPJE5Ga90JhuJZUixSzyAc9DrmTnqHMLZWK tr7bh3w31C1KEbF51NfWLRH5mirBRXho6C2LokczVciLcjf/tjjpvodWPrTwIUvEg9uc EC/w== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=p5T1dQ3U; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n9-v6si17575244pgq.470.2018.05.24.19.21.57; Thu, 24 May 2018 19:22:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=p5T1dQ3U; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S970891AbeEXPSZ (ORCPT + 99 others); Thu, 24 May 2018 11:18:25 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:45558 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966307AbeEXPSX (ORCPT ); Thu, 24 May 2018 11:18:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=7nsDdfRmD1yxTLOzwnaAMcHguwZ/GzfwPjN0XdPCeuA=; b=p5T1dQ3UzBBak7elgG0yGdU53 NX3MgCqCICw5So8u4JmhvrmiqbJ0rFjVg2lXryzLHXWubfumqa58kraszl3bWJQbvKyN3jhcgll12 qkz0a+zH6a4SxzTDZkMiF4cHjMKY/W0nztH3klt3mflV1S/J5bmsKYGLVMiGVRJkyRewaB4qHnrgu lzTE/TVQ27Sy9DS0rOJtXz1Dp5inA+386tHVCgnuy9RfbeaVh0JeA9boOny76hHAph4WKyIymYZft KHjWTknkSRynUrjt4oOZA+oTs2BSxoKCC3L9Gn+asQ+Cg8gYC56FscXy/dRQHmjRcxgxuHsJJ5uZc iCjth/Mzw==; Received: from willy by bombadil.infradead.org with local (Exim 4.90_1 #2 (Red Hat Linux)) id 1fLs0Q-0006V7-Uj; Thu, 24 May 2018 15:18:18 +0000 Date: Thu, 24 May 2018 08:18:18 -0700 From: Matthew Wilcox To: Michal Hocko Cc: Huaisheng Ye , akpm@linux-foundation.org, linux-mm@kvack.org, vbabka@suse.cz, mgorman@techsingularity.net, kstewart@linuxfoundation.org, alexander.levin@verizon.com, gregkh@linuxfoundation.org, colyli@suse.de, chengnt@lenovo.com, hehy1@lenovo.com, linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org, xen-devel@lists.xenproject.org, linux-btrfs@vger.kernel.org, Huaisheng Ye Subject: Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD Message-ID: <20180524151818.GA21245@bombadil.infradead.org> References: <1526916033-4877-1-git-send-email-yehs2007@gmail.com> <20180522183728.GB20441@dhcp22.suse.cz> <20180524051919.GA9819@bombadil.infradead.org> <20180524122323.GH20441@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180524122323.GH20441@dhcp22.suse.cz> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 24, 2018 at 02:23:23PM +0200, Michal Hocko wrote: > > If we had eight ZONEs, we could offer: > > No, please no more zones. What we have is quite a maint. burden on its > own. Ideally we should only have lowmem, highmem and special/device > zones for directly kernel accessible memory, the one that the kernel > cannot or must not use and completely special memory managed out of > the page allocator. All the remaining constrains should better be > implemented on top. I believe you when you say that they're a maintenance pain. Is that maintenance pain because they're so specialised? ie if we had more, could we solve our pain by making them more generic? > > ZONE_16M // 24 bit > > ZONE_256M // 28 bit > > ZONE_LOWMEM // CONFIG_32BIT only > > ZONE_4G // 32 bit > > ZONE_64G // 36 bit > > ZONE_1T // 40 bit > > ZONE_ALL // everything larger > > ZONE_MOVABLE // movable allocations; no physical address guarantees > > > > #ifdef CONFIG_64BIT > > #define ZONE_NORMAL ZONE_ALL > > #else > > #define ZONE_NORMAL ZONE_LOWMEM > > #endif > > > > This would cover most driver DMA mask allocations; we could tweak the > > offered zones based on analysis of what people need. > > But those already do have aproper API, IIUC. So do we really need to > make our GFP_*/Zone API more complicated than it already is? I don't want to change the driver API (setting the DMA mask, etc), but we don't actually have a good API to the page allocator for the implementation of dma_alloc_foo() to request pages. More or less, architectures do: if (mask < 4GB) alloc_page(GFP_DMA) else if (mask < 64EB) alloc_page(GFP_DMA32) else alloc_page(GFP_HIGHMEM) it more-or-less sucks that the devices with 28-bit DMA limits are forced to allocate from the low 16MB when they're perfectly capable of using the low 256MB. Sure, my proposal doesn't help 27 or 26 bit DMA mask devices, but those are pretty rare. I'm sure you don't need reminding what a mess vmalloc_32 is, and the implementation of saa7146_vmalloc_build_pgtable() just hurts. > > #define GFP_HIGHUSER (GFP_USER | ZONE_ALL) > > #define GFP_HIGHUSER_MOVABLE (GFP_USER | ZONE_MOVABLE) > > > > One other thing I want to see is that fallback from zones happens from > > highest to lowest normally (ie if you fail to allocate in 1T, then you > > try to allocate from 64G), but movable allocations hapen from lowest > > to highest. So ZONE_16M ends up full of page cache pages which are > > readily evictable for the rare occasions when we need to allocate memory > > below 16MB. > > > > I'm sure there are lots of good reasons why this won't work, which is > > why I've been hesitant to propose it before now. > > I am worried you are playing with a can of worms... Yes. Me too.