Received: by 10.223.185.116 with SMTP id b49csp1055412wrg; Fri, 16 Feb 2018 11:35:44 -0800 (PST) X-Google-Smtp-Source: AH8x226GJsxD2cQipoFXd7OTZkmV3lmeBEqCRQINVZ8Xa68I+vuUq8FLYsVeLdKTnqcAto32hR7V X-Received: by 2002:a17:902:684a:: with SMTP id f10-v6mr2587592pln.129.1518809744326; Fri, 16 Feb 2018 11:35:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518809744; cv=none; d=google.com; s=arc-20160816; b=Vxe2Jy77nEeHSXeDrLNtgYkTBgkP7B7hCNlrQmuuzacBSbgiEw9RULJHgcLERdUJOf NPjZf40uhNgTr3kIW1JDVPJxlKxNnWi7vIYCf+IcXk+yIPhMh3EHJUS9U3nKiZBrRBY3 AbRqtAJ4nQQVD9i7gLy9bmQFF5ojMFUoYo9U3y06Ucu0MnIrYCC0Bff+yg552W5K/a/K QzxK12RFPkDvKu1RmMi7tcfb1QaKzSTLctwUhiCmUwrLcGzUcrR3QEYPyr/U22Gc3uVi 0XE4pyYQ6e2jH8RtfklJpmPPN4iSfWw+HUKyCstTWdWmvc0n75/hmGmZC1YeaD1Np/co bd7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=S2nR256qxY+A11wo9m98Sdqwyy4UncjMFh+Jp1HCRno=; b=lNwa/DH8E5IaMUNh8fn0e3RnL/OM8fco7/7zwrtWqm5jEaQC4vOiDV4uwdq0a6HM3s 4nmBa6Aa/XgmjaTMMWUttB4Yf6zNNMWOK/nZDkogpkRifrjXC43lMSWoPka6/iaB6DlY gY1Q7uGrhTz/SoHIijAbNdzNo8Pa/05omhbMAvNHR0BCeo/dBfCTyJFqCb8vpcUJNGHA C+1AzljrO3MPFlp/04aWAPBCxuIMcjToQUZT633+Rc8lYbGuUdeoI4AY5mMY9pchfVdk JQhcWZSbNXvIcgpGysb8i7fOYfbSEnUSU7vjSLud7RGuqApel5atWHz5C0MSH//FkS1p j6Mg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=Iq2CT5F/; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z127si3098461pgb.74.2018.02.16.11.35.30; Fri, 16 Feb 2018 11:35:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=Iq2CT5F/; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161664AbeBPRKA (ORCPT + 99 others); Fri, 16 Feb 2018 12:10:00 -0500 Received: from bombadil.infradead.org ([198.137.202.133]:41958 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161066AbeBPRJ6 (ORCPT ); Fri, 16 Feb 2018 12:09:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=S2nR256qxY+A11wo9m98Sdqwyy4UncjMFh+Jp1HCRno=; b=Iq2CT5F/68Tg2VwoBZW48qh6u 4Gb5ClfVTJji4qfrJiMwVKqbZ1c3mcZnCdKsoRG3AHQBEhqmAz9+H+i4HLWZvvtMTOMF+5XI9ZCbM +F49fjWng1F1q/x8JMkJZD+xAxsSs/LYa/NYH58cI3k+djWaUP/QnAV7qXZz4amN5iGOXk84ZtCYl qFrVKQt8KUJp0egmkvFY6oup3uNx6FBlS0JQJkaQZZCUs8Y7VOZD+H3kXxGzuz8ArGHDV12X11Ul6 te7ciJZFJWI9P5UhUQYi3/jTZftMOq16m0DSXjBJfmlA+h5lDPCSh26yHZDtLXBwQjUAjMaRN460o aq/lXrYoQ==; Received: from willy by bombadil.infradead.org with local (Exim 4.89 #1 (Red Hat Linux)) id 1emjWG-0008CN-64; Fri, 16 Feb 2018 17:09:56 +0000 Date: Fri, 16 Feb 2018 09:09:55 -0800 From: Matthew Wilcox To: Christopher Lameter Cc: Michal Hocko , David Rientjes , Andrew Morton , Jonathan Corbet , Vlastimil Babka , Mel Gorman , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org Subject: Re: [patch 1/2] mm, page_alloc: extend kernelcore and movablecore for percent Message-ID: <20180216170955.GA17591@bombadil.infradead.org> References: <20180214095911.GB28460@dhcp22.suse.cz> <20180215144525.GG7275@dhcp22.suse.cz> <20180215151129.GB12360@bombadil.infradead.org> <20180215204817.GB22948@bombadil.infradead.org> <20180216160116.GA24395@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 16, 2018 at 10:08:28AM -0600, Christopher Lameter wrote: > On Fri, 16 Feb 2018, Matthew Wilcox wrote: > > I don't understand this response. I'm not suggesting mixing objects > > of different sizes within the same page. The vast majority of slabs > > use order-0 pages, a few use order-1 pages and larger sizes are almost > > unheard of. I'm suggesting the slab have it's own private arena of pages > > that it uses for allocating pages to slabs; when an entire page comes > > free in a slab, it is returned to the arena. When the arena is empty, > > slab requests another arena from the page allocator. > > This just shifts the fragmentation problem because the 2M page cannot be > released until all 4k or 8k pages within that 2M page are freed. How is > that different from the page allocator which cannot coalesce an 2M page > until all fragments have been released? I'm not proposing releasing this 2MB page, unless it naturally frees up. I'm saying that by restricting allocations to be within this 2MB page, we prevent allocating from the adjacent 2MB page. The workload I'm thinking of looks like this ... maybe the result of running 'file' on every inode in a directory: do { Allocate an inode Allocate a page of pagecache } while (lots of times); naively, we allocate a page for the inode slab, then 3-6 pages for page cache (depending on the filesystem), then we allocate another page for the inode slab, then another 3-6 pages of page cache, and so on. So the pages end up looking like this: IPPPPPIP|PPPPIPPP|PPIPPPPP|IPPPPPIP|... Now we need an order-3 allocation. We can't get there just by releasing page cache pages because there's inode slab pages in there, so we need to shrink the inode caches as well. I'm proposing: IIIIII00|PPPPPPPP|PPPPPPPP|PPPPPPPP|PP... and we can get our order-3 allocation just by releasing page cache pages. > The kernelcore already does something similar by limiting the > general unmovable allocs to a section of memory. Right! But Michal's unhappy about kernelcore (see the beginning of this thread), and so I'm proposing an alternative. > Maybe what we should do is raise the lowest allocation size instead and > allocate 2^x groups of pages to certain purposes? > > I.e. have a base allocation size of 16k and if the alloc was a page cache > page then use the remainder for the neigboring pages. Yes, there are a lot of ideas like this floating around; I know Kirill's interested in this kind of thing not just for THP but also for faultaround.