Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp880671pxk; Thu, 10 Sep 2020 00:33:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxRprDrskFJSaF0FM09KX/DmV4KDlYcuhGy8Wmkb7pB9/7Bm1jmnSk8MuFRlaJ30NfR6cXD X-Received: by 2002:aa7:dc08:: with SMTP id b8mr8143209edu.271.1599723234062; Thu, 10 Sep 2020 00:33:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599723234; cv=none; d=google.com; s=arc-20160816; b=NFtaVV1GueWIxMmw+mIPiTMQPT6zt4SJBCL3XxarLPioU2gaGGGdZc+j9Yc70BWBpe oCBNASyyN+B/IUsaRR1qfGjUJfj+WKnwyEBn7uS37K6vcJfXLqU9+gb66L5frsfrP3+h Kidg5YmOGih9Y1tDPvA9r5jXFksvwWKg0TyxAm15oxy3DL3VpmpFKKvG5y2hooMxO7nw M6qo5GEKMBDfi8LKl2G8q8engcQYKUdhbYNfnL82B0e+CvIOO7hj4JcQmCfRl5a6hBF/ mS1mFNmXKiwu8Zma9m8kwu8Bb0O2Q1INBwR7EOkWDBKiRIo5SNtkUYjGMuQw2WSEukyr +cXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=RlaCFxHTp+PvG+vXuYB8V6AkJUvVshSXYbMRZpABEfU=; b=DQxaALZ6BZmzgneiLsIq8cs0mF1GhB36htZeAyoLB3WnoIifER/9cfkOsd4wJo7H1v FqbER8CRyn/qeubu70TAR/P4UhXlwOto+xnWchG+yavw3CDyolZlzI86wjEH3/VKTwyr UHy7MtticzjrrXuOiEy3jhSS03JxHkPNriHx0k2BW+N+DpRRawxfraTKx13oJa+fjORg wT18ucwfo2WafFbq5TBfL3A6UpBqWIdsT2r5orGgwzs7rWuC1+jslWXdpT9x62F2VbBX O7rAfjqFgx7em8WMjEnT5bKQjVqCG0L/pS/jxtDEBxVavlgf1/zpvACsOiBdzi7NF/ks pQyw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f2si3035519edn.26.2020.09.10.00.33.31; Thu, 10 Sep 2020 00:33:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729251AbgIJHcY (ORCPT + 99 others); Thu, 10 Sep 2020 03:32:24 -0400 Received: from mx2.suse.de ([195.135.220.15]:59368 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726932AbgIJHcR (ORCPT ); Thu, 10 Sep 2020 03:32:17 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id C5650B339; Thu, 10 Sep 2020 07:32:30 +0000 (UTC) Date: Thu, 10 Sep 2020 09:32:13 +0200 From: Michal Hocko To: David Hildenbrand Cc: Rik van Riel , Zi Yan , Roman Gushchin , "Kirill A. Shutemov" , linux-mm@kvack.org, "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org, Vlastimil Babka , Mel Gorman Subject: Re: [RFC PATCH 00/16] 1GB THP support on x86_64 Message-ID: <20200910073213.GC28354@dhcp22.suse.cz> References: <20200902180628.4052244-1-zi.yan@sent.com> <20200903142300.bjq2um5y5nwocvar@box> <20200903163020.GG60440@carbon.dhcp.thefacebook.com> <8e677ead-206d-08dd-d73e-569bd3803e3b@redhat.com> <7E20392E-5ED7-4C22-9555-F3BAABF3CBE9@nvidia.com> <20200908143503.GE26850@dhcp22.suse.cz> <7ed82cb06074b30c2956638082c515fb179f69a3.camel@surriel.com> <20200909070445.GA7348@dhcp22.suse.cz> <054d02f3b34d9946905929ff268b685c91494b3e.camel@surriel.com> <6135d2c5-2a74-6ca8-4b3b-8ceb25c0d4b1@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6135d2c5-2a74-6ca8-4b3b-8ceb25c0d4b1@redhat.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Cc Vlastimil and Mel - the whole email thread starts http://lkml.kernel.org/r/20200902180628.4052244-1-zi.yan@sent.com but this particular subthread has diverged a bit and you might find it interesting] On Wed 09-09-20 15:43:55, David Hildenbrand wrote: > On 09.09.20 15:19, Rik van Riel wrote: > > On Wed, 2020-09-09 at 09:04 +0200, Michal Hocko wrote: > >> On Tue 08-09-20 10:41:10, Rik van Riel wrote: > >>> On Tue, 2020-09-08 at 16:35 +0200, Michal Hocko wrote: > >>> > >>>> A global knob is insufficient. 1G pages will become a very > >>>> precious > >>>> resource as it requires a pre-allocation (reservation). So it > >>>> really > >>>> has > >>>> to be an opt-in and the question is whether there is also some > >>>> sort > >>>> of > >>>> access control needed. > >>> > >>> The 1GB pages do not require that much in the way of > >>> pre-allocation. The memory can be obtained through CMA, > >>> which means it can be used for movable 4kB and 2MB > >>> allocations when not > >>> being used for 1GB pages. > >> > >> That CMA has to be pre-reserved, right? That requires a > >> configuration. > > > > To some extent, yes. > > > > However, because that pool can be used for movable > > 4kB and 2MB > > pages as well as for 1GB pages, it would be easy to just set > > the size of that pool to eg. 1/3 or even 1/2 of memory for every > > system. > > > > It isn't like the pool needs to be the exact right size. We > > just need to avoid the "highmem problem" of having too little > > memory for kernel allocations. > > > > I am not sure I like the trend towards CMA that we are seeing, reserving > huge buffers for specific users (and eventually even doing it > automatically). > > What we actually want is ZONE_MOVABLE with relaxed guarantees, such that > anybody who requires large, unmovable allocations can use it. > > I once played with the idea of having ZONE_PREFER_MOVABLE, which > a) Is the primary choice for movable allocations > b) Is allowed to contain unmovable allocations (esp., gigantic pages) > c) Is the fallback for ZONE_NORMAL for unmovable allocations, instead of > running out of memory I might be missing something but how can this work longterm? Or put in another words why would this work any better than existing fragmentation avoidance techniques that page allocator implements already - movability grouping etc. Please note that I am not deeply familiar with those but my high level understanding is that we already try hard to not mix movable and unmovable objects in same page blocks as much as we can. My suspicion is that a separate zone would work in a similar fashion. As long as there is a lot of free memory then zone will be effectively MOVABLE. Similar applies to normal zone when unmovable allocations are in minority. As long as the Normal zone gets full of unmovable objects they start overflowing to ZONE_PREFER_MOVABLE and it will resemble page block stealing when unmovable objects start spreading over movable page blocks. Again, my level of expertise to page allocator is quite low so all the above might be simply wrong... > If someone messes up the zone ratio, issues known from zone imbalances > are avoided - large allocations simply become less likely to succeed. In > contrast to ZONE_MOVABLE, memory offlining is not guaranteed to work. -- Michal Hocko SUSE Labs