Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp369615pxf; Wed, 10 Mar 2021 08:00:57 -0800 (PST) X-Google-Smtp-Source: ABdhPJyhh5tQdwflQythCY2AMQwZMbPP1cG3+i2fI+9FWJtoVDm3q6IAt4hU4Q7FK38jducjOC0a X-Received: by 2002:aa7:c7d5:: with SMTP id o21mr3388094eds.166.1615392057103; Wed, 10 Mar 2021 08:00:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615392057; cv=none; d=google.com; s=arc-20160816; b=ST9OC5cA4YJQ/LU7ZB92S7WpnlZ8VBygexe6hXYnQq0yqIElBlC1dz9+wMAO2D2t/8 coX/JEcBJet22OZj0CkytGEvyL+sLuM3xtR01rc56yahGpHz9j/FlHua8u5To/Xv2A1A pXFCzLZeXZl6WgJ8rDqXp0vf4FC2sLBZcfs9S78xPO4ZGLCQ/cc9++ZSG9EPO5q7hYHS OPXwLGKJ1XJ6GBVdKIRrAiGjGHNfECiMZH64UMjngn3/4bPDG2p0vNQ0WYcQ5pRrO32f RrUX8lX/P7UjGe864iFQeg+RX6YLMTJ2Tvyd63Gm3A2Lwk4i2GZKpwTHspVBs/QJlskZ c2EA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=zfnsGSMGd9G9novCqEOP8MqvMnBHpRLv2bnAk6Txk1M=; b=d4LWb+w5bM9wRbsYS/Wiwkk+7USRctlwXouG0DrkLkLVdtUBBV2MBxG/b9yrKvwwCo JFm6A7UjI5rAU3BbMEV9fefq1Fof2kOk/PnSWf962EsUajrp6VtTEOS/gUyx0EXkuDWz F02sphsYaF+NGEQ6mkbtL9jUuBFwwdnoB5EGY/JJqpQX1Ib2cMwDXTgKydwL7NcbqqJC JxI2wncIyjn1RqTIZgJD2525NDDxBJE5eoUdhhXz8PLlCrFp1dfhC8THzYXZcY7iwL+X pheF/N2WgJ+mzp7A9QtmxMVW+wSqICH7uKbGi8pnvTNcHwfJ0Btw+d5YDtil6H4iR1mE wQ6Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a19si12249618ejt.403.2021.03.10.08.00.32; Wed, 10 Mar 2021 08:00:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232470AbhCJP7d (ORCPT + 99 others); Wed, 10 Mar 2021 10:59:33 -0500 Received: from mx2.suse.de ([195.135.220.15]:37494 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232191AbhCJP7C (ORCPT ); Wed, 10 Mar 2021 10:59:02 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 19E5AAC1F; Wed, 10 Mar 2021 15:59:01 +0000 (UTC) Date: Wed, 10 Mar 2021 16:58:58 +0100 From: Oscar Salvador To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, David Hildenbrand , Michal Hocko , Zi Yan , David Rientjes , Andrew Morton Subject: Re: [RFC PATCH 0/3] hugetlb: add demote/split page functionality Message-ID: <20210310155843.GA14328@linux> References: <20210309001855.142453-1-mike.kravetz@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210309001855.142453-1-mike.kravetz@oracle.com> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 08, 2021 at 04:18:52PM -0800, Mike Kravetz wrote: > The concurrent use of multiple hugetlb page sizes on a single system > is becoming more common. One of the reasons is better TLB support for > gigantic page sizes on x86 hardware. In addition, hugetlb pages are > being used to back VMs in hosting environments. > > When using hugetlb pages to back VMs in such environments, it is > sometimes desirable to preallocate hugetlb pools. This avoids the delay > and uncertainty of allocating hugetlb pages at VM startup. In addition, > preallocating huge pages minimizes the issue of memory fragmentation that > increases the longer the system is up and running. > > In such environments, a combination of larger and smaller hugetlb pages > are preallocated in anticipation of backing VMs of various sizes. Over > time, the preallocated pool of smaller hugetlb pages may become > depleted while larger hugetlb pages still remain. In such situations, > it may be desirable to convert larger hugetlb pages to smaller hugetlb > pages. Hi Mike, The usecase sounds neat. > > Converting larger to smaller hugetlb pages can be accomplished today by > first freeing the larger page to the buddy allocator and then allocating > the smaller pages. However, there are two issues with this approach: > 1) This process can take quite some time, especially if allocation of > the smaller pages is not immediate and requires migration/compaction. > 2) There is no guarantee that the total size of smaller pages allocated > will match the size of the larger page which was freed. This is > because the area freed by the larger page could quickly be > fragmented. > > To address these issues, introduce the concept of hugetlb page demotion. > Demotion provides a means of 'in place' splitting a hugetlb page to > pages of a smaller size. For example, on x86 one 1G page can be > demoted to 512 2M pages. Page demotion is controlled via sysfs files. > - demote_size Read only target page size for demotion What about those archs where we have more than two hugetlb sizes? IIRC, in powerpc you can have that, right? If so, would it make sense for demote_size to be writable so you can pick the size? > - demote Writable number of hugetlb pages to be demoted Below you mention that due to reservation, the amount of demoted pages can be less than what the admin specified. Would it make sense to have a place where someone can check how many pages got actually demoted? Or will this follow nr_hugepages' scheme and will always reflect the number of current demoted pages? > Only hugetlb pages which are free at the time of the request can be demoted. > Demotion does not add to the complexity surplus pages. Demotion also honors > reserved huge pages. Therefore, when a value is written to the sysfs demote > file that value is only the maximum number of pages which will be demoted. > It is possible fewer will actually be demoted. > > If demote_size is PAGESIZE, demote will simply free pages to the buddy > allocator. Wrt. vmemmap discussion with David. I also think we could compute how many vmemmap pages we are going to need to re-shape the vmemmap layout and allocate those upfront. And I think this approach would be just more simple. I plan to have a look at the patches later today or tomorrow. Thanks -- Oscar Salvador SUSE L3