Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp40599imm; Tue, 22 May 2018 13:37:29 -0700 (PDT) X-Google-Smtp-Source: AB8JxZr5TafA1TfU37MOSisc/C0wxophkwfcITO9znwfOcTqiw8awBd6BWlCBU4P3EUPV7/a4mTT X-Received: by 2002:a63:b307:: with SMTP id i7-v6mr20848pgf.28.1527021449443; Tue, 22 May 2018 13:37:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527021449; cv=none; d=google.com; s=arc-20160816; b=a1hOX1ThTEOhIVc5MFpyEwWxGRUmJsGtmIV+ko25kLLbgADHF4koZFFA9c8EWWjMOF uCGrdkQBMbBkpQUJyCMEBOnd4x+FfTizJDEyvc4y8UmWyJyvgAHP3YqXl39243Rrtsde tnMpO1XQ9PKW7ER4LH9k4fTygJDQEA/PFifpy2MUjCEhWVjYD7QUNu2ISI863GGUwhts 155pT1uPudxrjW5E54ssccN0iqVjJYaI/+3YoPUxRGS0TBgYS+NY5679aiRXoglMTMlx QiCXRUEJgM+M4LdEo3ou9RZbi6Bs2Xgkc051Sxl2Icw47IVO8NLz0Xk12GYkDeRJ/Xuo tr7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=t0rTzRrrWT0dss9DDakOCJz1kvYARTGbUHvgxGLaDgk=; b=SCUkoxJ31/0TKAUfkREIOljhcuTq8jNaG4biuq3kGpn4a99uFCQcbYuulMIDOcKKMb lmB4kbiIpsGCmzLHTeTCexyMCw+i4wdxaJu4uaYJpQMMCsAsVdKAGeqdhEm2Use95pV8 ju/Dpe2clDdFmahC9pTyLPA39xNS42bgn5/06jZmdBVY2/1sWouGIyUSGpeuYXC8Gu+L JGGcKdgE/SG2RbfIn70LFLlH62VVGxAqy1If4yd5k1xln8sUtpw0H8hN4zVx4+I2jlzV +sok19neubA76ReKPRLw2hVaNJx5aRr/w6HrDbh6XUdkkoNMejEtw33IFXjknXQQmkzD 4r7A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=VZIKFG0s; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x4-v6si16353932pfm.110.2018.05.22.13.37.14; Tue, 22 May 2018 13:37:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=VZIKFG0s; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752829AbeEVUgj (ORCPT + 99 others); Tue, 22 May 2018 16:36:39 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:43792 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751748AbeEVUgi (ORCPT ); Tue, 22 May 2018 16:36:38 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w4MKVYPl139353; Tue, 22 May 2018 20:35:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=t0rTzRrrWT0dss9DDakOCJz1kvYARTGbUHvgxGLaDgk=; b=VZIKFG0s6XBuNAu9ULMYEXZmh3l8vPX8XqzGplD6oFwiMAV0ozCZ4QiIt3ySE/e9KK1E ZrcogI793ZvYs3XtY4mOc5y0kBh4kjYVJzGtZZofYCSgx0IyGedVjWq73mYmYEdLSUYG N6CtbrFLv+jufP3wkpP9NCN2nuMhm9jl9Lq1ic2rH93+VOp3/+XJ7eeFYX9cWOsSn+OU ZLZYpjFg/IY8+skvRWjtf10lLYFkf5ec+8uIXoJYro6RqqFVkq0M/xv1RjJxafQSKdId mWay0WQD0ew3kW2kzdlfmn6hswN1t2rho4uG2K6DU6fSnXFwkG+NDL8eabz9j84+RSXL ww== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp2120.oracle.com with ESMTP id 2j4nh7hcq2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 22 May 2018 20:35:55 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w4MKZrRs030899 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 22 May 2018 20:35:54 GMT Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w4MKZpKR017218; Tue, 22 May 2018 20:35:51 GMT Received: from [192.168.1.164] (/50.38.38.67) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 22 May 2018 13:35:51 -0700 Subject: Re: [PATCH v2 3/4] mm: add find_alloc_contig_pages() interface To: Reinette Chatre , Vlastimil Babka , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org Cc: Michal Hocko , Christopher Lameter , Guy Shattah , Anshuman Khandual , Michal Nazarewicz , David Nellans , Laura Abbott , Pavel Machek , Dave Hansen , Andrew Morton References: <20180503232935.22539-1-mike.kravetz@oracle.com> <20180503232935.22539-4-mike.kravetz@oracle.com> <57dfd52c-22a5-5546-f8f3-848f21710cc1@oracle.com> From: Mike Kravetz Message-ID: <652bb498-8393-4738-a987-9bed31786261@oracle.com> Date: Tue, 22 May 2018 13:35:49 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8901 signatures=668700 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1805220212 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/22/2018 09:41 AM, Reinette Chatre wrote: > On 5/21/2018 4:48 PM, Mike Kravetz wrote: >> On 05/21/2018 01:54 AM, Vlastimil Babka wrote: >>> On 05/04/2018 01:29 AM, Mike Kravetz wrote: >>>> +/** >>>> + * find_alloc_contig_pages() -- attempt to find and allocate a contiguous >>>> + * range of pages >>>> + * @nr_pages: number of pages to find/allocate >>>> + * @gfp: gfp mask used to limit search as well as during compaction >>>> + * @nid: target node >>>> + * @nodemask: mask of other possible nodes >>>> + * >>>> + * Pages can be freed with a call to free_contig_pages(), or by manually >>>> + * calling __free_page() for each page allocated. >>>> + * >>>> + * Return: pointer to 'order' pages on success, or NULL if not successful. >>>> + */ >>>> +struct page *find_alloc_contig_pages(unsigned long nr_pages, gfp_t gfp, >>>> + int nid, nodemask_t *nodemask) >>>> +{ >>>> + unsigned long i, alloc_order, order_pages; >>>> + struct page *pages; >>>> + >>>> + /* >>>> + * Underlying allocators perform page order sized allocations. >>>> + */ >>>> + alloc_order = get_count_order(nr_pages); >>> >>> So if takes arbitrary nr_pages but convert it to order anyway? I think >>> that's rather suboptimal and wasteful... e.g. a range could be skipped >>> because some of the pages added by rounding cannot be migrated away. >> >> Yes. My idea with this series was to use existing allocators which are >> all order based. Let me think about how to do allocation for arbitrary >> number of allocations. >> - For less than MAX_ORDER size we rely on the buddy allocator, so we are >> pretty much stuck with order sized allocation. However, allocations of >> this size are not really interesting as you can call existing routines >> directly. >> - For sizes greater than MAX_ORDER, we know that the allocation size will >> be at least pageblock sized. So, the isolate/migrate scheme can still >> be used for full pageblocks. We can then use direct migration for the >> remaining pages. This does complicate things a bit. >> >> I'm guessing that most (?all?) allocations will be order based. The use >> cases I am aware of (hugetlbfs, Intel Cache Pseudo-Locking, RDMA) are all >> order based. However, as commented in previous version taking arbitrary >> nr_pages makes interface more future proof. >> > > I noticed this Cache Pseudo-Locking statement and would like to clarify. > I have not been following this thread in detail so I would like to > apologize first if my comments are out of context. > > Currently the Cache Pseudo-Locking allocations are order based because I > assumed it was required by the allocator. The contiguous regions needed > by Cache Pseudo-Locking will not always be order based - instead it is > based on the granularity of the cache allocation. One example is a > platform with 55MB L3 cache that can be divided into 20 equal portions. > To support Cache Pseudo-Locking on this platform we need to be able to > allocate contiguous regions at increments of 2816KB (the size of each > portion). In support of this example platform regions needed would thus > be 2816KB, 5632KB, 8448KB, etc. Thank you Reinette. I was not aware of these details. Yours is the most concrete new use case. This certainly makes more of a case for arbitrary sized allocations. -- Mike Kravetz