Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1811762imm; Tue, 22 May 2018 09:41:42 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrLc2ZGCF9DfdBmxhSVYMXOwhHU0PkD0WQAiFeKCb9rWYDW8HMop+IZkJIprC4oDvzx3TNs X-Received: by 2002:a62:104a:: with SMTP id y71-v6mr24587898pfi.188.1527007302876; Tue, 22 May 2018 09:41:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527007302; cv=none; d=google.com; s=arc-20160816; b=hBkrO04o/GkRhO2q+plKUCrZt24pMImVCZVs+RrbDm7cEHFALKf23QU+Ej2E3sdsgS OUzRoFY6XfQALW8nx5U3fTj2htM64iXB9tb34VjyQ6fXxAWTxLSSL3/Cx8XAvPlwPuBP 4wXGwUXZapAvYNpgbuToQck+UHXnpcIFWPH442g7VoToqIL6PvhnL9Vw9TkvAyLssFdq PWoKDbmOaOipU15cV7dsKKF1pcU5Prkub2vs5ueBM68YtjeOpUkF+M/lg6Fewye4GktT UwZ+T+41Fy4HuBJAA+BtfILx3+RAa3BvaLby/oglhglR7DGIAysg3g3CiV/G/Ti27UnK 6b6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=aEiOtiVcxx9tOQGNZHt8grGcIan18TT+bgxSDOh1+wM=; b=wUvoLJZKMQcAbuAhxe/j0gk5/MmQft6hiDWdrNoN73VKu5fccjrFuI1mh9ZRzfiFmm pTWXUgWY5yb14UcoVkeEm4mPTepq8XNz6JkQ5kAFCcCWYqebtrUMjhfEXq0wqIxNgAz9 4p1ZDWFwBYT98c1DAO2vpcwzYI+gUdgrdy68p5z4ZwDJI+HsPaKFiDtgIA6vqh2rRblk pchd8tbKvEt/n7OxDRMwB4v62OXUCZdgZz57vUdkSHKue4N8bUfEvsceaWVssgJF57jz aNzQpj8mKJ8V3+IxnKQzpyPlNA5mEJ3OG8/Jrx8m1+++bUZWmskA15BX2zxzGjDRYsMW VDOw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l11-v6si4196094pgs.218.2018.05.22.09.41.27; Tue, 22 May 2018 09:41:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751439AbeEVQlP (ORCPT + 99 others); Tue, 22 May 2018 12:41:15 -0400 Received: from mga06.intel.com ([134.134.136.31]:37025 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751196AbeEVQlO (ORCPT ); Tue, 22 May 2018 12:41:14 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 May 2018 09:41:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,430,1520924400"; d="scan'208";a="43336583" Received: from rchatre-mobl.amr.corp.intel.com (HELO [10.24.14.198]) ([10.24.14.198]) by orsmga008.jf.intel.com with ESMTP; 22 May 2018 09:41:11 -0700 Subject: Re: [PATCH v2 3/4] mm: add find_alloc_contig_pages() interface To: Mike Kravetz , Vlastimil Babka , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org Cc: Michal Hocko , Christopher Lameter , Guy Shattah , Anshuman Khandual , Michal Nazarewicz , David Nellans , Laura Abbott , Pavel Machek , Dave Hansen , Andrew Morton References: <20180503232935.22539-1-mike.kravetz@oracle.com> <20180503232935.22539-4-mike.kravetz@oracle.com> <57dfd52c-22a5-5546-f8f3-848f21710cc1@oracle.com> From: Reinette Chatre Message-ID: Date: Tue, 22 May 2018 09:41:09 -0700 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <57dfd52c-22a5-5546-f8f3-848f21710cc1@oracle.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/21/2018 4:48 PM, Mike Kravetz wrote: > On 05/21/2018 01:54 AM, Vlastimil Babka wrote: >> On 05/04/2018 01:29 AM, Mike Kravetz wrote: >>> +/** >>> + * find_alloc_contig_pages() -- attempt to find and allocate a contiguous >>> + * range of pages >>> + * @nr_pages: number of pages to find/allocate >>> + * @gfp: gfp mask used to limit search as well as during compaction >>> + * @nid: target node >>> + * @nodemask: mask of other possible nodes >>> + * >>> + * Pages can be freed with a call to free_contig_pages(), or by manually >>> + * calling __free_page() for each page allocated. >>> + * >>> + * Return: pointer to 'order' pages on success, or NULL if not successful. >>> + */ >>> +struct page *find_alloc_contig_pages(unsigned long nr_pages, gfp_t gfp, >>> + int nid, nodemask_t *nodemask) >>> +{ >>> + unsigned long i, alloc_order, order_pages; >>> + struct page *pages; >>> + >>> + /* >>> + * Underlying allocators perform page order sized allocations. >>> + */ >>> + alloc_order = get_count_order(nr_pages); >> >> So if takes arbitrary nr_pages but convert it to order anyway? I think >> that's rather suboptimal and wasteful... e.g. a range could be skipped >> because some of the pages added by rounding cannot be migrated away. > > Yes. My idea with this series was to use existing allocators which are > all order based. Let me think about how to do allocation for arbitrary > number of allocations. > - For less than MAX_ORDER size we rely on the buddy allocator, so we are > pretty much stuck with order sized allocation. However, allocations of > this size are not really interesting as you can call existing routines > directly. > - For sizes greater than MAX_ORDER, we know that the allocation size will > be at least pageblock sized. So, the isolate/migrate scheme can still > be used for full pageblocks. We can then use direct migration for the > remaining pages. This does complicate things a bit. > > I'm guessing that most (?all?) allocations will be order based. The use > cases I am aware of (hugetlbfs, Intel Cache Pseudo-Locking, RDMA) are all > order based. However, as commented in previous version taking arbitrary > nr_pages makes interface more future proof. > I noticed this Cache Pseudo-Locking statement and would like to clarify. I have not been following this thread in detail so I would like to apologize first if my comments are out of context. Currently the Cache Pseudo-Locking allocations are order based because I assumed it was required by the allocator. The contiguous regions needed by Cache Pseudo-Locking will not always be order based - instead it is based on the granularity of the cache allocation. One example is a platform with 55MB L3 cache that can be divided into 20 equal portions. To support Cache Pseudo-Locking on this platform we need to be able to allocate contiguous regions at increments of 2816KB (the size of each portion). In support of this example platform regions needed would thus be 2816KB, 5632KB, 8448KB, etc. Regards, Reinette