Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4318126imu; Mon, 7 Jan 2019 20:53:04 -0800 (PST) X-Google-Smtp-Source: ALg8bN7HAAET0iSYejbWc5U2Ycd1dCe4ulQ90R8T1nt7rrew2y+ZkyCRmKFIvTxOA2wIVlOolKbR X-Received: by 2002:a17:902:654a:: with SMTP id d10mr293643pln.324.1546923184265; Mon, 07 Jan 2019 20:53:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546923184; cv=none; d=google.com; s=arc-20160816; b=yd5UexML+OWziV/F+omOIj6BWus1D05xYfgt0LV3wp0g6H/D2pq6n5vN5LPIiSUCc4 MlzDKYIdMVjFanwmEYBZvvbi9kUMdcIKXoQWJpTjVKoFgZC8aFNXpCIeAmXTm5DtH1IS mQBnWrd6vxOYilBdbiN5Hj1ujgzcD8dk94OyOZgLWCVWHUdYJETPbPqtJy0P13iCF2E+ WTllcTzd7dwRbLlCBUIWx6z/OnY3YnfAhzzoFt2x5rugkkx8T0HHqsv+NBp2K39dY3YV 33uOuIHP3chN1lFFVq/xA9/9bMpH01CGXGvdjxN7oQNnqQDddp2wdRD4gaJvuAiGCs/l 69vQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :mime-version:references:in-reply-to:date:subject:cc:to:from; bh=UECP7dPdbfNeRQlhnzQBcvG7nikbIao7kBehTqzCUmg=; b=YLXjedVd99hjy6mC9rex6GYbJ+QkCuKWkFfyhfpMFS2eaQlIxII/y7KqSkOdE2ETz9 8C9BetPHNhHvuWEaC7T/HJRdlIQU0vt8L5h0lr9lcedcYA4cJpMra4uBYmflUeg0olIv oGGLJszzPwTkQ7/vIdjic7mz606L3Ty9rG96HgLu83eL9PyqXvXlRHVAnJVUzXkiv/dk OYnAzOYeclU2/g0rUV8B1GWTK7kqcy8EiGuwWWY0J6uzIgHDgo/PkEBhGr8sGoF5f3s4 TcsGSzmSEXAqXnefTsaK9TZBV8jlsXonoj3u7aJxRFxhlYfDbSbxLHrwprVYjXJJtaWd V6CA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f5si5715927pfn.259.2019.01.07.20.52.49; Mon, 07 Jan 2019 20:53:04 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727707AbfAHEvh (ORCPT + 99 others); Mon, 7 Jan 2019 23:51:37 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:54222 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727684AbfAHEvf (ORCPT ); Mon, 7 Jan 2019 23:51:35 -0500 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id x084msPa049213 for ; Mon, 7 Jan 2019 23:51:34 -0500 Received: from e13.ny.us.ibm.com (e13.ny.us.ibm.com [129.33.205.203]) by mx0a-001b2d01.pphosted.com with ESMTP id 2pvk82n5kv-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 07 Jan 2019 23:51:34 -0500 Received: from localhost by e13.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 8 Jan 2019 04:51:33 -0000 Received: from b01cxnp23033.gho.pok.ibm.com (9.57.198.28) by e13.ny.us.ibm.com (146.89.104.200) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 8 Jan 2019 04:51:29 -0000 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x084pTBg24903736 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 8 Jan 2019 04:51:29 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EF82FB2066; Tue, 8 Jan 2019 04:51:28 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A9057B2065; Tue, 8 Jan 2019 04:51:25 +0000 (GMT) Received: from skywalker.ibmuc.com (unknown [9.85.75.199]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Tue, 8 Jan 2019 04:51:25 +0000 (GMT) From: "Aneesh Kumar K.V" To: akpm@linux-foundation.org, Michal Hocko , Alexey Kardashevskiy , David Gibson , Andrea Arcangeli , mpe@ellerman.id.au Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, "Aneesh Kumar K.V" Subject: [PATCH V6 2/4] mm: Add get_user_pages_cma_migrate Date: Tue, 8 Jan 2019 10:21:08 +0530 X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190108045110.28597-1-aneesh.kumar@linux.ibm.com> References: <20190108045110.28597-1-aneesh.kumar@linux.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 x-cbid: 19010804-0064-0000-0000-000003923B01 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010364; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000273; SDB=6.01143337; UDB=6.00595222; IPR=6.00923595; MB=3.00025025; MTD=3.00000008; XFM=3.00000015; UTC=2019-01-08 04:51:32 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19010804-0065-0000-0000-00003BF2E2E0 Message-Id: <20190108045110.28597-3-aneesh.kumar@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-01-08_03:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1901080036 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This helper does a get_user_pages_fast making sure we migrate pages found in the CMA area before taking page reference. This makes sure that we don't keep non-movable pages (due to page reference count) in the CMA area. This will be used by ppc64 in a later patch to avoid pinning pages in the CMA region. ppc64 uses CMA region for allocation of hardware page table (hash page table) and not able to migrate pages out of CMA region results in page table allocation failures. One case where we hit this easy is when a guest using VFIO passthrough device. VFIO locks all the guests memory and if the guest memory is backed by CMA region, it becomes unmovable resulting in fragmenting the CMA and possibly preventing other guest from allocation a large enough hash page table. NOTE: We allocate new page without using __GFP_THISNODE Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 2 + include/linux/migrate.h | 3 + mm/hugetlb.c | 4 +- mm/migrate.c | 149 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 156 insertions(+), 2 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 087fd5f48c91..1eed0cdaec0e 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -371,6 +371,8 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask); struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); +struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask, + int nid, nodemask_t *nmask); int huge_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); diff --git a/include/linux/migrate.h b/include/linux/migrate.h index e13d9bf2f9a5..bc83e12a06e9 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -285,6 +285,9 @@ static inline int migrate_vma(const struct migrate_vma_ops *ops, } #endif /* IS_ENABLED(CONFIG_MIGRATE_VMA_HELPER) */ +extern int get_user_pages_cma_migrate(unsigned long start, int nr_pages, int write, + struct page **pages); + #endif /* CONFIG_MIGRATION */ #endif /* _LINUX_MIGRATE_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 745088810965..fc4afaec1055 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1586,8 +1586,8 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, return page; } -static struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask, - int nid, nodemask_t *nmask) +struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask, + int nid, nodemask_t *nmask) { struct page *page; diff --git a/mm/migrate.c b/mm/migrate.c index ccf8966caf6f..5e21c7aee942 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2982,3 +2982,152 @@ int migrate_vma(const struct migrate_vma_ops *ops, } EXPORT_SYMBOL(migrate_vma); #endif /* defined(MIGRATE_VMA_HELPER) */ + +static struct page *new_non_cma_page(struct page *page, unsigned long private) +{ + /* + * We want to make sure we allocate the new page from the same node + * as the source page. + */ + int nid = page_to_nid(page); + /* + * Trying to allocate a page for migration. Ignore allocation + * failure warnings. We don't force __GFP_THISNODE here because + * this node here is the node where we have CMA reservation and + * in some case these nodes will have really less non movable + * allocation memory. + */ + gfp_t gfp_mask = GFP_USER | __GFP_NOWARN; + + if (PageHighMem(page)) + gfp_mask |= __GFP_HIGHMEM; + +#ifdef CONFIG_HUGETLB_PAGE + if (PageHuge(page)) { + struct hstate *h = page_hstate(page); + /* + * We don't want to dequeue from the pool because pool pages will + * mostly be from the CMA region. + */ + return alloc_migrate_huge_page(h, gfp_mask, nid, NULL); + } +#endif + if (PageTransHuge(page)) { + struct page *thp; + /* + * ignore allocation failure warnings + */ + gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN; + + /* + * Remove the movable mask so that we don't allocate from + * CMA area again. + */ + thp_gfpmask &= ~__GFP_MOVABLE; + thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER); + if (!thp) + return NULL; + prep_transhuge_page(thp); + return thp; + } + + return __alloc_pages_node(nid, gfp_mask, 0); +} + +/** + * get_user_pages_cma_migrate() - pin user pages in memory by migrating pages in CMA region + * @start: starting user address + * @nr_pages: number of pages from start to pin + * @write: whether pages will be written to + * @pages: array that receives pointers to the pages pinned. + * Should be at least nr_pages long. + * + * Attempt to pin user pages in memory without taking mm->mmap_sem. + * If not successful, it will fall back to taking the lock and + * calling get_user_pages(). + * + * If the pinned pages are backed by CMA region, we migrate those pages out, + * allocating new pages from non-CMA region. This helps in avoiding keeping + * pages pinned in the CMA region for a long time thereby resulting in + * CMA allocation failures. + * + * Returns number of pages pinned. This may be fewer than the number + * requested. If nr_pages is 0 or negative, returns 0. If no pages + * were pinned, returns -errno. + */ + +int get_user_pages_cma_migrate(unsigned long start, int nr_pages, int write, + struct page **pages) +{ + int i, ret; + unsigned long flags; + bool drain_allow = true; + bool migrate_allow = true; + LIST_HEAD(cma_page_list); + +get_user_again: + /* + * If get_user_pages ends up allocating pages, make sure we don't + * allocate from CMA region so that we can avoid the migration below. + */ + flags = memalloc_nocma_save(); + ret = get_user_pages_fast(start, nr_pages, write, pages); + memalloc_nocma_restore(flags); + if (ret <= 0) + return ret; + + for (i = 0; i < ret; ++i) { + /* + * If we get a page from the CMA zone, since we are going to + * be pinning these entries, we might as well move them out + * of the CMA zone if possible. + */ + if (is_migrate_cma_page(pages[i]) && migrate_allow) { + + struct page *head = compound_head(pages[i]); + + if (PageHuge(head)) { + isolate_huge_page(head, &cma_page_list); + } else { + if (!PageLRU(head) && drain_allow) { + lru_add_drain_all(); + drain_allow = false; + } + + if (!isolate_lru_page(head)) { + list_add_tail(&head->lru, &cma_page_list); + mod_node_page_state(page_pgdat(head), + NR_ISOLATED_ANON + + page_is_file_cache(head), + hpage_nr_pages(head)); + } + } + } + } + if (!list_empty(&cma_page_list)) { + /* + * drop the above get_user_pages reference. + */ + for (i = 0; i < ret; ++i) + put_page(pages[i]); + + if (migrate_pages(&cma_page_list, new_non_cma_page, + NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE)) { + /* + * some of the pages failed migration. Do get_user_pages + * without migration. + */ + migrate_allow = false; + + if (!list_empty(&cma_page_list)) + putback_movable_pages(&cma_page_list); + } + /* + * We did migrate all the pages, Try to get the page references again + * migrating any new CMA pages which we failed to isolate earlier. + */ + drain_allow = true; + goto get_user_again; + } + return ret; +} -- 2.20.1