Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp316711ybt; Wed, 8 Jul 2020 00:17:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxpQCHxbNMNcCR3K5fnIQwgWnSsR7w5R6XJ3FrzbW98xh7MYrN0ZoXfwt4K/SwzJt7HIuYQ X-Received: by 2002:a17:906:4447:: with SMTP id i7mr41049629ejp.191.1594192629762; Wed, 08 Jul 2020 00:17:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594192629; cv=none; d=google.com; s=arc-20160816; b=mVFfFY92VEskD/tJDtMpBlkkjBppNrsgxlSs9gL5GuXJPvOsyw5784hJR+4+Zrc5TK CmIGrHOtvXO9FfHDuJoOR7Y3RyquBiZG8jp3EUXvesiH+viqoozImltWR2ZVUQGx7/mh HSp0A6jWUB+IT+urd/1aH8iah8S7RR/pDwSoQoU1uHb+aNuoICPRKY3BG5fK4ClGeyD6 qMddwDQ45epNlbDjuUxNRgWhCAj8GSOLSa7YXBQt+gziCWr6d1RL7snvxg8FHrOEqdsN XEFrPbAb77CW7b9tHxhSrDcadbQw6blgOIMaP9iTPbCL+9vLLiBjDKYjFVVfn/cv5BwR V3gg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=Y/j7etSqm2CvvtGvUX/WpYIoiSvXeTmOIHFxWZnBTsE=; b=eGnFtCjO/7TmR1tS9+J1PDV/2RuhUpQJqmw30TjegNrI+9UT9Pz0JA8zPKaQpQH+YT LzRFyD3l3wpB2kc1cYOF1V1wJRA/NdETSj+nx4k6wH1HOahLTvqOqo4hrV+1u4MII8gF dbyfMlSV/2p35ES0IRSv3u3jOXTJxv2erdrC2IVmHu+DlzfOowDC8MmnGpw3WWHj+i0z McVzCBRHXeIYDrdqEdCpN47ndIQtFPNKJ3IoickxHqRjUaRTWfWb43wy+qHQ4LBpLuEF VGZm1Cx02eilBfyNbQy3EizRuqIcO761Yi1OKM8mqxvDuFV2woJj5tCe9UBQqsLUAeT4 +/Aw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=CsLAGSLM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id lj8si16308890ejb.60.2020.07.08.00.16.46; Wed, 08 Jul 2020 00:17:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=CsLAGSLM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730117AbgGHHQJ (ORCPT + 99 others); Wed, 8 Jul 2020 03:16:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729881AbgGHHQI (ORCPT ); Wed, 8 Jul 2020 03:16:08 -0400 Received: from mail-pj1-x1044.google.com (mail-pj1-x1044.google.com [IPv6:2607:f8b0:4864:20::1044]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A4720C061755 for ; Wed, 8 Jul 2020 00:16:08 -0700 (PDT) Received: by mail-pj1-x1044.google.com with SMTP id gc15so1810525pjb.0 for ; Wed, 08 Jul 2020 00:16:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=Y/j7etSqm2CvvtGvUX/WpYIoiSvXeTmOIHFxWZnBTsE=; b=CsLAGSLMEkS8nWCE4rutnqTqYmrr772RHKWtnIG7n4rxu9Kgmb8TFYZTFc9rLNYc62 Dswlr87tKNVy9tgVSZz1acJjl82kwNbQRKBEPw78xFnTWC5Bi0pYqJ5hj6IhLgRv/sTO fxftVmAHCBM+YsVtUw89/B7n71Ynb19aHG1jwmkqLegnV9K/VogWyqbdTWDYpCDw0oJe NjV33d3BeVd7JE0wOua77ABRsG5xdyf/9KWhAD2dmt28vB3np4P5ft4QALeyblVEfFq3 NyDF1J6hoJnkmrE3FR0UnxSOUcN2O8WZT712SbUIKWiKLGqRkFlKIO6N9PBvCggVVRwC 94MQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=Y/j7etSqm2CvvtGvUX/WpYIoiSvXeTmOIHFxWZnBTsE=; b=Zf2pLIYRrAaed/f0/wWkRlu1/Ne+ephf5TdkoZBvieBDeu4OwGemI40RsTWz+tBhIK mk2siY+L85Z4ONj/NNHA0jL5j+Yl7B2idIGVmnRPc50e+7E53PRR9zWuv79/bBs5LEuM iq319fKEs/GLgzUETi13Pnhp11DLwSLEn3dVJRf4W2zVQg6QGb+3Na05P7ZEkE/bUtB2 RZIApjdqCqG+2ZjYXXJWWQkYi6TIpugo4FlYUCRtWeFRecFSIeDapogKHb0RwgChMKiM NCwwC+Y3rhWI08UZcMvnHp6uUNwjHPong4nO/IxmdRheIlUc47oPrqDqKpxA2wdbWwzN 3m4A== X-Gm-Message-State: AOAM531UUQabzS07I6FD0mRkgpZIPYLusooPpkV85Y31PkvOWreZT71d 515xiW78TPMTB+I5tofEtfA= X-Received: by 2002:a17:902:bccc:: with SMTP id o12mr51156464pls.29.1594192568168; Wed, 08 Jul 2020 00:16:08 -0700 (PDT) Received: from js1304-desktop ([114.206.198.176]) by smtp.gmail.com with ESMTPSA id h100sm4532663pjb.46.2020.07.08.00.16.05 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Jul 2020 00:16:07 -0700 (PDT) Date: Wed, 8 Jul 2020 16:16:02 +0900 From: Joonsoo Kim To: Vlastimil Babka Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@lge.com, Christoph Hellwig , Roman Gushchin , Mike Kravetz , Naoya Horiguchi , Michal Hocko Subject: Re: [PATCH v4 04/11] mm/hugetlb: make hugetlb migration callback CMA aware Message-ID: <20200708071602.GB16543@js1304-desktop> References: <1594107889-32228-1-git-send-email-iamjoonsoo.kim@lge.com> <1594107889-32228-5-git-send-email-iamjoonsoo.kim@lge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 07, 2020 at 01:22:31PM +0200, Vlastimil Babka wrote: > On 7/7/20 9:44 AM, js1304@gmail.com wrote: > > From: Joonsoo Kim > > > > new_non_cma_page() in gup.c which try to allocate migration target page > > requires to allocate the new page that is not on the CMA area. > > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way > > works well for THP page or normal page but not for hugetlb page. > > > > hugetlb page allocation process consists of two steps. First is dequeing > > from the pool. Second is, if there is no available page on the queue, > > allocating from the page allocator. > > > > new_non_cma_page() can control allocation from the page allocator by > > specifying correct gfp flag. However, dequeing cannot be controlled until > > now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal > > since new_non_cma_page() cannot utilize hugetlb pages on the queue so this > > patch tries to fix this situation. > > > > This patch makes the deque function on hugetlb CMA aware and skip CMA > > pages if newly added skip_cma argument is passed as true. > > Hmm, can't you instead change dequeue_huge_page_node_exact() to test the PF_ > flag and avoid adding bool skip_cma everywhere? Okay! Please check following patch. > > I think that's what Michal suggested [1] except he said "the code already does > by memalloc_nocma_{save,restore} API". It needs extending a bit though, AFAICS. > __gup_longterm_locked() indeed does the save/restore, but restore comes before > check_and_migrate_cma_pages() and thus new_non_cma_page() is called, so an > adjustment is needed there, but that's all? > > Hm the adjustment should be also done because save/restore is done around > __get_user_pages_locked(), but check_and_migrate_cma_pages() also calls > __get_user_pages_locked(), and that call not being between nocma save and > restore is thus also a correctness issue? Simply, I call memalloc_nocma_{save,restore} in new_non_cma_page(). It would not cause any problem. ------------------>8------------------- From bcfc57e3c6f2df1ad2940308b89d740cd3f0fba8 Mon Sep 17 00:00:00 2001 From: Joonsoo Kim Date: Wed, 8 Jul 2020 14:39:26 +0900 Subject: [PATCH] mm/hugetlb: make hugetlb migration callback CMA aware new_non_cma_page() in gup.c which try to allocate migration target page requires to allocate the new page that is not on the CMA area. new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way works well for THP page or normal page but not for hugetlb page. hugetlb page allocation process consists of two steps. First is dequeing from the pool. Second is, if there is no available page on the queue, allocating from the page allocator. new_non_cma_page() can control allocation from the page allocator by specifying correct gfp flag. However, dequeing cannot be controlled until now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal since new_non_cma_page() cannot utilize hugetlb pages on the queue so this patch tries to fix this situation. This patch makes new_non_cma_page() uses memalloc_nocma_{save,restore} to exclude CMA memory rather than manually clearing __GFP_MOVABLE. And, this patch also makes the deque function on hugetlb CMA aware. In the deque function, CMA memory is skipped if PF_MEMALLOC_NOCMA flag is set by memalloc_nocma_{save,restore}. Acked-by: Mike Kravetz Signed-off-by: Joonsoo Kim --- include/linux/hugetlb.h | 2 -- mm/gup.c | 32 +++++++++++++++----------------- mm/hugetlb.c | 11 +++++++++-- 3 files changed, 24 insertions(+), 21 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index bb93e95..34a10e5 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -509,8 +509,6 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask); struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); -struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask, - int nid, nodemask_t *nmask); int huge_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); diff --git a/mm/gup.c b/mm/gup.c index 5daadae..79142a9 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1623,6 +1623,8 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private) * allocation memory. */ gfp_t gfp_mask = GFP_USER | __GFP_NOWARN; + unsigned int flags = memalloc_nocma_save(); + struct page *new_page = NULL; if (PageHighMem(page)) gfp_mask |= __GFP_HIGHMEM; @@ -1630,33 +1632,29 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private) #ifdef CONFIG_HUGETLB_PAGE if (PageHuge(page)) { struct hstate *h = page_hstate(page); - /* - * We don't want to dequeue from the pool because pool pages will - * mostly be from the CMA region. - */ - return alloc_migrate_huge_page(h, gfp_mask, nid, NULL); + + new_page = alloc_huge_page_nodemask(h, nid, NULL, gfp_mask); + goto out; } #endif + if (PageTransHuge(page)) { - struct page *thp; /* * ignore allocation failure warnings */ gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN; - /* - * Remove the movable mask so that we don't allocate from - * CMA area again. - */ - thp_gfpmask &= ~__GFP_MOVABLE; - thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER); - if (!thp) - return NULL; - prep_transhuge_page(thp); - return thp; + new_page = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER); + if (new_page) + prep_transhuge_page(new_page); + goto out; } - return __alloc_pages_node(nid, gfp_mask, 0); + new_page = __alloc_pages_node(nid, gfp_mask, 0); + +out: + memalloc_nocma_restore(flags); + return new_page; } static long check_and_migrate_cma_pages(struct task_struct *tsk, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3245aa0..514e29c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -1036,10 +1037,16 @@ static void enqueue_huge_page(struct hstate *h, struct page *page) static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid) { struct page *page; + bool nocma = !!(READ_ONCE(current->flags) & PF_MEMALLOC_NOCMA); + + list_for_each_entry(page, &h->hugepage_freelists[nid], lru) { + if (nocma && is_migrate_cma_page(page)) + continue; - list_for_each_entry(page, &h->hugepage_freelists[nid], lru) if (!PageHWPoison(page)) break; + } + /* * if 'non-isolated free hugepage' not found on the list, * the allocation fails. @@ -1928,7 +1935,7 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, return page; } -struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask, +static struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask, int nid, nodemask_t *nmask) { struct page *page; -- 2.7.4