Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp2102856ybh; Fri, 24 Jul 2020 04:36:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyGgmVqcfNjoAtnzJ7AoWjlZSLnPKIM2qrPojibDOipxypLJzGxLAH1ZMSEOVdPHAdpzbEk X-Received: by 2002:a05:6402:30b2:: with SMTP id df18mr8679143edb.229.1595590561072; Fri, 24 Jul 2020 04:36:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595590561; cv=none; d=google.com; s=arc-20160816; b=ocB6iV0WOa8lgx66XRv7wDeBkVL9RkoG62Yv0aW+M0MHW7QtB5is9MBCyiWkQSZ2dm 0t0K/K0silNSy6YicfUt5xOyFINeDndsYTzU6RxSyzw7rFrA9NDzfSEcDPzef4m2B7zP yUx66AjnByvqo45Dv3qxvN+gcm3/0h0X/G5SbFusR102jbfwGp2lYoUNAV7pWApAZiq4 +oIuC2daO8T6AZckv2zz78eMPIMe7a1fpnsaZtE++wqCn4+pAi6nfyljESRHUDbmE1E7 rAS/PTgBRGci5HKCjmt+BNHReohrDC4mhgILvNdZvwIAb0ng3h9TYClV5nOagui9o6JI SzuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=M110HA4nMMKBLYGy/DI8jTdSDD0jpS2wTCc48wWnBmY=; b=TYCYou7xtG0NT03GQINrqm0z7j/Hn5Y4gKb2jwePRH3eKfMTqH1vrr11cgMYeYPglQ cFCmy19F+RCgnyTackwRiqOTbtlwl3NB1aQtjFQnmFmnfKLLl4O1VerCHk049X/+bOo1 LdbzmQvt+f6fv6CwZ+HElpoqpXfm5IjB1j0xkOXgp5exn0Jk93soT1TerHJGPYHxy4iy Z2FzyE+M7mcCUcbJCZb2Hnz7kWb2V5MAeDP/gBpShxFGjVUYgwZFAKdzh+zgU3upYlA4 SyA0NQaCXKgVPUO2CdieuEkcIbd9sHyGHi0FBPfbu/J2gcxTdAiyRKBtQjLDdWwxmA+C LDuQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y3si385010ede.295.2020.07.24.04.35.38; Fri, 24 Jul 2020 04:36:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726493AbgGXLeU (ORCPT + 99 others); Fri, 24 Jul 2020 07:34:20 -0400 Received: from mail-ed1-f66.google.com ([209.85.208.66]:44480 "EHLO mail-ed1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726280AbgGXLeU (ORCPT ); Fri, 24 Jul 2020 07:34:20 -0400 Received: by mail-ed1-f66.google.com with SMTP id by13so6768393edb.11 for ; Fri, 24 Jul 2020 04:34:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=M110HA4nMMKBLYGy/DI8jTdSDD0jpS2wTCc48wWnBmY=; b=jEft8tzJS/Z8/qtfwUz1DIv5+73cMtTQtIKn2Fa3V3NVjNiCJXBs35xWBZgtLyGdV8 Tr6M8xtWETXMlndVdUwWvvv5AFzodDIa54JNhXsXcZY9JZxi9Mvw+I9FAATA4ALwSO7B wy7E3qL56Ja29TRA8db5PIjRG8cqIOYyAR3QrBftlqKRqqmgejSaBccedIQNGrEbmzDh lCiDmhnwy3lJoUxiNPdrnduUJ580mN0DBtJCeRERcYzQr1PzVlQto5RvNIa7eMTC5I4Z zzSbmdsmj3SR6MxjhqEBFv0QMGRzsHHH2++4cyJfA+0YvZ+gZ56aGtlaV7fS+MjbFW1w I55A== X-Gm-Message-State: AOAM533mxTLsNQD31skzCnF6WAiVvo7dkQaOj/PlAZISrhqfBXPjgVlr xZMIqxQz5TYz/S4vytd6ZrY= X-Received: by 2002:a05:6402:2d7:: with SMTP id b23mr8635662edx.145.1595590458024; Fri, 24 Jul 2020 04:34:18 -0700 (PDT) Received: from localhost (ip-37-188-169-187.eurotel.cz. [37.188.169.187]) by smtp.gmail.com with ESMTPSA id q21sm496212ejr.75.2020.07.24.04.34.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Jul 2020 04:34:17 -0700 (PDT) Date: Fri, 24 Jul 2020 13:34:15 +0200 From: Michal Hocko To: Muchun Song Cc: mike.kravetz@oracle.com, akpm@linux-foundation.org, rientjes@google.com, mgorman@suse.de, walken@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jianchao Guo Subject: Re: [PATCH v2] mm/hugetlb: add mempolicy check in the reservation routine Message-ID: <20200724113415.GG4061@dhcp22.suse.cz> References: <20200724100306.33457-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200724100306.33457-1-songmuchun@bytedance.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 24-07-20 18:03:06, Muchun Song wrote: > In the reservation routine, we only check whether the cpuset meets > the memory allocation requirements. But we ignore the mempolicy of > MPOL_BIND case. If someone mmap hugetlb succeeds, but the subsequent > memory allocation may fail due to mempolicy restrictions and receives > the SIGBUS signal. This can be reproduced by the follow steps. > > 1) Compile the test case. > cd tools/testing/selftests/vm/ > gcc map_hugetlb.c -o map_hugetlb > > 2) Pre-allocate huge pages. Suppose there are 2 numa nodes in the > system. Each node will pre-allocate one huge page. > echo 2 > /proc/sys/vm/nr_hugepages > > 3) Run test case(mmap 4MB). We receive the SIGBUS signal. > numactl --membind=0 ./map_hugetlb 4 > > With this patch applied, the mmap will fail in the step 3) and throw > "mmap: Cannot allocate memory". > > Reported-by: Jianchao Guo > Signed-off-by: Muchun Song > --- > > changelog in v2: > 1) Reuse policy_nodemask(). > > include/linux/mempolicy.h | 1 + > mm/hugetlb.c | 19 ++++++++++++++++--- > mm/mempolicy.c | 2 +- > 3 files changed, 18 insertions(+), 4 deletions(-) > > diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h > index ea9c15b60a96..6b9640f1c990 100644 > --- a/include/linux/mempolicy.h > +++ b/include/linux/mempolicy.h > @@ -152,6 +152,7 @@ extern int huge_node(struct vm_area_struct *vma, > extern bool init_nodemask_of_mempolicy(nodemask_t *mask); > extern bool mempolicy_nodemask_intersects(struct task_struct *tsk, > const nodemask_t *mask); > +extern nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy); > extern unsigned int mempolicy_slab_node(void); > > extern enum zone_type policy_zone; > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 589c330df4db..a753fe8591b4 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -3463,12 +3463,25 @@ static int __init default_hugepagesz_setup(char *s) > } > __setup("default_hugepagesz=", default_hugepagesz_setup); > > -static unsigned int cpuset_mems_nr(unsigned int *array) > +static unsigned int allowed_mems_nr(struct hstate *h) > { > int node; > unsigned int nr = 0; > + struct mempolicy *mpol = get_task_policy(current); > + nodemask_t *mpol_allowed, *mems_allowed, nodemask; > + unsigned int *array = h->free_huge_pages_node; > + gfp_t gfp_mask = htlb_alloc_mask(h); > + > + mpol_allowed = policy_nodemask(gfp_mask, mpol); > + if (mpol_allowed) { > + nodes_and(nodemask, cpuset_current_mems_allowed, > + *mpol_allowed); > + mems_allowed = &nodemask; > + } else { > + mems_allowed = &cpuset_current_mems_allowed; > + } I believe you can simplify this and use a similar pattern as the page allocator. Something like for_each_node_mask(node, mpol_allowed) { if (node_isset(node, &cpuset_current_mems_allowed)) nr += array[node]; } There shouldn't be any need to allocate a potentially large nodemask on the stack. -- Michal Hocko SUSE Labs