Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753086AbbKXFdk (ORCPT ); Tue, 24 Nov 2015 00:33:40 -0500 Received: from TYO202.gate.nec.co.jp ([210.143.35.52]:63309 "EHLO tyo202.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752498AbbKXFdi convert rfc822-to-8bit (ORCPT ); Tue, 24 Nov 2015 00:33:38 -0500 From: Naoya Horiguchi To: Mike Kravetz CC: Hillf Danton , "'Andrew Morton'" , "'David Rientjes'" , "'Dave Hansen'" , "'Mel Gorman'" , "'Joonsoo Kim'" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "'Naoya Horiguchi'" Subject: Re: [PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count Thread-Topic: [PATCH v1] mm: hugetlb: fix hugepage memory leak caused by wrong reserve count Thread-Index: AQHRI2PslGkcal4fn0KZMoXK0xVcf56j9IOAgADqZgCABTaWAA== Date: Tue, 24 Nov 2015 05:32:59 +0000 Message-ID: <20151124053258.GA27211@hori1.linux.bs1.fc.nec.co.jp> References: <1448004017-23679-1-git-send-email-n-horiguchi@ah.jp.nec.com> <050201d12369$167a0a10$436e1e30$@alibaba-inc.com> <564F9702.5070007@oracle.com> In-Reply-To: <564F9702.5070007@oracle.com> Accept-Language: ja-JP, en-US Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.128.101.24] Content-Type: text/plain; charset="iso-2022-jp" Content-ID: <1296A6AF322D374DB7FC2E7FECA19BFB@gisp.nec.co.jp> Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2328 Lines: 47 On Fri, Nov 20, 2015 at 01:56:18PM -0800, Mike Kravetz wrote: > On 11/19/2015 11:57 PM, Hillf Danton wrote: > >> > >> When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to > >> alloc_buddy_huge_page() to directly create a hugepage from the buddy allocator. > >> In that case, however, if alloc_buddy_huge_page() succeeds we don't decrement > >> h->resv_huge_pages, which means that successful hugetlb_fault() returns without > >> releasing the reserve count. As a result, subsequent hugetlb_fault() might fail > >> despite that there are still free hugepages. > >> > >> This patch simply adds decrementing code on that code path. > > In general, I agree with the patch. If we allocate a huge page via the > buddy allocator and that page will be used to satisfy a reservation, then > we need to decrement the reservation count. > > As Hillf mentions, this code is not exactly the same in linux-next. > Specifically, there is the new call to take the memory policy of the > vma into account when calling the buddy allocator. I do not think, > this impacts your proposed change but you may want to test with that > in place. > > >> > >> I reproduced this problem when testing v4.3 kernel in the following situation: > >> - the test machine/VM is a NUMA system, > >> - hugepage overcommiting is enabled, > >> - most of hugepages are allocated and there's only one free hugepage > >> which is on node 0 (for example), > >> - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to > >> node 1, tries to allocate a hugepage, > > I am curious about this scenario. When this second program attempts to > allocate the page, I assume it creates a reservation first. Is this > reservation before or after setting mempolicy? If the mempolicy was set > first, I would have expected the reservation to allocate a page on > node 1 to satisfy the reservation. My testing called set_mempolicy() at first then called mmap(), but things didn't change if I reordered them, because currently hugetlb reservation is not NUMA-aware. Thanks, Naoya Horiguchi-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/