Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754464AbcJMP0I (ORCPT ); Thu, 13 Oct 2016 11:26:08 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:17052 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754237AbcJMPZ7 (ORCPT ); Thu, 13 Oct 2016 11:25:59 -0400 Subject: Re: [bug/regression] libhugetlbfs testsuite failures and OOMs eventually kill my system To: Jan Stancek , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <57FF7BB4.1070202@redhat.com> Cc: hillf.zj@alibaba-inc.com, dave.hansen@linux.intel.com, kirill.shutemov@linux.intel.com, mhocko@suse.cz, n-horiguchi@ah.jp.nec.com, aneesh.kumar@linux.vnet.ibm.com, iamjoonsoo.kim@lge.com From: Mike Kravetz Message-ID: <277142fc-330d-76c7-1f03-a1c8ac0cf336@oracle.com> Date: Thu, 13 Oct 2016 08:24:31 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <57FF7BB4.1070202@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1997 Lines: 63 On 10/13/2016 05:19 AM, Jan Stancek wrote: > Hi, > > I'm running into ENOMEM failures with libhugetlbfs testsuite [1] on > a power8 lpar system running 4.8 or latest git [2]. Repeated runs of > this suite trigger multiple OOMs, that eventually kill entire system, > it usually takes 3-5 runs: > > * Total System Memory......: 18024 MB > * Shared Mem Max Mapping...: 320 MB > * System Huge Page Size....: 16 MB > * Available Huge Pages.....: 20 > * Total size of Huge Pages.: 320 MB > * Remaining System Memory..: 17704 MB > * Huge Page User Group.....: hugepages (1001) > > I see this only on ppc (BE/LE), x86_64 seems unaffected and successfully > ran the tests for ~12 hours. > > Bisect has identified following patch as culprit: > commit 67961f9db8c477026ea20ce05761bde6f8bf85b0 > Author: Mike Kravetz > Date: Wed Jun 8 15:33:42 2016 -0700 > mm/hugetlb: fix huge page reserve accounting for private mappings > Thanks Jan, I'll take a look. > > Following patch (made with my limited insight) applied to > latest git [2] fixes the problem for me: > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index ec49d9e..7261583 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1876,7 +1876,7 @@ static long __vma_reservation_common(struct hstate *h, > * return value of this routine is the opposite of the > * value returned from reserve map manipulation routines above. > */ > - if (ret) > + if (ret >= 0) > return 0; > else > return 1; > Do note that this code is only executed if this condition is true: else if (is_vma_resv_set(vma, HPAGE_RESV_OWNER) && ret >= 0) { So, we would always return 0. This always tells the calling code that a reservation exists. -- Mike Kravetz > Regards, > Jan > > [1] https://github.com/libhugetlbfs/libhugetlbfs > [2] v4.8-14230-gb67be92 >