Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932110AbcJMM2Q (ORCPT ); Thu, 13 Oct 2016 08:28:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:48748 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753904AbcJMM2J (ORCPT ); Thu, 13 Oct 2016 08:28:09 -0400 From: Jan Stancek Subject: [bug/regression] libhugetlbfs testsuite failures and OOMs eventually kill my system To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: mike.kravetz@oracle.com, hillf.zj@alibaba-inc.com, dave.hansen@linux.intel.com, kirill.shutemov@linux.intel.com, mhocko@suse.cz, n-horiguchi@ah.jp.nec.com, aneesh.kumar@linux.vnet.ibm.com, iamjoonsoo.kim@lge.com Message-ID: <57FF7BB4.1070202@redhat.com> Date: Thu, 13 Oct 2016 14:19:00 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 13 Oct 2016 12:19:04 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1579 Lines: 47 Hi, I'm running into ENOMEM failures with libhugetlbfs testsuite [1] on a power8 lpar system running 4.8 or latest git [2]. Repeated runs of this suite trigger multiple OOMs, that eventually kill entire system, it usually takes 3-5 runs: * Total System Memory......: 18024 MB * Shared Mem Max Mapping...: 320 MB * System Huge Page Size....: 16 MB * Available Huge Pages.....: 20 * Total size of Huge Pages.: 320 MB * Remaining System Memory..: 17704 MB * Huge Page User Group.....: hugepages (1001) I see this only on ppc (BE/LE), x86_64 seems unaffected and successfully ran the tests for ~12 hours. Bisect has identified following patch as culprit: commit 67961f9db8c477026ea20ce05761bde6f8bf85b0 Author: Mike Kravetz Date: Wed Jun 8 15:33:42 2016 -0700 mm/hugetlb: fix huge page reserve accounting for private mappings Following patch (made with my limited insight) applied to latest git [2] fixes the problem for me: diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ec49d9e..7261583 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1876,7 +1876,7 @@ static long __vma_reservation_common(struct hstate *h, * return value of this routine is the opposite of the * value returned from reserve map manipulation routines above. */ - if (ret) + if (ret >= 0) return 0; else return 1; Regards, Jan [1] https://github.com/libhugetlbfs/libhugetlbfs [2] v4.8-14230-gb67be92