Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752839AbcJNIsv (ORCPT ); Fri, 14 Oct 2016 04:48:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44256 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750806AbcJNIsj (ORCPT ); Fri, 14 Oct 2016 04:48:39 -0400 Subject: Re: [bug/regression] libhugetlbfs testsuite failures and OOMs eventually kill my system To: Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <57FF7BB4.1070202@redhat.com> <277142fc-330d-76c7-1f03-a1c8ac0cf336@oracle.com> Cc: hillf.zj@alibaba-inc.com, dave.hansen@linux.intel.com, kirill.shutemov@linux.intel.com, mhocko@suse.cz, n-horiguchi@ah.jp.nec.com, aneesh.kumar@linux.vnet.ibm.com, iamjoonsoo.kim@lge.com From: Jan Stancek Message-ID: <58009BE2.5010805@redhat.com> Date: Fri, 14 Oct 2016 10:48:34 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Fri, 14 Oct 2016 08:48:39 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8580 Lines: 229 On 10/14/2016 01:26 AM, Mike Kravetz wrote: > > Hi Jan, > > Any chance you can get the contents of /sys/kernel/mm/hugepages > before and after the first run of libhugetlbfs testsuite on Power? > Perhaps a script like: > > cd /sys/kernel/mm/hugepages > for f in hugepages-*/*; do > n=`cat $f`; > echo -e "$n\t$f"; > done > > Just want to make sure the numbers look as they should. > Hi Mike, Numbers are below. I have also isolated a single testcase from "func" group of tests: corrupt-by-cow-opt [1]. This test stops working if I run it 19 times (with 20 hugepages). And if I disable this test, "func" group tests can all pass repeatedly. [1] https://github.com/libhugetlbfs/libhugetlbfs/blob/master/tests/corrupt-by-cow-opt.c Regards, Jan Kernel is v4.8-14230-gb67be92, with reboot between each run. 1) Only func tests System boot After setup: 20 hugepages-16384kB/free_hugepages 20 hugepages-16384kB/nr_hugepages 20 hugepages-16384kB/nr_hugepages_mempolicy 0 hugepages-16384kB/nr_overcommit_hugepages 0 hugepages-16384kB/resv_hugepages 0 hugepages-16384kB/surplus_hugepages 0 hugepages-16777216kB/free_hugepages 0 hugepages-16777216kB/nr_hugepages 0 hugepages-16777216kB/nr_hugepages_mempolicy 0 hugepages-16777216kB/nr_overcommit_hugepages 0 hugepages-16777216kB/resv_hugepages 0 hugepages-16777216kB/surplus_hugepages After func tests: ********** TEST SUMMARY * 16M * 32-bit 64-bit * Total testcases: 0 85 * Skipped: 0 0 * PASS: 0 81 * FAIL: 0 4 * Killed by signal: 0 0 * Bad configuration: 0 0 * Expected FAIL: 0 0 * Unexpected PASS: 0 0 * Strange test result: 0 0 26 hugepages-16384kB/free_hugepages 26 hugepages-16384kB/nr_hugepages 26 hugepages-16384kB/nr_hugepages_mempolicy 0 hugepages-16384kB/nr_overcommit_hugepages 1 hugepages-16384kB/resv_hugepages 0 hugepages-16384kB/surplus_hugepages 0 hugepages-16777216kB/free_hugepages 0 hugepages-16777216kB/nr_hugepages 0 hugepages-16777216kB/nr_hugepages_mempolicy 0 hugepages-16777216kB/nr_overcommit_hugepages 0 hugepages-16777216kB/resv_hugepages 0 hugepages-16777216kB/surplus_hugepages After test cleanup: umount -a -t hugetlbfs hugeadm --pool-pages-max ${HPSIZE}:0 1 hugepages-16384kB/free_hugepages 1 hugepages-16384kB/nr_hugepages 1 hugepages-16384kB/nr_hugepages_mempolicy 0 hugepages-16384kB/nr_overcommit_hugepages 1 hugepages-16384kB/resv_hugepages 1 hugepages-16384kB/surplus_hugepages 0 hugepages-16777216kB/free_hugepages 0 hugepages-16777216kB/nr_hugepages 0 hugepages-16777216kB/nr_hugepages_mempolicy 0 hugepages-16777216kB/nr_overcommit_hugepages 0 hugepages-16777216kB/resv_hugepages 0 hugepages-16777216kB/surplus_hugepages --- 2) Only stress tests System boot After setup: 20 hugepages-16384kB/free_hugepages 20 hugepages-16384kB/nr_hugepages 20 hugepages-16384kB/nr_hugepages_mempolicy 0 hugepages-16384kB/nr_overcommit_hugepages 0 hugepages-16384kB/resv_hugepages 0 hugepages-16384kB/surplus_hugepages 0 hugepages-16777216kB/free_hugepages 0 hugepages-16777216kB/nr_hugepages 0 hugepages-16777216kB/nr_hugepages_mempolicy 0 hugepages-16777216kB/nr_overcommit_hugepages 0 hugepages-16777216kB/resv_hugepages 0 hugepages-16777216kB/surplus_hugepages After stress tests: 20 hugepages-16384kB/free_hugepages 20 hugepages-16384kB/nr_hugepages 20 hugepages-16384kB/nr_hugepages_mempolicy 0 hugepages-16384kB/nr_overcommit_hugepages 17 hugepages-16384kB/resv_hugepages 0 hugepages-16384kB/surplus_hugepages 0 hugepages-16777216kB/free_hugepages 0 hugepages-16777216kB/nr_hugepages 0 hugepages-16777216kB/nr_hugepages_mempolicy 0 hugepages-16777216kB/nr_overcommit_hugepages 0 hugepages-16777216kB/resv_hugepages 0 hugepages-16777216kB/surplus_hugepages After cleanup: 17 hugepages-16384kB/free_hugepages 17 hugepages-16384kB/nr_hugepages 17 hugepages-16384kB/nr_hugepages_mempolicy 0 hugepages-16384kB/nr_overcommit_hugepages 17 hugepages-16384kB/resv_hugepages 17 hugepages-16384kB/surplus_hugepages 0 hugepages-16777216kB/free_hugepages 0 hugepages-16777216kB/nr_hugepages 0 hugepages-16777216kB/nr_hugepages_mempolicy 0 hugepages-16777216kB/nr_overcommit_hugepages 0 hugepages-16777216kB/resv_hugepages 0 hugepages-16777216kB/surplus_hugepages --- 3) only corrupt-by-cow-opt System boot After setup: 20 hugepages-16384kB/free_hugepages 20 hugepages-16384kB/nr_hugepages 20 hugepages-16384kB/nr_hugepages_mempolicy 0 hugepages-16384kB/nr_overcommit_hugepages 0 hugepages-16384kB/resv_hugepages 0 hugepages-16384kB/surplus_hugepages 0 hugepages-16777216kB/free_hugepages 0 hugepages-16777216kB/nr_hugepages 0 hugepages-16777216kB/nr_hugepages_mempolicy 0 hugepages-16777216kB/nr_overcommit_hugepages 0 hugepages-16777216kB/resv_hugepages 0 hugepages-16777216kB/surplus_hugepages libhugetlbfs-2.18# env LD_LIBRARY_PATH=./obj64 ./tests/obj64/corrupt-by-cow-opt; /root/grab.sh Starting testcase "./tests/obj64/corrupt-by-cow-opt", pid 3298 Write s to 0x3effff000000 via shared mapping Write p to 0x3effff000000 via private mapping Read s from 0x3effff000000 via shared mapping PASS 20 hugepages-16384kB/free_hugepages 20 hugepages-16384kB/nr_hugepages 20 hugepages-16384kB/nr_hugepages_mempolicy 0 hugepages-16384kB/nr_overcommit_hugepages 1 hugepages-16384kB/resv_hugepages 0 hugepages-16384kB/surplus_hugepages 0 hugepages-16777216kB/free_hugepages 0 hugepages-16777216kB/nr_hugepages 0 hugepages-16777216kB/nr_hugepages_mempolicy 0 hugepages-16777216kB/nr_overcommit_hugepages 0 hugepages-16777216kB/resv_hugepages 0 hugepages-16777216kB/surplus_hugepages # env LD_LIBRARY_PATH=./obj64 ./tests/obj64/corrupt-by-cow-opt; /root/grab.sh Starting testcase "./tests/obj64/corrupt-by-cow-opt", pid 3312 Write s to 0x3effff000000 via shared mapping Write p to 0x3effff000000 via private mapping Read s from 0x3effff000000 via shared mapping PASS 20 hugepages-16384kB/free_hugepages 20 hugepages-16384kB/nr_hugepages 20 hugepages-16384kB/nr_hugepages_mempolicy 0 hugepages-16384kB/nr_overcommit_hugepages 2 hugepages-16384kB/resv_hugepages 0 hugepages-16384kB/surplus_hugepages 0 hugepages-16777216kB/free_hugepages 0 hugepages-16777216kB/nr_hugepages 0 hugepages-16777216kB/nr_hugepages_mempolicy 0 hugepages-16777216kB/nr_overcommit_hugepages 0 hugepages-16777216kB/resv_hugepages 0 hugepages-16777216kB/surplus_hugepages (... output cut from ~17 iterations ...) # env LD_LIBRARY_PATH=./obj64 ./tests/obj64/corrupt-by-cow-opt; /root/grab.sh Starting testcase "./tests/obj64/corrupt-by-cow-opt", pid 3686 Write s to 0x3effff000000 via shared mapping Bus error 20 hugepages-16384kB/free_hugepages 20 hugepages-16384kB/nr_hugepages 20 hugepages-16384kB/nr_hugepages_mempolicy 0 hugepages-16384kB/nr_overcommit_hugepages 19 hugepages-16384kB/resv_hugepages 0 hugepages-16384kB/surplus_hugepages 0 hugepages-16777216kB/free_hugepages 0 hugepages-16777216kB/nr_hugepages 0 hugepages-16777216kB/nr_hugepages_mempolicy 0 hugepages-16777216kB/nr_overcommit_hugepages 0 hugepages-16777216kB/resv_hugepages 0 hugepages-16777216kB/surplus_hugepages # env LD_LIBRARY_PATH=./obj64 ./tests/obj64/corrupt-by-cow-opt; /root/grab.sh Starting testcase "./tests/obj64/corrupt-by-cow-opt", pid 3700 Write s to 0x3effff000000 via shared mapping FAIL mmap() 2: Cannot allocate memory 20 hugepages-16384kB/free_hugepages 20 hugepages-16384kB/nr_hugepages 20 hugepages-16384kB/nr_hugepages_mempolicy 0 hugepages-16384kB/nr_overcommit_hugepages 19 hugepages-16384kB/resv_hugepages 0 hugepages-16384kB/surplus_hugepages 0 hugepages-16777216kB/free_hugepages 0 hugepages-16777216kB/nr_hugepages 0 hugepages-16777216kB/nr_hugepages_mempolicy 0 hugepages-16777216kB/nr_overcommit_hugepages 0 hugepages-16777216kB/resv_hugepages 0 hugepages-16777216kB/surplus_hugepages