Received: by 2002:a25:ef43:0:0:0:0:0 with SMTP id w3csp1046767ybm; Wed, 27 May 2020 14:49:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx3ZosFSK1wIwwPh3Kf8o+kKbN5MDgmdJELWLzERd2GSmPSv7THOIt+HP3ymRYRIvV7EcAm X-Received: by 2002:a05:6402:14d5:: with SMTP id f21mr149693edx.327.1590616183034; Wed, 27 May 2020 14:49:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590616183; cv=none; d=google.com; s=arc-20160816; b=x3uJBAsQtHm3VfSolkiQIeu30yzupWwdHk+Tk1Ogy++2PsWCtlPXryxrUTRGm1jMlD R2ryqqcFohTJMdDDUFtfcW7YoB79VBSJiPWj9KMWmur/MAy38VPWbSyoyAIbVEAG+mEv iy0igfoa+NgCq8URkWOWVkijRkVphVYYLi4mvThtrGQed9Mq6ZdoajACXkkRo0OrTAuP 0qpK+IfINKy5q0dHeIaAkbU2DQKEYaV6SGpSDcRTUhrcHZl/jmWIK1A+DDfxoG5xw9EJ 4NIjkpTA30Y+wmGRmsZHCMJQURtOG69Fd/DVYlgTRcIjxEXuacegzjQVhM0964Bz3Dam bKMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=WMJebq3Hh+s5Z0QcM5uJG07mcNLMXzjCS2MDwWHQC0Y=; b=zYcox5sLztMEQfjx3SmXMvinT6f9WngMdWhQCtXl6iSNppP1gpPHyLYBddjKpv87AL CndA/EqWheWBU6VCXXP3n8/ODBLZB1fXQfgLnXp+5DxbyG+n0mcZkN96njDstybPf5f7 mluFYmGRaKf/behZtBAghDhTE+pQhIlBfb7lECtOrgxvh8Vjlsr00hsPTeWDTLXSP0X7 /keZYjPHR6+gsUSm86lj172ByV/Z+V7yzI+UEx0Tl63PtEioV4vixYZPogAhGfcXnDHV +77OnbZy68Ub+zx5ovGmM6r635RhLJfOF9//PrgBoE8rEby/NSHAEyeIUOo8j+7gZ63L vQzQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=p0ZiIJkQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q8si2456029edn.307.2020.05.27.14.49.18; Wed, 27 May 2020 14:49:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=p0ZiIJkQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387724AbgE0Pm4 (ORCPT + 98 others); Wed, 27 May 2020 11:42:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55896 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726807AbgE0Pm4 (ORCPT ); Wed, 27 May 2020 11:42:56 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF37CC05BD1E for ; Wed, 27 May 2020 08:42:55 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id k22so4736827qtm.6 for ; Wed, 27 May 2020 08:42:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lca.pw; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=WMJebq3Hh+s5Z0QcM5uJG07mcNLMXzjCS2MDwWHQC0Y=; b=p0ZiIJkQ9li1YIAfuDCce5BfJ7GfHKeSACkEUvfXFMK5lw5KQYdKs3u5LHtlJrbr+R lZIs3ubmr7kwcRRGqBrpQjZ0RvErNYIygHlSSHiQEcE1bhAE/KF5S6BE4yxjIB6NFYQH AnaN7FgUQNFGig5A1wglkSBCxlvNrH5aWbaacEjXmUL+lc+LFaxvMi2YqVf+HsOuBMbw upN+UL7iFcamlIXpTXdE2gR+0q5a1MPYR4mACNqDpnGMYaQz7FckZ3W/zh9cAuRsBViV 257hyauo2qrviPoZL6BYKvqdmc/2yq+tu63Jgo7TAFyFB4oqtehbMIOMcODcMyMz8lJD As5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=WMJebq3Hh+s5Z0QcM5uJG07mcNLMXzjCS2MDwWHQC0Y=; b=ckf6V3ItNBWu6Yr32O+AM6B+ogddpGZnlBFsWtWJyYIzZ9l0nJzTdB3h57eN/89WEt 73EPXcK20cs3EwvPkrffhe7qlF7PUOYXJCuftMvgGOahjLM/IICH3uCNun7i3vfZ96fG z1Trbj0Tjzwt0VktWFiY3NTRj/lFle9Q3LhmHvI9EEifoafBRwJ5mWGmR69TcgmqrKv9 OP0UDV8aLbvsholbzjz4VOGsJUopfVXAU5AR9HZ0cGoIEkGspTH1V5bZ0ITPmG1vVzA6 0BtuYOH99GuqOXOd+MqNzS6DUO20ttNrazvE2ZB/hXH/46Eyj/eDKq+w7PvcR0bPO0ki loIQ== X-Gm-Message-State: AOAM5310Kgbfe/21aWSImGbP+oMgHJcuEi0rq0TIz4LaCz3osF/pd2jj 5sAmmkkDmfMozjrFf/cowzByzw== X-Received: by 2002:ac8:82b:: with SMTP id u40mr4830976qth.369.1590594174986; Wed, 27 May 2020 08:42:54 -0700 (PDT) Received: from lca.pw (pool-71-184-117-43.bstnma.fios.verizon.net. [71.184.117.43]) by smtp.gmail.com with ESMTPSA id v1sm2496825qkb.19.2020.05.27.08.42.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 May 2020 08:42:53 -0700 (PDT) Date: Wed, 27 May 2020 11:42:50 -0400 From: Qian Cai To: Feng Tang Cc: Andrew Morton , Michal Hocko , Johannes Weiner , Stephen Rothwell , Matthew Wilcox , Mel Gorman , Kees Cook , Luis Chamberlain , Iurii Zaikin , andi.kleen@intel.com, tim.c.chen@intel.com, dave.hansen@intel.com, ying.huang@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/3] make vm_committed_as_batch aware of vm overcommit policy Message-ID: <20200527154250.GB741@lca.pw> References: <1588922717-63697-1-git-send-email-feng.tang@intel.com> <20200521212726.GC6367@ovpn-112-192.phx2.redhat.com> <20200526181459.GD991@lca.pw> <20200527014647.GB93879@shbuild999.sh.intel.com> <20200527022539.GK991@lca.pw> <20200527104606.GE93879@shbuild999.sh.intel.com> <20200527120549.GA741@lca.pw> <20200527133332.GA20232@shbuild999.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200527133332.GA20232@shbuild999.sh.intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 27, 2020 at 09:33:32PM +0800, Feng Tang wrote: > Hi Qian, > > On Wed, May 27, 2020 at 08:05:49AM -0400, Qian Cai wrote: > > On Wed, May 27, 2020 at 06:46:06PM +0800, Feng Tang wrote: > > > Hi Qian, > > > > > > On Tue, May 26, 2020 at 10:25:39PM -0400, Qian Cai wrote: > > > > > > > > [1] https://lkml.org/lkml/2020/3/5/57 > > > > > > > > > > > > > > Reverted this series fixed a warning under memory pressue. > > > > > > > > > > > > Andrew, Stephen, can you drop this series? > > > > > > > > > > > > > > > > > > > > [ 3319.257898] LTP: starting oom01 > > > > > > > [ 3319.284417] ------------[ cut here ]------------ > > > > > > > [ 3319.284439] memory commitment underflow > > > > > > > > > > Thanks for the catch! > > > > > > > > > > Could you share the info about the platform, like the CPU numbers > > > > > and RAM size, and what's the mmap test size of your test program. > > > > > It would be great if you can point me the link to the test program. > > > > > > > > I have been reproduced this on both AMD and Intel. The test just > > > > allocating memory and swapping. > > > > > > > > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/oom/oom01.c > > > > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/tunable/overcommit_memory.c > > > > > > > > It might be better to run the whole LTP mm tests if none of the above > > > > triggers it for you which has quite a few memory pressurers. > > > > > > > > /opt/ltp/runltp -f mm > > > > > > Thanks for sharing. I tried to reproduce this on 2 server plaforms, > > > but can't reproduce it, and they are still under testing. > > > > > > Meanwhile, could you help to try the below patch, which is based on > > > Andi's suggestion and have some debug info. The warning is a little > > > strange, as the condition is > > > > > > (percpu_counter_read(&vm_committed_as) < > > > -(s64)vm_committed_as_batch * num_online_cpus()) > > > > > > while for your platform (48 CPU + 128 GB RAM), the > > > '-(s64)vm_committed_as_batch * num_online_cpus()' > > > is a s64 value: '-32G', which makes the condition hard to be true, > > > and when it is, it could be triggered by some magic for s32/s64 > > > operations around the percpu-counter. > > > > Here is the information on AMD and powerpc below affected by this. It > > could need a bit patient to reproduce, but our usual daily CI would > > trigger it eventually after a few tries. > > > > # git clone https://github.com/cailca/linux-mm.git > > # cd linux-mm > > # ./compile.sh > > # systemctl reboot > > # ./test.sh > > I just downloaded it, and it failed on my desktop machine as it failed > in 'yum' and 'grub2' setup. The difficulty for me to reproduce is the > test platforms are behind the 0day framework, and I can hardly setup > external test suits, though I have been trying for all day today :) I tried your debug patch and it did not even compile on linux-next (where the issue was happened) and I am running out of time today. It probably need to reproduce on large systems as it did not happen on one of our small s390 system here.