Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1113723ybt; Wed, 8 Jul 2020 21:57:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxJjOL5BVylulXk3I6EXYVgWMUiFQ3Tk5c1jwGAxmETZJCmUDoL9zTCrRnyJNmpwwBQVC6N X-Received: by 2002:a17:906:375a:: with SMTP id e26mr56613542ejc.324.1594270654196; Wed, 08 Jul 2020 21:57:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594270654; cv=none; d=google.com; s=arc-20160816; b=YA3uLUrNE3v/mFKJx1UaI9xEDKL9pkwAs8nh40ZEXuM9LYY3xY6tulxR8Cu0T6lC72 /OOtvwivjqlpX3TW5vAwm//Ss/68t7sCPAQn6xBQW8+DnNvsk60fpthalIH60ujlUUmF 7n0qNSZt1vkrIrJy99Q9LZjDq7kTmHjuMQeui8pIJ5Hfw40dkW1zr8G7biIIWnClttLv CgbH2uU+xbGXxt9H3tI1bLtoUOADP599wiLQQP5xiZ+8OJZ3qqcT6OUqhWV4iaHe+jeF hZt/LjNXMbRoPrswsoKk3YTvNwQ4dkIaOzaMqB3kO3FQKxDeAwNbRZ4VLPtlsFEpDOLb 2cBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:ironport-sdr:ironport-sdr; bh=rIhofS/hhRMVug2iZg1Mc+NCk1pFpnpwUIp2bG7RZuA=; b=VVyv6BaUUCaP/xjs5eT+ts9l01puLZ/PJ4+yl9hLptUeRpfTog7GMmgdJEcWIusOSV Gs5Cb1zm39IBX95Yg7zEKtkejdtw0CB0LZv9yZIAqIyt45RC/8aTAVW6doaPZqFAiOpL QiUOwDufb4sGi0c+NskHG/nt30SLpG5M42WUun6o9XFfQVp1pz+g+C2SEI5YkO8hS5Me 1RboGDg4jEVGeh6Vl9shuiB55+lG5wzInfWi/kxpGskR0D8MElvm7Rv2yp0ayIUUlKoe WUekVkn0nx2eWwD3kQvbPPDSso9wBEJ6Z+AOmLtjuY47b+lJcQ7EwG7Z1NTjGLWxK17A whMw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z14si1214201ejw.332.2020.07.08.21.56.57; Wed, 08 Jul 2020 21:57:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726124AbgGIE4C (ORCPT + 99 others); Thu, 9 Jul 2020 00:56:02 -0400 Received: from mga17.intel.com ([192.55.52.151]:30715 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725775AbgGIE4B (ORCPT ); Thu, 9 Jul 2020 00:56:01 -0400 IronPort-SDR: uotb+ZhqJL9rl/MQMbS/xLuIWnTtRh04zccGY1FPGG01aJoSNQ8VZbKx+6mR8R17pcjFdsIAvD vAuxvZ0hv9gQ== X-IronPort-AV: E=McAfee;i="6000,8403,9676"; a="128007830" X-IronPort-AV: E=Sophos;i="5.75,330,1589266800"; d="scan'208";a="128007830" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jul 2020 21:56:01 -0700 IronPort-SDR: IvPR29umbfUPvj6t7MskItNErhnlcuvlpD0Z46MoNFUW0F7nsB0ThPBjBGlkrk+cqbze4bMXTT ALuhRsnTCu0g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,330,1589266800"; d="scan'208";a="324103571" Received: from shbuild999.sh.intel.com (HELO localhost) ([10.239.146.107]) by orsmga007.jf.intel.com with ESMTP; 08 Jul 2020 21:55:55 -0700 Date: Thu, 9 Jul 2020 12:55:54 +0800 From: Feng Tang To: "Huang, Ying" , Andi Kleen , Qian Cai , Andrew Morton , Michal Hocko Cc: Dennis Zhou , Tejun Heo , Christoph Lameter , kernel test robot , Johannes Weiner , Matthew Wilcox , Mel Gorman , Kees Cook , Luis Chamberlain , Iurii Zaikin , tim.c.chen@intel.com, dave.hansen@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, lkp@lists.01.org Subject: Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail Message-ID: <20200709045554.GA56190@shbuild999.sh.intel.com> References: <20200705125854.GA66252@shbuild999.sh.intel.com> <20200705155232.GA608@lca.pw> <20200706014313.GB66252@shbuild999.sh.intel.com> <20200706023614.GA1231@lca.pw> <20200706132443.GA34488@shbuild999.sh.intel.com> <20200706133434.GA3483883@tassilo.jf.intel.com> <20200707023829.GA85993@shbuild999.sh.intel.com> <87zh8c7z5i.fsf@yhuang-dev.intel.com> <20200707054120.GC21741@shbuild999.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200707054120.GC21741@shbuild999.sh.intel.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 07, 2020 at 01:41:20PM +0800, Feng Tang wrote: > On Tue, Jul 07, 2020 at 12:00:09PM +0800, Huang, Ying wrote: > > Feng Tang writes: > > > > > On Mon, Jul 06, 2020 at 06:34:34AM -0700, Andi Kleen wrote: > > >> > ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); > > >> > - if (ret == 0 && write) > > >> > + if (ret == 0 && write) { > > >> > + if (sysctl_overcommit_memory == OVERCOMMIT_NEVER) > > >> > + schedule_on_each_cpu(sync_overcommit_as); > > >> > > >> The schedule_on_each_cpu is not atomic, so the problem could still happen > > >> in that window. > > >> > > >> I think it may be ok if it eventually resolves, but certainly needs > > >> a comment explaining it. Can you do some stress testing toggling the > > >> policy all the time on different CPUs and running the test on > > >> other CPUs and see if the test fails? > > > > > > For the raw test case reported by 0day, this patch passed in 200 times > > > run. And I will read the ltp code and try stress testing it as you > > > suggested. > > > > > > > > >> The other alternative would be to define some intermediate state > > >> for the sysctl variable and only switch to never once the schedule_on_each_cpu > > >> returned. But that's more complexity. > > > > > > One thought I had is to put this schedule_on_each_cpu() before > > > the proc_dointvec_minmax() to do the sync before sysctl_overcommit_memory > > > is really changed. But the window still exists, as the batch is > > > still the larger one. > > > > Can we change the batch firstly, then sync the global counter, finally > > change the overcommit policy? > > These reorderings are really head scratching :) > > I've thought about this before when Qian Cai first reported the warning > message, as kernel had a check: > > VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) < > -(s64)vm_committed_as_batch * num_online_cpus(), > "memory commitment underflow"); > > If the batch is decreased first, the warning will be easier/earlier to be > triggered, so I didn't brought this up when handling the warning message. > > But it might work now, as the warning has been removed. I tested the reorder way, and the test could pass in 100 times run. The new order when changing policy to OVERCOMMIT_NEVER: 1. re-compute the batch ( to the smaller one) 2. do the on_each_cpu sync 3. really change the policy to NEVER. It solves one of previous concern, that after the sync is done on cpuX, but before the whole sync on all cpus are done, there is a window that the percpu-counter could be enlarged again. IIRC Andi had concern about read side cost when doing the sync, my understanding is most of the readers (malloc/free/map/unmap) are using percpu_counter_read_positive, which is a fast path without involving lock. As for the problem itself, I agree with Michal's point, that usually there is no normal case that will change the overcommit_policy too frequently. The code logic is mainly in overcommit_policy_handler(), based on the previous sync fix. please help to review, thanks! int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { int ret; if (write) { int new_policy; struct ctl_table t; t = *table; t.data = &new_policy; ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos); if (ret) return ret; mm_compute_batch(new_policy); if (new_policy == OVERCOMMIT_NEVER) schedule_on_each_cpu(sync_overcommit_as); sysctl_overcommit_memory = new_policy; } else { ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); } return ret; } - Feng