Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755464AbdDLVZg (ORCPT ); Wed, 12 Apr 2017 17:25:36 -0400 Received: from resqmta-ch2-10v.sys.comcast.net ([69.252.207.42]:52616 "EHLO resqmta-ch2-10v.sys.comcast.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753804AbdDLVZe (ORCPT ); Wed, 12 Apr 2017 17:25:34 -0400 Date: Wed, 12 Apr 2017 16:25:32 -0500 (CDT) From: Christoph Lameter X-X-Sender: cl@east.gentwo.org To: Vlastimil Babka cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Li Zefan , Michal Hocko , Mel Gorman , David Rientjes , Hugh Dickins , Andrea Arcangeli , Anshuman Khandual , "Kirill A. Shutemov" , linux-api@vger.kernel.org Subject: Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update In-Reply-To: Message-ID: References: <20170411140609.3787-1-vbabka@suse.cz> <20170411140609.3787-2-vbabka@suse.cz> Content-Type: text/plain; charset=US-ASCII X-CMAE-Envelope: MS4wfDL+6U14d2McYIEP/I2s/fPuF08f04Lcmp7EkQX/xbsnClZ0S/Mzq+htOk6+HZMNgmiduhkjVAF/+5yz9623TFiCrX+N7fcRsWi2ZgPmatDJpgcC6/7e 4XvR4cF2ttrunvuuH24VvgF9HJu5Dh/Mufr85cL479rTmrTV7FdCf+ndJSDHgel1ZQltjggkvRea2aG2wp2MZgOKUMBxaN4AeHdrRRKh91XIdied/CIEMuUs 6gzwc2ZdDk2NDTC/sSD5huzDZGhsRmEn8JeIdN17eu9Mm2U7xZN2UTeFbqpYWAWZnELrJAkqjgIuefOxQ4rKlPNT1DH4cj8wWD8uQAL9TrOn/cNe2vZCFoak gArk0Qfaq40UpMIHItZYr/t7gX93XQbN2MtHaZZyRhOpRq0U9WOz3IIbAOLEuZ3DE11kPl1QualHLMTcj2d1LMPGyAG3B4KYOwcWcnJmC5KfxwNQjR8C2zg9 0a9p2PE90CrYeTE0OzjdTuHwunhuCOn7T4D3kHlJovmiigtYkhBnPNAZZvzXfK78yp++4+4FuZP31Jym85hFGJiSDOGC3naecujKMg== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1898 Lines: 37 On Tue, 11 Apr 2017, Vlastimil Babka wrote: > > The fallback was only intended for a cpuset on which boundaries are not enforced > > in critical conditions (softwall). A hardwall cpuset (CS_MEM_HARDWALL) > > should fail the allocation. > > Hmm just to clarify - I'm talking about ignoring the *mempolicy's* nodemask on > the basis of cpuset having higher priority, while you seem to be talking about > ignoring a (softwall) cpuset nodemask, right? man set_mempolicy says "... if > required nodemask contains no nodes that are allowed by the process's current > cpuset context, the memory policy reverts to local allocation" which does come > down to ignoring mempolicy's nodemask. I am talking of allocating outside of the current allowed nodes (determined by mempolicy -- MPOL_BIND is the only concern as far as I can tell -- as well as the current cpuset). One can violate the cpuset if its not a hardwall but the MPOL_MBIND node restriction cannot be violated. Those allocations are also not allowed if the allocation was for a user space page even if this is a softwall cpuset. > >> This patch fixes the issue by having __alloc_pages_slowpath() check for empty > >> intersection of cpuset and ac->nodemask before OOM or allocation failure. If > >> it's indeed empty, the nodemask is ignored and allocation retried, which mimics > >> node_zonelist(). This works fine, because almost all callers of > > > > Well that would need to be subject to the hardwall flag. Allocation needs > > to fail for a hardwall cpuset. > > They still do, if no hardwall cpuset node can satisfy the allocation with > mempolicy ignored. If the memory policy is MPOL_MBIND then allocations outside of the given nodes should fail. They can violate the cpuset boundaries only if they are kernel allocations and we are not in a hardwall cpuset. That was at least my understand when working on this code years ago.