Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753360AbdDKTAY (ORCPT ); Tue, 11 Apr 2017 15:00:24 -0400 Received: from mx2.suse.de ([195.135.220.15]:46855 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752753AbdDKTAU (ORCPT ); Tue, 11 Apr 2017 15:00:20 -0400 Subject: Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update To: Christoph Lameter References: <20170411140609.3787-1-vbabka@suse.cz> <20170411140609.3787-2-vbabka@suse.cz> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Li Zefan , Michal Hocko , Mel Gorman , David Rientjes , Hugh Dickins , Andrea Arcangeli , Anshuman Khandual , "Kirill A. Shutemov" , linux-api@vger.kernel.org From: Vlastimil Babka Message-ID: Date: Tue, 11 Apr 2017 21:00:21 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1649 Lines: 32 +CC linux-api On 11.4.2017 19:24, Christoph Lameter wrote: > On Tue, 11 Apr 2017, Vlastimil Babka wrote: > >> The root of the problem is that the cpuset's mems_allowed and mempolicy's >> nodemask can temporarily have no intersection, thus get_page_from_freelist() >> cannot find any usable zone. The current semantic for empty intersection is to >> ignore mempolicy's nodemask and honour cpuset restrictions. This is checked in >> node_zonelist(), but the racy update can happen after we already passed the > > The fallback was only intended for a cpuset on which boundaries are not enforced > in critical conditions (softwall). A hardwall cpuset (CS_MEM_HARDWALL) > should fail the allocation. Hmm just to clarify - I'm talking about ignoring the *mempolicy's* nodemask on the basis of cpuset having higher priority, while you seem to be talking about ignoring a (softwall) cpuset nodemask, right? man set_mempolicy says "... if required nodemask contains no nodes that are allowed by the process's current cpuset context, the memory policy reverts to local allocation" which does come down to ignoring mempolicy's nodemask. >> This patch fixes the issue by having __alloc_pages_slowpath() check for empty >> intersection of cpuset and ac->nodemask before OOM or allocation failure. If >> it's indeed empty, the nodemask is ignored and allocation retried, which mimics >> node_zonelist(). This works fine, because almost all callers of > > Well that would need to be subject to the hardwall flag. Allocation needs > to fail for a hardwall cpuset. They still do, if no hardwall cpuset node can satisfy the allocation with mempolicy ignored.