Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753002AbdDKRYa (ORCPT ); Tue, 11 Apr 2017 13:24:30 -0400 Received: from resqmta-ch2-10v.sys.comcast.net ([69.252.207.42]:38222 "EHLO resqmta-ch2-10v.sys.comcast.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751890AbdDKRY2 (ORCPT ); Tue, 11 Apr 2017 13:24:28 -0400 Date: Tue, 11 Apr 2017 12:24:25 -0500 (CDT) From: Christoph Lameter X-X-Sender: cl@east.gentwo.org To: Vlastimil Babka cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Li Zefan , Michal Hocko , Mel Gorman , David Rientjes , Hugh Dickins , Andrea Arcangeli , Anshuman Khandual , "Kirill A. Shutemov" Subject: Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update In-Reply-To: <20170411140609.3787-2-vbabka@suse.cz> Message-ID: References: <20170411140609.3787-1-vbabka@suse.cz> <20170411140609.3787-2-vbabka@suse.cz> Content-Type: text/plain; charset=US-ASCII X-CMAE-Envelope: MS4wfHbs+iawAf5145CLNPL+aqFXMb/Q7ly2QRVpmsTP7Y72VwZcxDzsgBs+c68TpAI3Z2MEQNxpGWmA0ncIFuFEg5/0FmGztuIhaE19Th2sTx/SWTW9YlxL /0gArIZ6Hi2AvAAMT9BLhI7Fto8mdLXaBNurxNnf+z02k2RMW3QH5IUSoKLVppW7bLZQZwmifPzywnZFUyO1HhtawsswiNv/5YQjjCExafs30tVUEXXx/Qna DpvIq9Pzo4Ca+ENt/Zso0Wd+rvfG1EOjVFG/98d4GrpY/Ar5ozxsiIvpNw+dL0UDa3ZbuA8QUz0BiLP/JasFpTFCQTJb8d2ZoZ6jFLQFdnMu7lNiFWehJMON Lm7og7fO9+lx0nJ+Jl6PBNZR729n09l/U63UPCaVwhAgxCjYHd6gkwUN8fwIERCmntM/mDbsQvxNiRIByktMF0faH4J2yW3QoFuiAPKGB0JlQG/F+lfkhnDS 2mSfQXD17XJAhOXencvt7I7GydeLM4gNRiFz6omrKCk+bauSOXKvTlmiwP0= Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1035 Lines: 19 On Tue, 11 Apr 2017, Vlastimil Babka wrote: > The root of the problem is that the cpuset's mems_allowed and mempolicy's > nodemask can temporarily have no intersection, thus get_page_from_freelist() > cannot find any usable zone. The current semantic for empty intersection is to > ignore mempolicy's nodemask and honour cpuset restrictions. This is checked in > node_zonelist(), but the racy update can happen after we already passed the The fallback was only intended for a cpuset on which boundaries are not enforced in critical conditions (softwall). A hardwall cpuset (CS_MEM_HARDWALL) should fail the allocation. > This patch fixes the issue by having __alloc_pages_slowpath() check for empty > intersection of cpuset and ac->nodemask before OOM or allocation failure. If > it's indeed empty, the nodemask is ignored and allocation retried, which mimics > node_zonelist(). This works fine, because almost all callers of Well that would need to be subject to the hardwall flag. Allocation needs to fail for a hardwall cpuset.