Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932639AbcJUMZf (ORCPT ); Fri, 21 Oct 2016 08:25:35 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:44815 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755509AbcJUMZc (ORCPT ); Fri, 21 Oct 2016 08:25:32 -0400 From: "Aneesh Kumar K.V" To: Vlastimil Babka , Michal Hocko , Andrew Morton Cc: Mel Gorman , David Rientjes , Anshuman Khandual , linux-mm@kvack.org, LKML , Michal Hocko Subject: Re: [PATCH] mm, mempolicy: clean up __GFP_THISNODE confusion in policy_zonelist In-Reply-To: References: <20161013125958.32155-1-mhocko@kernel.org> <877f92ue91.fsf@linux.vnet.ibm.com> Date: Fri, 21 Oct 2016 17:55:20 +0530 MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16102112-0016-0000-0000-000004F911D8 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00005952; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000187; SDB=6.00771021; UDB=6.00369701; IPR=6.00547585; BA=6.00004823; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013064; XFM=3.00000011; UTC=2016-10-21 12:25:27 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16102112-0017-0000-0000-000033F8EE3C Message-Id: <874m45vqhb.fsf@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-10-21_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=1 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1610210226 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1888 Lines: 40 Vlastimil Babka writes: > On 10/21/2016 01:34 PM, Aneesh Kumar K.V wrote: >> Michal Hocko writes: >>> >> >> For both MPOL_PREFERED and MPOL_INTERLEAVE we pick the zone list from >> the node other than the current running node. Why don't we do that for >> MPOL_BIND ?ie, if the current node is not part of the policy node mask >> why are we not picking the first node from the policy node mask for >> MPOL_BIND ? > > For MPOL_PREFERED and MPOL_INTERLEAVE we got some explicit preference of nodes, > so it makes sense that the nodes in the zonelist we pick are ordered by the > distance from that node, regardless of current node. > > For MPOL_BIND, we don't have preferences but restrictions. If the current cpu is > from a node within the restriction, then great. If it's not, finding a node > according to distance from current cpu is probably less arbitrary than by > distance from the node that happens to have the lowest id in the node mask? I agree. This is related to the changes we are working in this part of the kernel. We are looking at adding support for coherent device. By default we don't want to allocate memory from the coherent device node, but then we are looking at an user space interface that can be used to force allocation. For now, to avoid allocation hitting the coherent device, we build the zonelist of the nodes such that zones from the coherent device are not present in any other node's zone list. We looked at use MPOL_BIND as the user space interface to force allocation from coherent device node. MPOL_BIND usage breaks with the above detail you mentioned about MPOL_BIND. >From what you are suggesting above, I guess the right approach is to add coherent node's zones to all the node's zone list and make sure the default node mask used for allocation (N_MEMORY) doesn't have coherent device node ? -aneesh