Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp1614003imm; Tue, 2 Oct 2018 11:00:39 -0700 (PDT) X-Google-Smtp-Source: ACcGV61Kwk3/BjdQ0KmVDuv2CGorrFqFo9kM0PpsDaGsLaJvFXKaOKc3/mD1RoG5A7FfB+r2E6wn X-Received: by 2002:a17:902:6b09:: with SMTP id o9-v6mr17903702plk.316.1538503239086; Tue, 02 Oct 2018 11:00:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538503239; cv=none; d=google.com; s=arc-20160816; b=iPHxN2pQlTmCOagiV2oOqyEwmZypyZq6WMdrWgI1IT/MNtVZgpAdex/2GcDf3kV0sf 5CNuwTGJVKLbHW9r2z0A2pY1XlQ0ZDkNfD2dCHDRNzXV/La3c2LgHSYZ0CvM3NhbqfDq Tj00n8slWV3tJcjRX01KLqy2F4M5K31Aa0hbxuyT0w3bMmxrREMeOvFV1ujvJnXaPtpY QDf+f9w/au419o2hBz4HkaO38EJltXrGqDQiohgtaSo4LEcFdAjSVfLOekqBC8SVRZ8W mR0uBECp8pqZ3Bu93QbfALVs0Es1akiuXPwa3ZR28LwgUGpdHmGExdXb+RGJopY4+Mmf q8sA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:subject:cc:to :from:date; bh=GBkOllBrjnDnFfrvb+4CG5xPIPJTFEqxBjeAp8QStKQ=; b=tWVgzPdmvmb1wLzSDWdNQZ6NWfWKCRpXaEI0kPUMWaPBffVj8suRlyBZ5WneAlLHIW rUiMGRD9M8UmpAV+K5cTyoL/7SIrITfEETUymRtmM2dNRKWS7WawSxJ0aFG7GeH/flIe 6+tmf6NDegEw9HPcrzV3/cYMClz3+GLuhc6CNTGTqgm8giJRS6Lx8kQahtj+rRzx4For jTfyR9IyU8T7caN1oEOuAJ4dnhFztbkJtoA2SLmG2XwvB3phwCAMadzJw1cnRXrUJye5 Bcqb9kTyKwAHp1kDkJznlwthZraE6SUWlchDgY7ofWFMoND0E6nFKyXbEqPzqXJIXT1A h3eg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 129-v6si11066290pfd.201.2018.10.02.11.00.24; Tue, 02 Oct 2018 11:00:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727398AbeJCAPA (ORCPT + 99 others); Tue, 2 Oct 2018 20:15:00 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:55010 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726772AbeJCAPA (ORCPT ); Tue, 2 Oct 2018 20:15:00 -0400 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w92HUMQS070035 for ; Tue, 2 Oct 2018 13:30:30 -0400 Received: from e06smtp04.uk.ibm.com (e06smtp04.uk.ibm.com [195.75.94.100]) by mx0b-001b2d01.pphosted.com with ESMTP id 2mvbeq4gu0-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 02 Oct 2018 13:30:26 -0400 Received: from localhost by e06smtp04.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 2 Oct 2018 18:30:11 +0100 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp04.uk.ibm.com (192.168.101.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 2 Oct 2018 18:30:08 +0100 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w92HU7TU31326366 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 2 Oct 2018 17:30:07 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3C4B6AE059; Tue, 2 Oct 2018 20:29:00 +0100 (BST) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 92D3DAE045; Tue, 2 Oct 2018 20:28:58 +0100 (BST) Received: from linux.vnet.ibm.com (unknown [9.199.45.239]) by d06av26.portsmouth.uk.ibm.com (Postfix) with SMTP; Tue, 2 Oct 2018 20:28:58 +0100 (BST) Date: Tue, 2 Oct 2018 23:00:05 +0530 From: Srikar Dronamraju To: Mel Gorman Cc: Peter Zijlstra , Ingo Molnar , Jirka Hladky , Rik van Riel , LKML , Linux-MM Subject: Re: [PATCH 2/2] mm, numa: Migrate pages to local nodes quicker early in the lifetime of a task Reply-To: Srikar Dronamraju References: <20181001100525.29789-1-mgorman@techsingularity.net> <20181001100525.29789-3-mgorman@techsingularity.net> <20181002124149.GB4593@linux.vnet.ibm.com> <20181002135459.GA7003@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20181002135459.GA7003@techsingularity.net> User-Agent: Mutt/1.5.24 (2015-08-30) X-TM-AS-GCONF: 00 x-cbid: 18100217-0016-0000-0000-0000020D6E4E X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18100217-0017-0000-0000-00003264ADE5 Message-Id: <20181002173005.GD4593@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-10-02_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1810020167 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > > > This does have issues when using with workloads that access more shared faults > > than private faults. > > > > Not as such. It can have issues on workloads where memory is initialised > by one thread, then additional threads are created and access the same > memory. They are not necessarily shared once buffers are handed over. In > such a case, migrating quickly is the right thing to do. If it's truely > shared pages then there may be some unnecessary migrations early in the > lifetime of the task but it'll settle down quickly enough. > Do you have a workload recommendation to try for shared fault accesses. I will try to get a DayTrader run in a day or two. There JVM and db threads act on the same memory, I presume it might show some insights. > Is it just numa01 that was affected for you? I ask because that particular > workload is an averse workload on any machine with more than sockets and > your machine description says it has 4 nodes. What it is testing is quite > specific to 2-node machines. > Agree, Some variations of numa01.sh where I have one process having threads equal to number of cpus does regress but not as much as numa01. ./numa03.sh Real: 484.84 555.51 518.59 22.91 -5.84277% ./numa03.sh Sys: 44.41 64.40 53.24 6.65 -11.3824% ./numa03.sh User: 51328.77 59429.39 55366.62 2744.39 -9.47912% > > SPECJbb did show some small loss and gains. > > > > That almost always shows small gains and losses so that's not too > surprising. > Okay. > > Our numa grouping is not fast enough. It can take sometimes several > > iterations before all the tasks belonging to the same group end up being > > part of the group. With the current check we end up spreading memory faster > > than we should hence hurting the chance of early consolidation. > > > > Can we restrict to something like this? > > > > if (p->numa_scan_seq >=MIN && p->numa_scan_seq <= MIN+4 && > > (cpupid_match_pid(p, last_cpupid))) > > return true; > > > > meaning, we ran atleast MIN number of scans, and we find the task to be most likely > > task using this page. > > > > What's MIN? Assuming it's any type of delay, note that this will regress > STREAM again because it's very sensitive to the starting state. > I was thinking of MIN as 3 to give a chance for things to settle. but that might not help STREAM as you pointed out. Do you have a hint on which commit made STREAM regress? if we want to prioritize STREAM like workloads (i.e private faults) one simpler fix could be to change the quadtraic equation from: if (!cpupid_pid_unset(last_cpupid) && cpupid_to_nid(last_cpupid) != dst_nid) return false; to: if (!cpupid_pid_unset(last_cpupid) && cpupid_to_nid(last_cpupid) == dst_nid) return true; i.e to say if the group tasks likely consolidated to a node or the task was moved to a different node but access were private, just move the memory. The drawback though is we keep pulling memory everytime the task moves across nodes. (which is probably restricted for long running tasks to some extent by your fix) -- Thanks and Regards Srikar Dronamraju