Received: by 10.223.176.46 with SMTP id f43csp3543032wra; Mon, 22 Jan 2018 16:42:10 -0800 (PST) X-Google-Smtp-Source: AH8x2255qJ2518kjp+aVSAYryYV7oojcg5mQTGu0gWbF5IR+HC7AIZe42coNJvNjFLSL73fYxOWi X-Received: by 10.107.89.11 with SMTP id n11mr1308856iob.154.1516668130549; Mon, 22 Jan 2018 16:42:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516668130; cv=none; d=google.com; s=arc-20160816; b=SjIyNZOCNTXFA0JsumYHnStoeHuJphP4NvgPXkwv9VN/kqRVEexiYHZYD4eIu0xdkD A/a5cbGTT5g43mUqgAY32Nz+2nAE9YFxjNm+hZJo45Kvdaee12HyMChAxZy07XntbdCG f7wHzk2kwPCRVlFmDKfTzLt0vGXmcoIXzxKtebfIr9qRuL2gc+yHk9u8sltqN3Jasq7h 0ptqas5ydbQ19i7qChJZo/djuj/98e3dVCrSYTxIynxVLrPCQlMSTEArMMbqKUa3UG8D ZwaOy6V/W/MYeayIAgCIKyUSBGumA8y1hn/A4PnKU/QVFYjQYBbjvu5OZlh6ns25XSvj STig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature:arc-authentication-results; bh=/idB2iR57NulghH0Y0xfO1opB4aT8xmZSnCS9+xPDqg=; b=pDc7Q0KXLfxNEVrlybfv/5Ssno/ATqPq/4CFs5X0sNI7S9/gbe0soTEa1n3YYt7Qrw /+PORDlGfq6wblvBC3VuCfGfoSBdWbbdJqNq6rY5nqiFURKW36osUlTRF7KCtkbJOhGr OgoBV/kB5IznLEvZbB9Np0u7AxZO+IVz9wAyjBSxO4I4gVzc8z4bCPfwaog9XXTjY/xd bis0aqBXG/PGdQpG5z+80El3l9KkDjVDWd2SM9iDxNGXtifyJqEP3izLKHuIvAWLooOC kbb2yQtzwVQ/Z4UItI566p+xzZjDTog+JkNc9NXGaE7ZnlXblt/fAtVa//bNd9OIZbwp RTZg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=Z1dKL8sE; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j4si13811291iof.83.2018.01.22.16.41.56; Mon, 22 Jan 2018 16:42:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=Z1dKL8sE; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751324AbeAWAlc (ORCPT + 99 others); Mon, 22 Jan 2018 19:41:32 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:50904 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751253AbeAWAla (ORCPT ); Mon, 22 Jan 2018 19:41:30 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w0N0RqpS067940; Tue, 23 Jan 2018 00:41:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=content-type : mime-version : subject : from : in-reply-to : date : cc : content-transfer-encoding : message-id : references : to; s=corp-2017-10-26; bh=/idB2iR57NulghH0Y0xfO1opB4aT8xmZSnCS9+xPDqg=; b=Z1dKL8sE0v48tfzc/WLE0OmGIGSHZ4owzt1glqP0Lae1LmnpxScfkVSxoIR+GK2kM+nA /au4M90oUaIsvzlV5Q0mDscLxGEb1wvJC2lC7w4zwOxCPUDOyXC+opqMtgtnDbWgmiIV Pwqvxe16QEDatp/B0O4DOhXViEYI4Hd8eX4GvtUvVWDKf/pwgyA8eEbHxbG2XtgRk+F/ o82xbzXUKFADeqj9wFkGNetD/pUwOHRghk+LOMMXNjgIGHvbqvczEYGVK4vKlaJctEyr 9XKaMAv0iacJkYDcbJbzHnvYwq/UGVErTuFfOqrJtKMLpNgxiFt6Avw+ZhfF5ddQxiH+ Mg== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp2130.oracle.com with ESMTP id 2fnrh3ggcr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 23 Jan 2018 00:41:16 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w0N0fFuY014291 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 23 Jan 2018 00:41:15 GMT Received: from abhmp0004.oracle.com (abhmp0004.oracle.com [141.146.116.10]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w0N0fEYn015722; Tue, 23 Jan 2018 00:41:14 GMT Received: from dhcp-10-159-133-171.vpn.oracle.com (/10.159.133.171) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 22 Jan 2018 16:41:13 -0800 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\)) Subject: Re: [PATCH] mm: numa: Do not trap faults on shared data section pages. From: Henry Willard In-Reply-To: Date: Mon, 22 Jan 2018 16:41:11 -0800 Cc: Mel Gorman , akpm@linux-foundation.org, kstewart@linuxfoundation.org, zi.yan@cs.rutgers.edu, pombredanne@nexb.com, aarcange@redhat.com, gregkh@linuxfoundation.org, aneesh.kumar@linux.vnet.ibm.com, kirill.shutemov@linux.intel.com, jglisse@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <1516130924-3545-1-git-send-email-henry.willard@oracle.com> <1516130924-3545-2-git-send-email-henry.willard@oracle.com> <20180116212614.gudglzw7kwzd3get@suse.de> <2BEFC6DE-7A47-4CB9-AAE5-CEF70453B46F@oracle.com> To: Christopher Lameter X-Mailer: Apple Mail (2.3445.5.20) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8782 signatures=668655 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=786 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801230004 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Jan 19, 2018, at 6:12 PM, Christopher Lameter wrote: >=20 > On Thu, 18 Jan 2018, Henry Willard wrote: >=20 >> If MPOL_MF_LAZY were allowed and specified things would not work >> correctly. change_pte_range() is unaware of and can=E2=80=99t honor = the >> difference between MPOL_MF_MOVE_ALL and MPOL_MF_MOVE. >=20 > Not sure how that relates to what I said earlier... Sorry. Only that CAP_SYS_NICE is not relevant to this patch. >=20 >>=20 >> For the case of auto numa balancing, it may be undesirable for shared >> pages to be migrated whether they are also copy-on-write or not. The >> copy-on-write test was added to restrict the effect of the patch to = the >> specific situation we observed. Perhaps I should remove it, I don=E2=80= =99t >> understand why it would be desirable to modify the behavior via = sysfs. >=20 > I think the most common case of shared pages occurs for pages that = contain > code. In that case a page may be mapped into hundreds if not thousands = of > processes. In particular that is often the case for basic system = libraries > like the c library which may actually be mapped into every binary that = is > running. That is true, but auto numa balancing skips these and similar pages = before it calls change_prot_numa(). They don=E2=80=99t even have to be = actually shared to be skipped.=20 >=20 > It is very difficult and expensive to unmap these pages from all the > processes in order to migrate them. So some sort of limit would be = useful > to avoid unnecessary migration attempts. One example would be to = forbid > migrating pages that are mapped in more than 5 processes. Some sysctl = know > would be useful here to set the boundary. >=20 > Your patch addresses a special case here by forbidding migration of = any > page mapped by more than a single process (mapcount !=3D1). The current patch skips pages that are in copy-on-write VMAs and still = shared. These include pages in the program=E2=80=99s data segment that = are writable. but have not been written to. Once the pages are modified = they are no longer shared and can be migrated. The problem is that in = some cases, the pages are never modified and remain shared. Prior to commit 4b10e7d562c90d0a72f324832c26653947a07381, = change_prot_numa() called change_prot_numa_range(), which tested for = (page_mapcount(page) !=3D 1) and bailed out for any shared pages. This = patch is more selective. A simple test for shared or not seems to be = common. >=20 > That would mean f.e. that the complete migration of a set of processes > that rely on sharing data via a memory segment is impossible because = those > shared pages can never be moved. >=20 > By setting the limit higher that migration would still be possible. >=20 > Maybe we can set that limit by default at 5 and allow a higher setting > if users have applications that require a higher mapcoun? F.e. a > common construct is a shepherd task and N worker threads. If those > tasks each have their own address space and only communicate via > a shared data segment then one may want to set the limit higher than N > in order to allow the migration of the group of processes. This example would be unaffected by this patch, because the patch does = not affect explicitly shared memory. A process with the necessary = capabilities is still able to migrate all pages. Henry