Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751374AbeAPV0U (ORCPT + 1 other); Tue, 16 Jan 2018 16:26:20 -0500 Received: from mx2.suse.de ([195.135.220.15]:40731 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750885AbeAPV0S (ORCPT ); Tue, 16 Jan 2018 16:26:18 -0500 Date: Tue, 16 Jan 2018 21:26:14 +0000 From: Mel Gorman To: Henry Willard Cc: akpm@linux-foundation.org, kstewart@linuxfoundation.org, zi.yan@cs.rutgers.edu, pombredanne@nexb.com, aarcange@redhat.com, gregkh@linuxfoundation.org, aneesh.kumar@linux.vnet.ibm.com, kirill.shutemov@linux.intel.com, jglisse@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: numa: Do not trap faults on shared data section pages. Message-ID: <20180116212614.gudglzw7kwzd3get@suse.de> References: <1516130924-3545-1-git-send-email-henry.willard@oracle.com> <1516130924-3545-2-git-send-email-henry.willard@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1516130924-3545-2-git-send-email-henry.willard@oracle.com> User-Agent: NeoMutt/20170912 (1.9.0) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 16, 2018 at 11:28:44AM -0800, Henry Willard wrote: > Workloads consisting of a large number processes running the same program > with a large shared data section may suffer from excessive numa balancing > page migration of the pages in the shared data section. This shows up as > high I/O wait time and degraded performance on machines with higher socket > or node counts. > > This patch skips shared copy-on-write pages in change_pte_range() for the > numa balancing case. > > Signed-off-by: Henry Willard > Reviewed-by: H?kon Bugge > Reviewed-by: Steve Sistare steven.sistare@oracle.com Merge the leader and this mail together. It would have been nice to see data on other realistic workloads as well. My main source of discomfort is the fact that this is permanent as two processes perfectly isolated but with a suitably shared COW mapping will never migrate the data. A potential improvement to get the reported bandwidth up in the test program would be to skip the rest of the VMA if page_mapcount != 1 in a COW mapping as it would be reasonable to assume the remaining pages in the VMA are also affected and the scan is wasteful. There are counter-examples to this but I suspect that the full VMA being shared is the common case. Whether you do that or not; Acked-by: Mel Gorman -- Mel Gorman SUSE Labs