Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752306AbdCPNvW (ORCPT ); Thu, 16 Mar 2017 09:51:22 -0400 Received: from mga05.intel.com ([192.55.52.43]:1736 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751899AbdCPNvU (ORCPT ); Thu, 16 Mar 2017 09:51:20 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.36,172,1486454400"; d="scan'208";a="77776423" Date: Thu, 16 Mar 2017 21:51:22 +0800 From: Aaron Lu To: Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dave Hansen , Tim Chen , Andrew Morton , Ying Huang Subject: Re: [PATCH v2 0/5] mm: support parallel free of memory Message-ID: <20170316135122.GF13054@aaronlu.sh.intel.com> References: <1489568404-7817-1-git-send-email-aaron.lu@intel.com> <20170315141813.GB32626@dhcp22.suse.cz> <20170315154406.GF2442@aaronlu.sh.intel.com> <20170315162843.GA27197@dhcp22.suse.cz> <20170316073403.GE1661@aaronlu.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170316073403.GE1661@aaronlu.sh.intel.com> User-Agent: Mutt/1.8.0 (2017-02-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1381 Lines: 37 On Thu, Mar 16, 2017 at 03:34:03PM +0800, Aaron Lu wrote: > On Wed, Mar 15, 2017 at 05:28:43PM +0100, Michal Hocko wrote: > ... ... > > After all the amount of the work to be done is the same we just risk > > more lock contentions, unexpected CPU usage etc. > > I start to realize this is a good question. > > I guess max_active=4 produced almost the best result(max_active=8 is > only slightly better) is due to the test box is a 4 node machine and > therefore, there are 4 zone->lock to contend(let's ignore those tiny > zones only available in node 0). > > I'm going to test on a EP to see if max_active=2 will suffice to produce > a good enough result. If so, the proper default number should be the > number of nodes. Here are test results on 2 nodes EP with 128GiB memory, test size 100GiB. max_active time vanilla 2.971s ?3.8% 2 1.699s ?13.7% 4 1.616s ?3.1% 8 1.642s ?0.9% So 4 gives best result but 2 is probably good enough. If the size each worker deals with is changed from 1G to 2G: max_active time 2 1.605s ?1.7% 4 1.639s ?1.2% 8 1.626s ?1.8% Considering that we are mostly improving for memory intensive apps, the default setting should probably be: max_active = node_number with each worker freeing 2G memory.