Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp3348272yba; Tue, 23 Apr 2019 02:07:22 -0700 (PDT) X-Google-Smtp-Source: APXvYqwVdvI5q1owxoitsS4zVTYHRHJZSAiXORLLnNWyFp+GcnB4I3H7Xl80YPT7VVOATLXNIfIL X-Received: by 2002:a63:c706:: with SMTP id n6mr23557485pgg.310.1556010442129; Tue, 23 Apr 2019 02:07:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556010442; cv=none; d=google.com; s=arc-20160816; b=YDun87zwOSPUq0Rb2E2judC38i0+4WN0lq8Tmm5LHtuITT8cOTLMALuQdOoaiAiwYY 10pTmFTHfWYkIR7OFh5o6sM6M/UyrV79he2LmXwQzaoCFHINhMkHxrmshNNcjEYtkO6v ucNDonhu102JiTvxpcVoWQDMpdL/zO2vSBxUSlcaWvafrYRNuio+0rr+y8Wxy4c7siAR rYD1FbUo+vfIW4vggJpWPknKs1LKjO734eZfExRn+GpueyaMrzQb6GXsz5i1FDulZ4i3 l6425qXdj5FMXKYBjVpxgdxwO1qXkoN5E6GbgBqcPxD8HYnHa3UU3Kpb2zO06E1RWo+D Wg9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=6VqXzTqzV4erjPSVPtVl2zQZ4ymXP6yRhCiCP3xmNmw=; b=MHF4KombSJjgLR3jGqbnNFG7PqfI3jeKlh4PFpAgA6y3kPV9z7VAvKQtQYi3nGkN2W 2jeUkh1ocOW2dLQzzQ6UxD73DJEtttiRtXOw2fyAg+SwaIntjMPMYFimFVQs/XgGx83Z aQxJ+vrix/x7WYvabSd7x7Q0vkD8gNLi0gz1XzyrhMaLCFR/nL6sBVoL5sKT8BvBNThw Hr8FzjTiWYb35GZSffbglFvLpsK/afogXVYruN/x6+5DP9GrvQZa1JyoTNjmRv14OjU6 EBXD4TPpYPX0HUb5jIRNPWeUNz6vAWZ7jO5jANhvTE/PaSJFEVv+/ps15dW5KOse9osv 67XA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=JzQhZaX9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l34si14750830pgb.574.2019.04.23.02.07.06; Tue, 23 Apr 2019 02:07:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=JzQhZaX9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727314AbfDWJFt (ORCPT + 99 others); Tue, 23 Apr 2019 05:05:49 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:51600 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727031AbfDWJFK (ORCPT ); Tue, 23 Apr 2019 05:05:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Transfer-Encoding :Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=6VqXzTqzV4erjPSVPtVl2zQZ4ymXP6yRhCiCP3xmNmw=; b=JzQhZaX9wCR1O+q7wBnsYM6n0U V+FN/qJvVRDX/PdLVZhm0qsNatmpcnHcFuJ0sI9VMTSmVZiTDfFnQ/IalvfBTnz7WzD3W0/B3tTsS QpXw6XhiPzEw+nCaurIJhIoBvm2yg4Yptl8hMg0tWToYZQOOKLZ7EkwMnq7ZVqIo8wt3XxN4Ajh4f NjWysATDqv1dQ6ZV6VfP7HRwy4cr3txcBpY2O08fb3MncEICV/9yvPwSa6BYci7Zil6Uaw/tvBexY obgQhKwzPrTCW5m6f6dB4BvPaGA//4enbwbat3m+hpZZbnEFFQWVuUFyqoVwSLZpBD/HUvi6EIy/R B8+U44fw==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1hIrMR-0006tp-7X; Tue, 23 Apr 2019 09:05:07 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 90E0329B47DCD; Tue, 23 Apr 2019 11:05:05 +0200 (CEST) Date: Tue, 23 Apr 2019 11:05:05 +0200 From: Peter Zijlstra To: =?utf-8?B?546L6LSH?= Cc: hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, Ingo Molnar , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC PATCH 5/5] numa: numa balancer Message-ID: <20190423090505.GG11158@hirez.programming.kicks-ass.net> References: <209d247e-c1b2-3235-2722-dd7c1f896483@linux.alibaba.com> <85bcd381-ef27-ddda-6069-1f1d80cf296a@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <85bcd381-ef27-ddda-6069-1f1d80cf296a@linux.alibaba.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 22, 2019 at 10:21:17AM +0800, 王贇 wrote: > numa balancer is a module which will try to automatically adjust numa > balancing stuff to gain numa bonus as much as possible. > > For each memory cgroup, we process the work in two steps: > > On stage 1 we check cgroup's exectime and memory topology to see > if there could be a candidate for settled down, if we got one then > move onto stage 2. > > On stage 2 we try to settle down as much as possible by prefer the > candidate node, if the node no longer suitable or locality keep > downturn, we reset things and new round begin. > > Decision made with find_candidate_nid(), should_prefer() and keep_prefer(), > which try to pick a candidate node, see if allowed to prefer it and if > keep doing the prefer. > > Tested on the box with 96 cpus with sysbench-mysql-oltp_read_write > testing, 4 mysqld instances created and attached to 4 cgroups, 4 > sysbench instances then created and attached to corresponding cgroup > to test the mysql with oltp_read_write script, average eps show: > > origin balancer > 4 instances each 12 threads 5241.08 5375.59 +2.50% > 4 instances each 24 threads 7497.29 7820.73 +4.13% > 4 instances each 36 threads 8985.44 9317.04 +3.55% > 4 instances each 48 threads 9716.50 9982.60 +2.66% > > Other benchmark liks dbench, pgbench, perf bench numa also tested, and > with different parameters and number of instances/threads, most of > the cases show bonus, some show acceptable regression, and some got no > changes. > > TODO: > * improve the logical to address the regression cases > * Find a way, maybe, to handle the page cache left on remote > * find more scenery which could gain benefit > > Signed-off-by: Michael Wang > --- > drivers/Makefile | 1 + > drivers/numa/Makefile | 1 + > drivers/numa/numa_balancer.c | 715 +++++++++++++++++++++++++++++++++++++++++++ So I really think this is the wrong direction. Why introduce yet another balancer thingy and not extend the existing numa balancer with the additional information you got from the previous patches? Also, this really should not be a module and not in drivers/