Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965230Ab2EWP3Q (ORCPT ); Wed, 23 May 2012 11:29:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:30690 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933709Ab2EWP3M (ORCPT ); Wed, 23 May 2012 11:29:12 -0400 Date: Wed, 23 May 2012 18:16:05 +0300 From: "Michael S. Tsirkin" To: Andrew Theurer Cc: Liu ping fan , Shirley Ma , kvm@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, Avi Kivity , Srivatsa Vaddagiri , Rusty Russell , Anthony Liguori , Ryan Harper , Shirley Ma , Krishna Kumar , Tom Lendacky Subject: Re: [RFC:kvm] export host NUMA info to guest & make emulated device NUMA attr Message-ID: <20120523151604.GB30542@redhat.com> References: <1337246456-30909-1-git-send-email-kernelfans@gmail.com> <1337357675.12999.31.camel@oc3660625478.ibm.com> <4FBCF99F.4070409@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4FBCF99F.4070409@linux.vnet.ibm.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2904 Lines: 63 On Wed, May 23, 2012 at 09:52:15AM -0500, Andrew Theurer wrote: > On 05/22/2012 04:28 AM, Liu ping fan wrote: > >On Sat, May 19, 2012 at 12:14 AM, Shirley Ma wrote: > >>On Thu, 2012-05-17 at 17:20 +0800, Liu Ping Fan wrote: > >>>Currently, the guest can not know the NUMA info of the vcpu, which > >>>will > >>>result in performance drawback. > >>> > >>>This is the discovered and experiment by > >>> Shirley Ma > >>> Krishna Kumar > >>> Tom Lendacky > >>>Refer to - > >>>http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html > >>>we can see the big perfermance gap between NUMA aware and unaware. > >>> > >>>Enlightened by their discovery, I think, we can do more work -- that > >>>is to > >>>export NUMA info of host to guest. > >> > >>There three problems we've found: > >> > >>1. KVM doesn't support NUMA load balancer. Even there are no other > >>workloads in the system, and the number of vcpus on the guest is smaller > >>than the number of cpus per node, the vcpus could be scheduled on > >>different nodes. > >> > >>Someone is working on in-kernel solution. Andrew Theurer has a working > >>user-space NUMA aware VM balancer, it requires libvirt and cgroups > >>(which is default for RHEL6 systems). > >> > >Interesting, and I found that "sched/numa: Introduce > >sys_numa_{t,m}bind()" committed by Peter and Ingo may help. > >But I think from the guest view, it can not tell whether the two vcpus > >are on the same host node. For example, > >vcpu-a in node-A is not vcpu-b in node-B, the guest lb will be more > >expensive if it pull_task from vcpu-a and > >choose vcpu-b to push. And my idea is to export such info to guest, > >still working on it. > > The long term solution is to two-fold: > 1) Guests that are quite large (in that they cannot fit in a host > NUMA node) must have static mulit-node NUMA topology implemented by > Qemu. That is here today, but we do not do it automatically, which > is probably going to be a VM management responsibility. > 2) Host scheduler and NUMA code must be enhanced to get better > placement of Qemu memory and threads. For single-node vNUMA guests, > this is easy, put it all in one node. For mulit-node vNUMA guests, > the host must understand that some Qemu memory belongs with certain > vCPU threads (which make up one of the guests vNUMA nodes), and then > place that memory/threads in a specific host node (and continue for > other memory/threads for each Qemu vNUMA node). And for IO, we need multiqueue devices such that each node can have its own queue in its local memory. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/