Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1065346ybl; Wed, 8 Jan 2020 10:23:19 -0800 (PST) X-Google-Smtp-Source: APXvYqyXBYL7/BPQnBgfYs937DXqa5qMcUMNK+8eJrf6vquxRFJoMYe7LIIq2GOsP2w/k262SfiB X-Received: by 2002:a9d:4e92:: with SMTP id v18mr4862997otk.47.1578507799843; Wed, 08 Jan 2020 10:23:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578507799; cv=none; d=google.com; s=arc-20160816; b=0jZ1yILLhKesNtCd+fusDmlJGVFI4W8uBFFdyw53oQLsxL2WOdo7AnlUp4bgz/CEmP jTyJcNo8Qni3Sd9E/wTPBE/O98P4t4QtiseDt/xWJULnyQ5aY7eyvzacQASCOAXJf3On TW4cOUf/YR1MZ3DGorlAFCd7kh/Qrp7rjbS6Z98IBCPRPmaJ5ajW+XPqlXmvaXnQhcC1 ZQ7X3i77eUOkfjsxyZR+HXHqVGMPVzbywBaJ5TB3ibdbBu8jQ8VKpAXcd5enBKMbYJ37 /jBNyMy/pCQPTNd+GO1yH6o3lDgAINqtRor6iAs0Rm3VOS8NqL5f4Y/l1uWtCiIh9zc+ 8e4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=ZclNq8Gr7pU/JCTXFFTTfRWQ9njV8jVe1K8Bz3Pzsz0=; b=aaEY/IA1Qbc5fvalYYz7CKPZMMa5oUQAnooC2kMfF9sZacnulILzjwB16nGbAV37kO ooxe+bLq8q5mBwew4v2yss9TPMOGa7EYyPfkeZaWaWJVTQVKagPtrchb6/CdmcKPO1gi I/aYTdniyyx73qSOqjg20empIvCY+eAfW7Ln7HXbWMU7hLV6JOnyfiuZZBO7mivyPIPC i2oDTcbFSkHbAplZz9Z3vdSWNgviruoBxkrtzDRED0otKk248jhviqP+KfPXCKU54+7k ZYnXwUyG8SFpmISjxUjU1224RbIkxe5Al+Mrx+nf1vrUiX65Z/A4NyQmmhrhad/k6u81 lDSA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=gzhPPRxb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d20si2176497oic.40.2020.01.08.10.23.07; Wed, 08 Jan 2020 10:23:19 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=gzhPPRxb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728422AbgAHPuv (ORCPT + 99 others); Wed, 8 Jan 2020 10:50:51 -0500 Received: from bombadil.infradead.org ([198.137.202.133]:43552 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726186AbgAHPuv (ORCPT ); Wed, 8 Jan 2020 10:50:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=ZclNq8Gr7pU/JCTXFFTTfRWQ9njV8jVe1K8Bz3Pzsz0=; b=gzhPPRxbWxKaI0bZ0WbwrbmXR xlSTgzecSP6dVTnbBkKg9qkpoHergsjs7+VQhOuVJA5yY18MfdusUj+sC1u2dmzd3fG0V4ncwLsb8 yLJnCGRrWQ9v5J3QN2xm8R22Kh6EW/NR93PLH/hFGYY2+0PZeb7T1R8Q5QTjZd94Tje8wUD/f+d7l ongVbMqAms69fs7UDvXoOHKwVx5qtVYtSE0LsTrLZiF7/V8/wEMAiLMk+mA8ql4KJJoKWHf0D+kd0 iWreXHPDj0vq8sBh5hYqpY7/ti2FbxBUbBYAspjZmI8w23DD6uxivJ8guinjSEH0/HvTTYnU9Av7C dAcyzKaWg==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1ipDbX-0002mT-5L; Wed, 08 Jan 2020 15:50:43 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 720FD30025A; Wed, 8 Jan 2020 16:49:07 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 754D62B6157AC; Wed, 8 Jan 2020 16:50:40 +0100 (CET) Date: Wed, 8 Jan 2020 16:50:40 +0100 From: Peter Zijlstra To: Wanpeng Li Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Paolo Bonzini , Thomas Gleixner , Marcelo Tosatti , Konrad Rzeszutek Wilk , KarimAllah , Vincent Guittot , Ingo Molnar , Ankur Arora Subject: Re: [PATCH RFC] sched/fair: Penalty the cfs task which executes mwait/hlt Message-ID: <20200108155040.GB2827@hirez.programming.kicks-ass.net> References: <1578448201-28218-1-git-send-email-wanpengli@tencent.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1578448201-28218-1-git-send-email-wanpengli@tencent.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 08, 2020 at 09:50:01AM +0800, Wanpeng Li wrote: > From: Wanpeng Li > > To deliver all of the resources of a server to instances in cloud, there are no > housekeeping cpus reserved. libvirtd, qemu main loop, kthreads, and other agent/tools > etc which can't be offloaded to other hardware like smart nic, these stuff will > contend with vCPUs even if MWAIT/HLT instructions executed in the guest. > > The is no trap and yield the pCPU after we expose mwait/hlt to the guest [1][2], > the top command on host still observe 100% cpu utilization since qemu process is > running even though guest who has the power management capability executes mwait. > Actually we can observe the physical cpu has already enter deeper cstate by > powertop on host. > > For virtualization, there is a HLT activity state in CPU VMCS field which indicates > the logical processor is inactive because it executed the HLT instruction, but > SDM 24.4.2 mentioned that execution of the MWAIT instruction may put a logical > processor into an inactive state, however, this VMCS field never reflects this > state. So far I think I can follow, however it does not explain who consumes this VMCS state if it is set and how that helps. Also, this: > This patch avoids fine granularity intercept and reschedule vCPU if MWAIT/HLT > instructions executed, because it can worse the message-passing workloads which > will switch between idle and running frequently in the guest. Lets penalty the > vCPU which is long idle through tick-based sampling and preemption. is just complete gibberish. And I have no idea what problem you're trying to solve how. Also, I don't think the TSC/MPERF ratio is architected, we can't assume this is true for everything that has APERFMPERF. /me tries to reconstruct intent from patch So what you're doing is, mark the CPU 'idle' when the MPERF/TSC ratio < 1%, and then frob the vruntime such that it will hopefully preempt. That's pretty disgusting. Please, write a coherent problem statement and justify the magic choices. This is unreviewable.