Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp1570173ybz; Thu, 23 Apr 2020 01:08:14 -0700 (PDT) X-Google-Smtp-Source: APiQypLArzaZVsCtdwIwkWGJlVo5iEh6IfzSmDQ5sSthwRy53W4LEBwx05Epge0L0FGS+64ASKNr X-Received: by 2002:a50:d7d3:: with SMTP id m19mr1763662edj.285.1587629294682; Thu, 23 Apr 2020 01:08:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587629294; cv=none; d=google.com; s=arc-20160816; b=FgA9qxWZhvgkSKm1GlC6QIgH7zNDQTC67jaF4mnWHNnWOcjKTA20HjFnswywq0SXrq uk1yKO+IcBhCsOeIvPQGKa0g64ArkfcUc+qVjHNPtPIzWxarKtWkOhEkpff4N7vIclHv +PTp+7Y0T8dQF0RmLGQqHrAWLhRqp4+0DR6fyxdho0N/irbvGvoLuRRkKwLxt1/gfd0R yyYxQYuNixnnzOb7GD4DxUq4gKooft3QEp+vLEh6PIMlMfwAN0g4w3OBeGKTwccTqBhs +oK8yHRkEC4ifCnP+XWR42OanZyL9jnP1Dm0d+qAEFhq4xm4aAZ5WpFVleNRnIW52zs2 jwGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id; bh=U2C1ehLMMjJSmnWUIcBKhjXU+NZQ4HkacLBhbuzztSE=; b=renqRMGccCMlVp6E3zE5RmjmJCkr6Hfgm1hvx90K/Ud2lNZ8/pS4Ia4tbeJf7dTLS8 82wIaJWLI5zOGWIzBYo6LQT6MEResbD/JBkniYNGy7pAoP6vE7k8+Bvr58dq4jqreNi/ hyoeXa1TBCZ/l/xYLZHU7yR2DiMT/py4cEFKjA/4iAwNmd7JWeNmSIxFxfz9nSwxx12a K+THt+ilbc6PAzNv4fUbLMbj+8w3qXua6kZFuM14OsPMr5icDTSEcQvMzAVtszEmvwaD ZpSIyKIWEEVKxCgEgOubGaTAfd6LM/xj2wH2skttIS0c67YSXhG3wIS5eJ2hSkp2XBwi 56gA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y71si780336ede.301.2020.04.23.01.07.51; Thu, 23 Apr 2020 01:08:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726398AbgDWIGI (ORCPT + 99 others); Thu, 23 Apr 2020 04:06:08 -0400 Received: from mx2.suse.de ([195.135.220.15]:48388 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725854AbgDWIGI (ORCPT ); Thu, 23 Apr 2020 04:06:08 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 56E0FAA4F; Thu, 23 Apr 2020 08:06:05 +0000 (UTC) Message-ID: <1587629164.28094.11.camel@suse.cz> Subject: Re: [PATCH 1/4] x86, sched: Bail out of frequency invariance if base frequency is unknown From: Giovanni Gherdovich To: Ricardo Neri Cc: Srinivas Pandruvada , Thomas Gleixner , Ingo Molnar , Peter Zijlstra , Borislav Petkov , Len Brown , "Rafael J . Wysocki" , x86@kernel.org, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Mel Gorman , Doug Smythies , Like Xu , Neil Rickert , Chris Wilson Date: Thu, 23 Apr 2020 10:06:04 +0200 In-Reply-To: <20200422171547.GA11942@ranerica-svr.sc.intel.com> References: <20200416054745.740-1-ggherdovich@suse.cz> <20200416054745.740-2-ggherdovich@suse.cz> <20200422171547.GA11942@ranerica-svr.sc.intel.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.26.6 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2020-04-22 at 10:15 -0700, Ricardo Neri wrote: > On Thu, Apr 16, 2020 at 07:47:42AM +0200, Giovanni Gherdovich wrote: > > Some hypervisors such as VMWare ESXi 5.5 advertise support for > > X86_FEATURE_APERFMPERF but then fill all MSR's with zeroes. In particular, > > MSR_PLATFORM_INFO set to zero tricks the code that wants to know the base > > clock frequency of the CPU (highest non-turbo frequency), producing a > > division by zero when computing the ratio turbo_freq/base_freq necessary > > for frequency invariant accounting. > > > > It is to be noted that even if MSR_PLATFORM_INFO contained the appropriate > > data, APERF and MPERF are constantly zero on ESXi 5.5, thus freq-invariance > > couldn't be done in principle (not that it would make a lot of sense in a > > VM anyway). The real problem is advertising X86_FEATURE_APERFMPERF. This > > appears to be fixed in more recent versions: ESXi 6.7 doesn't advertise > > that feature. > > > > Signed-off-by: Giovanni Gherdovich > > Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance") > > --- > > arch/x86/kernel/smpboot.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > > index fe3ab9632f3b..3a318ec9bc17 100644 > > --- a/arch/x86/kernel/smpboot.c > > +++ b/arch/x86/kernel/smpboot.c > > @@ -1985,6 +1985,15 @@ static bool intel_set_max_freq_ratio(void) > > return false; > > > > out: > > + /* > > + * Some hypervisors advertise X86_FEATURE_APERFMPERF > > + * but then fill all MSR's with zeroes. > > + */ > > + if (!base_freq) { > > + pr_debug("Couldn't determine cpu base frequency, necessary for scale-invariant accounting.\n"); > > + return false; > > + } > > It may be possible that MSR_TURBO_RATIO_LIMIT is also all-zeros. In > such case, turbo_freq will be also zero. If that is the case, > arch_max_freq_ratio will be zero and we will see a division by zero > exception in arch_scale_freq_tick() because mcnt is multiplied by > arch_max_freq_ratio(). Thanks Ricardo for clarifying this. Follow-up question: when I see an all-zeros MSR_TURBO_RATIO_LIMIT, can I assume the CPU doesn't support turbo boost? Or is it possible that such a CPU has turbo boost, just the turbo ratios aren't declared in the MSR? Some context: this feature (called "frequency invariance") wants to know what's the max clock freq a CPU can have at any time (it needs it for some scheduler calculations). This is hard to know precisely, because turbo can kick in at any time and depends on many factors. So it settles for an "average maximum frequency", which I decided the 4 cores turbo is a good estimate for. Now, if an all-zeros MSR_TURBO_RATIO_LIMIT means "turbo boost unsupported", this is actually the easy case because then I know exactly what the max freq is (base frequency). If, on the other hand, an all-zeros MSR means "there may or may not be turbo, and you don't know how much" then I must disable frequency invariance. Thanks, Giovanni