Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp844303img; Mon, 18 Mar 2019 15:58:42 -0700 (PDT) X-Google-Smtp-Source: APXvYqxxV6X1VOjoOM9NcYuWLXs9dGuWhP9RTrUwbFRQuiuAFrtTWzwKk8EcIrcOWMSYV8l8cf1A X-Received: by 2002:a63:460a:: with SMTP id t10mr19442502pga.354.1552949922220; Mon, 18 Mar 2019 15:58:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552949922; cv=none; d=google.com; s=arc-20160816; b=XXLAB3WfdffK2Sul3lZ+r9x05yybkaoT/n7wfLFmVKsBXJDNWTrYNxQOZQiiyAuaos rrwHoX7fijdmXHOTiB9/od2sTBio0W45qmmbHLhJStAXUAKmwUSjLhiFeq2W9/VZrZeB L1thRb5N1zhBA+CrcrBwfMgZbkIqUlI2X+Nr031YvpIxTpJAcxxWuwQfbfwgQ9llRxrZ GQeqQdHbQZJ0kd2yDYHirFEVTcTn77q9bMg+vsc67x0dhXpVCy7Dk/pc8TzVsWpKAeEH CZUWcQiIRNxfy3li1yq6Jvp21wHAnrAhbGiF3NO3ha7eGepBN9VQWUgraiJReOO2qva5 EtHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version; bh=FeqP637Hz4xL9ZEIDmXfZ9/I4SB257smyiUbRjYKimI=; b=uuQb8GO1V6qv1Taw36WfaQMGmZg79fFudChl/5aCpBKl0/dzuCwACIOnRcTsqM5XgD cBx2PaiudHZVytbHumLA83Ue2NX3+Qcn8WQn+TDVmxBDOF9iYFn7emFqkX2XtbLQLkn8 tJ9xiu/vUVANzio9ZBNc/LiMkSKJW78tOBt5+1C9qnhWyDEr7ADR9U62wnwn2Uv9O5lz DzBp0nudPb2Fqyc78cL6tvSI5sAq34DiLQFxWOz/n9x1RIADoyXmqemfNoSGfpj2FD7p ZtVgPytT3D3l8UvQ9k9E6zznHVdK/5/wFWs2OYhLmfnW/8/UL+wlN5CHQhqM4pX5HQTo jrbw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d33si10612655pla.315.2019.03.18.15.58.25; Mon, 18 Mar 2019 15:58:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727123AbfCRW5V (ORCPT + 99 others); Mon, 18 Mar 2019 18:57:21 -0400 Received: from mail-oi1-f196.google.com ([209.85.167.196]:38284 "EHLO mail-oi1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726809AbfCRW5V (ORCPT ); Mon, 18 Mar 2019 18:57:21 -0400 Received: by mail-oi1-f196.google.com with SMTP id w137so2487293oiw.5; Mon, 18 Mar 2019 15:57:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FeqP637Hz4xL9ZEIDmXfZ9/I4SB257smyiUbRjYKimI=; b=DuyNrWari3kEFjVzG2rGNR4myA3DFsscDzyKHQYYI8vOEfZWvunzsCNQn9UisqS5pV r7s2uZbDTsPCBbjtlokp5IUdMBL0+aAHHh5R7Y9KftA0JjJq9oIwEFau5mKKqJzDkiKT xBwq2HKY9XdpZlHZRq/CWKQYpa4vI5+667wA5gc3pDEO12tNW6pEHxNgrD42XihjTS+m 0RFkkH618HNQOi314USluBj20qPuBgNZl2GVRKfB9j36pE/xM9dIfk6o0Na0paV8QZkc dn+xRIWxWe+Ip3MxcjcdRkfYgsKFRAv71mCcpeJCsvzS6gvT66UczGNC5KT7t47BYmLK bj/g== X-Gm-Message-State: APjAAAXqIalL+zHd61+0s9GA8F1oyZvd2zR4gKtT+4TznpQv5wUToVHa W2dFqk3Q+P/PiH4MUBfG560mYqbOZ0qrpA1mRFw= X-Received: by 2002:aca:355:: with SMTP id 82mr18990oid.30.1552949839921; Mon, 18 Mar 2019 15:57:19 -0700 (PDT) MIME-Version: 1.0 References: <6369897.qxlu8PgE1t@house> <2616815.TtRSCVe66q@house> <3100343.nkNWlhakcz@house> In-Reply-To: <3100343.nkNWlhakcz@house> From: "Rafael J. Wysocki" Date: Mon, 18 Mar 2019 23:57:08 +0100 Message-ID: Subject: Re: [PATCH] [RESEND] Do not modify perf bias performance setting by default at boot To: Thomas Renninger Cc: "Rafael J. Wysocki" , Len Brown , Hannes Reinecke , Linux PM , LKML , Borislav Petkov , Simon Schricker , Srinivas Pandruvada , Len Brown Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 18, 2019 at 2:22 PM Thomas Renninger wrote: > > On Monday, March 18, 2019 12:40:46 PM CET Rafael J. Wysocki wrote: > > On Mon, Mar 18, 2019 at 12:15 PM Thomas Renninger wrote: > > ... > > > And who's BIOS, really? I guess you mean the OEM? Note, however, > > that the user and the OEM may not agree on that, but whatever. > > I mean both! > The OEM. > And the user who might choose a "performance" BIOS setting. > > > > > Yes, the kernel replaces whatever the original BIOS setting is with > > > > its own one. > > > > > > No, it only replaces the "performance" (0) value with "normal" (6). > > > This does not makes sense and is broken. > > Both know it better than this... > > > OK, fair enough. > > > > I guess it would have been better to set it to 6 unconditionally. > > No, I guess keeping the original value is the only sane thing to do. > Sorry, Rafael, I have to insist on this. Which doesn't make your argument more convincing at all. > There might be secrets Len and you cannot share with public, but you > should at least explain this privately then in a way that the current > code could somehow make sense. I absolutely cannot see that. Anyway, I just wanted to say that using the same initial value in all cases would have been more consistent than what the current code does. > > What about the systems that will misbehave when it is left at 0? > > Good question. > What exactly happens there? The CPU is faster in general? > And may consume a tiny bit more of power. > I had reports that Turbo Mode is not entered on some server CPUs when > perf BIAS is switched back to normal mode. > It is hard to identify what Intel implemented in microcode, but it's not > that much altogether. It would be nice if Intel would be a bit more verbose > about this and show some performance vs powersave figures. Can you please stop insinuating things? That really doesn't help or make the discussion clearer. > > > > It may not match every setup perfectly, but at least it > > > > is consistent. Why exactly is it worse than whatever the BIOS has > > > > set? > > > > > > Because there may be BIOS settings for the CPU which justify > > > initialization > > > of the Perf BIAS value by BIOS. > > > > Well, the EPB is there for users to set it via the OS. The BIOS > > setting is not guaranteed to work for all users anyway. > > Who says that? I do. If users have different needs, the single BIOS-provided value may not work for all of them just as well as the value set by the kernel doesn't. IOW, it really doesn't matter who sets the initial value, the BIOS or the kernel. In both cases it may not work for everybody. > Is this documented? > > I would say it is exactly the other way around. Energy Perf BIAS hint is > a MSR which must be changed in Ring 0 kernel environment. > > x86_energy_perf_policy is a nice tool to play with and to try out what > this value is for. But for example in secure boot mode userspace must not > modify this HW setting in any way. > So it would not be possible for a secure booted server to switch this > setting back to performance mode. I suspect that this was the real motivation for the reverts. :-) Well, if x86_energy_perf_policy doesn't work in hardened environments, then this is a problem, but making all systems use BIOS-provided values is not the most straightforward way to address it in my view. > > > What sense does it make to unconditionally set perf BIAS value from > > > performance to balanced? > > > Why is this done? > > > > Basically, for HW health reasons AFAICS. > > > > Apparently, on some systems EPB=0 is (was?) special and means (meant?) > > very aggressive use of turbo etc. which is not healthy in general. > > Hmm, maybe you want to explain this privately. > Performance is a valid setting. > x86_energy_perf_policy tool has a "performance" string for this and I expect it is very used. > I would switch to it if I am constantly connected to AC. > man x86_energy_perf_policy > also does not tell the user anything about "be careful with performance setting" > > And until today every CPU online/offline cycle or machine resume cycle switches > the value back to performance (if kernel tried to switch to 6). > > ... > > > > There may be BIOSes initialzing it via BIOS options. And this is a very > > > valid thing to do. > > > > Yes, and there may be BIOSes leaving it at 0 with the assumption that > > the OS will adjust it. The kernel cannot know which is the case. > > Correct. The kernel cannot know. > > You know this better than anyone else: > - A specific Microsoft version is doing it wrong. > - We adopt. > - Microsoft is doing it correctly with the next version > - We are busted > > I am ok with leaving the message as a hint, I would still check > ACPI prefered profile variable that we are not running on a server. > On these systems the message would be wrong rather sure and performance is intended. So yes, ACPI profiles could be used in there as they also are used elsewhere in a similar context. > > ... > > > > No. You must not ignore BIOS settings. Even worse, you must not override > > > these without any sane reason. > > > > While there are BIOS settings that better should not be overridden, > > the EPB is not one of them. > > Again: Why or where is this documented? In the SDM if you will, under "Performance and Energy Bias Hint support": "Software can use whatever criteria it sees fit to program the MSR with an appropriate value." If that doesn't mean "Yes, the OS can override the BIOS setting", then I'm not sure what would. > > > Your assumption above might be right. But we want to do it better, right? > > > > > > ... > > > > > > > The system-wide resume part will still not be working properly after > > > > the reverts. > > > > > > But it must never blindly (unconditionally) be set to specific value. > > > Correct? > > > > Yes. I've already said that. > > How about below solution then for the initial boot up sequence? The MSR would never be written to with that change, is that intentional? > > > You mean the kernel should store the pre-hibernation perf BIAS value > > > in NVRAM and write it back when waking up again? > > > > Or in the image and yes, it should write it back. > > Can you point me to a similar case where a variable is stored through > hibernation? > If it's not too much I can try to come up with an (compiled but untested) patch. > > > > This would make sense. > > > > > > It would also mean perf BIAS never really worked, at least did not survive > > > suspend. > > > > Right. > > > > > On servers (no hibernation) it would works but is overridden > > > to a value you typically do not want to have on a server... > > > So the current situation is rather broken in the kernel. > > > > Well, you can say so, but fixing it really means something more than > > reverting the commits that your patch is reverting and *that* is my > > point. > > > > Yes, I think that this needs to be fixed. > > > > No, I don't think that the reverts you are proposing are the way to go here. > > Why? > It is the first step to go. Well, I'll just go and fix the real problems here I guess, that will be way more productive than just talking about them ... And then we can get back to the initial setting discussion.