Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp286442pxb; Wed, 3 Feb 2021 05:44:08 -0800 (PST) X-Google-Smtp-Source: ABdhPJxD89ByHblBQGKlrLTJR9MAaTmlMjdU3ov/oWMZfvMAxZk5lQ8/G5G0zk1IfIgJVNJ9SAzk X-Received: by 2002:a17:906:354b:: with SMTP id s11mr3290585eja.460.1612359848460; Wed, 03 Feb 2021 05:44:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612359848; cv=none; d=google.com; s=arc-20160816; b=oHt2OH9Rvz9BAYDfLR/7IXnzV/QXRjFpnqtWorjisbO6xPqpadM/2rz+gkQbeTV30a 4cR1Uk7IWetlg2I7kfOofx5wE6hXS3tv81FUm7Fm83QKIfBfmwihK0rI0Hq3dNC/gEFY 3DtT86SClNGmsmY7zIP04cgkRwrRmzrii0XfOAg6VqN/nRXt7Br0MWamiOJ2DCpM5Uy9 sjNDComThLvdr3ZMOQZ2H5yrApCGTJ8ahf4sUrWwTWM9WCXsAczaNQo3UXPN8qw98dOu g5Bns+Rhu7DtrWp+y9H1eO5Vl/1F3wJRMuMC7zNrljk4UlfcJqxTFG87ThoQckMJADa8 iARw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version; bh=zAG2ORrJOSGIc7ykt4Jaj4cXkoSE9ZmRI1sEKOlUH8E=; b=Ixxi2lzQATuFTN+plOBebTvvTImN0uW3HSkwumRXAhAGbkGcMRq0MgzUh883IwD6I7 edetKiv5vvAK6NY5yIS6d6xDLBx/mu0A3ST6RVkfW1n9UbRlA55UPMSD2GdPW6herFyp 5fK9LW+VFeHTDNdcIKhXinug3FQ4qZbbgo4InjdaezL9mq/itiuHjIgJa2m8eh1aLaif pC2mCj5mQwSBj64T1NVkSMrV26c/taXqJIDfygTNSiOe04vel3ooCrEyBVt/a9X/tUrw hd8Kb1OuguVOX8ZPInmaZvjEdlG51GrABYZSvIs8aOewhP1B6htHhsZI8/J6xhkeuHZ5 KM7g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id lu13si1187149ejb.351.2021.02.03.05.43.42; Wed, 03 Feb 2021 05:44:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232144AbhBCNl4 (ORCPT + 99 others); Wed, 3 Feb 2021 08:41:56 -0500 Received: from mail-oo1-f45.google.com ([209.85.161.45]:32872 "EHLO mail-oo1-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232138AbhBCNkz (ORCPT ); Wed, 3 Feb 2021 08:40:55 -0500 Received: by mail-oo1-f45.google.com with SMTP id u7so6020490ooq.0; Wed, 03 Feb 2021 05:40:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=zAG2ORrJOSGIc7ykt4Jaj4cXkoSE9ZmRI1sEKOlUH8E=; b=PhpseIQ7HbXBQPVzhJGTrmZsOPUGXkFxLjtqmdsonvWN1W4F0ODYkz2MrJ1fU4fpKL lvuyw6BVuvxSE493fTLJD8QUKdHrS5FG1tmlycFNMDzAd5/z03/iHRETertE7lpivV1H C7mahhx7N2iJ8KKFxE/+EHaZRLuzMHuMijgxpaX3Yj9FVx0a0tU28/NUqlGw7i6/zACM Mb+qFJAWn+WMam2YuFR4lJ6aN8oqIlbnAkd9WV6GpnSBshbILoa5h7dHjRRFNGLhE5tI KfpBoK4dqd5SKEGozATzV+7gHZllgouN0OYT6eTZmaZBd1dVBmejQxf7uEjt1tJKNldl rJsw== X-Gm-Message-State: AOAM532zfnnsP7CuFqvESHDNnBe7Vfvx9WIIAssoYXD+cHWPCTOcn9ux C0npWapnTiha6pct+bo155+jnqhJL3+bHfRHEK0= X-Received: by 2002:a4a:9873:: with SMTP id z48mr2107388ooi.44.1612359614153; Wed, 03 Feb 2021 05:40:14 -0800 (PST) MIME-Version: 1.0 References: <20210122204038.3238-1-ggherdovich@suse.cz> <20210122204038.3238-2-ggherdovich@suse.cz> <1612341564.3640.14.camel@suse.cz> In-Reply-To: <1612341564.3640.14.camel@suse.cz> From: "Rafael J. Wysocki" Date: Wed, 3 Feb 2021 14:40:01 +0100 Message-ID: Subject: Re: [PATCH v2 1/1] x86,sched: On AMD EPYC set freq_max = max_boost in schedutil invariant formula To: Giovanni Gherdovich Cc: "Rafael J. Wysocki" , Viresh Kumar , Borislav Petkov , Ingo Molnar , Peter Zijlstra , "Rafael J . Wysocki" , Jon Grimm , Nathan Fontenot , Yazen Ghannam , Thomas Lendacky , Suthikulpanit Suravee , Mel Gorman , Pu Wen , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Michael Larabel , "the arch/x86 maintainers" , Linux PM , Linux Kernel Mailing List , ACPI Devel Maling List Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 3, 2021 at 9:39 AM Giovanni Gherdovich wrote: > > Hello, > > both Rafael and Viresh make a similar remark: why adding a new "max_boost" > variable, since "max_freq" is already available and could be used instead. > > Replying here to both. > > On Tue, 2021-02-02 at 20:26 +0100, Rafael J. Wysocki wrote: > > On Tue, Feb 2, 2021 at 7:59 PM Rafael J. Wysocki wrote: > > > > > > On Fri, Jan 22, 2021 at 9:47 PM Giovanni Gherdovich wrote: > > > > > > [cut] > > > > @@ -779,15 +829,25 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy) > > > > freq_table[valid_states-1].frequency / 1000) > > > > continue; > > > > > > > > + freq = perf->states[i].core_frequency * 1000; > > > > freq_table[valid_states].driver_data = i; > > > > - freq_table[valid_states].frequency = > > > > - perf->states[i].core_frequency * 1000; > > > > + freq_table[valid_states].frequency = freq; > > > > + > > > > + if (freq > max_freq) > > > > + max_freq = freq; > > > > + > > > > valid_states++; > > > > } > > > > freq_table[valid_states].frequency = CPUFREQ_TABLE_END; > > > > policy->freq_table = freq_table; > > > > perf->state = 0; > > > > > > > > + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD && > > > > + amd_max_boost(max_freq, &max_boost)) { > > > > + policy->cpuinfo.max_boost = max_boost; > > > > > > Why not to set max_freq to max_boost instead? > > > > I mean, would setting the frequency in the last table entry to max_boost work? > > > > Alternatively, one more (artificial) entry with the frequency equal to > > max_boost could be added. > > On Wed, 2021-02-03 at 11:34 +0530, Viresh Kumar wrote: > > [cut] > > > > On 22-01-21, 21:40, Giovanni Gherdovich wrote: > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > > index 6931f0cdeb80..541f3db3f576 100644 > > > --- a/kernel/sched/cpufreq_schedutil.c > > > +++ b/kernel/sched/cpufreq_schedutil.c > > > @@ -159,8 +159,12 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, > > > unsigned long util, unsigned long max) > > > { > > > struct cpufreq_policy *policy = sg_policy->policy; > > > - unsigned int freq = arch_scale_freq_invariant() ? > > > - policy->cpuinfo.max_freq : policy->cur; > > > + unsigned int freq, max_freq; > > > + > > > + max_freq = cpufreq_driver_has_max_boost() ? > > > + policy->cpuinfo.max_boost : policy->cpuinfo.max_freq; > > > > Also, can't we update max_freq itself from the cpufreq driver? What > > troubles will it cost ? > > I could add the max boost frequency to the frequency table (and > policy->cpuinfo.max_freq would follow), yes, but that would trigger the > following warning from acpi-cpufreq.c: > > static void acpi_cpufreq_cpu_ready(struct cpufreq_policy *policy) > { > struct acpi_processor_performance *perf = per_cpu_ptr(acpi_perf_data, > policy->cpu); > > if (perf->states[0].core_frequency * 1000 != policy->cpuinfo.max_freq) > pr_warn(FW_WARN "P-state 0 is not max freq\n"); > } This check can be changed, though. > so I thought that to stay out of troubles I'd supply a different variable, > max_boost, to be used only in the schedutil formula. Which is not necessary and the extra static branch is not necessary. Moreover, there is no reason whatsoever to believe that EPYC is the only affected processor model. If I'm not mistaken, the regression will be visible on every CPU where the scale invariance algorithm uses the max frequency greater than the max frequency used acpi_cpufreq. Also, AFAICS, it should be sufficient to modify acpi_cpufreq to remedy this for all of the affected CPUs, not just EPYC. > After schedutil figures out a desired next_freq then the usual comparison with the > firmware-supplied frequency table could take place. > > Altering the frequency table seemed more invasive because once a freq value is > in there, it's going to be actually requested by the driver to the platform. This need not be the case if the control structure for the new entry is copied from an existing one. > I only want my max_boost to stretch the range of schedutil's next_freq. Right, but that can be done in a different way which would be cleaner too IMO. I'm going to send an alternative patch to fix this problem. > On Tue, 2021-02-02 at 19:59 +0100, Rafael J. Wysocki wrote: > > > > [cut] > > Also notice that the static branch is global and the max_boost value > > for different CPUs may be different, at least in theory. > > In theory yes, but I'm guarding the code with two conditions: > > * vendor is X86_VENDOR_AMD > * cppc_get_perf_caps() returns success > > this identifies AMD EPYC cpus with model 7xx2 and later, where max_boost is > the same on all cores. I may have added synchronization so that only one cpu > sets the value, but I didn't in the interest of simplicity for an -rc patch > (I'd have to consider hotplug, the maxcpus= command line param, ecc). And what about the other potentially affected processors? I wouldn't worry about the -rc time frame too much. If we can do a better fix now, let's do it.