The Intel ASDM provides a maximum time window that can be specified when
setting a time window in the RAPL driver. While the ASDM doesn't explicitly
provide a minimum time window value, it does provide a minimum time window
unit that also can be used as a minimum value.
This patchset implements barrier checking for the time windows, and adds
reporting of a known bug in which the maxmimum time window value may be
erroneously set to 0, as well as a module parameter to avoid the maximum
window checks on broken BIOSes.
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Radivoje Jovanovic <[email protected]>
Cc: Seiichi Ikarashi <[email protected]>
Cc: Mathias Krause <[email protected]>
Cc: Ajay Thomas <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>
Prarit Bhargava (3):
powercap, intel_rapl, implement get_max_time_window
powercap, intel_rapl, implement check for minimum time window
powercap, intel_rapl, Add ignore_max_window_check module parameter
for broken BIOSes
drivers/powercap/intel_rapl.c | 51 +++++++++++++++++++++++++++++++++++++++
drivers/powercap/powercap_sys.c | 6 +++--
2 files changed, 55 insertions(+), 2 deletions(-)
--
1.7.9.3
The MSR_PKG_POWER_INFO register (Intel ASDM, section 14.9.3
"Package RAPL Domain") provides a maximum time window which the
system can support. This window is read-only and is currently
not examined when setting the time windows for the package.
This patch implements get_max_time_window_us() and checks the window when
a user attempts to set power capping for the package.
Before the patch it was possible to set the window to, for example, 10000
micro seconds:
[root@intel-chiefriver-03 rhel7]# echo 10000 >
/sys/devices/virtual/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us;
egrep ^ /sys/devices/virtual/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_time_window_us:1:9765
but from 'turbostat -d', the package is limited to 976us:
cpu0: MSR_PKG_POWER_INFO: 0x01200168 (45 W TDP, RAPL 36 - 0 W, 0.000977 sec.)
(Note, there appears to be a rounding issue in turbostat which needs to
also be fixed. Looking at the values in the register it is clear the
value is 1/1024 = 976us.)
After the patch we are limited by the maximum time window:
[root@intel-chiefriver-03 rhel7]# echo 10000 >
/sys/devices/virtual/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us;
egrep ^ /sys/devices/virtual/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us
-bash: echo: write error: Invalid argument
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_time_window_us:1:976
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Radivoje Jovanovic <[email protected]>
Cc: Seiichi Ikarashi <[email protected]>
Cc: Mathias Krause <[email protected]>
Cc: Ajay Thomas <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>
---
drivers/powercap/intel_rapl.c | 31 +++++++++++++++++++++++++++++++
drivers/powercap/powercap_sys.c | 6 ++++--
2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
index cc97f08..f765b2c 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -493,13 +493,42 @@ static int get_current_power_limit(struct powercap_zone *power_zone, int id,
return ret;
}
+static int get_max_time_window(struct powercap_zone *power_zone, int id,
+ u64 *data)
+{
+ struct rapl_domain *rd;
+ int ret = 0;
+ u64 val;
+
+ get_online_cpus();
+ rd = power_zone_to_rapl_domain(power_zone);
+
+ if (rapl_read_data_raw(rd, MAX_TIME_WINDOW, true, &val))
+ ret = -EIO;
+ else
+ *data = val;
+
+ put_online_cpus();
+ return ret;
+}
+
static int set_time_window(struct powercap_zone *power_zone, int id,
u64 window)
{
struct rapl_domain *rd;
int ret = 0;
+ u64 max_window;
get_online_cpus();
+ ret = get_max_time_window(power_zone, id, &max_window);
+ if (ret < 0)
+ goto out;
+
+ if (window > max_window) {
+ ret = -EINVAL;
+ goto out;
+ }
+
rd = power_zone_to_rapl_domain(power_zone);
switch (rd->rpl[id].prim_id) {
case PL1_ENABLE:
@@ -511,6 +540,7 @@ static int set_time_window(struct powercap_zone *power_zone, int id,
default:
ret = -EINVAL;
}
+out:
put_online_cpus();
return ret;
}
@@ -590,6 +620,7 @@ static struct powercap_zone_constraint_ops constraint_ops = {
.set_time_window_us = set_time_window,
.get_time_window_us = get_time_window,
.get_max_power_uw = get_max_power,
+ .get_max_time_window_us = get_max_time_window,
.get_name = get_constraint_name,
};
diff --git a/drivers/powercap/powercap_sys.c b/drivers/powercap/powercap_sys.c
index 84419af..7d77b83 100644
--- a/drivers/powercap/powercap_sys.c
+++ b/drivers/powercap/powercap_sys.c
@@ -101,7 +101,7 @@ static ssize_t store_constraint_##_attr(struct device *dev,\
int err; \
u64 value; \
struct powercap_zone *power_zone = to_powercap_zone(dev); \
- int id; \
+ int id, ret; \
struct powercap_zone_constraint *pconst;\
\
if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id)) \
@@ -113,8 +113,10 @@ static ssize_t store_constraint_##_attr(struct device *dev,\
if (err) \
return -EINVAL; \
if (pconst && pconst->ops && pconst->ops->set_##_attr) { \
- if (!pconst->ops->set_##_attr(power_zone, id, value)) \
+ ret = pconst->ops->set_##_attr(power_zone, id, value); \
+ if (!ret) \
return count; \
+ return ret; \
} \
\
return -ENODATA; \
--
1.7.9.3
Using an small value for the time window results in a
bogus value for the time window. For example,
[root@intel-chiefriver-03 linux]# echo 950 >
/sys/devices/virtual/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us;
egrep ^ /sys/devices/virtual/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us
-bash: echo: write error: Invalid argument
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_time_window_us:1:4501502475370496
The Intel ASDM doesn't explicitly define a minimum time window.
The MSR_RAPL_POWER_UNIT register, read during initialization, does
specify a minimum time window unit so that can be used as a lower
bound for error checking.
After this change the minimum time window is properly clamped:
[root@intel-chiefriver-03 linux]# echo 950 >
/sys/devices/virtual/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us;
egrep ^ /sys/devices/virtual/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us
-bash: echo: write error: Invalid argument
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_time_window_us:1:976
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Radivoje Jovanovic <[email protected]>
Cc: Seiichi Ikarashi <[email protected]>
Cc: Mathias Krause <[email protected]>
Cc: Ajay Thomas <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>
---
drivers/powercap/intel_rapl.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
index f765b2c..14753e5 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -516,6 +516,7 @@ static int set_time_window(struct powercap_zone *power_zone, int id,
u64 window)
{
struct rapl_domain *rd;
+ struct rapl_package *rp;
int ret = 0;
u64 max_window;
@@ -524,12 +525,18 @@ static int set_time_window(struct powercap_zone *power_zone, int id,
if (ret < 0)
goto out;
- if (window > max_window) {
+ rd = power_zone_to_rapl_domain(power_zone);
+ rp = find_package_by_id(rd->package_id);
+ /*
+ * The Intel ASDM doesn't explicitly define a minimum time window.
+ * The MSR_RAPL_POWER_UNIT register, read during initialization,
+ * does contain the smallest unit of time that can be measured.
+ */
+ if ((window > max_window) || (window < rp->time_unit)) {
ret = -EINVAL;
goto out;
}
- rd = power_zone_to_rapl_domain(power_zone);
switch (rd->rpl[id].prim_id) {
case PL1_ENABLE:
rapl_write_data_raw(rd, TIME_WINDOW1, window);
--
1.7.9.3
Some systems erroneously set the maximum time window field of
MSR_PKG_POWER_INFO register to 0. This results in a user not being able
to set the time windows for the package. In some cases, however, RAPL
will still continue to work with a small window (albeit through some
trial and error). This patch adds a ignore_max_window_check module
parameter to avoid the maximum time window check in set_time_window().
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Radivoje Jovanovic <[email protected]>
Cc: Seiichi Ikarashi <[email protected]>
Cc: Mathias Krause <[email protected]>
Cc: Ajay Thomas <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>
---
drivers/powercap/intel_rapl.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
index 14753e5..3cdb8ee 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -508,10 +508,22 @@ static int get_max_time_window(struct powercap_zone *power_zone, int id,
else
*data = val;
+ if (val == 0)
+ pr_warn_once(FW_BUG "intel_rapl: Maximum Time Window is zero. This is a BIOS bug that should be reported to your hardware or BIOS vendor. The value of zero may prevent Intel RAPL from functioning properly. Most bugs can be avoided by setting the ignore_max_window_check module parameter.\n");
+
put_online_cpus();
return ret;
}
+/* Some BIOSes incorrectly program the maximum time window in the
+ * MSR_PKG_POWER_INFO register. Some of these systems still have functional
+ * RAPL registers, etc., so give the user the option of disabling the maximum
+ * time window check.
+ */
+static int ignore_max_window_check;
+module_param(ignore_max_window_check, int, 0444);
+MODULE_PARM_DESC(ignore_max_window_check, "Ignore maximum window check. A bug should be reported to your hardware or BIOS vendor if this parameter is used.");
+
static int set_time_window(struct powercap_zone *power_zone, int id,
u64 window)
{
@@ -532,7 +544,8 @@ static int set_time_window(struct powercap_zone *power_zone, int id,
* The MSR_RAPL_POWER_UNIT register, read during initialization,
* does contain the smallest unit of time that can be measured.
*/
- if ((window > max_window) || (window < rp->time_unit)) {
+ if ((!ignore_max_window_check && (window > max_window)) ||
+ (window < rp->time_unit)) {
ret = -EINVAL;
goto out;
}
--
1.7.9.3
On 2015-12-15 22:02, Prarit Bhargava wrote:
> The MSR_PKG_POWER_INFO register (Intel ASDM, section 14.9.3
> "Package RAPL Domain") provides a maximum time window which the
> system can support. This window is read-only and is currently
> not examined when setting the time windows for the package.
I have been having a question here long time.
Maximum Time Window (bits 53:48) in MSR_PKG_POWER_INFO is only
6-bit length even though Time Window for Power Limit #1 (bits 23:17)
and Time Window for Power Limit #2 (bits 55:49) in MSR_PKG_POWER_LIMIT
are both 7-bit length, not 6.
Do Intel guys have an answer for it?
The patch itself looks good to me.
Just minor comments below:
> diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
> index cc97f08..f765b2c 100644
> --- a/drivers/powercap/intel_rapl.c
> +++ b/drivers/powercap/intel_rapl.c
> @@ -493,13 +493,42 @@ static int get_current_power_limit(struct powercap_zone *power_zone, int id,
> return ret;
> }
>
> +static int get_max_time_window(struct powercap_zone *power_zone, int id,
The 2nd arg "id" is not necessary.
> + u64 *data)
> +{
> + struct rapl_domain *rd;
> + int ret = 0;
> + u64 val;
> +
> + get_online_cpus();
> + rd = power_zone_to_rapl_domain(power_zone);
> +
> + if (rapl_read_data_raw(rd, MAX_TIME_WINDOW, true, &val))
rapl_read_data_raw() can return -EINVAL and -ENODEV other than -EIO.
> + ret = -EIO;
Is it OK to limit ret to -EIO here?
> + else
> + *data = val;
> +
> + put_online_cpus();
> + return ret;
> +}
> +
> static int set_time_window(struct powercap_zone *power_zone, int id,
> u64 window)
> {
> struct rapl_domain *rd;
> int ret = 0;
> + u64 max_window;
>
> get_online_cpus();
> + ret = get_max_time_window(power_zone, id, &max_window);
> + if (ret < 0)
> + goto out;
> +
> + if (window > max_window) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> rd = power_zone_to_rapl_domain(power_zone);
> switch (rd->rpl[id].prim_id) {
> case PL1_ENABLE:
> @@ -511,6 +540,7 @@ static int set_time_window(struct powercap_zone *power_zone, int id,
> default:
> ret = -EINVAL;
> }
> +out:
> put_online_cpus();
> return ret;
> }
> @@ -590,6 +620,7 @@ static struct powercap_zone_constraint_ops constraint_ops = {
> .set_time_window_us = set_time_window,
> .get_time_window_us = get_time_window,
> .get_max_power_uw = get_max_power,
> + .get_max_time_window_us = get_max_time_window,
> .get_name = get_constraint_name,
> };
>
> diff --git a/drivers/powercap/powercap_sys.c b/drivers/powercap/powercap_sys.c
> index 84419af..7d77b83 100644
> --- a/drivers/powercap/powercap_sys.c
> +++ b/drivers/powercap/powercap_sys.c
> @@ -101,7 +101,7 @@ static ssize_t store_constraint_##_attr(struct device *dev,\
> int err; \
> u64 value; \
> struct powercap_zone *power_zone = to_powercap_zone(dev); \
> - int id; \
> + int id, ret; \
> struct powercap_zone_constraint *pconst;\
> \
> if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id)) \
> @@ -113,8 +113,10 @@ static ssize_t store_constraint_##_attr(struct device *dev,\
> if (err) \
> return -EINVAL; \
> if (pconst && pconst->ops && pconst->ops->set_##_attr) { \
> - if (!pconst->ops->set_##_attr(power_zone, id, value)) \
> + ret = pconst->ops->set_##_attr(power_zone, id, value); \
> + if (!ret) \
> return count; \
> + return ret; \
An opposite question to above.
Is it OK not to limit the return value to -EINVAL here?
Do you want this function to return -EIO or something?
> } \
> \
> return -ENODATA; \
>
On 2015-12-15 22:02, Prarit Bhargava wrote:
> Some systems erroneously set the maximum time window field of
> MSR_PKG_POWER_INFO register to 0. This results in a user not being able
> to set the time windows for the package. In some cases, however, RAPL
> will still continue to work with a small window (albeit through some
> trial and error). This patch adds a ignore_max_window_check module
> parameter to avoid the maximum time window check in set_time_window().
>
> Cc: "Rafael J. Wysocki" <[email protected]>
> Cc: Prarit Bhargava <[email protected]>
> Cc: Radivoje Jovanovic <[email protected]>
> Cc: Seiichi Ikarashi <[email protected]>
> Cc: Mathias Krause <[email protected]>
> Cc: Ajay Thomas <[email protected]>
> Signed-off-by: Prarit Bhargava <[email protected]>
> ---
> drivers/powercap/intel_rapl.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
> index 14753e5..3cdb8ee 100644
> --- a/drivers/powercap/intel_rapl.c
> +++ b/drivers/powercap/intel_rapl.c
> @@ -508,10 +508,22 @@ static int get_max_time_window(struct powercap_zone *power_zone, int id,
> else
> *data = val;
>
> + if (val == 0)
If rapl_read_data_raw() fails, "val" becomes indefinite.
So this check and warn should be performed only if rapl_read_data_raw() succeeds.
> + pr_warn_once(FW_BUG "intel_rapl: Maximum Time Window is zero. This is a BIOS bug that should be reported to your hardware or BIOS vendor. The value of zero may prevent Intel RAPL from functioning properly. Most bugs can be avoided by setting the ignore_max_window_check module parameter.\n");
> +
> put_online_cpus();
> return ret;
> }
>
> +/* Some BIOSes incorrectly program the maximum time window in the
> + * MSR_PKG_POWER_INFO register. Some of these systems still have functional
> + * RAPL registers, etc., so give the user the option of disabling the maximum
> + * time window check.
> + */
> +static int ignore_max_window_check;
> +module_param(ignore_max_window_check, int, 0444);
> +MODULE_PARM_DESC(ignore_max_window_check, "Ignore maximum window check. A bug should be reported to your hardware or BIOS vendor if this parameter is used.");
Don't you need to use "time_window" instead of just "window" in these names?
The Intel ASDM provides a maximum time window that can be specified when
setting a time window in the RAPL driver. While the ASDM doesn't explicitly
provide a minimum time window value, it does provide a minimum time window
unit that also can be used as a minimum value.
This patchset implements barrier checking for the time windows, and adds
reporting of a known bug in which the maxmimum time window value may be
erroneously set to 0, as well as a module parameter to avoid the maximum
window checks on broken BIOSes.
[v2]: update 3/3 with minor changes
Prarit Bhargava (3):
powercap, intel_rapl, implement get_max_time_window
powercap, intel_rapl, implement check for minimum time window
powercap, intel_rapl, Add ignore_max_time_window_check module
parameter for broken BIOSes
drivers/powercap/intel_rapl.c | 50 +++++++++++++++++++++++++++++++++++++++
drivers/powercap/powercap_sys.c | 6 +++--
2 files changed, 54 insertions(+), 2 deletions(-)
--
1.7.9.3
The MSR_PKG_POWER_INFO register (Intel ASDM, section 14.9.3
"Package RAPL Domain") provides a maximum time window which the
system can support. This window is read-only and is currently
not examined when setting the time windows for the package.
This patch implements get_max_time_window_us() and checks the window when
a user attempts to set power capping for the package.
Before the patch it was possible to set the window to, for example, 10000
micro seconds:
[root@intel-chiefriver-03 rhel7]# echo 10000 >
/sys/devices/virtual/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us;
egrep ^ /sys/devices/virtual/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_time_window_us:1:9765
but from 'turbostat -d', the package is limited to 976us:
cpu0: MSR_PKG_POWER_INFO: 0x01200168 (45 W TDP, RAPL 36 - 0 W, 0.000977 sec.)
(Note, there appears to be a rounding issue in turbostat which needs to
also be fixed. Looking at the values in the register it is clear the
value is 1/1024 = 976us.)
After the patch we are limited by the maximum time window:
[root@intel-chiefriver-03 rhel7]# echo 10000 >
/sys/devices/virtual/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us;
egrep ^ /sys/devices/virtual/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us
-bash: echo: write error: Invalid argument
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_time_window_us:1:976
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Radivoje Jovanovic <[email protected]>
Cc: Seiichi Ikarashi <[email protected]>
Cc: Mathias Krause <[email protected]>
Cc: Ajay Thomas <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>
---
drivers/powercap/intel_rapl.c | 31 +++++++++++++++++++++++++++++++
drivers/powercap/powercap_sys.c | 6 ++++--
2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
index cc97f08..f765b2c 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -493,13 +493,42 @@ static int get_current_power_limit(struct powercap_zone *power_zone, int id,
return ret;
}
+static int get_max_time_window(struct powercap_zone *power_zone, int id,
+ u64 *data)
+{
+ struct rapl_domain *rd;
+ int ret = 0;
+ u64 val;
+
+ get_online_cpus();
+ rd = power_zone_to_rapl_domain(power_zone);
+
+ if (rapl_read_data_raw(rd, MAX_TIME_WINDOW, true, &val))
+ ret = -EIO;
+ else
+ *data = val;
+
+ put_online_cpus();
+ return ret;
+}
+
static int set_time_window(struct powercap_zone *power_zone, int id,
u64 window)
{
struct rapl_domain *rd;
int ret = 0;
+ u64 max_window;
get_online_cpus();
+ ret = get_max_time_window(power_zone, id, &max_window);
+ if (ret < 0)
+ goto out;
+
+ if (window > max_window) {
+ ret = -EINVAL;
+ goto out;
+ }
+
rd = power_zone_to_rapl_domain(power_zone);
switch (rd->rpl[id].prim_id) {
case PL1_ENABLE:
@@ -511,6 +540,7 @@ static int set_time_window(struct powercap_zone *power_zone, int id,
default:
ret = -EINVAL;
}
+out:
put_online_cpus();
return ret;
}
@@ -590,6 +620,7 @@ static struct powercap_zone_constraint_ops constraint_ops = {
.set_time_window_us = set_time_window,
.get_time_window_us = get_time_window,
.get_max_power_uw = get_max_power,
+ .get_max_time_window_us = get_max_time_window,
.get_name = get_constraint_name,
};
diff --git a/drivers/powercap/powercap_sys.c b/drivers/powercap/powercap_sys.c
index 84419af..7d77b83 100644
--- a/drivers/powercap/powercap_sys.c
+++ b/drivers/powercap/powercap_sys.c
@@ -101,7 +101,7 @@ static ssize_t store_constraint_##_attr(struct device *dev,\
int err; \
u64 value; \
struct powercap_zone *power_zone = to_powercap_zone(dev); \
- int id; \
+ int id, ret; \
struct powercap_zone_constraint *pconst;\
\
if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id)) \
@@ -113,8 +113,10 @@ static ssize_t store_constraint_##_attr(struct device *dev,\
if (err) \
return -EINVAL; \
if (pconst && pconst->ops && pconst->ops->set_##_attr) { \
- if (!pconst->ops->set_##_attr(power_zone, id, value)) \
+ ret = pconst->ops->set_##_attr(power_zone, id, value); \
+ if (!ret) \
return count; \
+ return ret; \
} \
\
return -ENODATA; \
--
1.7.9.3
Using an small value for the time window results in a
bogus value for the time window. For example,
[root@intel-chiefriver-03 linux]# echo 950 >
/sys/devices/virtual/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us;
egrep ^ /sys/devices/virtual/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us
-bash: echo: write error: Invalid argument
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_time_window_us:1:4501502475370496
The Intel ASDM doesn't explicitly define a minimum time window.
The MSR_RAPL_POWER_UNIT register, read during initialization, does
specify a minimum time window unit so that can be used as a lower
bound for error checking.
After this change the minimum time window is properly clamped:
[root@intel-chiefriver-03 linux]# echo 950 >
/sys/devices/virtual/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us;
egrep ^ /sys/devices/virtual/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us
-bash: echo: write error: Invalid argument
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_time_window_us:1:976
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Radivoje Jovanovic <[email protected]>
Cc: Seiichi Ikarashi <[email protected]>
Cc: Mathias Krause <[email protected]>
Cc: Ajay Thomas <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>
---
drivers/powercap/intel_rapl.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
index f765b2c..14753e5 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -516,6 +516,7 @@ static int set_time_window(struct powercap_zone *power_zone, int id,
u64 window)
{
struct rapl_domain *rd;
+ struct rapl_package *rp;
int ret = 0;
u64 max_window;
@@ -524,12 +525,18 @@ static int set_time_window(struct powercap_zone *power_zone, int id,
if (ret < 0)
goto out;
- if (window > max_window) {
+ rd = power_zone_to_rapl_domain(power_zone);
+ rp = find_package_by_id(rd->package_id);
+ /*
+ * The Intel ASDM doesn't explicitly define a minimum time window.
+ * The MSR_RAPL_POWER_UNIT register, read during initialization,
+ * does contain the smallest unit of time that can be measured.
+ */
+ if ((window > max_window) || (window < rp->time_unit)) {
ret = -EINVAL;
goto out;
}
- rd = power_zone_to_rapl_domain(power_zone);
switch (rd->rpl[id].prim_id) {
case PL1_ENABLE:
rapl_write_data_raw(rd, TIME_WINDOW1, window);
--
1.7.9.3
Some systems erroneously set the maximum time window field of
MSR_PKG_POWER_INFO register to 0. This results in a user not being able
to set the time windows for the package. In some cases, however, RAPL
will still continue to work with a small window (albeit through some
trial and error). This patch adds a ignore_max_time_window_check module
parameter to avoid the maximum time window check in set_time_window().
[v2]: change name to max_time_window_check, fix (val == 0) check
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Radivoje Jovanovic <[email protected]>
Cc: Seiichi Ikarashi <[email protected]>
Cc: Mathias Krause <[email protected]>
Cc: Ajay Thomas <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>
---
drivers/powercap/intel_rapl.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
index 14753e5..939026d 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -505,13 +505,24 @@ static int get_max_time_window(struct powercap_zone *power_zone, int id,
if (rapl_read_data_raw(rd, MAX_TIME_WINDOW, true, &val))
ret = -EIO;
- else
+ else {
*data = val;
-
+ if (val == 0)
+ pr_warn_once(FW_BUG "intel_rapl: Maximum Time Window is zero. This is a BIOS bug that should be reported to your hardware or BIOS vendor. The value of zero may prevent Intel RAPL from functioning properly. Most bugs can be avoided by setting the ignore_max_window_check module parameter.\n");
+ }
put_online_cpus();
return ret;
}
+/* Some BIOSes incorrectly program the maximum time window in the
+ * MSR_PKG_POWER_INFO register. Some of these systems still have functional
+ * RAPL registers, etc., so give the user the option of disabling the maximum
+ * time window check.
+ */
+static int ignore_max_time_window_check;
+module_param(ignore_max_time_window_check, int, 0444);
+MODULE_PARM_DESC(ignore_max_time_window_check, "Ignore maximum time window check. A bug should be reported to your hardware or BIOS vendor if this parameter is used.");
+
static int set_time_window(struct powercap_zone *power_zone, int id,
u64 window)
{
@@ -532,7 +543,8 @@ static int set_time_window(struct powercap_zone *power_zone, int id,
* The MSR_RAPL_POWER_UNIT register, read during initialization,
* does contain the smallest unit of time that can be measured.
*/
- if ((window > max_window) || (window < rp->time_unit)) {
+ if ((!ignore_max_time_window_check && (window > max_window)) ||
+ (window < rp->time_unit)) {
ret = -EINVAL;
goto out;
}
--
1.7.9.3
On 12/17/2015 12:45 AM, Seiichi Ikarashi wrote:
> On 2015-12-15 22:02, Prarit Bhargava wrote:
>> The MSR_PKG_POWER_INFO register (Intel ASDM, section 14.9.3
>> "Package RAPL Domain") provides a maximum time window which the
>> system can support. This window is read-only and is currently
>> not examined when setting the time windows for the package.
>
> I have been having a question here long time.
> Maximum Time Window (bits 53:48) in MSR_PKG_POWER_INFO is only
> 6-bit length even though Time Window for Power Limit #1 (bits 23:17)
> and Time Window for Power Limit #2 (bits 55:49) in MSR_PKG_POWER_LIMIT
> are both 7-bit length, not 6.
While looking at the MSR settings I had exactly the same question! I too would
like to know the answer.
>
> Do Intel guys have an answer for it?
>
>
> The patch itself looks good to me.
> Just minor comments below:
>
>> diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
>> index cc97f08..f765b2c 100644
>> --- a/drivers/powercap/intel_rapl.c
>> +++ b/drivers/powercap/intel_rapl.c
>> @@ -493,13 +493,42 @@ static int get_current_power_limit(struct powercap_zone *power_zone, int id,
>> return ret;
>> }
>>
>> +static int get_max_time_window(struct powercap_zone *power_zone, int id,
>
> The 2nd arg "id" is not necessary.
I'll drop this in v2.
>
>> + u64 *data)
>> +{
>> + struct rapl_domain *rd;
>> + int ret = 0;
>> + u64 val;
>> +
>> + get_online_cpus();
>> + rd = power_zone_to_rapl_domain(power_zone);
>> +
>> + if (rapl_read_data_raw(rd, MAX_TIME_WINDOW, true, &val))
>
> rapl_read_data_raw() can return -EINVAL and -ENODEV other than -EIO.
>
>> + ret = -EIO;
>
> Is it OK to limit ret to -EIO here?
AFAICT it seems like it. The only error that can occur here (at least by the
time this code is executed) is that there is a range error. -EIO seems appropriate.
>
>> + else
>> + *data = val;
>> +
>> + put_online_cpus();
>> + return ret;
>> +}
>> +
>> static int set_time_window(struct powercap_zone *power_zone, int id,
>> u64 window)
>> {
>> struct rapl_domain *rd;
>> int ret = 0;
>> + u64 max_window;
>>
>> get_online_cpus();
>> + ret = get_max_time_window(power_zone, id, &max_window);
>> + if (ret < 0)
>> + goto out;
>> +
>> + if (window > max_window) {
>> + ret = -EINVAL;
>> + goto out;
>> + }
>> +
>> rd = power_zone_to_rapl_domain(power_zone);
>> switch (rd->rpl[id].prim_id) {
>> case PL1_ENABLE:
>> @@ -511,6 +540,7 @@ static int set_time_window(struct powercap_zone *power_zone, int id,
>> default:
>> ret = -EINVAL;
>> }
>> +out:
>> put_online_cpus();
>> return ret;
>> }
>> @@ -590,6 +620,7 @@ static struct powercap_zone_constraint_ops constraint_ops = {
>> .set_time_window_us = set_time_window,
>> .get_time_window_us = get_time_window,
>> .get_max_power_uw = get_max_power,
>> + .get_max_time_window_us = get_max_time_window,
>> .get_name = get_constraint_name,
>> };
>>
>> diff --git a/drivers/powercap/powercap_sys.c b/drivers/powercap/powercap_sys.c
>> index 84419af..7d77b83 100644
>> --- a/drivers/powercap/powercap_sys.c
>> +++ b/drivers/powercap/powercap_sys.c
>> @@ -101,7 +101,7 @@ static ssize_t store_constraint_##_attr(struct device *dev,\
>> int err; \
>> u64 value; \
>> struct powercap_zone *power_zone = to_powercap_zone(dev); \
>> - int id; \
>> + int id, ret; \
>> struct powercap_zone_constraint *pconst;\
>> \
>> if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id)) \
>> @@ -113,8 +113,10 @@ static ssize_t store_constraint_##_attr(struct device *dev,\
>> if (err) \
>> return -EINVAL; \
>> if (pconst && pconst->ops && pconst->ops->set_##_attr) { \
>> - if (!pconst->ops->set_##_attr(power_zone, id, value)) \
>> + ret = pconst->ops->set_##_attr(power_zone, id, value); \
>> + if (!ret) \
>> return count; \
>> + return ret; \
>
> An opposite question to above.
> Is it OK not to limit the return value to -EINVAL here?
> Do you want this function to return -EIO or something?
In this case, no, because the define is used by other values. I think that
would limit all erros in the set_* functions to be -EIO.
P.
On 12/18/2015 12:50 AM, Seiichi Ikarashi wrote:
> On 2015-12-15 22:02, Prarit Bhargava wrote:
>> Some systems erroneously set the maximum time window field of
>> MSR_PKG_POWER_INFO register to 0. This results in a user not being able
>> to set the time windows for the package. In some cases, however, RAPL
>> will still continue to work with a small window (albeit through some
>> trial and error). This patch adds a ignore_max_window_check module
>> parameter to avoid the maximum time window check in set_time_window().
>>
>> Cc: "Rafael J. Wysocki" <[email protected]>
>> Cc: Prarit Bhargava <[email protected]>
>> Cc: Radivoje Jovanovic <[email protected]>
>> Cc: Seiichi Ikarashi <[email protected]>
>> Cc: Mathias Krause <[email protected]>
>> Cc: Ajay Thomas <[email protected]>
>> Signed-off-by: Prarit Bhargava <[email protected]>
>> ---
>> drivers/powercap/intel_rapl.c | 15 ++++++++++++++-
>> 1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
>> index 14753e5..3cdb8ee 100644
>> --- a/drivers/powercap/intel_rapl.c
>> +++ b/drivers/powercap/intel_rapl.c
>> @@ -508,10 +508,22 @@ static int get_max_time_window(struct powercap_zone *power_zone, int id,
>> else
>> *data = val;
>>
>> + if (val == 0)
>
> If rapl_read_data_raw() fails, "val" becomes indefinite.
> So this check and warn should be performed only if rapl_read_data_raw() succeeds.
>
>> + pr_warn_once(FW_BUG "intel_rapl: Maximum Time Window is zero. This is a BIOS bug that should be reported to your hardware or BIOS vendor. The value of zero may prevent Intel RAPL from functioning properly. Most bugs can be avoided by setting the ignore_max_window_check module parameter.\n");
>> +
>> put_online_cpus();
>> return ret;
>> }
>>
>> +/* Some BIOSes incorrectly program the maximum time window in the
>> + * MSR_PKG_POWER_INFO register. Some of these systems still have functional
>> + * RAPL registers, etc., so give the user the option of disabling the maximum
>> + * time window check.
>> + */
>> +static int ignore_max_window_check;
>> +module_param(ignore_max_window_check, int, 0444);
>> +MODULE_PARM_DESC(ignore_max_window_check, "Ignore maximum window check. A bug should be reported to your hardware or BIOS vendor if this parameter is used.");
>
> Don't you need to use "time_window" instead of just "window" in these names?
Will submit a [v2] to cover the val error condition and s/time_window/window in
the module parameter description.
Rafael, I cannot remember ... will you take a 3/3 v2 or do you want me to repost
the whole thing as a v2?
P.
>
>