2010-11-15 16:07:27

by Len Brown

[permalink] [raw]
Subject: [PATCH RESEND] tools: add power/x86/x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS

From: Len Brown <[email protected]>

MSR_IA32_ENERGY_PERF_BIAS first became available on Westmere Xeon.
It is implemented in all Sandy Bridge processors -- mobile, desktop and server.
It is expected to become increasingly important in subsequent generations.

x86_energy_perf_policy is a user-space utility to set this
hardware energy vs performance policy hint in the processor.
Most systems would benefit from "x86_energy_perf_policy normal"
at system startup, as the hardware default is maximum performance
at the expense of energy efficiency. See the comments
in the source code for more information.

Linux-2.6.36 added "epb" to /proc/cpuinfo to indicate
if an x86 processor supports MSR_IA32_ENERGY_PERF_BIAS,
though the kernel does not actually program the MSR.

In March, Venkatesh Pallipadi proposed a small driver
that programmed MSR_IA32_ENERGY_PERF_BIAS, based on
the cpufreq governor in use. It also offered
a boot-time cmdline option to override.
http://lkml.org/lkml/2010/3/4/457
But hiding the hardware policy behind the
governor choice was deemed "kinda icky".

In June, I proposed a generic user/kernel API to
consolidate the power/performance policy trade-off.
"RFC: /sys/power/policy_preference"
http://lkml.org/lkml/2010/6/16/399
That is my preference for implementing this capability,
but I received no support on the list.

In September, I sent x86_energy_perf_policy.c to LKML,
a user-space utility that scribbles directly to the MSR.
http://lkml.org/lkml/2010/9/28/246

Here is the same utility re-sent, this time proposed
to reside in the kernel tools directory.

Signed-off-by: Len Brown <[email protected]>
---
tools/power/x86/x86_energy_perf_policy/Makefile | 7 +
.../x86_energy_perf_policy.c | 358 ++++++++++++++++++++
2 files changed, 365 insertions(+), 0 deletions(-)
create mode 100644 tools/power/x86/x86_energy_perf_policy/Makefile
create mode 100644 tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c

diff --git a/tools/power/x86/x86_energy_perf_policy/Makefile b/tools/power/x86/x86_energy_perf_policy/Makefile
new file mode 100644
index 0000000..b0763da
--- /dev/null
+++ b/tools/power/x86/x86_energy_perf_policy/Makefile
@@ -0,0 +1,7 @@
+x86_energy_perf_policy : x86_energy_perf_policy.c
+
+clean :
+ rm -f x86_energy_perf_policy
+
+install :
+ install x86_energy_perf_policy /usr/bin/x86_energy_perf_policy
diff --git a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
new file mode 100644
index 0000000..89394d9
--- /dev/null
+++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
@@ -0,0 +1,358 @@
+/*
+ * x86_energy_perf_policy -- set the energy versus performance
+ * policy preference bias on recent X86 processors.
+ */
+/*
+ * Copyright (c) 2010, Intel Corporation.
+ * Len Brown <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/resource.h>
+#include <fcntl.h>
+#include <signal.h>
+#include <sys/time.h>
+#include <stdlib.h>
+
+unsigned int verbose; /* set with -v */
+unsigned int read_only; /* set with -r */
+char *progname;
+unsigned long long new_bias;
+int cpu = -1;
+
+/*
+ * Usage:
+ *
+ * -c cpu: limit action to a single CPU (default is all CPUs)
+ * -v: verbose output (can invoke more than once)
+ * -r: read-only, don't change any settings
+ *
+ * performance
+ * Performance is paramount.
+ * Unwilling to sacrafice any performance
+ * for the sake of energy saving. (hardware default)
+ *
+ * normal
+ * Can tolerate minor performance compromise
+ * for potentially significant energy savings.
+ * (reasonable default for most desktops and servers)
+ *
+ * powersave
+ * Can tolerate significant performance hit
+ * to maximize energy savings.
+ *
+ * n
+ * a numerical value to write to the underlying MSR.
+ */
+void usage(void)
+{
+ printf("%s: [-c cpu] [-v] "
+ "(-r | 'performance' | 'normal' | 'powersave' | n)\n",
+ progname);
+}
+
+/*
+ * MSR_IA32_ENERGY_PERF_BIAS allows software to convey
+ * its policy for the relative importance of performance
+ * versus energy savings.
+ *
+ * The hardware uses this information in model-specific ways
+ * when it must choose trade-offs between performance and
+ * energy consumption.
+ *
+ * This policy hint does not supercede Processor Performance states
+ * (P-states) or CPU Idle power states (C-states), but allows
+ * software to have influence where it has been unable to
+ * express a preference in the past.
+ *
+ * For example, this setting may tell the hardware how
+ * aggressively or conservatively to control frequency
+ * in the "turbo range" above the explicitly OS-controlled
+ * P-state frequency range. It may also tell the hardware
+ * how aggressively is should enter the OS requestec C-states.
+ *
+ * The support for this feature is indicated by CPUID.06H.ECX.bit3
+ * per the Intel Architectures Software Developer's Manual.
+ */
+
+#define MSR_IA32_ENERGY_PERF_BIAS 0x000001b0
+
+#define BIAS_PERFORMANCE 0
+#define BIAS_BALANCE 6
+#define BIAS_POWERSAVE 15
+
+cmdline(int argc, char **argv) {
+ int opt;
+
+ progname = argv[0];
+
+ while ((opt = getopt(argc, argv, "+rvc:")) != -1) {
+ switch (opt) {
+ case 'c':
+ cpu = atoi(optarg);
+ break;
+ case 'r':
+ read_only = 1;
+ break;
+ case 'v':
+ verbose++;
+ break;
+ default:
+ usage();
+ exit(-1);
+ }
+ }
+ /* if -r, then should be no additional optind */
+ if (read_only && (argc > optind)) {
+ usage();
+ exit(-1);
+ }
+
+ /*
+ * if no -r , then must be one additional optind
+ */
+ if (!read_only) {
+
+ if (argc != optind + 1) {
+ printf("must supply -r or policy param\n");
+ usage();
+ exit(-1);
+ }
+
+ if (!strcmp("performance", argv[optind])) {
+ new_bias = BIAS_PERFORMANCE;
+ } else if (!strcmp("normal", argv[optind])) {
+ new_bias = BIAS_BALANCE;
+ } else if (!strcmp("powersave", argv[optind])) {
+ new_bias = BIAS_POWERSAVE;
+ } else {
+ new_bias = atoll(argv[optind]);
+ if (new_bias > BIAS_POWERSAVE) {
+ usage();
+ exit(-1);
+ }
+ }
+ }
+}
+
+/*
+ * validate_cpuid()
+ * returns on success, quietly exits on failure (make verbose with -v)
+ */
+void validate_cpuid(void)
+{
+ unsigned int eax, ebx, ecx, edx, max_level;
+ char brand[16];
+ unsigned int fms, family, model, stepping, ht_capable;
+
+ eax = ebx = ecx = edx = 0;
+
+ asm("cpuid" : "=a" (max_level), "=b" (ebx), "=c" (ecx),
+ "=d" (edx) : "a" (0));
+
+ sprintf(brand, "%.4s%.4s%.4s", &ebx, &edx, &ecx);
+
+ if (strncmp(brand, "GenuineIntel", 12)) {
+ if (verbose)
+ printf("CPUID: %s != GenuineIntel\n", brand);
+ exit(-1);
+ }
+
+ asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx");
+ family = (fms >> 8) & 0xf;
+ model = (fms >> 4) & 0xf;
+ stepping = fms & 0xf;
+ if (family == 6 || family == 0xf)
+ model += ((fms >> 16) & 0xf) << 4;
+
+ if (verbose > 1)
+ printf("CPUID %s %d levels family:model:stepping "
+ "0x%x:%x:%x (%d:%d:%d)\n", brand, max_level,
+ family, model, stepping, family, model, stepping);
+
+ if (!(edx & (1 << 5))) {
+ if (verbose)
+ printf("CPUID: no MSR\n");
+ exit(-1);
+ }
+
+ /*
+ * Support for MSR_IA32_ENERGY_PERF_BIAS
+ * is indicated by CPUID.06H.ECX.bit3
+ */
+ asm("cpuid" : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx) : "a" (6));
+ if (verbose)
+ printf("CPUID.06H.ECX: 0x%x\n", ecx);
+ if (!(ecx & (1 << 3))) {
+ if (verbose)
+ printf("CPUID: No MSR_IA32_ENERGY_PERF_BIAS\n");
+ exit(-1);
+ }
+ return; /* success */
+}
+
+check_dev_msr() {
+ struct stat sb;
+
+ if (stat("/dev/cpu/0/msr", &sb)) {
+ printf("no /dev/cpu/0/msr\n");
+ printf("Try \"# modprobe msr\"\n");
+ exit(-5);
+ }
+}
+
+unsigned long long get_msr(int cpu, int offset)
+{
+ unsigned long long msr;
+ char msr_path[32];
+ int retval;
+ int fd;
+
+ sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
+ fd = open(msr_path, O_RDONLY);
+ if (fd < 0) {
+ perror(msr_path);
+ exit(-1);
+ }
+
+ retval = pread(fd, &msr, sizeof msr, offset);
+
+ if (retval != sizeof msr) {
+ printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
+ exit(-2);
+ }
+ close(fd);
+ return msr;
+}
+
+unsigned long long put_msr(int cpu, unsigned long long new_msr, int offset)
+{
+ unsigned long long old_msr;
+ char msr_path[32];
+ int retval;
+ int fd;
+
+ sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
+ fd = open(msr_path, O_RDWR);
+ if (fd < 0) {
+ perror(msr_path);
+ exit(-1);
+ }
+
+ retval = pread(fd, &old_msr, sizeof old_msr, offset);
+ if (retval != sizeof old_msr) {
+ perror("pwrite");
+ printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
+ exit(-2);
+ }
+
+ retval = pwrite(fd, &new_msr, sizeof new_msr, offset);
+ if (retval != sizeof new_msr) {
+ perror("pwrite");
+ printf("pwrite cpu%d 0x%x = %d\n", cpu, offset, retval);
+ exit(-2);
+ }
+
+ close(fd);
+
+ return old_msr;
+}
+
+void print_msr(int cpu)
+{
+ printf("cpu%d: 0x%016llx\n",
+ cpu, get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS));
+}
+
+void update_msr(int cpu)
+{
+ unsigned long long previous_msr;
+
+ previous_msr = put_msr(cpu, new_bias, MSR_IA32_ENERGY_PERF_BIAS);
+
+ if (verbose)
+ printf("cpu%d msr0x%x 0x%016llx -> 0x%016llx\n",
+ cpu, MSR_IA32_ENERGY_PERF_BIAS, previous_msr, new_bias);
+
+ return;
+}
+
+char *proc_stat = "/proc/stat";
+/*
+ * run func() on every cpu in /dev/cpu
+ */
+void for_every_cpu(void (func)(int))
+{
+ FILE *fp;
+ int cpu_count;
+ int retval;
+
+ fp = fopen(proc_stat, "r");
+ if (fp == NULL) {
+ perror(proc_stat);
+ exit(-1);
+ }
+
+ retval = fscanf(fp, "cpu %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n");
+ if (retval != 0) {
+ perror("/proc/stat format");
+ exit(-1);
+ }
+
+ for (cpu_count = 0; ; cpu_count++) {
+ int cpu;
+
+ retval = fscanf(fp,
+ "cpu%u %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n",
+ &cpu);
+ if (retval != 1)
+ return;
+
+ func(cpu);
+ }
+ fclose(fp);
+}
+
+int main(int argc, char **argv)
+{
+ cmdline(argc, argv);
+
+ if (verbose > 1)
+ printf("x86_energy_perf_policy Aug 2, 2010"
+ " - Len Brown <[email protected]>\n");
+ if (verbose > 1 && !read_only)
+ printf("new_bias %lld\n", new_bias);
+
+ validate_cpuid();
+ check_dev_msr();
+
+ if (cpu != -1) {
+ if (read_only)
+ print_msr(cpu);
+ else
+ update_msr(cpu);
+ } else {
+ if (read_only)
+ for_every_cpu(print_msr);
+ else
+ for_every_cpu(update_msr);
+ }
+
+ return 0;
+}
--
1.7.3.1.127.g1bb28


2010-11-17 11:35:55

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH RESEND] tools: add power/x86/x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS

Len Brown <[email protected]> writes:
> @@ -0,0 +1,7 @@
> +x86_energy_perf_policy : x86_energy_perf_policy.c
> +
> +clean :
> + rm -f x86_energy_perf_policy
> +
> +install :
> + install x86_energy_perf_policy /usr/bin/x86_energy_perf_policy

It's not clear to me how this Makefile ensures it's only
build on x86.

If someone on another architecture does a full tools build
in the future (I think that is not wired up yet, but should
eventually) such a mechanism would be needed.


> +
> +/*
> + * Usage:

...

This full comment and parts of the following comments describing the
semantics need to be available somewhere to the user who may not have
easy access to the source. Can you make it display in usage or convert
it to a manpage? I would prefer a manpage

> +
> +cmdline(int argc, char **argv) {

No type?

> + int opt;
> +
> + progname = argv[0];
> +
> + while ((opt = getopt(argc, argv, "+rvc:")) != -1) {

Maybe it's me, but I prefer having long options too (getopt_long)
These are easier to memorize.

> +
> + /*
> + * if no -r , then must be one additional optind
> + */
> + if (!read_only) {
> +
> + if (argc != optind + 1) {
> + printf("must supply -r or policy param\n");
> + usage();
> + exit(-1);

-1 is an unusual exit code. Better use 1.

An obvious improvement would be to put the exit() into usage()

> + }
> +
> + if (!strcmp("performance", argv[optind])) {
> + new_bias = BIAS_PERFORMANCE;
> + } else if (!strcmp("normal", argv[optind])) {
> + new_bias = BIAS_BALANCE;
> + } else if (!strcmp("powersave", argv[optind])) {
> + new_bias = BIAS_POWERSAVE;
> + } else {
> + new_bias = atoll(argv[optind]);

If you used strtoull() you could actually check if the input
is really a number (end == argv[optind])

> + eax = ebx = ecx = edx = 0;
> +
> + asm("cpuid" : "=a" (max_level), "=b" (ebx), "=c" (ecx),
> + "=d" (edx) : "a" (0));

Strictly for 386/early 486 you would need to check if cpuid
is available using pushf too. Perhaps it's safer to use cpuinfo

> +
> +check_dev_msr() {

Return type missing again

> + struct stat sb;
> +
> + if (stat("/dev/cpu/0/msr", &sb)) {
> + printf("no /dev/cpu/0/msr\n");

This will fail if we eventually implement cpu 0 hotplug...
Better readdir or similar.

> + printf("Try \"# modprobe msr\"\n");
> + exit(-5);

Again -5 is unusual.


> + char msr_path[32];
> + int retval;
> + int fd;
> +
> + sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
> + fd = open(msr_path, O_RDONLY);
> + if (fd < 0) {
> + perror(msr_path);
> + exit(-1);

This should be a soft error because the CPU can go away
any time.


> +/*
> + * run func() on every cpu in /dev/cpu
> + */
> +void for_every_cpu(void (func)(int))
> +{
> + FILE *fp;
> + int cpu_count;
> + int retval;
> +
> + fp = fopen(proc_stat, "r");

Using /proc/stat to get the number of CPUs is unusual
and you don't handle holes in the cpu numbers which
can happen due to hotplug.

I would just readdir or fnmatch the MSR /dev/cpu/* directories.

-Andi
--
[email protected] -- Speaking for myself only.

2010-11-22 20:13:48

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH RESEND] tools: add power/x86/x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS

Hi Andy,

Thank you for the review!

responses below.

> > +install :
> > + install x86_energy_perf_policy /usr/bin/x86_energy_perf_policy
>
> It's not clear to me how this Makefile ensures it's only
> build on x86.
>
> If someone on another architecture does a full tools build
> in the future (I think that is not wired up yet, but should
> eventually) such a mechanism would be needed.

Per the comments from Andrew and others, the concept of a
"full tools build" doesn't actually exit (yet).

So I guess the only assurance that somebody not on x86 would run
make in this directory this utility lives in tools/power/x86/

Note that there are other utilities under tools
which have no Makefile at all...

> ...I would prefer a manpage

I'll be happy to write a manpage.
Is there good example I should follow?

> > +cmdline(int argc, char **argv) {
>
> No type?

okay, now void.

> > + while ((opt = getopt(argc, argv, "+rvc:")) != -1) {
>
> Maybe it's me, but I prefer having long options too (getopt_long)
> These are easier to memorize.

I'm not inclined to bother, as the use-case for this utility
is to be invoked by another program, and the options available
are really there just for verification/debugging, and don't
really merit being memorized by a human after that task.

> An obvious improvement would be to put the exit() into usage()

done.

> > + new_bias = atoll(argv[optind]);
>
> If you used strtoull() you could actually check if the input
> is really a number (end == argv[optind])

done.

> > + asm("cpuid" : "=a" (max_level), "=b" (ebx), "=c" (ecx),
> > + "=d" (edx) : "a" (0));
>
> Strictly for 386/early 486 you would need to check if cpuid
> is available using pushf too. Perhaps it's safer to use cpuinfo

Meh, maybe simpler to crash on 486 and earlier?:-)
I'm not fond of parsing /proc/cpuinfo.

> > +check_dev_msr() {
>
> Return type missing again

routine deleted.

> > + struct stat sb;
> > +
> > + if (stat("/dev/cpu/0/msr", &sb)) {
> > + printf("no /dev/cpu/0/msr\n");
>
> This will fail if we eventually implement cpu 0 hotplug...
> Better readdir or similar.

simpler to delete check_dev_msr() and stumble forward
assuming /dev/cpu/*/msr exists, and print a message and
exit if it doesn't.

> > + printf("Try \"# modprobe msr\"\n");
> > + exit(-5);
>
> Again -5 is unusual.

okay, I canged all the exits to 1.

> > + sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
> > + fd = open(msr_path, O_RDONLY);
> > + if (fd < 0) {
> > + perror(msr_path);
> > + exit(-1);
>
> This should be a soft error because the CPU can go away
> any time.

In the highly unlikely scenario that somebody uses
the -r option to excerise the read-only code,
and simultaneously invokes and completes a cpu hot remove
during the execution of this utility,
I think the utility exiting is just as useful,
and less complicated, than handling soft error.
Since in either case, the user would probably
simply re-invoke the utility to see what the
current state of the settled machine is.

> > +/*
> > + * run func() on every cpu in /dev/cpu
> > + */
...
> > + fp = fopen(proc_stat, "r");
>
> Using /proc/stat to get the number of CPUs is unusual
> and you don't handle holes in the cpu numbers which
> can happen due to hotplug.

The code does handle holes in cpu number namespace.

The "num_cpus" variable was a hold-over from
an older version that did not, and so I've deleted it.

> I would just readdir or fnmatch the MSR /dev/cpu/* directories.

I used to do that, but Arjan convinced me to use /proc/stat.
turbostat, rdmsr, and wrmsr all use /proc/stat.

thanks,
-Len Brown, Intel Open Source Technology Center



2010-11-22 20:34:04

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH RESEND] tools: add power/x86/x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS

On Mon, Nov 22, 2010 at 03:13:24PM -0500, Len Brown wrote:
> Per the comments from Andrew and others, the concept of a
> "full tools build" doesn't actually exit (yet).
>
> So I guess the only assurance that somebody not on x86 would run
> make in this directory this utility lives in tools/power/x86/
>
> Note that there are other utilities under tools
> which have no Makefile at all...

I suspect this will need to be fixed at some point.

e.g. kernel rpms probably don't want to hard code all of this
but just call some standard make file target. And the kernel
eventually needs a make install_user or similar.

>
> > ...I would prefer a manpage
>
> I'll be happy to write a manpage.
> Is there good example I should follow?

Just pick one from /usr/share/man. You can grep for my
name if you want one written by me, but I don't claim they are
necessarily better than others @)

> I'm not inclined to bother, as the use-case for this utility
> is to be invoked by another program, and the options available

What other program?

I could well imagine administrators sticking this
into their boot.locals to set the policy they want.

> In the highly unlikely scenario that somebody uses
> the -r option to excerise the read-only code,
> and simultaneously invokes and completes a cpu hot remove

FWIW there are setups where core offlining can happen
automatically in response to an error.

-Andi
--
[email protected] -- Speaking for myself only.

2010-11-23 04:48:50

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH RESEND] tools: add power/x86/x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS

On Mon, 22 Nov 2010, Andi Kleen wrote:

> On Mon, Nov 22, 2010 at 03:13:24PM -0500, Len Brown wrote:
> > Per the comments from Andrew and others, the concept of a
> > "full tools build" doesn't actually exit (yet).
> >
> > So I guess the only assurance that somebody not on x86 would run
> > make in this directory this utility lives in tools/power/x86/
> >
> > Note that there are other utilities under tools
> > which have no Makefile at all...
>
> I suspect this will need to be fixed at some point.
>
> e.g. kernel rpms probably don't want to hard code all of this
> but just call some standard make file target. And the kernel
> eventually needs a make install_user or similar.

I agree, but I don't volunteer to set up such
a build system as part of this particular patch.
As I mentioned, supplying any Makefile is
a step better than some of the peers...

> > I'm not inclined to bother, as the use-case for this utility
> > is to be invoked by another program, and the options available
>
> What other program?
>
> I could well imagine administrators sticking this
> into their boot.locals to set the policy they want.

right, and that would be a program.
It is unlikely that users are going to be typing this
command, except into an admin script.

> > In the highly unlikely scenario that somebody uses
> > the -r option to excerise the read-only code,
> > and simultaneously invokes and completes a cpu hot remove
>
> FWIW there are setups where core offlining can happen
> automatically in response to an error.

Understood. I think it is fine if this utility
simply exits if that error occurs while it is running.

(turbostat, OTOH, may be long running, and it treats
vanishing processors as a recoverable error)

thanks,
-Len Brown, Intel Open Source Technology Center

2010-11-24 05:31:33

by Len Brown

[permalink] [raw]
Subject: [PATCH v2] tools: create power/x86/x86_energy_perf_policy

From: Len Brown <[email protected]>

MSR_IA32_ENERGY_PERF_BIAS first became available on Westmere Xeon.
It is implemented in all Sandy Bridge processors -- mobile, desktop and server.
It is expected to become increasingly important in subsequent generations.

x86_energy_perf_policy is a user-space utility to set this
hardware energy vs performance policy hint in the processor.
Most systems would benefit from "x86_energy_perf_policy normal"
at system startup, as the hardware default is maximum performance
at the expense of energy efficiency.

Linux-2.6.36 added "epb" to /proc/cpuinfo to indicate
if an x86 processor supports MSR_IA32_ENERGY_PERF_BIAS,
though the kernel does not actually program the MSR.

In March, Venkatesh Pallipadi proposed a small driver
that programmed MSR_IA32_ENERGY_PERF_BIAS, based on
the cpufreq governor in use. It also offered
a boot-time cmdline option to override.
http://lkml.org/lkml/2010/3/4/457
But hiding the hardware policy behind the
governor choice was deemed "kinda icky".

So in June, I proposed a generic user/kernel API to
consolidate the power/performance policy trade-off.
"RFC: /sys/power/policy_preference"
http://lkml.org/lkml/2010/6/16/399
That is my preference for implementing this capability,
but I received no support on the list.

So in September, I sent x86_energy_perf_policy.c to LKML,
a user-space utility that scribbles directly to the MSR.
http://lkml.org/lkml/2010/9/28/246

Here is the same utility re-sent, this time proposed
to reside in the kernel tools directory.

Signed-off-by: Len Brown <[email protected]>
---
v2
create man page
minor tweaks in response to review comments

tools/power/x86/x86_energy_perf_policy/Makefile | 8 +
.../x86_energy_perf_policy.8 | 104 +++++++
.../x86_energy_perf_policy.c | 325 ++++++++++++++++++++

diff --git a/tools/power/x86/x86_energy_perf_policy/Makefile b/tools/power/x86/x86_energy_perf_policy/Makefile
new file mode 100644
index 0000000..f458237
--- /dev/null
+++ b/tools/power/x86/x86_energy_perf_policy/Makefile
@@ -0,0 +1,8 @@
+x86_energy_perf_policy : x86_energy_perf_policy.c
+
+clean :
+ rm -f x86_energy_perf_policy
+
+install :
+ install x86_energy_perf_policy /usr/bin/
+ install x86_energy_perf_policy.8 /usr/share/man/man8/
diff --git a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8 b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
new file mode 100644
index 0000000..8eaaad6
--- /dev/null
+++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
@@ -0,0 +1,104 @@
+.\" This page Copyright (C) 2010 Len Brown <[email protected]>
+.\" Distributed under the GPL, Copyleft 1994.
+.TH X86_ENERGY_PERF_POLICY 8
+.SH NAME
+x86_energy_perf_policy \- read or write MSR_IA32_ENERGY_PERF_BIAS
+.SH SYNOPSIS
+.ft B
+.B x86_energy_perf_policy
+.RB [ "\-c cpu" ]
+.RB [ "\-v" ]
+.RB "\-r"
+.br
+.B x86_energy_perf_policy
+.RB [ "\-c cpu" ]
+.RB [ "\-v" ]
+.RB 'performance'
+.br
+.B x86_energy_perf_policy
+.RB [ "\-c cpu" ]
+.RB [ "\-v" ]
+.RB 'normal'
+.br
+.B x86_energy_perf_policy
+.RB [ "\-c cpu" ]
+.RB [ "\-v" ]
+.RB 'powersave'
+.br
+.B x86_energy_perf_policy
+.RB [ "\-c cpu" ]
+.RB [ "\-v" ]
+.RB n
+.br
+.SH DESCRIPTION
+\fBx86_energy_perf_policy\fP
+allows software to convey
+its policy for the relative importance of performance
+versus energy savings to the processor.
+
+The processor uses this information in model-specific ways
+when it must select trade-offs between performance and
+energy efficiency.
+
+This policy hint does not supersede Processor Performance states
+(P-states) or CPU Idle power states (C-states), but allows
+software to have influence where it would otherwise be unable
+to express a preference.
+
+For example, this setting may tell the hardware how
+aggressively or conservatively to control frequency
+in the "turbo range" above the explicitly OS-controlled
+P-state frequency range. It may also tell the hardware
+how aggressively is should enter the OS requested C-states.
+
+Support for this feature is indicated by CPUID.06H.ECX.bit3
+per the Intel Architectures Software Developer's Manual.
+
+.SS Options
+\fB-c\fP limits operation to a single CPU.
+The default is to operate on all CPUs.
+Note that MSR_IA32_ENERGY_PERF_BIAS is defined per
+logical processor, but that the initial implementations
+of the MSR were shared among all processors in each package.
+.PP
+\fB-v\fP increases verbosity. By default
+x86_energy_perf_policy is silent.
+.PP
+\fB-r\fP is for "read-only" mode - the unchanged state
+is read and displayed.
+.PP
+.I performance
+Set a policy where performance is paramount.
+The processor will be unwilling to sacrifice any performance
+for the sake of energy saving. This is the hardware default.
+.PP
+.I normal
+Set a policy with a normal balance between performance and energy efficiency.
+The processor will tolerate minor performance compromise
+for potentially significant energy savings.
+This reasonable default for most desktops and servers.
+.PP
+.I powersave
+Set a policy where the processor can accept
+a measurable performance hit to maximize energy efficiency.
+.PP
+.I n
+Set MSR_IA32_ENERGY_PERF_BIAS to the specified number.
+The range of valid numbers is 0-15, where 0 is maximum
+performance and 15 is maximum energy efficiency.
+
+.SH NOTES
+.B "x86_energy_perf_policy "
+runs only as root.
+.SH FILES
+.ta
+.nf
+/dev/cpu/*/msr
+.fi
+
+.SH "SEE ALSO"
+msr(4)
+.PP
+.SH AUTHORS
+.nf
+Written by Len Brown <[email protected]>
diff --git a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
new file mode 100644
index 0000000..b539923
--- /dev/null
+++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
@@ -0,0 +1,325 @@
+/*
+ * x86_energy_perf_policy -- set the energy versus performance
+ * policy preference bias on recent X86 processors.
+ */
+/*
+ * Copyright (c) 2010, Intel Corporation.
+ * Len Brown <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/resource.h>
+#include <fcntl.h>
+#include <signal.h>
+#include <sys/time.h>
+#include <stdlib.h>
+#include <string.h>
+
+unsigned int verbose; /* set with -v */
+unsigned int read_only; /* set with -r */
+char *progname;
+unsigned long long new_bias;
+int cpu = -1;
+
+/*
+ * Usage:
+ *
+ * -c cpu: limit action to a single CPU (default is all CPUs)
+ * -v: verbose output (can invoke more than once)
+ * -r: read-only, don't change any settings
+ *
+ * performance
+ * Performance is paramount.
+ * Unwilling to sacrafice any performance
+ * for the sake of energy saving. (hardware default)
+ *
+ * normal
+ * Can tolerate minor performance compromise
+ * for potentially significant energy savings.
+ * (reasonable default for most desktops and servers)
+ *
+ * powersave
+ * Can tolerate significant performance hit
+ * to maximize energy savings.
+ *
+ * n
+ * a numerical value to write to the underlying MSR.
+ */
+void usage(void)
+{
+ printf("%s: [-c cpu] [-v] "
+ "(-r | 'performance' | 'normal' | 'powersave' | n)\n",
+ progname);
+ exit(1);
+}
+
+#define MSR_IA32_ENERGY_PERF_BIAS 0x000001b0
+
+#define BIAS_PERFORMANCE 0
+#define BIAS_BALANCE 6
+#define BIAS_POWERSAVE 15
+
+void cmdline(int argc, char **argv)
+{
+ int opt;
+
+ progname = argv[0];
+
+ while ((opt = getopt(argc, argv, "+rvc:")) != -1) {
+ switch (opt) {
+ case 'c':
+ cpu = atoi(optarg);
+ break;
+ case 'r':
+ read_only = 1;
+ break;
+ case 'v':
+ verbose++;
+ break;
+ default:
+ usage();
+ }
+ }
+ /* if -r, then should be no additional optind */
+ if (read_only && (argc > optind))
+ usage();
+
+ /*
+ * if no -r , then must be one additional optind
+ */
+ if (!read_only) {
+
+ if (argc != optind + 1) {
+ printf("must supply -r or policy param\n");
+ usage();
+ }
+
+ if (!strcmp("performance", argv[optind])) {
+ new_bias = BIAS_PERFORMANCE;
+ } else if (!strcmp("normal", argv[optind])) {
+ new_bias = BIAS_BALANCE;
+ } else if (!strcmp("powersave", argv[optind])) {
+ new_bias = BIAS_POWERSAVE;
+ } else {
+ char *endptr;
+
+ new_bias = strtoull(argv[optind], &endptr, 0);
+ if (endptr == argv[optind] ||
+ new_bias > BIAS_POWERSAVE) {
+ fprintf(stderr, "invalid value: %s\n",
+ argv[optind]);
+ usage();
+ }
+ }
+ }
+}
+
+/*
+ * validate_cpuid()
+ * returns on success, quietly exits on failure (make verbose with -v)
+ */
+void validate_cpuid(void)
+{
+ unsigned int eax, ebx, ecx, edx, max_level;
+ char brand[16];
+ unsigned int fms, family, model, stepping;
+
+ eax = ebx = ecx = edx = 0;
+
+ asm("cpuid" : "=a" (max_level), "=b" (ebx), "=c" (ecx),
+ "=d" (edx) : "a" (0));
+
+ if (ebx != 0x756e6547 || edx != 0x49656e69 || ecx != 0x6c65746e) {
+ if (verbose)
+ fprintf(stderr, "%.4s%.4s%.4s != GenuineIntel",
+ (char *)&ebx, (char *)&edx, (char *)&ecx);
+ exit(1);
+ }
+
+ asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx");
+ family = (fms >> 8) & 0xf;
+ model = (fms >> 4) & 0xf;
+ stepping = fms & 0xf;
+ if (family == 6 || family == 0xf)
+ model += ((fms >> 16) & 0xf) << 4;
+
+ if (verbose > 1)
+ printf("CPUID %s %d levels family:model:stepping "
+ "0x%x:%x:%x (%d:%d:%d)\n", brand, max_level,
+ family, model, stepping, family, model, stepping);
+
+ if (!(edx & (1 << 5))) {
+ if (verbose)
+ printf("CPUID: no MSR\n");
+ exit(1);
+ }
+
+ /*
+ * Support for MSR_IA32_ENERGY_PERF_BIAS
+ * is indicated by CPUID.06H.ECX.bit3
+ */
+ asm("cpuid" : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx) : "a" (6));
+ if (verbose)
+ printf("CPUID.06H.ECX: 0x%x\n", ecx);
+ if (!(ecx & (1 << 3))) {
+ if (verbose)
+ printf("CPUID: No MSR_IA32_ENERGY_PERF_BIAS\n");
+ exit(1);
+ }
+ return; /* success */
+}
+
+unsigned long long get_msr(int cpu, int offset)
+{
+ unsigned long long msr;
+ char msr_path[32];
+ int retval;
+ int fd;
+
+ sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
+ fd = open(msr_path, O_RDONLY);
+ if (fd < 0) {
+ printf("Try \"# modprobe msr\"\n");
+ perror(msr_path);
+ exit(1);
+ }
+
+ retval = pread(fd, &msr, sizeof msr, offset);
+
+ if (retval != sizeof msr) {
+ printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
+ exit(-2);
+ }
+ close(fd);
+ return msr;
+}
+
+unsigned long long put_msr(int cpu, unsigned long long new_msr, int offset)
+{
+ unsigned long long old_msr;
+ char msr_path[32];
+ int retval;
+ int fd;
+
+ sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
+ fd = open(msr_path, O_RDWR);
+ if (fd < 0) {
+ perror(msr_path);
+ exit(1);
+ }
+
+ retval = pread(fd, &old_msr, sizeof old_msr, offset);
+ if (retval != sizeof old_msr) {
+ perror("pwrite");
+ printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
+ exit(-2);
+ }
+
+ retval = pwrite(fd, &new_msr, sizeof new_msr, offset);
+ if (retval != sizeof new_msr) {
+ perror("pwrite");
+ printf("pwrite cpu%d 0x%x = %d\n", cpu, offset, retval);
+ exit(-2);
+ }
+
+ close(fd);
+
+ return old_msr;
+}
+
+void print_msr(int cpu)
+{
+ printf("cpu%d: 0x%016llx\n",
+ cpu, get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS));
+}
+
+void update_msr(int cpu)
+{
+ unsigned long long previous_msr;
+
+ previous_msr = put_msr(cpu, new_bias, MSR_IA32_ENERGY_PERF_BIAS);
+
+ if (verbose)
+ printf("cpu%d msr0x%x 0x%016llx -> 0x%016llx\n",
+ cpu, MSR_IA32_ENERGY_PERF_BIAS, previous_msr, new_bias);
+
+ return;
+}
+
+char *proc_stat = "/proc/stat";
+/*
+ * run func() on every cpu in /dev/cpu
+ */
+void for_every_cpu(void (func)(int))
+{
+ FILE *fp;
+ int retval;
+
+ fp = fopen(proc_stat, "r");
+ if (fp == NULL) {
+ perror(proc_stat);
+ exit(1);
+ }
+
+ retval = fscanf(fp, "cpu %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n");
+ if (retval != 0) {
+ perror("/proc/stat format");
+ exit(1);
+ }
+
+ while (1) {
+ int cpu;
+
+ retval = fscanf(fp,
+ "cpu%u %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n",
+ &cpu);
+ if (retval != 1)
+ return;
+
+ func(cpu);
+ }
+ fclose(fp);
+}
+
+int main(int argc, char **argv)
+{
+ cmdline(argc, argv);
+
+ if (verbose > 1)
+ printf("x86_energy_perf_policy Nov 24, 2010"
+ " - Len Brown <[email protected]>\n");
+ if (verbose > 1 && !read_only)
+ printf("new_bias %lld\n", new_bias);
+
+ validate_cpuid();
+
+ if (cpu != -1) {
+ if (read_only)
+ print_msr(cpu);
+ else
+ update_msr(cpu);
+ } else {
+ if (read_only)
+ for_every_cpu(print_msr);
+ else
+ for_every_cpu(update_msr);
+ }
+
+ return 0;
+}

2010-11-25 05:53:04

by Chen, Gong

[permalink] [raw]
Subject: Re: [PATCH v2] tools: create power/x86/x86_energy_perf_policy

于 11/24/2010 1:31 PM, Len Brown 写道:
> From: Len Brown<[email protected]>
>
> MSR_IA32_ENERGY_PERF_BIAS first became available on Westmere Xeon.
> It is implemented in all Sandy Bridge processors -- mobile, desktop and server.
> It is expected to become increasingly important in subsequent generations.
>
> x86_energy_perf_policy is a user-space utility to set this
> hardware energy vs performance policy hint in the processor.
> Most systems would benefit from "x86_energy_perf_policy normal"
> at system startup, as the hardware default is maximum performance
> at the expense of energy efficiency.
>
> Linux-2.6.36 added "epb" to /proc/cpuinfo to indicate
> if an x86 processor supports MSR_IA32_ENERGY_PERF_BIAS,
> though the kernel does not actually program the MSR.
>
> In March, Venkatesh Pallipadi proposed a small driver
> that programmed MSR_IA32_ENERGY_PERF_BIAS, based on
> the cpufreq governor in use. It also offered
> a boot-time cmdline option to override.
> http://lkml.org/lkml/2010/3/4/457
> But hiding the hardware policy behind the
> governor choice was deemed "kinda icky".
>
> So in June, I proposed a generic user/kernel API to
> consolidate the power/performance policy trade-off.
> "RFC: /sys/power/policy_preference"
> http://lkml.org/lkml/2010/6/16/399
> That is my preference for implementing this capability,
> but I received no support on the list.
>
> So in September, I sent x86_energy_perf_policy.c to LKML,
> a user-space utility that scribbles directly to the MSR.
> http://lkml.org/lkml/2010/9/28/246
>
> Here is the same utility re-sent, this time proposed
> to reside in the kernel tools directory.
>
> Signed-off-by: Len Brown<[email protected]>
> ---
> v2
> create man page
> minor tweaks in response to review comments
>
> tools/power/x86/x86_energy_perf_policy/Makefile | 8 +
> .../x86_energy_perf_policy.8 | 104 +++++++
> .../x86_energy_perf_policy.c | 325 ++++++++++++++++++++
>
> diff --git a/tools/power/x86/x86_energy_perf_policy/Makefile b/tools/power/x86/x86_energy_perf_policy/Makefile
> new file mode 100644
> index 0000000..f458237
> --- /dev/null
> +++ b/tools/power/x86/x86_energy_perf_policy/Makefile
> @@ -0,0 +1,8 @@
> +x86_energy_perf_policy : x86_energy_perf_policy.c
> +
> +clean :
> + rm -f x86_energy_perf_policy
> +
> +install :
> + install x86_energy_perf_policy /usr/bin/
> + install x86_energy_perf_policy.8 /usr/share/man/man8/
> diff --git a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8 b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
> new file mode 100644
> index 0000000..8eaaad6
> --- /dev/null
> +++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
> @@ -0,0 +1,104 @@
> +.\" This page Copyright (C) 2010 Len Brown<[email protected]>
> +.\" Distributed under the GPL, Copyleft 1994.
> +.TH X86_ENERGY_PERF_POLICY 8
> +.SH NAME
> +x86_energy_perf_policy \- read or write MSR_IA32_ENERGY_PERF_BIAS
> +.SH SYNOPSIS
> +.ft B
> +.B x86_energy_perf_policy
> +.RB [ "\-c cpu" ]
> +.RB [ "\-v" ]
> +.RB "\-r"
> +.br
> +.B x86_energy_perf_policy
> +.RB [ "\-c cpu" ]
> +.RB [ "\-v" ]
> +.RB 'performance'
> +.br
> +.B x86_energy_perf_policy
> +.RB [ "\-c cpu" ]
> +.RB [ "\-v" ]
> +.RB 'normal'
> +.br
> +.B x86_energy_perf_policy
> +.RB [ "\-c cpu" ]
> +.RB [ "\-v" ]
> +.RB 'powersave'
> +.br
> +.B x86_energy_perf_policy
> +.RB [ "\-c cpu" ]
> +.RB [ "\-v" ]
> +.RB n
> +.br
> +.SH DESCRIPTION
> +\fBx86_energy_perf_policy\fP
> +allows software to convey
> +its policy for the relative importance of performance
> +versus energy savings to the processor.
> +
> +The processor uses this information in model-specific ways
> +when it must select trade-offs between performance and
> +energy efficiency.
> +
> +This policy hint does not supersede Processor Performance states
> +(P-states) or CPU Idle power states (C-states), but allows
> +software to have influence where it would otherwise be unable
> +to express a preference.
> +
> +For example, this setting may tell the hardware how
> +aggressively or conservatively to control frequency
> +in the "turbo range" above the explicitly OS-controlled
> +P-state frequency range. It may also tell the hardware
> +how aggressively is should enter the OS requested C-states.
> +
> +Support for this feature is indicated by CPUID.06H.ECX.bit3
> +per the Intel Architectures Software Developer's Manual.
> +
> +.SS Options
> +\fB-c\fP limits operation to a single CPU.
> +The default is to operate on all CPUs.
> +Note that MSR_IA32_ENERGY_PERF_BIAS is defined per
> +logical processor, but that the initial implementations
> +of the MSR were shared among all processors in each package.
> +.PP
> +\fB-v\fP increases verbosity. By default
> +x86_energy_perf_policy is silent.
> +.PP
> +\fB-r\fP is for "read-only" mode - the unchanged state
> +is read and displayed.
> +.PP
> +.I performance
> +Set a policy where performance is paramount.
> +The processor will be unwilling to sacrifice any performance
> +for the sake of energy saving. This is the hardware default.
> +.PP
> +.I normal
> +Set a policy with a normal balance between performance and energy efficiency.
> +The processor will tolerate minor performance compromise
> +for potentially significant energy savings.
> +This reasonable default for most desktops and servers.
> +.PP
> +.I powersave
> +Set a policy where the processor can accept
> +a measurable performance hit to maximize energy efficiency.
> +.PP
> +.I n
> +Set MSR_IA32_ENERGY_PERF_BIAS to the specified number.
> +The range of valid numbers is 0-15, where 0 is maximum
> +performance and 15 is maximum energy efficiency.
> +
> +.SH NOTES
> +.B "x86_energy_perf_policy "
> +runs only as root.
> +.SH FILES
> +.ta
> +.nf
> +/dev/cpu/*/msr
> +.fi
> +
> +.SH "SEE ALSO"
> +msr(4)
> +.PP
> +.SH AUTHORS
> +.nf
> +Written by Len Brown<[email protected]>
> diff --git a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
> new file mode 100644
> index 0000000..b539923
> --- /dev/null
> +++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
> @@ -0,0 +1,325 @@
> +/*
> + * x86_energy_perf_policy -- set the energy versus performance
> + * policy preference bias on recent X86 processors.
> + */
> +/*
> + * Copyright (c) 2010, Intel Corporation.
> + * Len Brown<[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc.,
> + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
> + */
> +
> +#include<stdio.h>
> +#include<unistd.h>
> +#include<sys/types.h>
> +#include<sys/stat.h>
> +#include<sys/resource.h>
> +#include<fcntl.h>
> +#include<signal.h>
> +#include<sys/time.h>
> +#include<stdlib.h>
> +#include<string.h>
> +
> +unsigned int verbose; /* set with -v */
> +unsigned int read_only; /* set with -r */
> +char *progname;
> +unsigned long long new_bias;
> +int cpu = -1;
> +
> +/*
> + * Usage:
> + *
> + * -c cpu: limit action to a single CPU (default is all CPUs)
> + * -v: verbose output (can invoke more than once)
> + * -r: read-only, don't change any settings
> + *
> + * performance
> + * Performance is paramount.
> + * Unwilling to sacrafice any performance
> + * for the sake of energy saving. (hardware default)
> + *
> + * normal
> + * Can tolerate minor performance compromise
> + * for potentially significant energy savings.
> + * (reasonable default for most desktops and servers)
> + *
> + * powersave
> + * Can tolerate significant performance hit
> + * to maximize energy savings.
> + *
> + * n
> + * a numerical value to write to the underlying MSR.
> + */
> +void usage(void)
> +{
> + printf("%s: [-c cpu] [-v] "
> + "(-r | 'performance' | 'normal' | 'powersave' | n)\n",
> + progname);
> + exit(1);
> +}
> +
> +#define MSR_IA32_ENERGY_PERF_BIAS 0x000001b0
> +
> +#define BIAS_PERFORMANCE 0
> +#define BIAS_BALANCE 6
> +#define BIAS_POWERSAVE 15
> +
> +void cmdline(int argc, char **argv)
> +{
> + int opt;
> +
> + progname = argv[0];
> +
> + while ((opt = getopt(argc, argv, "+rvc:")) != -1) {
> + switch (opt) {
> + case 'c':
> + cpu = atoi(optarg);
> + break;
> + case 'r':
> + read_only = 1;
> + break;
> + case 'v':
> + verbose++;
> + break;
> + default:
> + usage();
> + }
> + }
> + /* if -r, then should be no additional optind */
> + if (read_only&& (argc> optind))
> + usage();
> +
> + /*
> + * if no -r , then must be one additional optind
> + */
> + if (!read_only) {
> +
> + if (argc != optind + 1) {
> + printf("must supply -r or policy param\n");
> + usage();
> + }
> +
> + if (!strcmp("performance", argv[optind])) {
> + new_bias = BIAS_PERFORMANCE;
> + } else if (!strcmp("normal", argv[optind])) {
> + new_bias = BIAS_BALANCE;
> + } else if (!strcmp("powersave", argv[optind])) {
> + new_bias = BIAS_POWERSAVE;
> + } else {
> + char *endptr;
> +
> + new_bias = strtoull(argv[optind],&endptr, 0);
> + if (endptr == argv[optind] ||
> + new_bias> BIAS_POWERSAVE) {
> + fprintf(stderr, "invalid value: %s\n",
> + argv[optind]);
> + usage();
> + }
> + }
> + }
> +}
> +
> +/*
> + * validate_cpuid()
> + * returns on success, quietly exits on failure (make verbose with -v)
> + */
> +void validate_cpuid(void)
> +{
> + unsigned int eax, ebx, ecx, edx, max_level;
> + char brand[16];
> + unsigned int fms, family, model, stepping;
> +
> + eax = ebx = ecx = edx = 0;
> +
> + asm("cpuid" : "=a" (max_level), "=b" (ebx), "=c" (ecx),
> + "=d" (edx) : "a" (0));
> +
> + if (ebx != 0x756e6547 || edx != 0x49656e69 || ecx != 0x6c65746e) {
> + if (verbose)
> + fprintf(stderr, "%.4s%.4s%.4s != GenuineIntel",
> + (char *)&ebx, (char *)&edx, (char *)&ecx);
> + exit(1);
> + }
> +
> + asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx");
> + family = (fms>> 8)& 0xf;
> + model = (fms>> 4)& 0xf;
> + stepping = fms& 0xf;
> + if (family == 6 || family == 0xf)
> + model += ((fms>> 16)& 0xf)<< 4;
> +
> + if (verbose> 1)
> + printf("CPUID %s %d levels family:model:stepping "
> + "0x%x:%x:%x (%d:%d:%d)\n", brand, max_level,
> + family, model, stepping, family, model, stepping);
> +
> + if (!(edx& (1<< 5))) {
> + if (verbose)
> + printf("CPUID: no MSR\n");
> + exit(1);
> + }
> +
> + /*
> + * Support for MSR_IA32_ENERGY_PERF_BIAS
> + * is indicated by CPUID.06H.ECX.bit3
> + */
> + asm("cpuid" : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx) : "a" (6));
> + if (verbose)
> + printf("CPUID.06H.ECX: 0x%x\n", ecx);
> + if (!(ecx& (1<< 3))) {
> + if (verbose)
> + printf("CPUID: No MSR_IA32_ENERGY_PERF_BIAS\n");
> + exit(1);
> + }
> + return; /* success */
> +}
> +
> +unsigned long long get_msr(int cpu, int offset)
> +{
> + unsigned long long msr;
> + char msr_path[32];
> + int retval;
> + int fd;
> +
> + sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
> + fd = open(msr_path, O_RDONLY);
> + if (fd< 0) {
> + printf("Try \"# modprobe msr\"\n");
> + perror(msr_path);
> + exit(1);
> + }
> +
> + retval = pread(fd,&msr, sizeof msr, offset);
> +
> + if (retval != sizeof msr) {
> + printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
> + exit(-2);
> + }
> + close(fd);
> + return msr;
> +}
> +
> +unsigned long long put_msr(int cpu, unsigned long long new_msr, int offset)
> +{
> + unsigned long long old_msr;
> + char msr_path[32];
> + int retval;
> + int fd;
> +
> + sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
> + fd = open(msr_path, O_RDWR);
> + if (fd< 0) {
> + perror(msr_path);
> + exit(1);
> + }
> +
> + retval = pread(fd,&old_msr, sizeof old_msr, offset);
> + if (retval != sizeof old_msr) {
> + perror("pwrite");
> + printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
> + exit(-2);
> + }
> +
> + retval = pwrite(fd,&new_msr, sizeof new_msr, offset);
> + if (retval != sizeof new_msr) {
> + perror("pwrite");
> + printf("pwrite cpu%d 0x%x = %d\n", cpu, offset, retval);
> + exit(-2);
> + }
> +
> + close(fd);
> +
> + return old_msr;
> +}
> +
> +void print_msr(int cpu)
> +{
> + printf("cpu%d: 0x%016llx\n",
> + cpu, get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS));
> +}
> +
> +void update_msr(int cpu)
> +{
> + unsigned long long previous_msr;
> +
> + previous_msr = put_msr(cpu, new_bias, MSR_IA32_ENERGY_PERF_BIAS);
> +
> + if (verbose)
> + printf("cpu%d msr0x%x 0x%016llx -> 0x%016llx\n",
> + cpu, MSR_IA32_ENERGY_PERF_BIAS, previous_msr, new_bias);
> +
> + return;
> +}
> +
> +char *proc_stat = "/proc/stat";
> +/*
> + * run func() on every cpu in /dev/cpu
> + */
> +void for_every_cpu(void (func)(int))
> +{
> + FILE *fp;
> + int retval;
> +
> + fp = fopen(proc_stat, "r");
> + if (fp == NULL) {
> + perror(proc_stat);
> + exit(1);
> + }
> +
> + retval = fscanf(fp, "cpu %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n");
> + if (retval != 0) {
> + perror("/proc/stat format");
> + exit(1);
> + }
> +
> + while (1) {
> + int cpu;
> +
> + retval = fscanf(fp,
> + "cpu%u %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n",
> + &cpu);
> + if (retval != 1)
> + return;
> +
> + func(cpu);
> + }
> + fclose(fp);
> +}
> +
> +int main(int argc, char **argv)
> +{
> + cmdline(argc, argv);
> +
> + if (verbose> 1)
> + printf("x86_energy_perf_policy Nov 24, 2010"
> + " - Len Brown<[email protected]>\n");
> + if (verbose> 1&& !read_only)
> + printf("new_bias %lld\n", new_bias);
> +
> + validate_cpuid();
> +
> + if (cpu != -1) {
> + if (read_only)
> + print_msr(cpu);
> + else
> + update_msr(cpu);
> + } else {
> + if (read_only)
> + for_every_cpu(print_msr);
> + else
> + for_every_cpu(update_msr);
> + }
> +
> + return 0;
> +}
>
I have 2 questions.

1. the usage looks too simple. If I haven't read the comments
in the source codes, I even can't know the exact meaning of these
parameters. Such as -v, -vv etc. How about adding the comments
as the part of the usage ?

2. the paramter "noraml | performance | powersave | n" looks weird.
why it can't look like other paramter (-r, -v etc.). For example,
I can't use it such as
"./x86_energy_perf_policy -c 0 normal -v"

2010-11-25 08:59:09

by Chen, Gong

[permalink] [raw]
Subject: Re: [PATCH v2] tools: create power/x86/x86_energy_perf_policy

于 11/25/2010 1:52 PM, Chen Gong 写道:
> 于 11/24/2010 1:31 PM, Len Brown 写道:
>> From: Len Brown<[email protected]>
>>
>> MSR_IA32_ENERGY_PERF_BIAS first became available on Westmere Xeon.
>> It is implemented in all Sandy Bridge processors -- mobile, desktop
>> and server.
>> It is expected to become increasingly important in subsequent
>> generations.
>>
>> x86_energy_perf_policy is a user-space utility to set this
>> hardware energy vs performance policy hint in the processor.
>> Most systems would benefit from "x86_energy_perf_policy normal"
>> at system startup, as the hardware default is maximum performance
>> at the expense of energy efficiency.
>>
>> Linux-2.6.36 added "epb" to /proc/cpuinfo to indicate
>> if an x86 processor supports MSR_IA32_ENERGY_PERF_BIAS,
>> though the kernel does not actually program the MSR.
>>
>> In March, Venkatesh Pallipadi proposed a small driver
>> that programmed MSR_IA32_ENERGY_PERF_BIAS, based on
>> the cpufreq governor in use. It also offered
>> a boot-time cmdline option to override.
>> http://lkml.org/lkml/2010/3/4/457
>> But hiding the hardware policy behind the
>> governor choice was deemed "kinda icky".
>>
>> So in June, I proposed a generic user/kernel API to
>> consolidate the power/performance policy trade-off.
>> "RFC: /sys/power/policy_preference"
>> http://lkml.org/lkml/2010/6/16/399
>> That is my preference for implementing this capability,
>> but I received no support on the list.
>>
>> So in September, I sent x86_energy_perf_policy.c to LKML,
>> a user-space utility that scribbles directly to the MSR.
>> http://lkml.org/lkml/2010/9/28/246
>>
>> Here is the same utility re-sent, this time proposed
>> to reside in the kernel tools directory.
>>
>> Signed-off-by: Len Brown<[email protected]>
>> ---
>> v2
>> create man page
>> minor tweaks in response to review comments
>>
>> tools/power/x86/x86_energy_perf_policy/Makefile | 8 +
>> .../x86_energy_perf_policy.8 | 104 +++++++
>> .../x86_energy_perf_policy.c | 325 ++++++++++++++++++++
>>
>> diff --git a/tools/power/x86/x86_energy_perf_policy/Makefile
>> b/tools/power/x86/x86_energy_perf_policy/Makefile
>> new file mode 100644
>> index 0000000..f458237
>> --- /dev/null
>> +++ b/tools/power/x86/x86_energy_perf_policy/Makefile
>> @@ -0,0 +1,8 @@
>> +x86_energy_perf_policy : x86_energy_perf_policy.c
>> +
>> +clean :
>> + rm -f x86_energy_perf_policy
>> +
>> +install :
>> + install x86_energy_perf_policy /usr/bin/
>> + install x86_energy_perf_policy.8 /usr/share/man/man8/
>> diff --git
>> a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
>> b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
>> new file mode 100644
>> index 0000000..8eaaad6
>> --- /dev/null
>> +++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
>> @@ -0,0 +1,104 @@
>> +.\" This page Copyright (C) 2010 Len Brown<[email protected]>
>> +.\" Distributed under the GPL, Copyleft 1994.
>> +.TH X86_ENERGY_PERF_POLICY 8
>> +.SH NAME
>> +x86_energy_perf_policy \- read or write MSR_IA32_ENERGY_PERF_BIAS
>> +.SH SYNOPSIS
>> +.ft B
>> +.B x86_energy_perf_policy
>> +.RB [ "\-c cpu" ]
>> +.RB [ "\-v" ]
>> +.RB "\-r"
>> +.br
>> +.B x86_energy_perf_policy
>> +.RB [ "\-c cpu" ]
>> +.RB [ "\-v" ]
>> +.RB 'performance'
>> +.br
>> +.B x86_energy_perf_policy
>> +.RB [ "\-c cpu" ]
>> +.RB [ "\-v" ]
>> +.RB 'normal'
>> +.br
>> +.B x86_energy_perf_policy
>> +.RB [ "\-c cpu" ]
>> +.RB [ "\-v" ]
>> +.RB 'powersave'
>> +.br
>> +.B x86_energy_perf_policy
>> +.RB [ "\-c cpu" ]
>> +.RB [ "\-v" ]
>> +.RB n
>> +.br
>> +.SH DESCRIPTION
>> +\fBx86_energy_perf_policy\fP
>> +allows software to convey
>> +its policy for the relative importance of performance
>> +versus energy savings to the processor.
>> +
>> +The processor uses this information in model-specific ways
>> +when it must select trade-offs between performance and
>> +energy efficiency.
>> +
>> +This policy hint does not supersede Processor Performance states
>> +(P-states) or CPU Idle power states (C-states), but allows
>> +software to have influence where it would otherwise be unable
>> +to express a preference.
>> +
>> +For example, this setting may tell the hardware how
>> +aggressively or conservatively to control frequency
>> +in the "turbo range" above the explicitly OS-controlled
>> +P-state frequency range. It may also tell the hardware
>> +how aggressively is should enter the OS requested C-states.
>> +
>> +Support for this feature is indicated by CPUID.06H.ECX.bit3
>> +per the Intel Architectures Software Developer's Manual.
>> +
>> +.SS Options
>> +\fB-c\fP limits operation to a single CPU.
>> +The default is to operate on all CPUs.
>> +Note that MSR_IA32_ENERGY_PERF_BIAS is defined per
>> +logical processor, but that the initial implementations
>> +of the MSR were shared among all processors in each package.
>> +.PP
>> +\fB-v\fP increases verbosity. By default
>> +x86_energy_perf_policy is silent.
>> +.PP
>> +\fB-r\fP is for "read-only" mode - the unchanged state
>> +is read and displayed.
>> +.PP
>> +.I performance
>> +Set a policy where performance is paramount.
>> +The processor will be unwilling to sacrifice any performance
>> +for the sake of energy saving. This is the hardware default.
>> +.PP
>> +.I normal
>> +Set a policy with a normal balance between performance and energy
>> efficiency.
>> +The processor will tolerate minor performance compromise
>> +for potentially significant energy savings.
>> +This reasonable default for most desktops and servers.
>> +.PP
>> +.I powersave
>> +Set a policy where the processor can accept
>> +a measurable performance hit to maximize energy efficiency.
>> +.PP
>> +.I n
>> +Set MSR_IA32_ENERGY_PERF_BIAS to the specified number.
>> +The range of valid numbers is 0-15, where 0 is maximum
>> +performance and 15 is maximum energy efficiency.
>> +
>> +.SH NOTES
>> +.B "x86_energy_perf_policy "
>> +runs only as root.
>> +.SH FILES
>> +.ta
>> +.nf
>> +/dev/cpu/*/msr
>> +.fi
>> +
>> +.SH "SEE ALSO"
>> +msr(4)
>> +.PP
>> +.SH AUTHORS
>> +.nf
>> +Written by Len Brown<[email protected]>
>> diff --git
>> a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
>> b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
>> new file mode 100644
>> index 0000000..b539923
>> --- /dev/null
>> +++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
>> @@ -0,0 +1,325 @@
>> +/*
>> + * x86_energy_perf_policy -- set the energy versus performance
>> + * policy preference bias on recent X86 processors.
>> + */
>> +/*
>> + * Copyright (c) 2010, Intel Corporation.
>> + * Len Brown<[email protected]>
>> + *
>> + * This program is free software; you can redistribute it and/or
>> modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but
>> WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
>> License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> along with
>> + * this program; if not, write to the Free Software Foundation, Inc.,
>> + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
>> + */
>> +
>> +#include<stdio.h>
>> +#include<unistd.h>
>> +#include<sys/types.h>
>> +#include<sys/stat.h>
>> +#include<sys/resource.h>
>> +#include<fcntl.h>
>> +#include<signal.h>
>> +#include<sys/time.h>
>> +#include<stdlib.h>
>> +#include<string.h>
>> +
>> +unsigned int verbose; /* set with -v */
>> +unsigned int read_only; /* set with -r */
>> +char *progname;
>> +unsigned long long new_bias;
>> +int cpu = -1;
>> +
>> +/*
>> + * Usage:
>> + *
>> + * -c cpu: limit action to a single CPU (default is all CPUs)
>> + * -v: verbose output (can invoke more than once)
>> + * -r: read-only, don't change any settings
>> + *
>> + * performance
>> + * Performance is paramount.
>> + * Unwilling to sacrafice any performance
>> + * for the sake of energy saving. (hardware default)
>> + *
>> + * normal
>> + * Can tolerate minor performance compromise
>> + * for potentially significant energy savings.
>> + * (reasonable default for most desktops and servers)
>> + *
>> + * powersave
>> + * Can tolerate significant performance hit
>> + * to maximize energy savings.
>> + *
>> + * n
>> + * a numerical value to write to the underlying MSR.
>> + */
>> +void usage(void)
>> +{
>> + printf("%s: [-c cpu] [-v] "
>> + "(-r | 'performance' | 'normal' | 'powersave' | n)\n",
>> + progname);
>> + exit(1);
>> +}
>> +
>> +#define MSR_IA32_ENERGY_PERF_BIAS 0x000001b0
>> +
>> +#define BIAS_PERFORMANCE 0
>> +#define BIAS_BALANCE 6
>> +#define BIAS_POWERSAVE 15
>> +
>> +void cmdline(int argc, char **argv)
>> +{
>> + int opt;
>> +
>> + progname = argv[0];
>> +
>> + while ((opt = getopt(argc, argv, "+rvc:")) != -1) {
>> + switch (opt) {
>> + case 'c':
>> + cpu = atoi(optarg);
>> + break;
>> + case 'r':
>> + read_only = 1;
>> + break;
>> + case 'v':
>> + verbose++;
>> + break;
>> + default:
>> + usage();
>> + }
>> + }
>> + /* if -r, then should be no additional optind */
>> + if (read_only&& (argc> optind))
>> + usage();
>> +
>> + /*
>> + * if no -r , then must be one additional optind
>> + */
>> + if (!read_only) {
>> +
>> + if (argc != optind + 1) {
>> + printf("must supply -r or policy param\n");
>> + usage();
>> + }
>> +
>> + if (!strcmp("performance", argv[optind])) {
>> + new_bias = BIAS_PERFORMANCE;
>> + } else if (!strcmp("normal", argv[optind])) {
>> + new_bias = BIAS_BALANCE;
>> + } else if (!strcmp("powersave", argv[optind])) {
>> + new_bias = BIAS_POWERSAVE;
>> + } else {
>> + char *endptr;
>> +
>> + new_bias = strtoull(argv[optind],&endptr, 0);
>> + if (endptr == argv[optind] ||
>> + new_bias> BIAS_POWERSAVE) {
>> + fprintf(stderr, "invalid value: %s\n",
>> + argv[optind]);
>> + usage();
>> + }
>> + }
>> + }
>> +}
>> +
>> +/*
>> + * validate_cpuid()
>> + * returns on success, quietly exits on failure (make verbose with -v)
>> + */
>> +void validate_cpuid(void)
>> +{
>> + unsigned int eax, ebx, ecx, edx, max_level;
>> + char brand[16];
>> + unsigned int fms, family, model, stepping;
>> +
>> + eax = ebx = ecx = edx = 0;
>> +
>> + asm("cpuid" : "=a" (max_level), "=b" (ebx), "=c" (ecx),
>> + "=d" (edx) : "a" (0));
>> +
>> + if (ebx != 0x756e6547 || edx != 0x49656e69 || ecx != 0x6c65746e) {
>> + if (verbose)
>> + fprintf(stderr, "%.4s%.4s%.4s != GenuineIntel",
>> + (char *)&ebx, (char *)&edx, (char *)&ecx);
>> + exit(1);
>> + }
>> +
>> + asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx");
>> + family = (fms>> 8)& 0xf;
>> + model = (fms>> 4)& 0xf;
>> + stepping = fms& 0xf;
>> + if (family == 6 || family == 0xf)
>> + model += ((fms>> 16)& 0xf)<< 4;
>> +
>> + if (verbose> 1)
>> + printf("CPUID %s %d levels family:model:stepping "
>> + "0x%x:%x:%x (%d:%d:%d)\n", brand, max_level,
>> + family, model, stepping, family, model, stepping);
>> +
>> + if (!(edx& (1<< 5))) {
>> + if (verbose)
>> + printf("CPUID: no MSR\n");
>> + exit(1);
>> + }
>> +
>> + /*
>> + * Support for MSR_IA32_ENERGY_PERF_BIAS
>> + * is indicated by CPUID.06H.ECX.bit3
>> + */
>> + asm("cpuid" : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx) : "a"
>> (6));
>> + if (verbose)
>> + printf("CPUID.06H.ECX: 0x%x\n", ecx);
>> + if (!(ecx& (1<< 3))) {
>> + if (verbose)
>> + printf("CPUID: No MSR_IA32_ENERGY_PERF_BIAS\n");
>> + exit(1);
>> + }
>> + return; /* success */
>> +}
>> +
>> +unsigned long long get_msr(int cpu, int offset)
>> +{
>> + unsigned long long msr;
>> + char msr_path[32];
>> + int retval;
>> + int fd;
>> +
>> + sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
>> + fd = open(msr_path, O_RDONLY);
>> + if (fd< 0) {
>> + printf("Try \"# modprobe msr\"\n");
>> + perror(msr_path);
>> + exit(1);
>> + }
>> +
>> + retval = pread(fd,&msr, sizeof msr, offset);
>> +
>> + if (retval != sizeof msr) {
>> + printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
>> + exit(-2);
>> + }
>> + close(fd);
>> + return msr;
>> +}
>> +
>> +unsigned long long put_msr(int cpu, unsigned long long new_msr, int
>> offset)
>> +{
>> + unsigned long long old_msr;
>> + char msr_path[32];
>> + int retval;
>> + int fd;
>> +
>> + sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
>> + fd = open(msr_path, O_RDWR);
>> + if (fd< 0) {
>> + perror(msr_path);
>> + exit(1);
>> + }
>> +
>> + retval = pread(fd,&old_msr, sizeof old_msr, offset);
>> + if (retval != sizeof old_msr) {
>> + perror("pwrite");
>> + printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
>> + exit(-2);
>> + }
>> +
>> + retval = pwrite(fd,&new_msr, sizeof new_msr, offset);
>> + if (retval != sizeof new_msr) {
>> + perror("pwrite");
>> + printf("pwrite cpu%d 0x%x = %d\n", cpu, offset, retval);
>> + exit(-2);
>> + }
>> +
>> + close(fd);
>> +
>> + return old_msr;
>> +}
>> +
>> +void print_msr(int cpu)
>> +{
>> + printf("cpu%d: 0x%016llx\n",
>> + cpu, get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS));
>> +}
>> +
>> +void update_msr(int cpu)
>> +{
>> + unsigned long long previous_msr;
>> +
>> + previous_msr = put_msr(cpu, new_bias, MSR_IA32_ENERGY_PERF_BIAS);
>> +
>> + if (verbose)
>> + printf("cpu%d msr0x%x 0x%016llx -> 0x%016llx\n",
>> + cpu, MSR_IA32_ENERGY_PERF_BIAS, previous_msr, new_bias);
>> +
>> + return;
>> +}
>> +
>> +char *proc_stat = "/proc/stat";
>> +/*
>> + * run func() on every cpu in /dev/cpu
>> + */
>> +void for_every_cpu(void (func)(int))
>> +{
>> + FILE *fp;
>> + int retval;
>> +
>> + fp = fopen(proc_stat, "r");
>> + if (fp == NULL) {
>> + perror(proc_stat);
>> + exit(1);
>> + }
>> +
>> + retval = fscanf(fp, "cpu %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n");
>> + if (retval != 0) {
>> + perror("/proc/stat format");
>> + exit(1);
>> + }
>> +
>> + while (1) {
>> + int cpu;
>> +
>> + retval = fscanf(fp,
>> + "cpu%u %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n",
>> + &cpu);
>> + if (retval != 1)
>> + return;
>> +
>> + func(cpu);
>> + }
>> + fclose(fp);
>> +}
>> +
>> +int main(int argc, char **argv)
>> +{
>> + cmdline(argc, argv);
>> +
>> + if (verbose> 1)
>> + printf("x86_energy_perf_policy Nov 24, 2010"
>> + " - Len Brown<[email protected]>\n");
>> + if (verbose> 1&& !read_only)
>> + printf("new_bias %lld\n", new_bias);
>> +
>> + validate_cpuid();
>> +
>> + if (cpu != -1) {
>> + if (read_only)
>> + print_msr(cpu);
>> + else
>> + update_msr(cpu);
>> + } else {
>> + if (read_only)
>> + for_every_cpu(print_msr);
>> + else
>> + for_every_cpu(update_msr);
>> + }
>> +
>> + return 0;
>> +}
>>
> I have 2 questions.
>
> 1. the usage looks too simple. If I haven't read the comments
> in the source codes, I even can't know the exact meaning of these
> parameters. Such as -v, -vv etc. How about adding the comments
> as the part of the usage ?
>
> 2. the paramter "noraml | performance | powersave | n" looks weird.
> why it can't look like other paramter (-r, -v etc.). For example,
> I can't use it such as
> "./x86_energy_perf_policy -c 0 normal -v"
> --

One more question. From the spec, it should write 1 to the MSR 0x1FC[18]
to enable this function after setting the Energy Policy on all threads
in one package.