2020-03-31 18:46:15

by Chen Yu

[permalink] [raw]
Subject: [PATCH 0/2][RFC] Add long time sampling time support

Since the RAPL Joule Counter is 32 bit, turbostat would
only print a *star* to indicate the overflow due to long
duration.

Print the actual energy consumed for long sampling time.

Chen Yu (2):
tools/power turbostat: Make the energy variable to be 64 bit
tools/power turbostat: Introduce reliable RAPL display

tools/power/x86/turbostat/Makefile | 2 +-
tools/power/x86/turbostat/turbostat.c | 322 +++++++++++++++++++++++---
2 files changed, 287 insertions(+), 37 deletions(-)

--
2.17.1


2020-03-31 18:46:26

by Chen Yu

[permalink] [raw]
Subject: [PATCH 1/2][RFC] tools/power turbostat: Make the energy variable to be 64 bit

Change the energy variable from 32bit to 64bit, so that it
can records long time duration.

Signed-off-by: Chen Yu <[email protected]>
---
tools/power/x86/turbostat/turbostat.c | 30 ++++++++++++---------------
1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c
index 33b370865d16..95f3047e94ae 100644
--- a/tools/power/x86/turbostat/turbostat.c
+++ b/tools/power/x86/turbostat/turbostat.c
@@ -211,12 +211,12 @@ struct pkg_data {
long long gfx_rc6_ms;
unsigned int gfx_mhz;
unsigned int package_id;
- unsigned int energy_pkg; /* MSR_PKG_ENERGY_STATUS */
- unsigned int energy_dram; /* MSR_DRAM_ENERGY_STATUS */
- unsigned int energy_cores; /* MSR_PP0_ENERGY_STATUS */
- unsigned int energy_gfx; /* MSR_PP1_ENERGY_STATUS */
- unsigned int rapl_pkg_perf_status; /* MSR_PKG_PERF_STATUS */
- unsigned int rapl_dram_perf_status; /* MSR_DRAM_PERF_STATUS */
+ unsigned long long energy_pkg; /* MSR_PKG_ENERGY_STATUS */
+ unsigned long long energy_dram; /* MSR_DRAM_ENERGY_STATUS */
+ unsigned long long energy_cores; /* MSR_PP0_ENERGY_STATUS */
+ unsigned long long energy_gfx; /* MSR_PP1_ENERGY_STATUS */
+ unsigned long long rapl_pkg_perf_status; /* MSR_PKG_PERF_STATUS */
+ unsigned long long rapl_dram_perf_status; /* MSR_DRAM_PERF_STATUS */
unsigned int pkg_temp_c;
unsigned long long counter[MAX_ADDED_COUNTERS];
} *package_even, *package_odd;
@@ -858,13 +858,13 @@ int dump_counters(struct thread_data *t, struct core_data *c,
outp += sprintf(outp, "pc10: %016llX\n", p->pc10);
outp += sprintf(outp, "cpu_lpi: %016llX\n", p->cpu_lpi);
outp += sprintf(outp, "sys_lpi: %016llX\n", p->sys_lpi);
- outp += sprintf(outp, "Joules PKG: %0X\n", p->energy_pkg);
- outp += sprintf(outp, "Joules COR: %0X\n", p->energy_cores);
- outp += sprintf(outp, "Joules GFX: %0X\n", p->energy_gfx);
- outp += sprintf(outp, "Joules RAM: %0X\n", p->energy_dram);
- outp += sprintf(outp, "Throttle PKG: %0X\n",
+ outp += sprintf(outp, "Joules PKG: %0llX\n", p->energy_pkg);
+ outp += sprintf(outp, "Joules COR: %0llX\n", p->energy_cores);
+ outp += sprintf(outp, "Joules GFX: %0llX\n", p->energy_gfx);
+ outp += sprintf(outp, "Joules RAM: %0llX\n", p->energy_dram);
+ outp += sprintf(outp, "Throttle PKG: %0llX\n",
p->rapl_pkg_perf_status);
- outp += sprintf(outp, "Throttle RAM: %0X\n",
+ outp += sprintf(outp, "Throttle RAM: %0llX\n",
p->rapl_dram_perf_status);
outp += sprintf(outp, "PTM: %dC\n", p->pkg_temp_c);

@@ -1210,11 +1210,7 @@ void format_all_counters(struct thread_data *t, struct core_data *c, struct pkg_
}

#define DELTA_WRAP32(new, old) \
- if (new > old) { \
- old = new - old; \
- } else { \
- old = 0x100000000 + new - old; \
- }
+ old = ((((unsigned long long)new << 32) - ((unsigned long long)old << 32)) >> 32);

int
delta_package(struct pkg_data *new, struct pkg_data *old)
--
2.17.1

2020-03-31 18:47:42

by Chen Yu

[permalink] [raw]
Subject: [PATCH 2/2][RFC] tools/power turbostat: Introduce reliable RAPL display

Since the RAPL Joule Counter is 32 bit, turbostat would
only print a *star* instead of printing the actual energy
consumed to indicate the overflow due to long duration.
This does not meet the requirement on servers as the
sampling time of turbostat is usually very long on servers.

So maintain a set of MSR buffer, and updates them
periodically before the 32bit msr register wrapped round.

The idea is similar to the implementation of ktime_get():
get_msr_sum() is used rather than get_msr() to get the
accumulated MSR.

This can be illustrated below:

MSR timer:
total_rapl_msr += (current_rapl_msr - last_rapl_msr);

get_msr_sum():
return (current_rapl_msr - last_rapl_msr) + total_rapl_msr;

Originally-by: Aaron Lu <[email protected]>
Signed-off-by: Chen Yu <[email protected]>
---
tools/power/x86/turbostat/Makefile | 2 +-
tools/power/x86/turbostat/turbostat.c | 292 ++++++++++++++++++++++++--
2 files changed, 274 insertions(+), 20 deletions(-)

diff --git a/tools/power/x86/turbostat/Makefile b/tools/power/x86/turbostat/Makefile
index 2b6551269e43..d08765531bcb 100644
--- a/tools/power/x86/turbostat/Makefile
+++ b/tools/power/x86/turbostat/Makefile
@@ -16,7 +16,7 @@ override CFLAGS += -D_FORTIFY_SOURCE=2

%: %.c
@mkdir -p $(BUILD_OUTPUT)
- $(CC) $(CFLAGS) $< -o $(BUILD_OUTPUT)/$@ $(LDFLAGS) -lcap
+ $(CC) $(CFLAGS) $< -o $(BUILD_OUTPUT)/$@ $(LDFLAGS) -lcap -lrt

.PHONY : clean
clean :
diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c
index 95f3047e94ae..a8979bec97e4 100644
--- a/tools/power/x86/turbostat/turbostat.c
+++ b/tools/power/x86/turbostat/turbostat.c
@@ -47,6 +47,7 @@ unsigned int sums_need_wide_columns;
unsigned int rapl_joules;
unsigned int summary_only;
unsigned int list_header_only;
+unsigned int longtime;
unsigned int dump_only;
unsigned int do_snb_cstates;
unsigned int do_knl_cstates;
@@ -259,6 +260,113 @@ struct msr_counter {
#define SYSFS_PERCPU (1 << 1)
};

+/*
+ * The accumulated sum of MSR is defined as a monotonic
+ * increasing MSR, it will be accumulated periodically,
+ * despite its register's bit width.
+ */
+enum {
+ IDX_PKG_ENERGY,
+ IDX_DRAM_ENERGY,
+ IDX_PP0_ENERGY,
+ IDX_PP1_ENERGY,
+ IDX_PKG_PERF,
+ IDX_DRAM_PERF,
+ IDX_COUNT,
+};
+
+int get_msr_sum(int cpu, off_t offset, unsigned long long *msr);
+
+struct msr_sum_array {
+ /* get_msr_sum() = sum + (get_msr() - last) */
+ struct {
+ /*The accumulated MSR value is updated by the timer*/
+ unsigned long long sum;
+ /*The MSR footprint recorded in last timer*/
+ unsigned long long last;
+ } entries[IDX_COUNT];
+};
+
+/* The percpu MSR sum array.*/
+struct msr_sum_array *per_cpu_msr_sum;
+
+int idx_to_offset(int idx)
+{
+ int offset;
+
+ switch (idx) {
+ case IDX_PKG_ENERGY:
+ offset = MSR_PKG_ENERGY_STATUS;
+ break;
+ case IDX_DRAM_ENERGY:
+ offset = MSR_DRAM_ENERGY_STATUS;
+ break;
+ case IDX_PP0_ENERGY:
+ offset = MSR_PP0_ENERGY_STATUS;
+ break;
+ case IDX_PP1_ENERGY:
+ offset = MSR_PP1_ENERGY_STATUS;
+ break;
+ case IDX_PKG_PERF:
+ offset = MSR_PKG_PERF_STATUS;
+ break;
+ case IDX_DRAM_PERF:
+ offset = MSR_DRAM_PERF_STATUS;
+ break;
+ default:
+ offset = -1;
+ }
+ return offset;
+}
+
+int offset_to_idx(int offset)
+{
+ int idx;
+
+ switch (offset) {
+ case MSR_PKG_ENERGY_STATUS:
+ idx = IDX_PKG_ENERGY;
+ break;
+ case MSR_DRAM_ENERGY_STATUS:
+ idx = IDX_DRAM_ENERGY;
+ break;
+ case MSR_PP0_ENERGY_STATUS:
+ idx = IDX_PP0_ENERGY;
+ break;
+ case MSR_PP1_ENERGY_STATUS:
+ idx = IDX_PP1_ENERGY;
+ break;
+ case MSR_PKG_PERF_STATUS:
+ idx = IDX_PKG_PERF;
+ break;
+ case MSR_DRAM_PERF_STATUS:
+ idx = IDX_DRAM_PERF;
+ break;
+ default:
+ idx = -1;
+ }
+ return idx;
+}
+
+int idx_valid(int idx)
+{
+ switch (idx) {
+ case IDX_PKG_ENERGY:
+ return do_rapl & RAPL_PKG;
+ case IDX_DRAM_ENERGY:
+ return do_rapl & RAPL_DRAM;
+ case IDX_PP0_ENERGY:
+ return do_rapl & RAPL_CORES_ENERGY_STATUS;
+ case IDX_PP1_ENERGY:
+ return do_rapl & RAPL_GFX;
+ case IDX_PKG_PERF:
+ return do_rapl & RAPL_PKG_PERF_STATUS;
+ case IDX_DRAM_PERF:
+ return do_rapl & RAPL_DRAM_PERF_STATUS;
+ default:
+ return 0;
+ }
+}
struct sys_counters {
unsigned int added_thread_counters;
unsigned int added_core_counters;
@@ -551,6 +659,7 @@ void help(void)
" Override default 5-second measurement interval\n"
" -J, --Joules displays energy in Joules instead of Watts\n"
" -l, --list list column headers only\n"
+ " -L, --Longtime long time duration support\n"
" -n, --num_iterations num\n"
" number of the measurement iterations\n"
" -o, --out file\n"
@@ -1962,34 +2071,70 @@ int get_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p)
p->sys_lpi = cpuidle_cur_sys_lpi_us;

if (do_rapl & RAPL_PKG) {
- if (get_msr(cpu, MSR_PKG_ENERGY_STATUS, &msr))
- return -13;
- p->energy_pkg = msr & 0xFFFFFFFF;
+ if (longtime) {
+ if (get_msr_sum(cpu, MSR_PKG_ENERGY_STATUS, &msr))
+ return -13;
+ p->energy_pkg = msr;
+ } else {
+ if (get_msr(cpu, MSR_PKG_ENERGY_STATUS, &msr))
+ return -13;
+ p->energy_pkg = msr & 0xFFFFFFFF;
+ }
}
if (do_rapl & RAPL_CORES_ENERGY_STATUS) {
- if (get_msr(cpu, MSR_PP0_ENERGY_STATUS, &msr))
- return -14;
- p->energy_cores = msr & 0xFFFFFFFF;
+ if (longtime) {
+ if (get_msr_sum(cpu, MSR_PP0_ENERGY_STATUS, &msr))
+ return -14;
+ p->energy_cores = msr;
+ } else {
+ if (get_msr(cpu, MSR_PP0_ENERGY_STATUS, &msr))
+ return -14;
+ p->energy_cores = msr & 0xFFFFFFFF;
+ }
}
if (do_rapl & RAPL_DRAM) {
- if (get_msr(cpu, MSR_DRAM_ENERGY_STATUS, &msr))
- return -15;
- p->energy_dram = msr & 0xFFFFFFFF;
+ if (longtime) {
+ if (get_msr_sum(cpu, MSR_DRAM_ENERGY_STATUS, &msr))
+ return -15;
+ p->energy_dram = msr;
+ } else {
+ if (get_msr(cpu, MSR_DRAM_ENERGY_STATUS, &msr))
+ return -15;
+ p->energy_dram = msr & 0xFFFFFFFF;
+ }
}
if (do_rapl & RAPL_GFX) {
- if (get_msr(cpu, MSR_PP1_ENERGY_STATUS, &msr))
- return -16;
- p->energy_gfx = msr & 0xFFFFFFFF;
+ if (longtime) {
+ if (get_msr_sum(cpu, MSR_PP1_ENERGY_STATUS, &msr))
+ return -16;
+ p->energy_gfx = msr;
+ } else {
+ if (get_msr(cpu, MSR_PP1_ENERGY_STATUS, &msr))
+ return -16;
+ p->energy_gfx = msr & 0xFFFFFFFF;
+ }
}
if (do_rapl & RAPL_PKG_PERF_STATUS) {
- if (get_msr(cpu, MSR_PKG_PERF_STATUS, &msr))
- return -16;
- p->rapl_pkg_perf_status = msr & 0xFFFFFFFF;
+ if (longtime) {
+ if (get_msr_sum(cpu, MSR_DRAM_PERF_STATUS, &msr))
+ return -16;
+ p->rapl_dram_perf_status = msr;
+ } else {
+ if (get_msr(cpu, MSR_PKG_PERF_STATUS, &msr))
+ return -16;
+ p->rapl_pkg_perf_status = msr & 0xFFFFFFFF;
+ }
}
if (do_rapl & RAPL_DRAM_PERF_STATUS) {
- if (get_msr(cpu, MSR_DRAM_PERF_STATUS, &msr))
- return -16;
- p->rapl_dram_perf_status = msr & 0xFFFFFFFF;
+ if (longtime) {
+ if (get_msr_sum(cpu, MSR_DRAM_PERF_STATUS, &msr))
+ return -16;
+ p->rapl_dram_perf_status = msr;
+ } else {
+ if (get_msr(cpu, MSR_DRAM_PERF_STATUS, &msr))
+ return -16;
+ p->rapl_dram_perf_status = msr & 0xFFFFFFFF;
+ }
}
if (do_rapl & RAPL_AMD_F17H) {
if (get_msr(cpu, MSR_PKG_ENERGY_STAT, &msr))
@@ -3053,6 +3198,109 @@ void do_sleep(void)
}
}

+int get_msr_sum(int cpu, off_t offset, unsigned long long *msr)
+{
+ int ret, idx;
+ unsigned long long msr_cur, msr_last;
+
+ if (!per_cpu_msr_sum)
+ return 1;
+
+ idx = offset_to_idx(offset);
+ if (idx < 0)
+ return idx;
+ /* get_msr_sum() = sum + (get_msr() - last) */
+ ret = get_msr(cpu, offset, &msr_cur);
+ if (ret)
+ return ret;
+ msr_last = per_cpu_msr_sum[cpu].entries[idx].last;
+ DELTA_WRAP32(msr_cur, msr_last);
+ *msr = msr_last + per_cpu_msr_sum[cpu].entries[idx].sum;
+
+ return 0;
+}
+
+timer_t timerid;
+
+/* Timer callback, update the sum of MSRs periodically. */
+static int update_msr_sum(struct thread_data *t, struct core_data *c, struct pkg_data *p)
+{
+ int i, ret;
+ int cpu = t->cpu_id;
+
+ for (i = IDX_PKG_ENERGY; i < IDX_COUNT; i++) {
+ unsigned long long msr_cur, msr_last;
+ int offset;
+
+ if (!idx_valid(i))
+ continue;
+ offset = idx_to_offset(i);
+ if (offset < 0)
+ continue;
+ ret = get_msr(cpu, offset, &msr_cur);
+ if (ret) {
+ fprintf(outf, "Can not update msr(0x%x)\n", offset);
+ continue;
+ }
+
+ msr_last = per_cpu_msr_sum[cpu].entries[i].last;
+ per_cpu_msr_sum[cpu].entries[i].last = msr_cur & 0xffffffff;
+
+ DELTA_WRAP32(msr_cur, msr_last);
+ per_cpu_msr_sum[cpu].entries[i].sum += msr_last;
+ }
+ return 0;
+}
+
+static void
+msr_record_handler(union sigval v)
+{
+ for_all_cpus(update_msr_sum, EVEN_COUNTERS);
+}
+
+void msr_longtime_record(void)
+{
+ struct itimerspec its;
+ struct sigevent sev;
+
+ per_cpu_msr_sum = calloc(topo.max_cpu_num + 1, sizeof(struct msr_sum_array));
+ if (!per_cpu_msr_sum) {
+ fprintf(outf, "Can not allocate memory for long time MSR.\n");
+ return;
+ }
+ /*
+ * Signal handler might be restricted, so use thread notifier instead.
+ */
+ memset(&sev, 0, sizeof(struct sigevent));
+ sev.sigev_notify = SIGEV_THREAD;
+ sev.sigev_notify_function = msr_record_handler;
+
+ sev.sigev_value.sival_ptr = &timerid;
+ if (timer_create(CLOCK_REALTIME, &sev, &timerid) == -1) {
+ fprintf(outf, "Can not create timer.\n");
+ goto release_msr;
+ }
+
+ its.it_value.tv_sec = 0;
+ its.it_value.tv_nsec = 1;
+ /*
+ * A wraparound time of around 60 secs when power consumption
+ * is high, use 50 secs.
+ */
+ its.it_interval.tv_sec = 50;
+ its.it_interval.tv_nsec = 0;
+
+ if (timer_settime(timerid, 0, &its, NULL) == -1) {
+ fprintf(outf, "Can not set timer.\n");
+ goto release_timer;
+ }
+ return;
+
+ release_timer:
+ timer_delete(timerid);
+ release_msr:
+ free(per_cpu_msr_sum);
+}

void turbostat_loop()
{
@@ -5735,6 +5983,7 @@ void cmdline(int argc, char **argv)
{"hide", required_argument, 0, 'H'}, // meh, -h taken by --help
{"Joules", no_argument, 0, 'J'},
{"list", no_argument, 0, 'l'},
+ {"Longtime", no_argument, 0, 'L'},
{"out", required_argument, 0, 'o'},
{"quiet", no_argument, 0, 'q'},
{"show", required_argument, 0, 's'},
@@ -5746,7 +5995,7 @@ void cmdline(int argc, char **argv)

progname = argv[0];

- while ((opt = getopt_long_only(argc, argv, "+C:c:Dde:hi:Jn:o:qST:v",
+ while ((opt = getopt_long_only(argc, argv, "+C:c:Dde:hi:JLn:o:qST:v",
long_options, &option_index)) != -1) {
switch (opt) {
case 'a':
@@ -5800,6 +6049,9 @@ void cmdline(int argc, char **argv)
list_header_only++;
quiet++;
break;
+ case 'L':
+ longtime = 1;
+ break;
case 'o':
outf = fopen_or_die(optarg, "w");
break;
@@ -5864,6 +6116,8 @@ int main(int argc, char **argv)
return 0;
}

+ if (longtime)
+ msr_longtime_record();
/*
* if any params left, it must be a command to fork
*/
--
2.17.1

2020-08-13 21:52:21

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH 2/2][RFC] tools/power turbostat: Introduce reliable RAPL display

why not simply use nanosleep(2)


On Tue, Mar 31, 2020 at 2:45 PM Chen Yu <[email protected]> wrote:
>
> Since the RAPL Joule Counter is 32 bit, turbostat would
> only print a *star* instead of printing the actual energy
> consumed to indicate the overflow due to long duration.
> This does not meet the requirement on servers as the
> sampling time of turbostat is usually very long on servers.
>
> So maintain a set of MSR buffer, and updates them
> periodically before the 32bit msr register wrapped round.
>
> The idea is similar to the implementation of ktime_get():
> get_msr_sum() is used rather than get_msr() to get the
> accumulated MSR.
>
> This can be illustrated below:
>
> MSR timer:
> total_rapl_msr += (current_rapl_msr - last_rapl_msr);
>
> get_msr_sum():
> return (current_rapl_msr - last_rapl_msr) + total_rapl_msr;
>
> Originally-by: Aaron Lu <[email protected]>
> Signed-off-by: Chen Yu <[email protected]>
> ---
> tools/power/x86/turbostat/Makefile | 2 +-
> tools/power/x86/turbostat/turbostat.c | 292 ++++++++++++++++++++++++--
> 2 files changed, 274 insertions(+), 20 deletions(-)
>
> diff --git a/tools/power/x86/turbostat/Makefile b/tools/power/x86/turbostat/Makefile
> index 2b6551269e43..d08765531bcb 100644
> --- a/tools/power/x86/turbostat/Makefile
> +++ b/tools/power/x86/turbostat/Makefile
> @@ -16,7 +16,7 @@ override CFLAGS += -D_FORTIFY_SOURCE=2
>
> %: %.c
> @mkdir -p $(BUILD_OUTPUT)
> - $(CC) $(CFLAGS) $< -o $(BUILD_OUTPUT)/$@ $(LDFLAGS) -lcap
> + $(CC) $(CFLAGS) $< -o $(BUILD_OUTPUT)/$@ $(LDFLAGS) -lcap -lrt
>
> .PHONY : clean
> clean :
> diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c
> index 95f3047e94ae..a8979bec97e4 100644
> --- a/tools/power/x86/turbostat/turbostat.c
> +++ b/tools/power/x86/turbostat/turbostat.c
> @@ -47,6 +47,7 @@ unsigned int sums_need_wide_columns;
> unsigned int rapl_joules;
> unsigned int summary_only;
> unsigned int list_header_only;
> +unsigned int longtime;
> unsigned int dump_only;
> unsigned int do_snb_cstates;
> unsigned int do_knl_cstates;
> @@ -259,6 +260,113 @@ struct msr_counter {
> #define SYSFS_PERCPU (1 << 1)
> };
>
> +/*
> + * The accumulated sum of MSR is defined as a monotonic
> + * increasing MSR, it will be accumulated periodically,
> + * despite its register's bit width.
> + */
> +enum {
> + IDX_PKG_ENERGY,
> + IDX_DRAM_ENERGY,
> + IDX_PP0_ENERGY,
> + IDX_PP1_ENERGY,
> + IDX_PKG_PERF,
> + IDX_DRAM_PERF,
> + IDX_COUNT,
> +};
> +
> +int get_msr_sum(int cpu, off_t offset, unsigned long long *msr);
> +
> +struct msr_sum_array {
> + /* get_msr_sum() = sum + (get_msr() - last) */
> + struct {
> + /*The accumulated MSR value is updated by the timer*/
> + unsigned long long sum;
> + /*The MSR footprint recorded in last timer*/
> + unsigned long long last;
> + } entries[IDX_COUNT];
> +};
> +
> +/* The percpu MSR sum array.*/
> +struct msr_sum_array *per_cpu_msr_sum;
> +
> +int idx_to_offset(int idx)
> +{
> + int offset;
> +
> + switch (idx) {
> + case IDX_PKG_ENERGY:
> + offset = MSR_PKG_ENERGY_STATUS;
> + break;
> + case IDX_DRAM_ENERGY:
> + offset = MSR_DRAM_ENERGY_STATUS;
> + break;
> + case IDX_PP0_ENERGY:
> + offset = MSR_PP0_ENERGY_STATUS;
> + break;
> + case IDX_PP1_ENERGY:
> + offset = MSR_PP1_ENERGY_STATUS;
> + break;
> + case IDX_PKG_PERF:
> + offset = MSR_PKG_PERF_STATUS;
> + break;
> + case IDX_DRAM_PERF:
> + offset = MSR_DRAM_PERF_STATUS;
> + break;
> + default:
> + offset = -1;
> + }
> + return offset;
> +}
> +
> +int offset_to_idx(int offset)
> +{
> + int idx;
> +
> + switch (offset) {
> + case MSR_PKG_ENERGY_STATUS:
> + idx = IDX_PKG_ENERGY;
> + break;
> + case MSR_DRAM_ENERGY_STATUS:
> + idx = IDX_DRAM_ENERGY;
> + break;
> + case MSR_PP0_ENERGY_STATUS:
> + idx = IDX_PP0_ENERGY;
> + break;
> + case MSR_PP1_ENERGY_STATUS:
> + idx = IDX_PP1_ENERGY;
> + break;
> + case MSR_PKG_PERF_STATUS:
> + idx = IDX_PKG_PERF;
> + break;
> + case MSR_DRAM_PERF_STATUS:
> + idx = IDX_DRAM_PERF;
> + break;
> + default:
> + idx = -1;
> + }
> + return idx;
> +}
> +
> +int idx_valid(int idx)
> +{
> + switch (idx) {
> + case IDX_PKG_ENERGY:
> + return do_rapl & RAPL_PKG;
> + case IDX_DRAM_ENERGY:
> + return do_rapl & RAPL_DRAM;
> + case IDX_PP0_ENERGY:
> + return do_rapl & RAPL_CORES_ENERGY_STATUS;
> + case IDX_PP1_ENERGY:
> + return do_rapl & RAPL_GFX;
> + case IDX_PKG_PERF:
> + return do_rapl & RAPL_PKG_PERF_STATUS;
> + case IDX_DRAM_PERF:
> + return do_rapl & RAPL_DRAM_PERF_STATUS;
> + default:
> + return 0;
> + }
> +}
> struct sys_counters {
> unsigned int added_thread_counters;
> unsigned int added_core_counters;
> @@ -551,6 +659,7 @@ void help(void)
> " Override default 5-second measurement interval\n"
> " -J, --Joules displays energy in Joules instead of Watts\n"
> " -l, --list list column headers only\n"
> + " -L, --Longtime long time duration support\n"
> " -n, --num_iterations num\n"
> " number of the measurement iterations\n"
> " -o, --out file\n"
> @@ -1962,34 +2071,70 @@ int get_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p)
> p->sys_lpi = cpuidle_cur_sys_lpi_us;
>
> if (do_rapl & RAPL_PKG) {
> - if (get_msr(cpu, MSR_PKG_ENERGY_STATUS, &msr))
> - return -13;
> - p->energy_pkg = msr & 0xFFFFFFFF;
> + if (longtime) {
> + if (get_msr_sum(cpu, MSR_PKG_ENERGY_STATUS, &msr))
> + return -13;
> + p->energy_pkg = msr;
> + } else {
> + if (get_msr(cpu, MSR_PKG_ENERGY_STATUS, &msr))
> + return -13;
> + p->energy_pkg = msr & 0xFFFFFFFF;
> + }
> }
> if (do_rapl & RAPL_CORES_ENERGY_STATUS) {
> - if (get_msr(cpu, MSR_PP0_ENERGY_STATUS, &msr))
> - return -14;
> - p->energy_cores = msr & 0xFFFFFFFF;
> + if (longtime) {
> + if (get_msr_sum(cpu, MSR_PP0_ENERGY_STATUS, &msr))
> + return -14;
> + p->energy_cores = msr;
> + } else {
> + if (get_msr(cpu, MSR_PP0_ENERGY_STATUS, &msr))
> + return -14;
> + p->energy_cores = msr & 0xFFFFFFFF;
> + }
> }
> if (do_rapl & RAPL_DRAM) {
> - if (get_msr(cpu, MSR_DRAM_ENERGY_STATUS, &msr))
> - return -15;
> - p->energy_dram = msr & 0xFFFFFFFF;
> + if (longtime) {
> + if (get_msr_sum(cpu, MSR_DRAM_ENERGY_STATUS, &msr))
> + return -15;
> + p->energy_dram = msr;
> + } else {
> + if (get_msr(cpu, MSR_DRAM_ENERGY_STATUS, &msr))
> + return -15;
> + p->energy_dram = msr & 0xFFFFFFFF;
> + }
> }
> if (do_rapl & RAPL_GFX) {
> - if (get_msr(cpu, MSR_PP1_ENERGY_STATUS, &msr))
> - return -16;
> - p->energy_gfx = msr & 0xFFFFFFFF;
> + if (longtime) {
> + if (get_msr_sum(cpu, MSR_PP1_ENERGY_STATUS, &msr))
> + return -16;
> + p->energy_gfx = msr;
> + } else {
> + if (get_msr(cpu, MSR_PP1_ENERGY_STATUS, &msr))
> + return -16;
> + p->energy_gfx = msr & 0xFFFFFFFF;
> + }
> }
> if (do_rapl & RAPL_PKG_PERF_STATUS) {
> - if (get_msr(cpu, MSR_PKG_PERF_STATUS, &msr))
> - return -16;
> - p->rapl_pkg_perf_status = msr & 0xFFFFFFFF;
> + if (longtime) {
> + if (get_msr_sum(cpu, MSR_DRAM_PERF_STATUS, &msr))
> + return -16;
> + p->rapl_dram_perf_status = msr;
> + } else {
> + if (get_msr(cpu, MSR_PKG_PERF_STATUS, &msr))
> + return -16;
> + p->rapl_pkg_perf_status = msr & 0xFFFFFFFF;
> + }
> }
> if (do_rapl & RAPL_DRAM_PERF_STATUS) {
> - if (get_msr(cpu, MSR_DRAM_PERF_STATUS, &msr))
> - return -16;
> - p->rapl_dram_perf_status = msr & 0xFFFFFFFF;
> + if (longtime) {
> + if (get_msr_sum(cpu, MSR_DRAM_PERF_STATUS, &msr))
> + return -16;
> + p->rapl_dram_perf_status = msr;
> + } else {
> + if (get_msr(cpu, MSR_DRAM_PERF_STATUS, &msr))
> + return -16;
> + p->rapl_dram_perf_status = msr & 0xFFFFFFFF;
> + }
> }
> if (do_rapl & RAPL_AMD_F17H) {
> if (get_msr(cpu, MSR_PKG_ENERGY_STAT, &msr))
> @@ -3053,6 +3198,109 @@ void do_sleep(void)
> }
> }
>
> +int get_msr_sum(int cpu, off_t offset, unsigned long long *msr)
> +{
> + int ret, idx;
> + unsigned long long msr_cur, msr_last;
> +
> + if (!per_cpu_msr_sum)
> + return 1;
> +
> + idx = offset_to_idx(offset);
> + if (idx < 0)
> + return idx;
> + /* get_msr_sum() = sum + (get_msr() - last) */
> + ret = get_msr(cpu, offset, &msr_cur);
> + if (ret)
> + return ret;
> + msr_last = per_cpu_msr_sum[cpu].entries[idx].last;
> + DELTA_WRAP32(msr_cur, msr_last);
> + *msr = msr_last + per_cpu_msr_sum[cpu].entries[idx].sum;
> +
> + return 0;
> +}
> +
> +timer_t timerid;
> +
> +/* Timer callback, update the sum of MSRs periodically. */
> +static int update_msr_sum(struct thread_data *t, struct core_data *c, struct pkg_data *p)
> +{
> + int i, ret;
> + int cpu = t->cpu_id;
> +
> + for (i = IDX_PKG_ENERGY; i < IDX_COUNT; i++) {
> + unsigned long long msr_cur, msr_last;
> + int offset;
> +
> + if (!idx_valid(i))
> + continue;
> + offset = idx_to_offset(i);
> + if (offset < 0)
> + continue;
> + ret = get_msr(cpu, offset, &msr_cur);
> + if (ret) {
> + fprintf(outf, "Can not update msr(0x%x)\n", offset);
> + continue;
> + }
> +
> + msr_last = per_cpu_msr_sum[cpu].entries[i].last;
> + per_cpu_msr_sum[cpu].entries[i].last = msr_cur & 0xffffffff;
> +
> + DELTA_WRAP32(msr_cur, msr_last);
> + per_cpu_msr_sum[cpu].entries[i].sum += msr_last;
> + }
> + return 0;
> +}
> +
> +static void
> +msr_record_handler(union sigval v)
> +{
> + for_all_cpus(update_msr_sum, EVEN_COUNTERS);
> +}
> +
> +void msr_longtime_record(void)
> +{
> + struct itimerspec its;
> + struct sigevent sev;
> +
> + per_cpu_msr_sum = calloc(topo.max_cpu_num + 1, sizeof(struct msr_sum_array));
> + if (!per_cpu_msr_sum) {
> + fprintf(outf, "Can not allocate memory for long time MSR.\n");
> + return;
> + }
> + /*
> + * Signal handler might be restricted, so use thread notifier instead.
> + */
> + memset(&sev, 0, sizeof(struct sigevent));
> + sev.sigev_notify = SIGEV_THREAD;
> + sev.sigev_notify_function = msr_record_handler;
> +
> + sev.sigev_value.sival_ptr = &timerid;
> + if (timer_create(CLOCK_REALTIME, &sev, &timerid) == -1) {
> + fprintf(outf, "Can not create timer.\n");
> + goto release_msr;
> + }
> +
> + its.it_value.tv_sec = 0;
> + its.it_value.tv_nsec = 1;
> + /*
> + * A wraparound time of around 60 secs when power consumption
> + * is high, use 50 secs.
> + */
> + its.it_interval.tv_sec = 50;
> + its.it_interval.tv_nsec = 0;
> +
> + if (timer_settime(timerid, 0, &its, NULL) == -1) {
> + fprintf(outf, "Can not set timer.\n");
> + goto release_timer;
> + }
> + return;
> +
> + release_timer:
> + timer_delete(timerid);
> + release_msr:
> + free(per_cpu_msr_sum);
> +}
>
> void turbostat_loop()
> {
> @@ -5735,6 +5983,7 @@ void cmdline(int argc, char **argv)
> {"hide", required_argument, 0, 'H'}, // meh, -h taken by --help
> {"Joules", no_argument, 0, 'J'},
> {"list", no_argument, 0, 'l'},
> + {"Longtime", no_argument, 0, 'L'},
> {"out", required_argument, 0, 'o'},
> {"quiet", no_argument, 0, 'q'},
> {"show", required_argument, 0, 's'},
> @@ -5746,7 +5995,7 @@ void cmdline(int argc, char **argv)
>
> progname = argv[0];
>
> - while ((opt = getopt_long_only(argc, argv, "+C:c:Dde:hi:Jn:o:qST:v",
> + while ((opt = getopt_long_only(argc, argv, "+C:c:Dde:hi:JLn:o:qST:v",
> long_options, &option_index)) != -1) {
> switch (opt) {
> case 'a':
> @@ -5800,6 +6049,9 @@ void cmdline(int argc, char **argv)
> list_header_only++;
> quiet++;
> break;
> + case 'L':
> + longtime = 1;
> + break;
> case 'o':
> outf = fopen_or_die(optarg, "w");
> break;
> @@ -5864,6 +6116,8 @@ int main(int argc, char **argv)
> return 0;
> }
>
> + if (longtime)
> + msr_longtime_record();
> /*
> * if any params left, it must be a command to fork
> */
> --
> 2.17.1
>


--
Len Brown, Intel Open Source Technology Center

2020-08-14 15:46:03

by Chen Yu

[permalink] [raw]
Subject: RE: [PATCH 2/2][RFC] tools/power turbostat: Introduce reliable RAPL display

Hi Len,
> From: Len Brown <[email protected]>
> Sent: Friday, August 14, 2020 5:51 AM
> To: Chen, Yu C <[email protected]>
> Cc: Linux PM list <[email protected]>; Linux Kernel Mailing List <linux-
> [email protected]>; Zhang, Rui <[email protected]>
> Subject: Re: [PATCH 2/2][RFC] tools/power turbostat: Introduce reliable RAPL
> display
>
> why not simply use nanosleep(2)
>
>
Do you mean, use nanosleep rather than the timer to accumulate the RAPL data?
After thinking for a while, it looks like if we use nanosleep we might
need to create a new thread within the turbostat and sleep every few seconds
(according to the RAPL register timeout) to accumulate the running RAPL. And might
need to deal with some race conditions between new thread and the main turbostat
thread. But yes, it can be switched to nanosleep() to check if the code would look
simpler.

BTW, we have a v3 of the patch at
https://lore.kernel.org/patchwork/project/lkml/list/?series=439330


Thanks,
Chenyu

> --
> Len Brown, Intel Open Source Technology Center