2020-04-15 21:30:28

by Chen Yu

[permalink] [raw]
Subject: [PATCH 0/3][RFC v2] tools/power turbostat: Enable accumulated energy consumption for long time sampling

The RAPL Joule Counter is 32 bit, turbostat would
only print a *star* instead of printing the actual energy
consumed due to the overflow of RAPL register.

Introduce the accumulated RAPL mechanism to avoid the possible
overflow.

Chen Yu (3):
tools/power turbostat: Make the energy variable to be 64 bit
tools/power turbostat: Introduce functions to accumulate RAPL
consumption
tools/power turbostat: Enable accumulate RAPL display

tools/power/x86/turbostat/Makefile | 2 +-
tools/power/x86/turbostat/turbostat.c | 273 ++++++++++++++++++++++----
2 files changed, 237 insertions(+), 38 deletions(-)

--
2.20.1


2020-04-15 21:30:35

by Chen Yu

[permalink] [raw]
Subject: [PATCH 1/3][RFC v2] tools/power turbostat: Make the energy variable to be 64 bit

Change the energy variable from 32bit to 64bit, so that it can record long
time duration. After this conversion, adjust the DELTA_WRAP32() accordingly.

Signed-off-by: Chen Yu <[email protected]>
---
tools/power/x86/turbostat/turbostat.c | 30 ++++++++++++---------------
1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c
index 33b370865d16..95f3047e94ae 100644
--- a/tools/power/x86/turbostat/turbostat.c
+++ b/tools/power/x86/turbostat/turbostat.c
@@ -211,12 +211,12 @@ struct pkg_data {
long long gfx_rc6_ms;
unsigned int gfx_mhz;
unsigned int package_id;
- unsigned int energy_pkg; /* MSR_PKG_ENERGY_STATUS */
- unsigned int energy_dram; /* MSR_DRAM_ENERGY_STATUS */
- unsigned int energy_cores; /* MSR_PP0_ENERGY_STATUS */
- unsigned int energy_gfx; /* MSR_PP1_ENERGY_STATUS */
- unsigned int rapl_pkg_perf_status; /* MSR_PKG_PERF_STATUS */
- unsigned int rapl_dram_perf_status; /* MSR_DRAM_PERF_STATUS */
+ unsigned long long energy_pkg; /* MSR_PKG_ENERGY_STATUS */
+ unsigned long long energy_dram; /* MSR_DRAM_ENERGY_STATUS */
+ unsigned long long energy_cores; /* MSR_PP0_ENERGY_STATUS */
+ unsigned long long energy_gfx; /* MSR_PP1_ENERGY_STATUS */
+ unsigned long long rapl_pkg_perf_status; /* MSR_PKG_PERF_STATUS */
+ unsigned long long rapl_dram_perf_status; /* MSR_DRAM_PERF_STATUS */
unsigned int pkg_temp_c;
unsigned long long counter[MAX_ADDED_COUNTERS];
} *package_even, *package_odd;
@@ -858,13 +858,13 @@ int dump_counters(struct thread_data *t, struct core_data *c,
outp += sprintf(outp, "pc10: %016llX\n", p->pc10);
outp += sprintf(outp, "cpu_lpi: %016llX\n", p->cpu_lpi);
outp += sprintf(outp, "sys_lpi: %016llX\n", p->sys_lpi);
- outp += sprintf(outp, "Joules PKG: %0X\n", p->energy_pkg);
- outp += sprintf(outp, "Joules COR: %0X\n", p->energy_cores);
- outp += sprintf(outp, "Joules GFX: %0X\n", p->energy_gfx);
- outp += sprintf(outp, "Joules RAM: %0X\n", p->energy_dram);
- outp += sprintf(outp, "Throttle PKG: %0X\n",
+ outp += sprintf(outp, "Joules PKG: %0llX\n", p->energy_pkg);
+ outp += sprintf(outp, "Joules COR: %0llX\n", p->energy_cores);
+ outp += sprintf(outp, "Joules GFX: %0llX\n", p->energy_gfx);
+ outp += sprintf(outp, "Joules RAM: %0llX\n", p->energy_dram);
+ outp += sprintf(outp, "Throttle PKG: %0llX\n",
p->rapl_pkg_perf_status);
- outp += sprintf(outp, "Throttle RAM: %0X\n",
+ outp += sprintf(outp, "Throttle RAM: %0llX\n",
p->rapl_dram_perf_status);
outp += sprintf(outp, "PTM: %dC\n", p->pkg_temp_c);

@@ -1210,11 +1210,7 @@ void format_all_counters(struct thread_data *t, struct core_data *c, struct pkg_
}

#define DELTA_WRAP32(new, old) \
- if (new > old) { \
- old = new - old; \
- } else { \
- old = 0x100000000 + new - old; \
- }
+ old = ((((unsigned long long)new << 32) - ((unsigned long long)old << 32)) >> 32);

int
delta_package(struct pkg_data *new, struct pkg_data *old)
--
2.20.1

2020-04-15 21:30:46

by Chen Yu

[permalink] [raw]
Subject: [PATCH 2/3][v2] tools/power turbostat: Introduce functions to accumulate RAPL consumption

Since the RAPL Joule Counter is 32 bit, turbostat would
only print a *star* instead of printing the actual energy
consumed to indicate the overflow due to long duration.
This does not meet the requirement from servers as the
sampling time of turbostat is usually very long on servers.

So maintain a set of MSR buffer, and update them
periodically before the 32bit MSR register is wrapped round,
so as to avoid the overflow.

The idea is similar to the implementation of ktime_get():

Periodical MSR timer:
total_rapl_sum += (current_rapl_msr - last_rapl_msr);

Using get_msr_sum() to get the accumulated RAPL:
return (current_rapl_msr - last_rapl_msr) + total_rapl_sum;

The accumulated RAPL mechanism will be turned on in next patch.

Originally-by: Aaron Lu <[email protected]>
Signed-off-by: Chen Yu <[email protected]>
---
v2: According to Len's suggestion:
1. Enable the accumulated RAPL mechanism by default.
2. Re-use the rapl_joule_counter_range to represent the
the timeout of periodical timer.
3. Remove the condition check in v1 patch when reading RAPL registers.
--
tools/power/x86/turbostat/Makefile | 2 +-
tools/power/x86/turbostat/turbostat.c | 209 ++++++++++++++++++++++++++
2 files changed, 210 insertions(+), 1 deletion(-)

diff --git a/tools/power/x86/turbostat/Makefile b/tools/power/x86/turbostat/Makefile
index 2b6551269e43..d08765531bcb 100644
--- a/tools/power/x86/turbostat/Makefile
+++ b/tools/power/x86/turbostat/Makefile
@@ -16,7 +16,7 @@ override CFLAGS += -D_FORTIFY_SOURCE=2

%: %.c
@mkdir -p $(BUILD_OUTPUT)
- $(CC) $(CFLAGS) $< -o $(BUILD_OUTPUT)/$@ $(LDFLAGS) -lcap
+ $(CC) $(CFLAGS) $< -o $(BUILD_OUTPUT)/$@ $(LDFLAGS) -lcap -lrt

.PHONY : clean
clean :
diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c
index 95f3047e94ae..ef380db38cba 100644
--- a/tools/power/x86/turbostat/turbostat.c
+++ b/tools/power/x86/turbostat/turbostat.c
@@ -259,6 +259,113 @@ struct msr_counter {
#define SYSFS_PERCPU (1 << 1)
};

+/*
+ * The accumulated sum of MSR is defined as a monotonic
+ * increasing MSR, it will be accumulated periodically,
+ * despite its register's bit width.
+ */
+enum {
+ IDX_PKG_ENERGY,
+ IDX_DRAM_ENERGY,
+ IDX_PP0_ENERGY,
+ IDX_PP1_ENERGY,
+ IDX_PKG_PERF,
+ IDX_DRAM_PERF,
+ IDX_COUNT,
+};
+
+int get_msr_sum(int cpu, off_t offset, unsigned long long *msr);
+
+struct msr_sum_array {
+ /* get_msr_sum() = sum + (get_msr() - last) */
+ struct {
+ /*The accumulated MSR value is updated by the timer*/
+ unsigned long long sum;
+ /*The MSR footprint recorded in last timer*/
+ unsigned long long last;
+ } entries[IDX_COUNT];
+};
+
+/* The percpu MSR sum array.*/
+struct msr_sum_array *per_cpu_msr_sum;
+
+int idx_to_offset(int idx)
+{
+ int offset;
+
+ switch (idx) {
+ case IDX_PKG_ENERGY:
+ offset = MSR_PKG_ENERGY_STATUS;
+ break;
+ case IDX_DRAM_ENERGY:
+ offset = MSR_DRAM_ENERGY_STATUS;
+ break;
+ case IDX_PP0_ENERGY:
+ offset = MSR_PP0_ENERGY_STATUS;
+ break;
+ case IDX_PP1_ENERGY:
+ offset = MSR_PP1_ENERGY_STATUS;
+ break;
+ case IDX_PKG_PERF:
+ offset = MSR_PKG_PERF_STATUS;
+ break;
+ case IDX_DRAM_PERF:
+ offset = MSR_DRAM_PERF_STATUS;
+ break;
+ default:
+ offset = -1;
+ }
+ return offset;
+}
+
+int offset_to_idx(int offset)
+{
+ int idx;
+
+ switch (offset) {
+ case MSR_PKG_ENERGY_STATUS:
+ idx = IDX_PKG_ENERGY;
+ break;
+ case MSR_DRAM_ENERGY_STATUS:
+ idx = IDX_DRAM_ENERGY;
+ break;
+ case MSR_PP0_ENERGY_STATUS:
+ idx = IDX_PP0_ENERGY;
+ break;
+ case MSR_PP1_ENERGY_STATUS:
+ idx = IDX_PP1_ENERGY;
+ break;
+ case MSR_PKG_PERF_STATUS:
+ idx = IDX_PKG_PERF;
+ break;
+ case MSR_DRAM_PERF_STATUS:
+ idx = IDX_DRAM_PERF;
+ break;
+ default:
+ idx = -1;
+ }
+ return idx;
+}
+
+int idx_valid(int idx)
+{
+ switch (idx) {
+ case IDX_PKG_ENERGY:
+ return do_rapl & RAPL_PKG;
+ case IDX_DRAM_ENERGY:
+ return do_rapl & RAPL_DRAM;
+ case IDX_PP0_ENERGY:
+ return do_rapl & RAPL_CORES_ENERGY_STATUS;
+ case IDX_PP1_ENERGY:
+ return do_rapl & RAPL_GFX;
+ case IDX_PKG_PERF:
+ return do_rapl & RAPL_PKG_PERF_STATUS;
+ case IDX_DRAM_PERF:
+ return do_rapl & RAPL_DRAM_PERF_STATUS;
+ default:
+ return 0;
+ }
+}
struct sys_counters {
unsigned int added_thread_counters;
unsigned int added_core_counters;
@@ -3053,6 +3160,108 @@ void do_sleep(void)
}
}

+int get_msr_sum(int cpu, off_t offset, unsigned long long *msr)
+{
+ int ret, idx;
+ unsigned long long msr_cur, msr_last;
+
+ if (!per_cpu_msr_sum)
+ return 1;
+
+ idx = offset_to_idx(offset);
+ if (idx < 0)
+ return idx;
+ /* get_msr_sum() = sum + (get_msr() - last) */
+ ret = get_msr(cpu, offset, &msr_cur);
+ if (ret)
+ return ret;
+ msr_last = per_cpu_msr_sum[cpu].entries[idx].last;
+ DELTA_WRAP32(msr_cur, msr_last);
+ *msr = msr_last + per_cpu_msr_sum[cpu].entries[idx].sum;
+
+ return 0;
+}
+
+timer_t timerid;
+
+/* Timer callback, update the sum of MSRs periodically. */
+static int update_msr_sum(struct thread_data *t, struct core_data *c, struct pkg_data *p)
+{
+ int i, ret;
+ int cpu = t->cpu_id;
+
+ for (i = IDX_PKG_ENERGY; i < IDX_COUNT; i++) {
+ unsigned long long msr_cur, msr_last;
+ int offset;
+
+ if (!idx_valid(i))
+ continue;
+ offset = idx_to_offset(i);
+ if (offset < 0)
+ continue;
+ ret = get_msr(cpu, offset, &msr_cur);
+ if (ret) {
+ fprintf(outf, "Can not update msr(0x%x)\n", offset);
+ continue;
+ }
+
+ msr_last = per_cpu_msr_sum[cpu].entries[i].last;
+ per_cpu_msr_sum[cpu].entries[i].last = msr_cur & 0xffffffff;
+
+ DELTA_WRAP32(msr_cur, msr_last);
+ per_cpu_msr_sum[cpu].entries[i].sum += msr_last;
+ }
+ return 0;
+}
+
+static void
+msr_record_handler(union sigval v)
+{
+ for_all_cpus(update_msr_sum, EVEN_COUNTERS);
+}
+
+void msr_sum_record(void)
+{
+ struct itimerspec its;
+ struct sigevent sev;
+
+ per_cpu_msr_sum = calloc(topo.max_cpu_num + 1, sizeof(struct msr_sum_array));
+ if (!per_cpu_msr_sum) {
+ fprintf(outf, "Can not allocate memory for long time MSR.\n");
+ return;
+ }
+ /*
+ * Signal handler might be restricted, so use thread notifier instead.
+ */
+ memset(&sev, 0, sizeof(struct sigevent));
+ sev.sigev_notify = SIGEV_THREAD;
+ sev.sigev_notify_function = msr_record_handler;
+
+ sev.sigev_value.sival_ptr = &timerid;
+ if (timer_create(CLOCK_REALTIME, &sev, &timerid) == -1) {
+ fprintf(outf, "Can not create timer.\n");
+ goto release_msr;
+ }
+
+ its.it_value.tv_sec = 0;
+ its.it_value.tv_nsec = 1;
+ /*
+ * A wraparound time is calculated early.
+ */
+ its.it_interval.tv_sec = rapl_joule_counter_range;
+ its.it_interval.tv_nsec = 0;
+
+ if (timer_settime(timerid, 0, &its, NULL) == -1) {
+ fprintf(outf, "Can not set timer.\n");
+ goto release_timer;
+ }
+ return;
+
+ release_timer:
+ timer_delete(timerid);
+ release_msr:
+ free(per_cpu_msr_sum);
+}

void turbostat_loop()
{
--
2.20.1

2020-04-15 21:30:50

by Chen Yu

[permalink] [raw]
Subject: [PATCH 3/3][RFC v2] tools/power turbostat: Enable accumulate RAPL display

Enable the accumulated RAPL display by default.

Signed-off-by: Chen Yu <[email protected]>
---
tools/power/x86/turbostat/turbostat.c | 34 +++++++++++----------------
1 file changed, 14 insertions(+), 20 deletions(-)

diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c
index ef380db38cba..29fc4069f467 100644
--- a/tools/power/x86/turbostat/turbostat.c
+++ b/tools/power/x86/turbostat/turbostat.c
@@ -1169,14 +1169,7 @@ int format_counters(struct thread_data *t, struct core_data *c,
}
}

- /*
- * If measurement interval exceeds minimum RAPL Joule Counter range,
- * indicate that results are suspect by printing "**" in fraction place.
- */
- if (interval_float < rapl_joule_counter_range)
- fmt8 = "%s%.2f";
- else
- fmt8 = "%6.0f**";
+ fmt8 = "%s%.2f";

if (DO_BIC(BIC_CorWatt) && (do_rapl & RAPL_PER_CORE_ENERGY))
outp += sprintf(outp, fmt8, (printed++ ? delim : ""), c->core_energy * rapl_energy_units / interval_float);
@@ -2069,34 +2062,34 @@ int get_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p)
p->sys_lpi = cpuidle_cur_sys_lpi_us;

if (do_rapl & RAPL_PKG) {
- if (get_msr(cpu, MSR_PKG_ENERGY_STATUS, &msr))
+ if (get_msr_sum(cpu, MSR_PKG_ENERGY_STATUS, &msr))
return -13;
- p->energy_pkg = msr & 0xFFFFFFFF;
+ p->energy_pkg = msr;
}
if (do_rapl & RAPL_CORES_ENERGY_STATUS) {
- if (get_msr(cpu, MSR_PP0_ENERGY_STATUS, &msr))
+ if (get_msr_sum(cpu, MSR_PP0_ENERGY_STATUS, &msr))
return -14;
- p->energy_cores = msr & 0xFFFFFFFF;
+ p->energy_cores = msr;
}
if (do_rapl & RAPL_DRAM) {
- if (get_msr(cpu, MSR_DRAM_ENERGY_STATUS, &msr))
+ if (get_msr_sum(cpu, MSR_DRAM_ENERGY_STATUS, &msr))
return -15;
- p->energy_dram = msr & 0xFFFFFFFF;
+ p->energy_dram = msr;
}
if (do_rapl & RAPL_GFX) {
- if (get_msr(cpu, MSR_PP1_ENERGY_STATUS, &msr))
+ if (get_msr_sum(cpu, MSR_PP1_ENERGY_STATUS, &msr))
return -16;
- p->energy_gfx = msr & 0xFFFFFFFF;
+ p->energy_gfx = msr;
}
if (do_rapl & RAPL_PKG_PERF_STATUS) {
- if (get_msr(cpu, MSR_PKG_PERF_STATUS, &msr))
+ if (get_msr_sum(cpu, MSR_PKG_PERF_STATUS, &msr))
return -16;
- p->rapl_pkg_perf_status = msr & 0xFFFFFFFF;
+ p->rapl_pkg_perf_status = msr;
}
if (do_rapl & RAPL_DRAM_PERF_STATUS) {
- if (get_msr(cpu, MSR_DRAM_PERF_STATUS, &msr))
+ if (get_msr_sum(cpu, MSR_DRAM_PERF_STATUS, &msr))
return -16;
- p->rapl_dram_perf_status = msr & 0xFFFFFFFF;
+ p->rapl_dram_perf_status = msr;
}
if (do_rapl & RAPL_AMD_F17H) {
if (get_msr(cpu, MSR_PKG_ENERGY_STAT, &msr))
@@ -6073,6 +6066,7 @@ int main(int argc, char **argv)
return 0;
}

+ msr_sum_record();
/*
* if any params left, it must be a command to fork
*/
--
2.20.1

2020-04-16 04:14:11

by Doug Smythies

[permalink] [raw]
Subject: RE: [PATCH 2/3][v2] tools/power turbostat: Introduce functions to accumulate RAPL consumption

On 2020.04.15 05:57 Chen Yu wrote:

...

> v2: According to Len's suggestion:
> 1. Enable the accumulated RAPL mechanism by default.

I am not a fan of this, but O.K.

> 2. Re-use the rapl_joule_counter_range to represent the
> the timeout of periodical timer.

No, please no. It is too easy to still have an overflow.

...
> + /*
> + * A wraparound time is calculated early.
> + */
> + its.it_interval.tv_sec = rapl_joule_counter_range;

Would this be o.K.?

+ its.it_interval.tv_sec = rapl_joule_counter_range / 2;

> + its.it_interval.tv_nsec = 0;

The way it was sent, this patch set does not work.
It still overflows.

Example, sample time calculated to ensure overflow:

Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt
100.00 3500 3592125 80 9.72 0.12
100.00 3500 3587391 79 9.77 0.12

Actual package watts was around 65.

However, if this additional patch is applied (I only fixed one of them):

doug@s18:~/temp-k-git/linux/tools/power/x86/turbostat$ git diff
diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c
index 29fc4069f467..4d72d9be5209 100644
--- a/tools/power/x86/turbostat/turbostat.c
+++ b/tools/power/x86/turbostat/turbostat.c
@@ -1350,7 +1350,8 @@ delta_package(struct pkg_data *new, struct pkg_data *old)

old->gfx_mhz = new->gfx_mhz;

- DELTA_WRAP32(new->energy_pkg, old->energy_pkg);
+/* DELTA_WRAP32(new->energy_pkg, old->energy_pkg); */
+ old->energy_pkg = new->energy_pkg - old->energy_pkg;
DELTA_WRAP32(new->energy_cores, old->energy_cores);
DELTA_WRAP32(new->energy_gfx, old->energy_gfx);
DELTA_WRAP32(new->energy_dram, old->energy_dram);

Then it seems to work.

Example:

doug@s15:~/temp-turbostat$ sudo ./turbostat --Summary --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 1200
...
RAPL: 690 sec. Joule Counter Range, at 95 Watts
...
Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt
100.00 3500 3592328 80 64.32 0.12
100.00 3500 3595195 79 64.37 0.12

... Doug


2020-04-16 20:42:56

by Chen Yu

[permalink] [raw]
Subject: Re: [PATCH 2/3][v2] tools/power turbostat: Introduce functions to accumulate RAPL consumption

Hi Doug,
Thanks for reviewing this patch.
On Wed, Apr 15, 2020 at 09:03:34PM -0700, Doug Smythies wrote:
> On 2020.04.15 05:57 Chen Yu wrote:
>
> ...
>
> > v2: According to Len's suggestion:
> > 1. Enable the accumulated RAPL mechanism by default.
>
> I am not a fan of this, but O.K.
>
> > 2. Re-use the rapl_joule_counter_range to represent the
> > the timeout of periodical timer.
>
> No, please no. It is too easy to still have an overflow.
>
> ...
> > + /*
> > + * A wraparound time is calculated early.
> > + */
> > + its.it_interval.tv_sec = rapl_joule_counter_range;
>
> Would this be o.K.?
>
> + its.it_interval.tv_sec = rapl_joule_counter_range / 2;
>
This should be okay. I've checked the defination of TDP, and
on a wiki page it has mentioned that[1]:
"Some sources state that the peak power for a microprocessor
is usually 1.5 times the TDP rating"
although the defination of TDP varies, using 2 * TDP should
be safe.
> > + its.it_interval.tv_nsec = 0;
>
> The way it was sent, this patch set does not work.
> It still overflows.
>
> Example, sample time calculated to ensure overflow:
>
> Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt
> 100.00 3500 3592125 80 9.72 0.12
> 100.00 3500 3587391 79 9.77 0.12
>
> Actual package watts was around 65.
>
> However, if this additional patch is applied (I only fixed one of them):
>
> doug@s18:~/temp-k-git/linux/tools/power/x86/turbostat$ git diff
> diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c
> index 29fc4069f467..4d72d9be5209 100644
> --- a/tools/power/x86/turbostat/turbostat.c
> +++ b/tools/power/x86/turbostat/turbostat.c
> @@ -1350,7 +1350,8 @@ delta_package(struct pkg_data *new, struct pkg_data *old)
>
> old->gfx_mhz = new->gfx_mhz;
>
> - DELTA_WRAP32(new->energy_pkg, old->energy_pkg);
> +/* DELTA_WRAP32(new->energy_pkg, old->energy_pkg); */
> + old->energy_pkg = new->energy_pkg - old->energy_pkg;
> DELTA_WRAP32(new->energy_cores, old->energy_cores);
> DELTA_WRAP32(new->energy_gfx, old->energy_gfx);
> DELTA_WRAP32(new->energy_dram, old->energy_dram);
>
> Then it seems to work.
>
Nice catch, I did not realize that the energy_pkg field has
already been converted into accumuted variable which does not
need to consider the wrapping(64bit should be long enough for
normal test cases).

Thanks,
Chenyu
> Example:
>
> doug@s15:~/temp-turbostat$ sudo ./turbostat --Summary --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 1200
> ...
> RAPL: 690 sec. Joule Counter Range, at 95 Watts
> ...
> Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt
> 100.00 3500 3592328 80 64.32 0.12
> 100.00 3500 3595195 79 64.37 0.12
>
> ... Doug
>
>

2020-04-16 20:50:17

by Doug Smythies

[permalink] [raw]
Subject: RE: [PATCH 2/3][v2] tools/power turbostat: Introduce functions to accumulate RAPL consumption

On 2020.04.10:06 Chen Yu wrote:
> On Wed, Apr 15, 2020 at 09:03:34PM -0700, Doug Smythies wrote:
>> On 2020.04.15 05:57 Chen Yu wrote:
>
>>> + /*
>>> + * A wraparound time is calculated early.
>>> + */
>>> + its.it_interval.tv_sec = rapl_joule_counter_range;
>>
>> Would this be o.K.?
>>
>> + its.it_interval.tv_sec = rapl_joule_counter_range / 2;
>>
> This should be okay. I've checked the defination of TDP, and
> on a wiki page it has mentioned that[1]:
> "Some sources state that the peak power for a microprocessor
> is usually 1.5 times the TDP rating"
> although the defination of TDP varies, using 2 * TDP should
> be safe.

O.K. Great.
By the way, I have already tested it (in addition to the previously e-mailed patch correction):

First, with this:

its.it_interval.tv_sec = rapl_joule_counter_range;

Result:

sudo ./turbostat --Summary --interval 3200 --show Avg_MHz,Busy%,Bzy_MHz,IRQ,PkgTmp,PkgWatt,GFXWatt
...
RAPL: 2759 sec. Joule Counter Range, at 95 Watts
...
cpu0: MSR_PKG_POWER_INFO: 0x000002f8 (95 W TDP, RAPL 0 - 0 W, 0.000000 sec.)
cpu0: MSR_PKG_POWER_LIMIT: 0x4283e800dd8320 (UNlocked)
cpu0: PKG Limit #1: ENabled (100.000000 Watts, 28.000000 sec, clamp ENabled)
cpu0: PKG Limit #2: ENabled (125.000000 Watts, 0.002441* sec, clamp DISabled)
...
Avg_MHz Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt
4039 100.20 4031 7211202 64 18.29 0.00
4033 100.22 4024 7254993 66 18.00 0.00

actual (using a shorter interval, that doesn't wrap around):

Avg_MHz Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt
4032 100.22 4023 676360 65 99.92 0.00
4029 100.23 4019 676629 65 99.91 0.00
4032 100.22 4023 676771 65 99.91 0.00
4037 100.22 4028 675430 65 99.91 0.00
4032 100.22 4023 675819 65 99.91 0.00
4028 100.23 4019 676541 65 99.91 0.00
4042 100.22 4033 675857 64 99.91 0.00
4029 100.23 4020 675597 65 99.91 0.00
4027 100.23 4017 676201 65 3751748943144.71 0.00
4034 100.22 4025 676402 65 99.91 0.00
4035 100.22 4026 674982 65 99.91 0.00
4032 100.22 4023 676012 64 99.91 0.00
4034 100.22 4025 723696 66 99.91 0.00
4039 100.22 4030 676342 64 99.91 0.00
4028 100.23 4018 676082 65 99.91 0.00
4028 100.23 4019 676218 65 99.91 0.00
4038 100.22 4030 675771 65 99.91 0.00
4031 100.22 4022 674282 65 3751749380702.93 0.00
4031 100.22 4022 676314 65 99.91 0.00
4039 100.22 4030 676197 65 99.91 0.00

And now with this:

its.it_interval.tv_sec = rapl_joule_counter_range / 2;

Avg_MHz Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt
4032 100.22 4023 7205931 65 99.91 0.00
4033 100.22 4023 7208003 65 99.91 0.00
4034 100.22 4024 7205563 65 99.91 0.00

and using the shorter interval:

Avg_MHz Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt
4028 100.23 4019 676147 64 99.92 0.00
4027 100.23 4017 675857 65 99.91 0.00
4036 100.22 4027 675736 65 99.91 0.00
4032 100.22 4023 674758 65 99.91 0.00
4032 100.22 4022 675692 65 99.91 0.00
4032 100.22 4023 676275 65 99.91 0.00
4043 100.22 4035 676001 66 99.91 0.00
4028 100.23 4019 676277 65 99.91 0.00
4028 100.23 4019 676420 65 99.91 0.00
4028 100.23 4019 675884 65 99.91 0.00
4037 100.22 4028 675293 65 99.91 0.00
4030 100.23 4021 674025 66 99.91 0.00
4031 100.22 4022 676462 65 99.91 0.00
4032 100.22 4023 676007 66 99.91 0.00
4047 100.21 4038 676424 65 99.91 0.00
4030 100.22 4021 676853 65 99.91 0.00
4028 100.23 4019 676553 65 99.91 0.00
4034 100.22 4025 675880 65 99.91 0.00
4036 100.22 4027 674824 65 99.91 0.00
4033 100.22 4024 674577 65 99.91 0.00
4031 100.22 4022 676599 65 99.91 0.00
4041 100.22 4032 676675 66 99.91 0.00

Note that it is very much on purpose that I have set
TDP to 100 watts on this processor, whereas the
default is 95 watts. Notice the package temperature,
after several hours of running power throttled to
a CPU frequency of 4.03 GHz.

... Doug


2020-04-17 04:24:48

by Chen Yu

[permalink] [raw]
Subject: Re: [PATCH 2/3][v2] tools/power turbostat: Introduce functions to accumulate RAPL consumption

On Thu, Apr 16, 2020 at 01:40:09PM -0700, Doug Smythies wrote:
> On 2020.04.10:06 Chen Yu wrote:
> > On Wed, Apr 15, 2020 at 09:03:34PM -0700, Doug Smythies wrote:
> >> On 2020.04.15 05:57 Chen Yu wrote:
> >
> >>> + /*
> >>> + * A wraparound time is calculated early.
> >>> + */
> >>> + its.it_interval.tv_sec = rapl_joule_counter_range;
> >>
> >> Would this be o.K.?
> >>
> >> + its.it_interval.tv_sec = rapl_joule_counter_range / 2;
> >>
> > This should be okay. I've checked the defination of TDP, and
> > on a wiki page it has mentioned that[1]:
> > "Some sources state that the peak power for a microprocessor
> > is usually 1.5 times the TDP rating"
> > although the defination of TDP varies, using 2 * TDP should
> > be safe.
>
> O.K. Great.
> By the way, I have already tested it (in addition to the previously e-mailed patch correction):
>
> First, with this:
>
> its.it_interval.tv_sec = rapl_joule_counter_range;
>
> Result:
>
> sudo ./turbostat --Summary --interval 3200 --show Avg_MHz,Busy%,Bzy_MHz,IRQ,PkgTmp,PkgWatt,GFXWatt
> ...
> RAPL: 2759 sec. Joule Counter Range, at 95 Watts
> ...
> cpu0: MSR_PKG_POWER_INFO: 0x000002f8 (95 W TDP, RAPL 0 - 0 W, 0.000000 sec.)
> cpu0: MSR_PKG_POWER_LIMIT: 0x4283e800dd8320 (UNlocked)
> cpu0: PKG Limit #1: ENabled (100.000000 Watts, 28.000000 sec, clamp ENabled)
> cpu0: PKG Limit #2: ENabled (125.000000 Watts, 0.002441* sec, clamp DISabled)
> ...
> Avg_MHz Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt
> 4039 100.20 4031 7211202 64 18.29 0.00
> 4033 100.22 4024 7254993 66 18.00 0.00
>
> actual (using a shorter interval, that doesn't wrap around):
>
> Avg_MHz Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt
> 4032 100.22 4023 676360 65 99.92 0.00
> 4029 100.23 4019 676629 65 99.91 0.00
> 4032 100.22 4023 676771 65 99.91 0.00
> 4037 100.22 4028 675430 65 99.91 0.00
> 4032 100.22 4023 675819 65 99.91 0.00
> 4028 100.23 4019 676541 65 99.91 0.00
> 4042 100.22 4033 675857 64 99.91 0.00
> 4029 100.23 4020 675597 65 99.91 0.00
> 4027 100.23 4017 676201 65 3751748943144.71 0.00
> 4034 100.22 4025 676402 65 99.91 0.00
> 4035 100.22 4026 674982 65 99.91 0.00
> 4032 100.22 4023 676012 64 99.91 0.00
> 4034 100.22 4025 723696 66 99.91 0.00
> 4039 100.22 4030 676342 64 99.91 0.00
> 4028 100.23 4018 676082 65 99.91 0.00
> 4028 100.23 4019 676218 65 99.91 0.00
> 4038 100.22 4030 675771 65 99.91 0.00
> 4031 100.22 4022 674282 65 3751749380702.93 0.00
> 4031 100.22 4022 676314 65 99.91 0.00
> 4039 100.22 4030 676197 65 99.91 0.00
>
> And now with this:
>
> its.it_interval.tv_sec = rapl_joule_counter_range / 2;
>
> Avg_MHz Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt
> 4032 100.22 4023 7205931 65 99.91 0.00
> 4033 100.22 4023 7208003 65 99.91 0.00
> 4034 100.22 4024 7205563 65 99.91 0.00
>
> and using the shorter interval:
>
> Avg_MHz Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt
> 4028 100.23 4019 676147 64 99.92 0.00
> 4027 100.23 4017 675857 65 99.91 0.00
> 4036 100.22 4027 675736 65 99.91 0.00
> 4032 100.22 4023 674758 65 99.91 0.00
> 4032 100.22 4022 675692 65 99.91 0.00
> 4032 100.22 4023 676275 65 99.91 0.00
> 4043 100.22 4035 676001 66 99.91 0.00
> 4028 100.23 4019 676277 65 99.91 0.00
> 4028 100.23 4019 676420 65 99.91 0.00
> 4028 100.23 4019 675884 65 99.91 0.00
> 4037 100.22 4028 675293 65 99.91 0.00
> 4030 100.23 4021 674025 66 99.91 0.00
> 4031 100.22 4022 676462 65 99.91 0.00
> 4032 100.22 4023 676007 66 99.91 0.00
> 4047 100.21 4038 676424 65 99.91 0.00
> 4030 100.22 4021 676853 65 99.91 0.00
> 4028 100.23 4019 676553 65 99.91 0.00
> 4034 100.22 4025 675880 65 99.91 0.00
> 4036 100.22 4027 674824 65 99.91 0.00
> 4033 100.22 4024 674577 65 99.91 0.00
> 4031 100.22 4022 676599 65 99.91 0.00
> 4041 100.22 4032 676675 66 99.91 0.00
>
> Note that it is very much on purpose that I have set
> TDP to 100 watts on this processor, whereas the
> default is 95 watts. Notice the package temperature,
> after several hours of running power throttled to
> a CPU frequency of 4.03 GHz.
>
Do you mind if I add Reviewed-by and Tested-by tag from you
in next version?

Thanks,
Chenyu
> ... Doug
>
>