2020-06-01 23:03:06

by Moger, Babu

[permalink] [raw]
Subject: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD

Memory bandwidth is calculated reading the monitoring counter
at two intervals and calculating the delta. It is the software’s
responsibility to read the count often enough to avoid having
the count roll over _twice_ between reads.

The current code hardcodes the bandwidth monitoring counter's width
to 24 bits for AMD. This is due to default base counter width which
is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX
to adjust the counter width. But, the AMD hardware supports much
wider bandwidth counter with the default width of 44 bits.

Kernel reads these monitoring counters every 1 second and adjusts the
counter value for overflow. With 24 bits and scale value of 64 for AMD,
it can only measure up to 1GB/s without overflowing. For the rates
above 1GB/s this will fail to measure the bandwidth.

Fix the issue setting the default width to 44 bits by adjusting the
offset.

AMD future products will implement the CPUID 0xF.[ECX=1]:EAX.

Signed-off-by: Babu Moger <[email protected]>
---
- Sending it second time. Email client had some issues first time.
- Generated the patch on top of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git (x86/cache).

arch/x86/kernel/cpu/resctrl/core.c | 8 +++++++-
arch/x86/kernel/cpu/resctrl/internal.h | 1 +
2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 12f967c6b603..6040e9ae541b 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -983,7 +983,13 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
c->x86_cache_occ_scale = ebx;
if (c->x86_vendor == X86_VENDOR_INTEL)
c->x86_cache_mbm_width_offset = eax & 0xff;
- else
+ else if (c->x86_vendor == X86_VENDOR_AMD) {
+ if (eax)
+ c->x86_cache_mbm_width_offset = eax & 0xff;
+ else
+ c->x86_cache_mbm_width_offset =
+ MBM_CNTR_WIDTH_OFFSET_AMD;
+ } else
c->x86_cache_mbm_width_offset = -1;
}
}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index f20a47d120b1..5ffa32256b3b 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -37,6 +37,7 @@
#define MBA_IS_LINEAR 0x4
#define MBA_MAX_MBPS U32_MAX
#define MAX_MBA_BW_AMD 0x800
+#define MBM_CNTR_WIDTH_OFFSET_AMD 20

#define RMID_VAL_ERROR BIT_ULL(63)
#define RMID_VAL_UNAVAIL BIT_ULL(62)


2020-06-01 23:27:49

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD

On Mon, Jun 01, 2020 at 06:00:29PM -0500, Babu Moger wrote:
> Memory bandwidth is calculated reading the monitoring counter
> at two intervals and calculating the delta. It is the software’s
> responsibility to read the count often enough to avoid having
> the count roll over _twice_ between reads.
>
> The current code hardcodes the bandwidth monitoring counter's width
> to 24 bits for AMD. This is due to default base counter width which
> is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX
> to adjust the counter width. But, the AMD hardware supports much
> wider bandwidth counter with the default width of 44 bits.
>
> Kernel reads these monitoring counters every 1 second and adjusts the
> counter value for overflow. With 24 bits and scale value of 64 for AMD,
> it can only measure up to 1GB/s without overflowing. For the rates
> above 1GB/s this will fail to measure the bandwidth.
>
> Fix the issue setting the default width to 44 bits by adjusting the
> offset.
>
> AMD future products will implement the CPUID 0xF.[ECX=1]:EAX.
>
> Signed-off-by: Babu Moger <[email protected]>
> ---
> - Sending it second time. Email client had some issues first time.
> - Generated the patch on top of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git (x86/cache).
>
> arch/x86/kernel/cpu/resctrl/core.c | 8 +++++++-
> arch/x86/kernel/cpu/resctrl/internal.h | 1 +
> 2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 12f967c6b603..6040e9ae541b 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -983,7 +983,13 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
> c->x86_cache_occ_scale = ebx;
> if (c->x86_vendor == X86_VENDOR_INTEL)
> c->x86_cache_mbm_width_offset = eax & 0xff;
> - else
> + else if (c->x86_vendor == X86_VENDOR_AMD) {
> + if (eax)
> + c->x86_cache_mbm_width_offset = eax & 0xff;

When AMD implements CPUID.0x1f.1:eax, will the offset be based on 24 or 44?
Seems it makes senses to be based on 44 because default counter width is 44.

> + else
> + c->x86_cache_mbm_width_offset =
> + MBM_CNTR_WIDTH_OFFSET_AMD;

If that's the case, you don't need this "else" because the CPUID reports
offset as 0 for default width 44.

This will match the Intel code above.

Otherwise, the code is awkward.

Thanks.

-Fenghua

2020-06-02 14:33:28

by Moger, Babu

[permalink] [raw]
Subject: RE: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD

Hi Fenghua,

> -----Original Message-----
> From: Fenghua Yu <[email protected]>
> Sent: Monday, June 1, 2020 6:23 PM
> To: Moger, Babu <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD
>
> On Mon, Jun 01, 2020 at 06:00:29PM -0500, Babu Moger wrote:
> > Memory bandwidth is calculated reading the monitoring counter
> > at two intervals and calculating the delta. It is the software’s
> > responsibility to read the count often enough to avoid having
> > the count roll over _twice_ between reads.
> >
> > The current code hardcodes the bandwidth monitoring counter's width
> > to 24 bits for AMD. This is due to default base counter width which
> > is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX
> > to adjust the counter width. But, the AMD hardware supports much
> > wider bandwidth counter with the default width of 44 bits.
> >
> > Kernel reads these monitoring counters every 1 second and adjusts the
> > counter value for overflow. With 24 bits and scale value of 64 for AMD,
> > it can only measure up to 1GB/s without overflowing. For the rates
> > above 1GB/s this will fail to measure the bandwidth.
> >
> > Fix the issue setting the default width to 44 bits by adjusting the
> > offset.
> >
> > AMD future products will implement the CPUID 0xF.[ECX=1]:EAX.
> >
> > Signed-off-by: Babu Moger <[email protected]>
> > ---
> > - Sending it second time. Email client had some issues first time.
> > - Generated the patch on top of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git (x86/cache).
> >
> > arch/x86/kernel/cpu/resctrl/core.c | 8 +++++++-
> > arch/x86/kernel/cpu/resctrl/internal.h | 1 +
> > 2 files changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> b/arch/x86/kernel/cpu/resctrl/core.c
> > index 12f967c6b603..6040e9ae541b 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -983,7 +983,13 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
> > c->x86_cache_occ_scale = ebx;
> > if (c->x86_vendor == X86_VENDOR_INTEL)
> > c->x86_cache_mbm_width_offset = eax & 0xff;
> > - else
> > + else if (c->x86_vendor == X86_VENDOR_AMD) {
> > + if (eax)
> > + c->x86_cache_mbm_width_offset = eax & 0xff;
>
> When AMD implements CPUID.0x1f.1:eax, will the offset be based on 24 or 44?
> Seems it makes senses to be based on 44 because default counter width is 44.

It will be based on 24 just like Intel. So, it will be 24 + offset

>
> > + else
> > + c->x86_cache_mbm_width_offset =
> > + MBM_CNTR_WIDTH_OFFSET_AMD;
>
> If that's the case, you don't need this "else" because the CPUID reports
> offset as 0 for default width 44.
>
> This will match the Intel code above.
>
> Otherwise, the code is awkward.

Yes. It is bit awkward. Other way is to add check in
rdt_get_mon_l3_config. I thought this way is better.
Thanks

2020-06-02 17:16:16

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD

Hi Babu,

On 6/1/2020 4:00 PM, Babu Moger wrote:
> Memory bandwidth is calculated reading the monitoring counter
> at two intervals and calculating the delta. It is the software’s
> responsibility to read the count often enough to avoid having
> the count roll over _twice_ between reads.
>
> The current code hardcodes the bandwidth monitoring counter's width
> to 24 bits for AMD. This is due to default base counter width which
> is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX
> to adjust the counter width. But, the AMD hardware supports much
> wider bandwidth counter with the default width of 44 bits.
>
> Kernel reads these monitoring counters every 1 second and adjusts the
> counter value for overflow. With 24 bits and scale value of 64 for AMD,
> it can only measure up to 1GB/s without overflowing. For the rates
> above 1GB/s this will fail to measure the bandwidth.
>
> Fix the issue setting the default width to 44 bits by adjusting the
> offset.
>
> AMD future products will implement the CPUID 0xF.[ECX=1]:EAX.
>
> Signed-off-by: Babu Moger <[email protected]>

There is no fixes tag but if I understand correctly this issue has been
present since AMD support was added to resctrl. This fix builds on top
of a recent feature addition and would thus not work for earlier
kernels. Are you planning to create a different fix for earlier kernels?

Reinette

2020-06-02 17:36:12

by Moger, Babu

[permalink] [raw]
Subject: Re: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD



On 6/2/20 12:13 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/1/2020 4:00 PM, Babu Moger wrote:
>> Memory bandwidth is calculated reading the monitoring counter
>> at two intervals and calculating the delta. It is the software’s
>> responsibility to read the count often enough to avoid having
>> the count roll over _twice_ between reads.
>>
>> The current code hardcodes the bandwidth monitoring counter's width
>> to 24 bits for AMD. This is due to default base counter width which
>> is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX
>> to adjust the counter width. But, the AMD hardware supports much
>> wider bandwidth counter with the default width of 44 bits.
>>
>> Kernel reads these monitoring counters every 1 second and adjusts the
>> counter value for overflow. With 24 bits and scale value of 64 for AMD,
>> it can only measure up to 1GB/s without overflowing. For the rates
>> above 1GB/s this will fail to measure the bandwidth.
>>
>> Fix the issue setting the default width to 44 bits by adjusting the
>> offset.
>>
>> AMD future products will implement the CPUID 0xF.[ECX=1]:EAX.
>>
>> Signed-off-by: Babu Moger <[email protected]>
>
> There is no fixes tag but if I understand correctly this issue has been
> present since AMD support was added to resctrl. This fix builds on top
> of a recent feature addition and would thus not work for earlier
> kernels. Are you planning to create a different fix for earlier kernels?

Yes. This was there from day one. I am going to back port to older kernels
once we arrive on the final patch. Do we need fixes tag here?

2020-06-02 21:56:06

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD

Hi Babu,

On 6/1/2020 4:00 PM, Babu Moger wrote:
> Memory bandwidth is calculated reading the monitoring counter
> at two intervals and calculating the delta. It is the software’s
> responsibility to read the count often enough to avoid having
> the count roll over _twice_ between reads.
>
> The current code hardcodes the bandwidth monitoring counter's width
> to 24 bits for AMD. This is due to default base counter width which
> is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX
> to adjust the counter width. But, the AMD hardware supports much
> wider bandwidth counter with the default width of 44 bits.
>
> Kernel reads these monitoring counters every 1 second and adjusts the
> counter value for overflow. With 24 bits and scale value of 64 for AMD,
> it can only measure up to 1GB/s without overflowing. For the rates
> above 1GB/s this will fail to measure the bandwidth.
>
> Fix the issue setting the default width to 44 bits by adjusting the
> offset.
>
> AMD future products will implement the CPUID 0xF.[ECX=1]:EAX.
>
> Signed-off-by: Babu Moger <[email protected]>
> ---
> - Sending it second time. Email client had some issues first time.
> - Generated the patch on top of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git (x86/cache).
>
> arch/x86/kernel/cpu/resctrl/core.c | 8 +++++++-
> arch/x86/kernel/cpu/resctrl/internal.h | 1 +
> 2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 12f967c6b603..6040e9ae541b 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -983,7 +983,13 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
> c->x86_cache_occ_scale = ebx;
> if (c->x86_vendor == X86_VENDOR_INTEL)
> c->x86_cache_mbm_width_offset = eax & 0xff;
> - else
> + else if (c->x86_vendor == X86_VENDOR_AMD) {
> + if (eax)

This test checks if _any_ bit is set in eax ...

> + c->x86_cache_mbm_width_offset = eax & 0xff;

... with the assumption that the first eight bits contain a value.

Even so, now that Intel and AMD will be using eax in the same way,
perhaps it can be done simpler by always using eax to obtain the offset
(and thus avoid the code duplication) and on AMD initialize the default
if it cannot be obtained from eax?

What I mean is something like:

@@ -1024,10 +1024,12 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)

c->x86_cache_max_rmid = ecx;
c->x86_cache_occ_scale = ebx;
- if (c->x86_vendor == X86_VENDOR_INTEL)
- c->x86_cache_mbm_width_offset = eax & 0xff;
- else
- c->x86_cache_mbm_width_offset = -1;
+ c->x86_cache_mbm_width_offset = eax & 0xff;
+ if (c->x86_vendor == X86_VENDOR_AMD &&
+ c->x86_cache_mbm_width_offset == 0) {
+ c->x86_cache_mbm_width_offset =
+ MBM_CNTR_WIDTH_OFFSET_AMD;
+ }
}
}

What do you think?

Reinette

2020-06-02 22:02:48

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD

Hi Babu,

On 6/2/2020 10:33 AM, Babu Moger wrote:
>
>
> On 6/2/20 12:13 PM, Reinette Chatre wrote:
>> On 6/1/2020 4:00 PM, Babu Moger wrote:
>>> Memory bandwidth is calculated reading the monitoring counter
>>> at two intervals and calculating the delta. It is the software’s
>>> responsibility to read the count often enough to avoid having
>>> the count roll over _twice_ between reads.
>>>
>>> The current code hardcodes the bandwidth monitoring counter's width
>>> to 24 bits for AMD. This is due to default base counter width which
>>> is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX
>>> to adjust the counter width. But, the AMD hardware supports much
>>> wider bandwidth counter with the default width of 44 bits.
>>>
>>> Kernel reads these monitoring counters every 1 second and adjusts the
>>> counter value for overflow. With 24 bits and scale value of 64 for AMD,
>>> it can only measure up to 1GB/s without overflowing. For the rates
>>> above 1GB/s this will fail to measure the bandwidth.
>>>
>>> Fix the issue setting the default width to 44 bits by adjusting the
>>> offset.
>>>
>>> AMD future products will implement the CPUID 0xF.[ECX=1]:EAX.
>>>
>>> Signed-off-by: Babu Moger <[email protected]>
>>
>> There is no fixes tag but if I understand correctly this issue has been
>> present since AMD support was added to resctrl. This fix builds on top
>> of a recent feature addition and would thus not work for earlier
>> kernels. Are you planning to create a different fix for earlier kernels?
>
> Yes. This was there from day one. I am going to back port to older kernels
> once we arrive on the final patch. Do we need fixes tag here?
>

Yes, this needs a fixes tag. This would help the teams understand which
kernels should be fixed.

Reinette

2020-06-02 22:16:39

by Moger, Babu

[permalink] [raw]
Subject: RE: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD



> -----Original Message-----
> From: Reinette Chatre <[email protected]>
> Sent: Tuesday, June 2, 2020 4:51 PM
> To: Moger, Babu <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD
>
> Hi Babu,
>
> On 6/1/2020 4:00 PM, Babu Moger wrote:
> > Memory bandwidth is calculated reading the monitoring counter
> > at two intervals and calculating the delta. It is the software’s
> > responsibility to read the count often enough to avoid having
> > the count roll over _twice_ between reads.
> >
> > The current code hardcodes the bandwidth monitoring counter's width
> > to 24 bits for AMD. This is due to default base counter width which
> > is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX
> > to adjust the counter width. But, the AMD hardware supports much
> > wider bandwidth counter with the default width of 44 bits.
> >
> > Kernel reads these monitoring counters every 1 second and adjusts the
> > counter value for overflow. With 24 bits and scale value of 64 for AMD,
> > it can only measure up to 1GB/s without overflowing. For the rates
> > above 1GB/s this will fail to measure the bandwidth.
> >
> > Fix the issue setting the default width to 44 bits by adjusting the
> > offset.
> >
> > AMD future products will implement the CPUID 0xF.[ECX=1]:EAX.
> >
> > Signed-off-by: Babu Moger <[email protected]>
> > ---
> > - Sending it second time. Email client had some issues first time.
> > - Generated the patch on top of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git (x86/cache).
> >
> > arch/x86/kernel/cpu/resctrl/core.c | 8 +++++++-
> > arch/x86/kernel/cpu/resctrl/internal.h | 1 +
> > 2 files changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> b/arch/x86/kernel/cpu/resctrl/core.c
> > index 12f967c6b603..6040e9ae541b 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -983,7 +983,13 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
> > c->x86_cache_occ_scale = ebx;
> > if (c->x86_vendor == X86_VENDOR_INTEL)
> > c->x86_cache_mbm_width_offset = eax & 0xff;
> > - else
> > + else if (c->x86_vendor == X86_VENDOR_AMD) {
> > + if (eax)
>
> This test checks if _any_ bit is set in eax ...
>
> > + c->x86_cache_mbm_width_offset = eax & 0xff;
>
> ... with the assumption that the first eight bits contain a value.
>
> Even so, now that Intel and AMD will be using eax in the same way,
> perhaps it can be done simpler by always using eax to obtain the offset
> (and thus avoid the code duplication) and on AMD initialize the default
> if it cannot be obtained from eax?
>
> What I mean is something like:
>
> @@ -1024,10 +1024,12 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
>
> c->x86_cache_max_rmid = ecx;
> c->x86_cache_occ_scale = ebx;
> - if (c->x86_vendor == X86_VENDOR_INTEL)
> - c->x86_cache_mbm_width_offset = eax & 0xff;
> - else
> - c->x86_cache_mbm_width_offset = -1;
> + c->x86_cache_mbm_width_offset = eax & 0xff;
> + if (c->x86_vendor == X86_VENDOR_AMD &&
> + c->x86_cache_mbm_width_offset == 0) {
> + c->x86_cache_mbm_width_offset =
> + MBM_CNTR_WIDTH_OFFSET_AMD;
> + }
> }
> }
>
> What do you think?

That looks good. But we still need to keep the
default(c->x86_cache_mbm_width_offset = -1;) for non-AMD and non-Intel.
How about this?

diff --git a/arch/x86/kernel/cpu/resctrl/core.c
b/arch/x86/kernel/cpu/resctrl/core.c
index 12f967c6b603..7269bd896ba9 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -983,6 +983,9 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
c->x86_cache_occ_scale = ebx;
if (c->x86_vendor == X86_VENDOR_INTEL)
c->x86_cache_mbm_width_offset = eax & 0xff;
+ else if (c->x86_vendor == X86_VENDOR_AMD)
+ c->x86_cache_mbm_width_offset = eax ? eax & 0xff :
+
MBM_CNTR_WIDTH_OFFSET_AMD;
else
c->x86_cache_mbm_width_offset = -1;
}

2020-06-02 23:30:41

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD

Hi Babu,

On 6/2/2020 3:12 PM, Babu Moger wrote:
>
>
>> -----Original Message-----
>> From: Reinette Chatre <[email protected]>
>> Sent: Tuesday, June 2, 2020 4:51 PM
>> To: Moger, Babu <[email protected]>; [email protected];
>> [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]
>> Subject: Re: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD
>>
>> Hi Babu,
>>
>> On 6/1/2020 4:00 PM, Babu Moger wrote:
>>> Memory bandwidth is calculated reading the monitoring counter
>>> at two intervals and calculating the delta. It is the software’s
>>> responsibility to read the count often enough to avoid having
>>> the count roll over _twice_ between reads.
>>>
>>> The current code hardcodes the bandwidth monitoring counter's width
>>> to 24 bits for AMD. This is due to default base counter width which
>>> is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX
>>> to adjust the counter width. But, the AMD hardware supports much
>>> wider bandwidth counter with the default width of 44 bits.
>>>
>>> Kernel reads these monitoring counters every 1 second and adjusts the
>>> counter value for overflow. With 24 bits and scale value of 64 for AMD,
>>> it can only measure up to 1GB/s without overflowing. For the rates
>>> above 1GB/s this will fail to measure the bandwidth.
>>>
>>> Fix the issue setting the default width to 44 bits by adjusting the
>>> offset.
>>>
>>> AMD future products will implement the CPUID 0xF.[ECX=1]:EAX.
>>>
>>> Signed-off-by: Babu Moger <[email protected]>
>>> ---
>>> - Sending it second time. Email client had some issues first time.
>>> - Generated the patch on top of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git (x86/cache).
>>>
>>> arch/x86/kernel/cpu/resctrl/core.c | 8 +++++++-
>>> arch/x86/kernel/cpu/resctrl/internal.h | 1 +
>>> 2 files changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c
>> b/arch/x86/kernel/cpu/resctrl/core.c
>>> index 12f967c6b603..6040e9ae541b 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>>> @@ -983,7 +983,13 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
>>> c->x86_cache_occ_scale = ebx;
>>> if (c->x86_vendor == X86_VENDOR_INTEL)
>>> c->x86_cache_mbm_width_offset = eax & 0xff;
>>> - else
>>> + else if (c->x86_vendor == X86_VENDOR_AMD) {
>>> + if (eax)
>>
>> This test checks if _any_ bit is set in eax ...
>>
>>> + c->x86_cache_mbm_width_offset = eax & 0xff;
>>
>> ... with the assumption that the first eight bits contain a value.
>>
>> Even so, now that Intel and AMD will be using eax in the same way,
>> perhaps it can be done simpler by always using eax to obtain the offset
>> (and thus avoid the code duplication) and on AMD initialize the default
>> if it cannot be obtained from eax?
>>
>> What I mean is something like:
>>
>> @@ -1024,10 +1024,12 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
>>
>> c->x86_cache_max_rmid = ecx;
>> c->x86_cache_occ_scale = ebx;
>> - if (c->x86_vendor == X86_VENDOR_INTEL)
>> - c->x86_cache_mbm_width_offset = eax & 0xff;
>> - else
>> - c->x86_cache_mbm_width_offset = -1;
>> + c->x86_cache_mbm_width_offset = eax & 0xff;
>> + if (c->x86_vendor == X86_VENDOR_AMD &&
>> + c->x86_cache_mbm_width_offset == 0) {
>> + c->x86_cache_mbm_width_offset =
>> + MBM_CNTR_WIDTH_OFFSET_AMD;
>> + }
>> }
>> }
>>
>> What do you think?
>
> That looks good. But we still need to keep the
> default(c->x86_cache_mbm_width_offset = -1;) for non-AMD and non-Intel.
> How about this?

This original default of -1 was added to deal with AMD when it was not
known to support eax. Now that AMD's support of eax is captured among
the default code I did not find it necessary to keep that considering
resctrl_cpu_detect() is only called on AMD and Intel.

>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> b/arch/x86/kernel/cpu/resctrl/core.c
> index 12f967c6b603..7269bd896ba9 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -983,6 +983,9 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
> c->x86_cache_occ_scale = ebx;
> if (c->x86_vendor == X86_VENDOR_INTEL)
> c->x86_cache_mbm_width_offset = eax & 0xff;
> + else if (c->x86_vendor == X86_VENDOR_AMD)
> + c->x86_cache_mbm_width_offset = eax ? eax & 0xff :

This has the same concern that I mentioned earlier where the contents of
the entire register is used to determine if the first eight bits
contains a value. Did I miss something obvious?

> +
> MBM_CNTR_WIDTH_OFFSET_AMD;
> else
> c->x86_cache_mbm_width_offset = -1;
> }
>

Reinette

2020-06-03 15:06:52

by Moger, Babu

[permalink] [raw]
Subject: RE: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <[email protected]>
> Sent: Tuesday, June 2, 2020 6:28 PM
> To: Moger, Babu <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD
>
> Hi Babu,
>
> On 6/2/2020 3:12 PM, Babu Moger wrote:
> >
> >
> >> -----Original Message-----
> >> From: Reinette Chatre <[email protected]>
> >> Sent: Tuesday, June 2, 2020 4:51 PM
> >> To: Moger, Babu <[email protected]>; [email protected];
> >> [email protected]; [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected]
> >> Subject: Re: [PATCH] x86/resctrl: Fix memory bandwidth counter width for
> AMD
> >>
> >> Hi Babu,
> >>
> >> On 6/1/2020 4:00 PM, Babu Moger wrote:
> >>> Memory bandwidth is calculated reading the monitoring counter
> >>> at two intervals and calculating the delta. It is the software’s
> >>> responsibility to read the count often enough to avoid having
> >>> the count roll over _twice_ between reads.
> >>>
> >>> The current code hardcodes the bandwidth monitoring counter's width
> >>> to 24 bits for AMD. This is due to default base counter width which
> >>> is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX
> >>> to adjust the counter width. But, the AMD hardware supports much
> >>> wider bandwidth counter with the default width of 44 bits.
> >>>
> >>> Kernel reads these monitoring counters every 1 second and adjusts the
> >>> counter value for overflow. With 24 bits and scale value of 64 for AMD,
> >>> it can only measure up to 1GB/s without overflowing. For the rates
> >>> above 1GB/s this will fail to measure the bandwidth.
> >>>
> >>> Fix the issue setting the default width to 44 bits by adjusting the
> >>> offset.
> >>>
> >>> AMD future products will implement the CPUID 0xF.[ECX=1]:EAX.
> >>>
> >>> Signed-off-by: Babu Moger <[email protected]>
> >>> ---
> >>> - Sending it second time. Email client had some issues first time.
> >>> - Generated the patch on top of
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git (x86/cache).
> >>>
> >>> arch/x86/kernel/cpu/resctrl/core.c | 8 +++++++-
> >>> arch/x86/kernel/cpu/resctrl/internal.h | 1 +
> >>> 2 files changed, 8 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> >> b/arch/x86/kernel/cpu/resctrl/core.c
> >>> index 12f967c6b603..6040e9ae541b 100644
> >>> --- a/arch/x86/kernel/cpu/resctrl/core.c
> >>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> >>> @@ -983,7 +983,13 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
> >>> c->x86_cache_occ_scale = ebx;
> >>> if (c->x86_vendor == X86_VENDOR_INTEL)
> >>> c->x86_cache_mbm_width_offset = eax & 0xff;
> >>> - else
> >>> + else if (c->x86_vendor == X86_VENDOR_AMD) {
> >>> + if (eax)
> >>
> >> This test checks if _any_ bit is set in eax ...
> >>
> >>> + c->x86_cache_mbm_width_offset = eax & 0xff;
> >>
> >> ... with the assumption that the first eight bits contain a value.
> >>
> >> Even so, now that Intel and AMD will be using eax in the same way,
> >> perhaps it can be done simpler by always using eax to obtain the offset
> >> (and thus avoid the code duplication) and on AMD initialize the default
> >> if it cannot be obtained from eax?
> >>
> >> What I mean is something like:
> >>
> >> @@ -1024,10 +1024,12 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
> >>
> >> c->x86_cache_max_rmid = ecx;
> >> c->x86_cache_occ_scale = ebx;
> >> - if (c->x86_vendor == X86_VENDOR_INTEL)
> >> - c->x86_cache_mbm_width_offset = eax & 0xff;
> >> - else
> >> - c->x86_cache_mbm_width_offset = -1;
> >> + c->x86_cache_mbm_width_offset = eax & 0xff;
> >> + if (c->x86_vendor == X86_VENDOR_AMD &&
> >> + c->x86_cache_mbm_width_offset == 0) {
> >> + c->x86_cache_mbm_width_offset =
> >> + MBM_CNTR_WIDTH_OFFSET_AMD;
> >> + }
> >> }
> >> }
> >>
> >> What do you think?
> >
> > That looks good. But we still need to keep the
> > default(c->x86_cache_mbm_width_offset = -1;) for non-AMD and non-Intel.
> > How about this?
>
> This original default of -1 was added to deal with AMD when it was not
> known to support eax. Now that AMD's support of eax is captured among
> the default code I did not find it necessary to keep that considering
> resctrl_cpu_detect() is only called on AMD and Intel.

Ok. Sure. Will re-post with changes.

> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> > b/arch/x86/kernel/cpu/resctrl/core.c
> > index 12f967c6b603..7269bd896ba9 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -983,6 +983,9 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
> > c->x86_cache_occ_scale = ebx;
> > if (c->x86_vendor == X86_VENDOR_INTEL)
> > c->x86_cache_mbm_width_offset = eax & 0xff;
> > + else if (c->x86_vendor == X86_VENDOR_AMD)
> > + c->x86_cache_mbm_width_offset = eax ? eax & 0xff :
>
> This has the same concern that I mentioned earlier where the contents of
> the entire register is used to determine if the first eight bits
> contains a value. Did I miss something obvious?

You are right. I will make the change as you suggested. Thanks

>
> > +
> > MBM_CNTR_WIDTH_OFFSET_AMD;
> > else
> > c->x86_cache_mbm_width_offset = -1;
> > }
> >
>
> Reinette