2020-02-14 18:17:59

by James Morse

[permalink] [raw]
Subject: [PATCH v2] x86/resctrl: Preserve CDP enable over cpuhp

Resctrl assumes that all CPUs are online when the filesystem is
mounted, and that CPUs remember their CDP-enabled state over CPU
hotplug.

This goes wrong when resctrl's CDP-enabled state changes while all
the CPUs in a domain are offline.

When a domain comes online, enable (or disable!) CDP to match resctrl's
current setting.

Fixes: 5ff193fbde20 ("x86/intel_rdt: Add basic resctrl filesystem support")
Suggested-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>

---
Changes since v1:
* Explicitly test for L2/L3 resources to ignore duplicate calls.
* Poke the LxDATA resources to avoid confusing CDP-off with CDP-unsupported.
* Moved code to rdtgroup.c for fewer exported functions.

v1: lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/core.c | 2 ++
arch/x86/kernel/cpu/resctrl/internal.h | 1 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 18 ++++++++++++++++++
3 files changed, 21 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 89049b343c7a..d8cc5223b7ce 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -578,6 +578,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
d->id = id;
cpumask_set_cpu(cpu, &d->cpu_mask);

+ rdt_domain_reconfigure_cdp(r);
+
if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
kfree(d);
return;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 181c992f448c..3dd13f3a8b23 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -601,5 +601,6 @@ bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
void __check_limbo(struct rdt_domain *d, bool force_free);
bool cbm_validate_intel(char *buf, u32 *data, struct rdt_resource *r);
bool cbm_validate_amd(char *buf, u32 *data, struct rdt_resource *r);
+void rdt_domain_reconfigure_cdp(struct rdt_resource *r);

#endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 064e9ef44cd6..5967320a1951 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1831,6 +1831,9 @@ static int set_cache_qos_cfg(int level, bool enable)
struct rdt_domain *d;
int cpu;

+ /* CDP state is restored during cpuhp, which takes this lock */
+ lockdep_assert_held(&rdtgroup_mutex);
+
if (level == RDT_RESOURCE_L3)
update = l3_qos_cfg_update;
else if (level == RDT_RESOURCE_L2)
@@ -1859,6 +1862,21 @@ static int set_cache_qos_cfg(int level, bool enable)
return 0;
}

+/* Restore the qos cfg state when a package comes online */
+void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
+{
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ if (!r->alloc_capable)
+ return;
+
+ if (r == &rdt_resources_all[RDT_RESOURCE_L2DATA])
+ l2_qos_cfg_update(&r->alloc_enabled);
+
+ if (r == &rdt_resources_all[RDT_RESOURCE_L3DATA])
+ l3_qos_cfg_update(&r->alloc_enabled);
+}
+
/*
* Enable or disable the MBA software controller
* which helps user specify bandwidth in MBps.
--
2.24.1


2020-02-14 19:26:27

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v2] x86/resctrl: Preserve CDP enable over cpuhp

Hi James,

On 2/14/2020 10:16 AM, James Morse wrote:
> Resctrl assumes that all CPUs are online when the filesystem is
> mounted, and that CPUs remember their CDP-enabled state over CPU
> hotplug.
>
> This goes wrong when resctrl's CDP-enabled state changes while all
> the CPUs in a domain are offline.
>
> When a domain comes online, enable (or disable!) CDP to match resctrl's
> current setting.
>
> Fixes: 5ff193fbde20 ("x86/intel_rdt: Add basic resctrl filesystem support")
> Suggested-by: Reinette Chatre <[email protected]>
> Signed-off-by: James Morse <[email protected]>
>
> ---
> Changes since v1:
> * Explicitly test for L2/L3 resources to ignore duplicate calls.
> * Poke the LxDATA resources to avoid confusing CDP-off with CDP-unsupported.
> * Moved code to rdtgroup.c for fewer exported functions.
>
> v1: lore.kernel.org/r/[email protected]
> ---
> arch/x86/kernel/cpu/resctrl/core.c | 2 ++
> arch/x86/kernel/cpu/resctrl/internal.h | 1 +
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 18 ++++++++++++++++++
> 3 files changed, 21 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 89049b343c7a..d8cc5223b7ce 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -578,6 +578,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
> d->id = id;
> cpumask_set_cpu(cpu, &d->cpu_mask);
>
> + rdt_domain_reconfigure_cdp(r);
> +
> if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
> kfree(d);
> return;
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 181c992f448c..3dd13f3a8b23 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -601,5 +601,6 @@ bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
> void __check_limbo(struct rdt_domain *d, bool force_free);
> bool cbm_validate_intel(char *buf, u32 *data, struct rdt_resource *r);
> bool cbm_validate_amd(char *buf, u32 *data, struct rdt_resource *r);
> +void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>
> #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 064e9ef44cd6..5967320a1951 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1831,6 +1831,9 @@ static int set_cache_qos_cfg(int level, bool enable)
> struct rdt_domain *d;
> int cpu;
>
> + /* CDP state is restored during cpuhp, which takes this lock */
> + lockdep_assert_held(&rdtgroup_mutex);
> +

I think this hunk can be dropped. (1) The code path where this
annotation is added is not part of this fix. (2) The comment implies
that the taking of the mutex is something new/unique added in the CPU
hotplug path but that is not accurate since this mutex is also taken in
the only other existing call path of this snippet that is handling the
mounting of the filesystem.

You do mention that these annotations is helpful for the MPAM work.
Could the annotations instead be added as a separate patch forming part
of that work?

> if (level == RDT_RESOURCE_L3)
> update = l3_qos_cfg_update;
> else if (level == RDT_RESOURCE_L2)
> @@ -1859,6 +1862,21 @@ static int set_cache_qos_cfg(int level, bool enable)
> return 0;
> }
>
> +/* Restore the qos cfg state when a package comes online */

s/package/domain/? When, for example, considering L2 then "package" is
not the right term to use.

> +void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
> +{
> + lockdep_assert_held(&rdtgroup_mutex);
> +
> + if (!r->alloc_capable)
> + return;
> +
> + if (r == &rdt_resources_all[RDT_RESOURCE_L2DATA])
> + l2_qos_cfg_update(&r->alloc_enabled);
> +
> + if (r == &rdt_resources_all[RDT_RESOURCE_L3DATA])
> + l3_qos_cfg_update(&r->alloc_enabled);
> +}
> +
> /*
> * Enable or disable the MBA software controller
> * which helps user specify bandwidth in MBps.
>

Thank you very much.

Reinette

2020-02-21 15:27:13

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v2] x86/resctrl: Preserve CDP enable over cpuhp

Hi Reinette,

On 14/02/2020 19:24, Reinette Chatre wrote:
> On 2/14/2020 10:16 AM, James Morse wrote:
>> Resctrl assumes that all CPUs are online when the filesystem is
>> mounted, and that CPUs remember their CDP-enabled state over CPU
>> hotplug.
>>
>> This goes wrong when resctrl's CDP-enabled state changes while all
>> the CPUs in a domain are offline.
>>
>> When a domain comes online, enable (or disable!) CDP to match resctrl's
>> current setting.

>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 064e9ef44cd6..5967320a1951 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1831,6 +1831,9 @@ static int set_cache_qos_cfg(int level, bool enable)
>> struct rdt_domain *d;
>> int cpu;
>>
>> + /* CDP state is restored during cpuhp, which takes this lock */
>> + lockdep_assert_held(&rdtgroup_mutex);
>> +
>
> I think this hunk can be dropped. (1) The code path where this
> annotation is added is not part of this fix. (2) The comment implies
> that the taking of the mutex is something new/unique added in the CPU
> hotplug path but that is not accurate since this mutex is also taken in
> the only other existing call path of this snippet that is handling the
> mounting of the filesystem.

These things answer the question: "what stops rdt_domain_reconfigure_cdp() racing with
set_cache_qos_cfg() on the mount path, causing the wrong value to be restored?".

We can try and answer that in the commit message, or comments, but these will quickly be
lost, stale, or wrong.

These annotations serve as a comment, and let lockdep check its still true.
(I think you can never have enough lockdep annotations!)


> You do mention that these annotations is helpful for the MPAM work.

Indeed, it splits up the, er, "big RDT mutex", these annotations mean lockdep catches me
out if I do something wrong, and makes it very clear when changing something subtle.


> Could the annotations instead be added as a separate patch forming part
> of that work?

Ideally these things are there from the beginning. Adding them over time as part of other
reviewed patches works. I don't think adding them in one go before refactoring helps: you
wouldn't have the confidence that they were correct in the first place.

I'll drop these.


>> if (level == RDT_RESOURCE_L3)
>> update = l3_qos_cfg_update;
>> else if (level == RDT_RESOURCE_L2)
>> @@ -1859,6 +1862,21 @@ static int set_cache_qos_cfg(int level, bool enable)
>> return 0;
>> }
>>
>> +/* Restore the qos cfg state when a package comes online */
>
> s/package/domain/? When, for example, considering L2 then "package" is
> not the right term to use.

Sure,


Thanks,

James

2020-02-21 17:07:43

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v2] x86/resctrl: Preserve CDP enable over cpuhp

Hi James,

On 2/21/2020 7:25 AM, James Morse wrote:
> Hi Reinette,
>
> On 14/02/2020 19:24, Reinette Chatre wrote:
>> On 2/14/2020 10:16 AM, James Morse wrote:
>>> Resctrl assumes that all CPUs are online when the filesystem is
>>> mounted, and that CPUs remember their CDP-enabled state over CPU
>>> hotplug.
>>>
>>> This goes wrong when resctrl's CDP-enabled state changes while all
>>> the CPUs in a domain are offline.
>>>
>>> When a domain comes online, enable (or disable!) CDP to match resctrl's
>>> current setting.
>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> index 064e9ef44cd6..5967320a1951 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> @@ -1831,6 +1831,9 @@ static int set_cache_qos_cfg(int level, bool enable)
>>> struct rdt_domain *d;
>>> int cpu;
>>>
>>> + /* CDP state is restored during cpuhp, which takes this lock */
>>> + lockdep_assert_held(&rdtgroup_mutex);
>>> +
>>
>> I think this hunk can be dropped. (1) The code path where this
>> annotation is added is not part of this fix. (2) The comment implies
>> that the taking of the mutex is something new/unique added in the CPU
>> hotplug path but that is not accurate since this mutex is also taken in
>> the only other existing call path of this snippet that is handling the
>> mounting of the filesystem.
>
> These things answer the question: "what stops rdt_domain_reconfigure_cdp() racing with
> set_cache_qos_cfg() on the mount path, causing the wrong value to be restored?".
>
> We can try and answer that in the commit message, or comments, but these will quickly be
> lost, stale, or wrong.
>
> These annotations serve as a comment, and let lockdep check its still true.
> (I think you can never have enough lockdep annotations!)

I agree that lockdep annotations are valuable. My comment was specific
to this one hunk, not all lockdep annotations in your patch. Please
consider my comment in the spirit of patch guidance (per
Documentation/process/submitting-patches.rst) noting that all logical
changes should be in separate patches. This specific hunk is unrelated
to the bug being fixed in this patch but can surely be done in a
separate patch submitted together with this fix.

>> You do mention that these annotations is helpful for the MPAM work.
>
> Indeed, it splits up the, er, "big RDT mutex", these annotations mean lockdep catches me
> out if I do something wrong, and makes it very clear when changing something subtle.
>
>
>> Could the annotations instead be added as a separate patch forming part
>> of that work?
>
> Ideally these things are there from the beginning. Adding them over time as part of other
> reviewed patches works. I don't think adding them in one go before refactoring helps: you
> wouldn't have the confidence that they were correct in the first place.
>
> I'll drop these.

My comment was just specific to the one lockdep annotation added to an
area that was unrelated to the bugfix. I noticed that you removed all
annotations in your new version, that was not my intention. You could
surely keep the lockdep annotation that is in the new code path
introduced in this fix and a separate patch with the other lockdep
annotation would also be welcome (with accurate comment).

Thank you

Reinette