2022-02-17 19:47:25

by James Morse

[permalink] [raw]
Subject: [PATCH v3 07/21] x86/resctrl: Create mba_sc configuration in the rdt_domain

To support resctrl's MBA software controller, the architecture must provide
a second configuration array to hold the mbps_val[] from user-space.

This complicates the interface between the architecture specific code and
the filesystem portions of resctrl that will move to /fs/, to allow
multiple architectures to support resctrl.

Make the filesystem parts of resctrl create an array for the mba_sc
values when is_mba_sc() is set to true. The software controller
can be changed to use this, allowing the architecture code to only
consider the values configured in hardware.

Signed-off-by: James Morse <[email protected]>
---
Changes since v2:
* Split patch in two, the liftime parts are a separate patch.
* Added reset in set_mba_sc() now that we can't depend on the lifetime.
* Initialise ret in mba_sc_allocate(),
* Made mbps_val allocation/freeing symmetric for cpuhp calls.
* Removed reference to squashed-out struct.
* Preserved kerneldoc for mbps_val.

Changes since v1:
* Added missing error handling to mba_sc_domain_allocate() in
domain_setup_mon_state()
* Added comment about mba_sc_domain_allocate() races
* Squashed out struct resctrl_mba_sc
* Moved mount time alloc/free calls to set_mba_sc().
* Removed mount check in resctrl_offline_domain()
* Reword commit message
---
arch/x86/kernel/cpu/resctrl/internal.h | 1 -
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 41 ++++++++++++++++++++++++++
include/linux/resctrl.h | 7 +++++
3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e12b55f815bf..a7e2cbce29d5 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -36,7 +36,6 @@
#define MBM_OVERFLOW_INTERVAL 1000
#define MAX_MBA_BW 100u
#define MBA_IS_LINEAR 0x4
-#define MBA_MAX_MBPS U32_MAX
#define MAX_MBA_BW_AMD 0x800
#define MBM_CNTR_WIDTH_OFFSET_AMD 20

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 794a84ba9097..e4313f907eb6 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1889,6 +1889,30 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
l3_qos_cfg_update(&hw_res->cdp_enabled);
}

+static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_domain *d)
+{
+ u32 num_closid = resctrl_arch_get_num_closid(r);
+ int cpu = cpumask_any(&d->cpu_mask);
+ int i;
+
+ d->mbps_val = kcalloc_node(num_closid, sizeof(*d->mbps_val),
+ GFP_KERNEL, cpu_to_node(cpu));
+ if (!d->mbps_val)
+ return -ENOMEM;
+
+ for (i = 0; i < num_closid; i++)
+ d->mbps_val[i] = MBA_MAX_MBPS;
+
+ return 0;
+}
+
+static void mba_sc_domain_destroy(struct rdt_resource *r,
+ struct rdt_domain *d)
+{
+ kfree(d->mbps_val);
+ d->mbps_val = NULL;
+}
+
/*
* Enable or disable the MBA software controller
* which helps user specify bandwidth in MBps.
@@ -1898,6 +1922,9 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
static int set_mba_sc(bool mba_sc)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
+ u32 num_closid = resctrl_arch_get_num_closid(r);
+ struct rdt_domain *d;
+ int i;

if (!is_mbm_enabled() || !is_mba_linear() ||
mba_sc == is_mba_sc(r))
@@ -1905,6 +1932,11 @@ static int set_mba_sc(bool mba_sc)

r->membw.mba_sc = mba_sc;

+ list_for_each_entry(d, &r->domains, list) {
+ for (i = 0; i < num_closid; i++)
+ d->mbps_val[i] = MBA_MAX_MBPS;
+ }
+
return 0;
}

@@ -3263,6 +3295,9 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
cancel_delayed_work(&d->cqm_limbo);
}

+ if (is_mba_sc(r))
+ mba_sc_domain_destroy(r, d);
+
domain_destroy_mon_state(d);
}

@@ -3309,6 +3344,12 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
if (err)
return err;

+ err = mba_sc_domain_allocate(r, d);
+ if (err) {
+ domain_destroy_mon_state(d);
+ return err;
+ }
+
if (is_mbm_enabled()) {
INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 5d283bdd6162..46ab9fb5562e 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -15,6 +15,9 @@ int proc_resctrl_show(struct seq_file *m,

#endif

+/* max value for struct rdt_domain's mbps_val */
+#define MBA_MAX_MBPS U32_MAX
+
/**
* enum resctrl_conf_type - The type of configuration.
* @CDP_NONE: No prioritisation, both code and data are controlled or monitored.
@@ -53,6 +56,9 @@ struct resctrl_staged_config {
* @cqm_work_cpu: worker CPU for CQM h/w counters
* @plr: pseudo-locked region (if any) associated with domain
* @staged_config: parsed configuration to be applied
+ * @mbps_val: When mba_sc is enabled, this holds the array of user
+ * specified control values for mba_sc in MBps, indexed
+ * by closid
*/
struct rdt_domain {
struct list_head list;
@@ -67,6 +73,7 @@ struct rdt_domain {
int cqm_work_cpu;
struct pseudo_lock_region *plr;
struct resctrl_staged_config staged_config[CDP_NUM_TYPES];
+ u32 *mbps_val;
};

/**
--
2.30.2


2022-03-05 00:41:59

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v3 07/21] x86/resctrl: Create mba_sc configuration in the rdt_domain

Hi James,

On 2/17/2022 10:20 AM, James Morse wrote:
> To support resctrl's MBA software controller, the architecture must provide
> a second configuration array to hold the mbps_val[] from user-space.
>
> This complicates the interface between the architecture specific code and
> the filesystem portions of resctrl that will move to /fs/, to allow
> multiple architectures to support resctrl.
>
> Make the filesystem parts of resctrl create an array for the mba_sc
> values when is_mba_sc() is set to true. The software controller
> can be changed to use this, allowing the architecture code to only
> consider the values configured in hardware.
>
> Signed-off-by: James Morse <[email protected]>
> ---
> Changes since v2:
> * Split patch in two, the liftime parts are a separate patch.
> * Added reset in set_mba_sc() now that we can't depend on the lifetime.
> * Initialise ret in mba_sc_allocate(),
> * Made mbps_val allocation/freeing symmetric for cpuhp calls.
> * Removed reference to squashed-out struct.
> * Preserved kerneldoc for mbps_val.
>
> Changes since v1:
> * Added missing error handling to mba_sc_domain_allocate() in
> domain_setup_mon_state()
> * Added comment about mba_sc_domain_allocate() races
> * Squashed out struct resctrl_mba_sc
> * Moved mount time alloc/free calls to set_mba_sc().
> * Removed mount check in resctrl_offline_domain()
> * Reword commit message
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 1 -
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 41 ++++++++++++++++++++++++++
> include/linux/resctrl.h | 7 +++++
> 3 files changed, 48 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index e12b55f815bf..a7e2cbce29d5 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -36,7 +36,6 @@
> #define MBM_OVERFLOW_INTERVAL 1000
> #define MAX_MBA_BW 100u
> #define MBA_IS_LINEAR 0x4
> -#define MBA_MAX_MBPS U32_MAX
> #define MAX_MBA_BW_AMD 0x800
> #define MBM_CNTR_WIDTH_OFFSET_AMD 20
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 794a84ba9097..e4313f907eb6 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1889,6 +1889,30 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
> l3_qos_cfg_update(&hw_res->cdp_enabled);
> }
>
> +static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_domain *d)
> +{
> + u32 num_closid = resctrl_arch_get_num_closid(r);
> + int cpu = cpumask_any(&d->cpu_mask);
> + int i;
> +
> + d->mbps_val = kcalloc_node(num_closid, sizeof(*d->mbps_val),
> + GFP_KERNEL, cpu_to_node(cpu));
> + if (!d->mbps_val)
> + return -ENOMEM;
> +
> + for (i = 0; i < num_closid; i++)
> + d->mbps_val[i] = MBA_MAX_MBPS;
> +
> + return 0;
> +}
> +
> +static void mba_sc_domain_destroy(struct rdt_resource *r,
> + struct rdt_domain *d)
> +{
> + kfree(d->mbps_val);
> + d->mbps_val = NULL;
> +}
> +


...

> @@ -3263,6 +3295,9 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
> cancel_delayed_work(&d->cqm_limbo);
> }
>
> + if (is_mba_sc(r))
> + mba_sc_domain_destroy(r, d);
> +
> domain_destroy_mon_state(d);
> }
>
> @@ -3309,6 +3344,12 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
> if (err)
> return err;
>
> + err = mba_sc_domain_allocate(r, d);
> + if (err) {
> + domain_destroy_mon_state(d);
> + return err;
> + }
> +

Thank you for making this all symmetrical. It seems as though the new
array is always created but only destroyed when is_mba_sc() is true.
Is this correct?

Reinette

2022-03-17 03:34:48

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v3 07/21] x86/resctrl: Create mba_sc configuration in the rdt_domain

Hi James,

I tried out this work and encountered a null pointer de-reference that
seems related to this patch. After digging into that it is not
clear to me how this is expected to work.

I encounter the issue just by attempting to mount with "-o mba_MBps" which is
the way to enable the mba_sc and exactly what this patch aims to address.

More below ...

On 2/17/2022 10:20 AM, James Morse wrote:
> To support resctrl's MBA software controller, the architecture must provide
> a second configuration array to hold the mbps_val[] from user-space.
>
> This complicates the interface between the architecture specific code and
> the filesystem portions of resctrl that will move to /fs/, to allow
> multiple architectures to support resctrl.
>
> Make the filesystem parts of resctrl create an array for the mba_sc
> values when is_mba_sc() is set to true. The software controller
> can be changed to use this, allowing the architecture code to only
> consider the values configured in hardware.
>
> Signed-off-by: James Morse <[email protected]>
> ---

...

>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 794a84ba9097..e4313f907eb6 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1889,6 +1889,30 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
> l3_qos_cfg_update(&hw_res->cdp_enabled);
> }
>
> +static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_domain *d)
> +{
> + u32 num_closid = resctrl_arch_get_num_closid(r);
> + int cpu = cpumask_any(&d->cpu_mask);
> + int i;
> +
> + d->mbps_val = kcalloc_node(num_closid, sizeof(*d->mbps_val),
> + GFP_KERNEL, cpu_to_node(cpu));
> + if (!d->mbps_val)
> + return -ENOMEM;
> +
> + for (i = 0; i < num_closid; i++)
> + d->mbps_val[i] = MBA_MAX_MBPS;
> +
> + return 0;
> +}
> +

...

> @@ -1905,6 +1932,11 @@ static int set_mba_sc(bool mba_sc)
>
> r->membw.mba_sc = mba_sc;
>
> + list_for_each_entry(d, &r->domains, list) {
> + for (i = 0; i < num_closid; i++)
> + d->mbps_val[i] = MBA_MAX_MBPS;
> + }
> +
> return 0;
> }
>

...

> @@ -3309,6 +3344,12 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
> if (err)
> return err;
>
> + err = mba_sc_domain_allocate(r, d);
> + if (err) {
> + domain_destroy_mon_state(d);
> + return err;
> + }
> +

Before the above snippet there is a check if the resource is capable of monitoring:

resctrl_online_domain()
{
...
if (!r->mon_capable)
return 0;

...
err = mba_sc_domain_allocate(r, d);
...
}

Thus, the rdt_domain->mbps_val array will only exist in those resources that
support monitoring.

Taking a look at where mon_capable is set we see it is done in
get_rdt_mon_resources() and as you can see it is only done for RDT_RESOURCE_L3.

get_rdt_mon_resources()
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;

...

return !rdt_get_mon_l3_config(r); /* mon_capable is set within */
}

Based on the above the rdt_domain->mbps_val array can only exist for those
domains that belong to resource RDT_RESOURCE_L3 (if it is capable of monitoring).

Now, looking at set_mba_sc() changed here, it only interacts with RDT_RESOURCE_MBA:

set_mba_sc()
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;

...

list_for_each_entry(d, &r->domains, list) {
for (i = 0; i < num_closid; i++)
d->mbps_val[i] = MBA_MAX_MBPS;
}
}

Considering that no domain belonging to RDT_RESOURCE_MBA will have this array this
always ends up being a null pointer de-reference.

Reinette

2022-03-31 04:11:19

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v3 07/21] x86/resctrl: Create mba_sc configuration in the rdt_domain

Hi Reinette,

On 05/03/2022 00:26, Reinette Chatre wrote:
> On 2/17/2022 10:20 AM, James Morse wrote:
>> To support resctrl's MBA software controller, the architecture must provide
>> a second configuration array to hold the mbps_val[] from user-space.
>>
>> This complicates the interface between the architecture specific code and
>> the filesystem portions of resctrl that will move to /fs/, to allow
>> multiple architectures to support resctrl.
>>
>> Make the filesystem parts of resctrl create an array for the mba_sc
>> values when is_mba_sc() is set to true. The software controller
>> can be changed to use this, allowing the architecture code to only
>> consider the values configured in hardware.

...

>> @@ -3263,6 +3295,9 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
>> cancel_delayed_work(&d->cqm_limbo);
>> }
>>
>> + if (is_mba_sc(r))
>> + mba_sc_domain_destroy(r, d);
>> +
>> domain_destroy_mon_state(d);
>> }
>>
>> @@ -3309,6 +3344,12 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
>> if (err)
>> return err;
>>
>> + err = mba_sc_domain_allocate(r, d);
>> + if (err) {
>> + domain_destroy_mon_state(d);
>> + return err;
>> + }
>> +

> Thank you for making this all symmetrical. It seems as though the new
> array is always created but only destroyed when is_mba_sc() is true.
> Is this correct?

That looks broken. Oops.

I'll fix it to always allocate the array, as that is what domain_setup_ctrlval() does
today, and it saves the hotplug headache.


Thanks,

James

2022-03-31 04:32:38

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v3 07/21] x86/resctrl: Create mba_sc configuration in the rdt_domain

Hi Reinette,

On 16/03/2022 21:50, Reinette Chatre wrote:
> I tried out this work and encountered a null pointer de-reference that
> seems related to this patch. After digging into that it is not
> clear to me how this is expected to work.
>
> I encounter the issue just by attempting to mount with "-o mba_MBps" which is
> the way to enable the mba_sc and exactly what this patch aims to address.
>
> More below ...
>
> On 2/17/2022 10:20 AM, James Morse wrote:
>> To support resctrl's MBA software controller, the architecture must provide
>> a second configuration array to hold the mbps_val[] from user-space.
>>
>> This complicates the interface between the architecture specific code and
>> the filesystem portions of resctrl that will move to /fs/, to allow
>> multiple architectures to support resctrl.
>>
>> Make the filesystem parts of resctrl create an array for the mba_sc
>> values when is_mba_sc() is set to true. The software controller
>> can be changed to use this, allowing the architecture code to only
>> consider the values configured in hardware.

...

>> @@ -3309,6 +3344,12 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
>> if (err)
>> return err;
>>
>> + err = mba_sc_domain_allocate(r, d);
>> + if (err) {
>> + domain_destroy_mon_state(d);
>> + return err;
>> + }
>> +
>
> Before the above snippet there is a check if the resource is capable of monitoring:
>
> resctrl_online_domain()
> {
> ...
> if (!r->mon_capable)
> return 0;
>
> ...
> err = mba_sc_domain_allocate(r, d);
> ...
> }
>
> Thus, the rdt_domain->mbps_val array will only exist in those resources that
> support monitoring.
>
> Taking a look at where mon_capable is set we see it is done in
> get_rdt_mon_resources() and as you can see it is only done for RDT_RESOURCE_L3.
>
> get_rdt_mon_resources()
> {
> struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>
> ...
>
> return !rdt_get_mon_l3_config(r); /* mon_capable is set within */
> }
>
> Based on the above the rdt_domain->mbps_val array can only exist for those
> domains that belong to resource RDT_RESOURCE_L3 (if it is capable of monitoring).
>
> Now, looking at set_mba_sc() changed here, it only interacts with RDT_RESOURCE_MBA:
>
> set_mba_sc()
> {
> struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
>
> ...
>
> list_for_each_entry(d, &r->domains, list) {
> for (i = 0; i < num_closid; i++)
> d->mbps_val[i] = MBA_MAX_MBPS;
> }
> }
>
> Considering that no domain belonging to RDT_RESOURCE_MBA will have this array this
> always ends up being a null pointer de-reference.

Ugh. I'm not sure how I managed to miss that. Thanks for debugging it!

That loop was added to reset the array when the filesystem is mounted, as it may hold
stale values from a previous mount of the filesystem. Its currently done by
reset_all_ctrls(), but that function should really belong to the architecture code.

Because mbm_handle_overflow() always passes a domain from the L3 to update_mba_bw(), I
think the cleanest thing to do is move the reset to a helper that always operates on the
L3 array. (and leave some breadcrumbs in the comments).


Thanks!

James

-----------------%<-----------------
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 36edae7dbc6a..3b52f079a5b3 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1913,6 +1913,23 @@ static void mba_sc_domain_destroy(struct rdt_resource *r,
d->mbps_val = NULL;
}

+static void mba_sc_reset(void)
+{
+ /*
+ * mbm_handle_overflow() only passes domains of the L3 resource to
+ * update_mba_bw(), so mba_sc only supports monitoring on the L3.
+ */
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ u32 num_closid = resctrl_arch_get_num_closid(r);
+ struct rdt_domain *d;
+ int i;
+
+ list_for_each_entry(d, &r->domains, list) {
+ for (i = 0; i < num_closid; i++)
+ d->mbps_val[i] = MBA_MAX_MBPS;
+ }
+}
+
/*
* Enable or disable the MBA software controller
* which helps user specify bandwidth in MBps.
@@ -1922,20 +1939,13 @@ static void mba_sc_domain_destroy(struct rdt_resource *r,
static int set_mba_sc(bool mba_sc)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
- u32 num_closid = resctrl_arch_get_num_closid(r);
- struct rdt_domain *d;
- int i;

if (!is_mbm_enabled() || !is_mba_linear() ||
mba_sc == is_mba_sc(r))
return -EINVAL;

r->membw.mba_sc = mba_sc;
-
- list_for_each_entry(d, &r->domains, list) {
- for (i = 0; i < num_closid; i++)
- d->mbps_val[i] = MBA_MAX_MBPS;
- }
+ mba_sc_reset();

return 0;
}
-----------------%<-----------------

2022-04-05 01:16:33

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v3 07/21] x86/resctrl: Create mba_sc configuration in the rdt_domain

Hi James,

On 4/4/2022 9:35 AM, James Morse wrote:
> Hi Reinette,
>
> On 4/1/22 23:54, Reinette Chatre wrote:
>> On 3/30/2022 9:43 AM, James Morse wrote:
>>> On 16/03/2022 21:50, Reinette Chatre wrote:
>>>> On 2/17/2022 10:20 AM, James Morse wrote:
>>>>> To support resctrl's MBA software controller, the architecture must provide
>>>>> a second configuration array to hold the mbps_val[] from user-space.
>>>>>
>>>>> This complicates the interface between the architecture specific code and
>>>>> the filesystem portions of resctrl that will move to /fs/, to allow
>>>>> multiple architectures to support resctrl.
>>>>>
>>>>> Make the filesystem parts of resctrl create an array for the mba_sc
>>>>> values when is_mba_sc() is set to true. The software controller
>>>>> can be changed to use this, allowing the architecture code to only
>>>>> consider the values configured in hardware.
>
> [...]
>
>>>> Considering that no domain belonging to RDT_RESOURCE_MBA will have this array this
>>>> always ends up being a null pointer de-reference.
>>>
>>> Ugh. I'm not sure how I managed to miss that. Thanks for debugging it!
>>>
>>> That loop was added to reset the array when the filesystem is mounted, as it may hold
>>> stale values from a previous mount of the filesystem. Its currently done by
>>> reset_all_ctrls(), but that function should really belong to the architecture code.
>>>
>>> Because mbm_handle_overflow() always passes a domain from the L3 to update_mba_bw(), I
>>> think the cleanest thing to do is move the reset to a helper that always operates on the
>>> L3 array. (and leave some breadcrumbs in the comments).
>
>> I think this points to more than a need to reset the correct array on mount/unmount ... or
>> perhaps I am not understanding this correctly?
>>
>> As the analysis above shows the mbps_val array only exists for rdt_domains associated
>> with RDT_RESOURCE_L3 but yet mbps_val will contain the MB value provided by user space
>> associated with RDT_RESOURCE_MBA.
>
> I've finally got my head round what is going on here: I've muddled up whether mon_capable
> is a resource or system property. mba_sc depends on the L3 being mon_capable, but the
> configuration should be associated with MBA (wherever that is).
> (basically ignore my previous reply!)
>
> The creation of the mbps_val[] should depend on supports_mba_mbps(), which uses
> is_mbm_enabled() to check whether the L3 is mon_capable. I'll check the rid too
> to make it clear its only MBA that has this.
> The call to allocate the domain in resctrl_online_domain() should be above the mon_capable
> check. (which is still needed to avoid the work guarded by is_mbm_enabled() and friends
> running for each domain).
>
>
> Thanks,
>
> James
>
> -----------------------%<-----------------------
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index e3c90f33baf2..ad0411eb2147 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -3345,6 +3345,14 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
>  
>         lockdep_assert_held(&rdtgroup_mutex);
>  
> +       if (is_mbm_enabled() && r->rid == RDT_RESOURCE_MBA) {
> +               err = mba_sc_domain_allocate(r, d);
> +               if (err) {
> +                       domain_destroy_mon_state(d);
> +                       return err;
> +               }
> +       }
> +
>         if (!r->mon_capable)
>                 return 0;
>  
> @@ -3352,12 +3360,6 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
>         if (err)
>                 return err;
>  
> -       err = mba_sc_domain_allocate(r, d);
> -       if (err) {
> -               domain_destroy_mon_state(d);
> -               return err;
> -       }
> -
>         if (is_mbm_enabled()) {
>                 INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
>                 mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
> -----------------------%<-----------------------

Thank you. Having mbps_val be part of RDT_RESOURCE_MBA makes things clear to me.

In the snippet you provide you may need to check the error path cleanup, I do not think
domain_destroy_mon_state(d) would be needed after moving the allocation earlier.

Reinette

2022-04-05 01:28:24

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v3 07/21] x86/resctrl: Create mba_sc configuration in the rdt_domain

Hi Reinette,

On 4/1/22 23:54, Reinette Chatre wrote:
> On 3/30/2022 9:43 AM, James Morse wrote:
>> On 16/03/2022 21:50, Reinette Chatre wrote:
>>> On 2/17/2022 10:20 AM, James Morse wrote:
>>>> To support resctrl's MBA software controller, the architecture must provide
>>>> a second configuration array to hold the mbps_val[] from user-space.
>>>>
>>>> This complicates the interface between the architecture specific code and
>>>> the filesystem portions of resctrl that will move to /fs/, to allow
>>>> multiple architectures to support resctrl.
>>>>
>>>> Make the filesystem parts of resctrl create an array for the mba_sc
>>>> values when is_mba_sc() is set to true. The software controller
>>>> can be changed to use this, allowing the architecture code to only
>>>> consider the values configured in hardware.

[...]

>>> Considering that no domain belonging to RDT_RESOURCE_MBA will have this array this
>>> always ends up being a null pointer de-reference.
>>
>> Ugh. I'm not sure how I managed to miss that. Thanks for debugging it!
>>
>> That loop was added to reset the array when the filesystem is mounted, as it may hold
>> stale values from a previous mount of the filesystem. Its currently done by
>> reset_all_ctrls(), but that function should really belong to the architecture code.
>>
>> Because mbm_handle_overflow() always passes a domain from the L3 to update_mba_bw(), I
>> think the cleanest thing to do is move the reset to a helper that always operates on the
>> L3 array. (and leave some breadcrumbs in the comments).

> I think this points to more than a need to reset the correct array on mount/unmount ... or
> perhaps I am not understanding this correctly?
>
> As the analysis above shows the mbps_val array only exists for rdt_domains associated
> with RDT_RESOURCE_L3 but yet mbps_val will contain the MB value provided by user space
> associated with RDT_RESOURCE_MBA.

I've finally got my head round what is going on here: I've muddled up whether mon_capable
is a resource or system property. mba_sc depends on the L3 being mon_capable, but the
configuration should be associated with MBA (wherever that is).
(basically ignore my previous reply!)

The creation of the mbps_val[] should depend on supports_mba_mbps(), which uses
is_mbm_enabled() to check whether the L3 is mon_capable. I'll check the rid too
to make it clear its only MBA that has this.
The call to allocate the domain in resctrl_online_domain() should be above the mon_capable
check. (which is still needed to avoid the work guarded by is_mbm_enabled() and friends
running for each domain).


Thanks,

James

-----------------------%<-----------------------
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index e3c90f33baf2..ad0411eb2147 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3345,6 +3345,14 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)

lockdep_assert_held(&rdtgroup_mutex);

+ if (is_mbm_enabled() && r->rid == RDT_RESOURCE_MBA) {
+ err = mba_sc_domain_allocate(r, d);
+ if (err) {
+ domain_destroy_mon_state(d);
+ return err;
+ }
+ }
+
if (!r->mon_capable)
return 0;

@@ -3352,12 +3360,6 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
if (err)
return err;

- err = mba_sc_domain_allocate(r, d);
- if (err) {
- domain_destroy_mon_state(d);
- return err;
- }
-
if (is_mbm_enabled()) {
INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
-----------------------%<-----------------------

2022-04-05 02:03:37

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v3 07/21] x86/resctrl: Create mba_sc configuration in the rdt_domain

Hi James,

On 3/30/2022 9:43 AM, James Morse wrote:
> Hi Reinette,
>
> On 16/03/2022 21:50, Reinette Chatre wrote:
>> I tried out this work and encountered a null pointer de-reference that
>> seems related to this patch. After digging into that it is not
>> clear to me how this is expected to work.
>>
>> I encounter the issue just by attempting to mount with "-o mba_MBps" which is
>> the way to enable the mba_sc and exactly what this patch aims to address.
>>
>> More below ...
>>
>> On 2/17/2022 10:20 AM, James Morse wrote:
>>> To support resctrl's MBA software controller, the architecture must provide
>>> a second configuration array to hold the mbps_val[] from user-space.
>>>
>>> This complicates the interface between the architecture specific code and
>>> the filesystem portions of resctrl that will move to /fs/, to allow
>>> multiple architectures to support resctrl.
>>>
>>> Make the filesystem parts of resctrl create an array for the mba_sc
>>> values when is_mba_sc() is set to true. The software controller
>>> can be changed to use this, allowing the architecture code to only
>>> consider the values configured in hardware.
>
> ...
>
>>> @@ -3309,6 +3344,12 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
>>> if (err)
>>> return err;
>>>
>>> + err = mba_sc_domain_allocate(r, d);
>>> + if (err) {
>>> + domain_destroy_mon_state(d);
>>> + return err;
>>> + }
>>> +
>>
>> Before the above snippet there is a check if the resource is capable of monitoring:
>>
>> resctrl_online_domain()
>> {
>> ...
>> if (!r->mon_capable)
>> return 0;
>>
>> ...
>> err = mba_sc_domain_allocate(r, d);
>> ...
>> }
>>
>> Thus, the rdt_domain->mbps_val array will only exist in those resources that
>> support monitoring.
>>
>> Taking a look at where mon_capable is set we see it is done in
>> get_rdt_mon_resources() and as you can see it is only done for RDT_RESOURCE_L3.
>>
>> get_rdt_mon_resources()
>> {
>> struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>
>> ...
>>
>> return !rdt_get_mon_l3_config(r); /* mon_capable is set within */
>> }
>>
>> Based on the above the rdt_domain->mbps_val array can only exist for those
>> domains that belong to resource RDT_RESOURCE_L3 (if it is capable of monitoring).
>>
>> Now, looking at set_mba_sc() changed here, it only interacts with RDT_RESOURCE_MBA:
>>
>> set_mba_sc()
>> {
>> struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
>>
>> ...
>>
>> list_for_each_entry(d, &r->domains, list) {
>> for (i = 0; i < num_closid; i++)
>> d->mbps_val[i] = MBA_MAX_MBPS;
>> }
>> }
>>
>> Considering that no domain belonging to RDT_RESOURCE_MBA will have this array this
>> always ends up being a null pointer de-reference.
>
> Ugh. I'm not sure how I managed to miss that. Thanks for debugging it!
>
> That loop was added to reset the array when the filesystem is mounted, as it may hold
> stale values from a previous mount of the filesystem. Its currently done by
> reset_all_ctrls(), but that function should really belong to the architecture code.
>
> Because mbm_handle_overflow() always passes a domain from the L3 to update_mba_bw(), I
> think the cleanest thing to do is move the reset to a helper that always operates on the
> L3 array. (and leave some breadcrumbs in the comments).
>
>

I think this points to more than a need to reset the correct array on mount/unmount ... or
perhaps I am not understanding this correctly?

As the analysis above shows the mbps_val array only exists for rdt_domains associated
with RDT_RESOURCE_L3 but yet mbps_val will contain the MB value provided by user space
associated with RDT_RESOURCE_MBA.

For example, following what happens when the user writes to the schemata, would this series
not attempt to set the user provided MB value in the rdt_domain->mbps_val that belongs to
RDT_RESOURCE_MBA ... but that array would not exist for the domain since the resource
is not monitor capable, no?

Reinette