LinuxLists.cc - [PATCH 0/3] Extend Parsing "ibm,thread-groups" for Shared-L2 information

2020-12-04 04:54:39

Subject: [PATCH 0/3] Extend Parsing "ibm,thread-groups" for Shared-L2 information

2020-12-04 04:54:46

Subject: [PATCH 1/3] powerpc/smp: Parse ibm,thread-groups with multiple properties

From: "Gautham R. Shenoy" <[email protected]>

The "ibm,thread-groups" device-tree property is an array that is used
to indicate if groups of threads within a core share certain
properties. It provides details of which property is being shared by
which groups of threads. This array can encode information about
multiple properties being shared by different thread-groups within the
core.

Example: Suppose,
"ibm,thread-groups" = [1,2,4,8,10,12,14,9,11,13,15,2,2,4,8,10,12,14,9,11,13,15]

This can be decomposed up into two consecutive arrays:

a) [1,2,4,8,10,12,14,9,11,13,15]
b) [2,2,4,8,10,12,14,9,11,13,15]

where in,

a) provides information of Property "1" being shared by "2" groups,
each with "4" threads each. The "ibm,ppc-interrupt-server#s" of the
first group is {8,10,12,14} and the "ibm,ppc-interrupt-server#s" of
the second group is {9,11,13,15}. Property "1" is indicative of
the thread in the group sharing L1 cache, translation cache and
Instruction Data flow.

b) provides information of Property "2" being shared by "2" groups,
each group with "4" threads. The "ibm,ppc-interrupt-server#s" of
the first group is {8,10,12,14} and the
"ibm,ppc-interrupt-server#s" of the second group is
{9,11,13,15}. Property "2" indicates that the threads in each group
share the L2-cache.

The existing code assumes that the "ibm,thread-groups" encodes
information about only one property. Hence even on platforms which
encode information about multiple properties being shared by the
corresponding groups of threads, the current code will only pick the
first one. (In the above example, it will only consider
[1,2,4,8,10,12,14,9,11,13,15] but not [2,2,4,8,10,12,14,9,11,13,15]).

This patch extends the parsing support on platforms which encode
information about multiple properties being shared by the
corresponding groups of threads.

Signed-off-by: Gautham R. Shenoy <[email protected]>
---
arch/powerpc/kernel/smp.c | 146 +++++++++++++++++++++++++++++-----------------
1 file changed, 92 insertions(+), 54 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 8c2857c..6a242a3 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -106,6 +106,15 @@ struct thread_groups {
unsigned int thread_list[MAX_THREAD_LIST_SIZE];
};

+/* Maximum number of properties that groups of threads within a core can share */
+#define MAX_THREAD_GROUP_PROPERTIES 1
+
+struct thread_groups_list {
+ unsigned int nr_properties;
+ struct thread_groups property_tgs[MAX_THREAD_GROUP_PROPERTIES];
+};
+
+static struct thread_groups_list tgl[NR_CPUS] __initdata;
/*
* On big-cores system, cpu_l1_cache_map for each CPU corresponds to
* the set its siblings that share the L1-cache.
@@ -695,81 +704,94 @@ static void or_cpumasks_related(int i, int j, struct cpumask *(*srcmask)(int),
/*
* parse_thread_groups: Parses the "ibm,thread-groups" device tree
* property for the CPU device node @dn and stores
- * the parsed output in the thread_groups
- * structure @tg if the ibm,thread-groups[0]
- * matches @property.
+ * the parsed output in the thread_groups_list
+ * structure @tglp.
*
* @dn: The device node of the CPU device.
- * @tg: Pointer to a thread group structure into which the parsed
+ * @tglp: Pointer to a thread group list structure into which the parsed
* output of "ibm,thread-groups" is stored.
- * @property: The property of the thread-group that the caller is
- * interested in.
*
* ibm,thread-groups[0..N-1] array defines which group of threads in
* the CPU-device node can be grouped together based on the property.
*
- * ibm,thread-groups[0] tells us the property based on which the
+ * This array can represent thread groupings for multiple properties.
+ *
+ * ibm,thread-groups[i + 0] tells us the property based on which the
* threads are being grouped together. If this value is 1, it implies
* that the threads in the same group share L1, translation cache.
*
- * ibm,thread-groups[1] tells us how many such thread groups exist.
+ * ibm,thread-groups[i+1] tells us how many such thread groups exist for the
+ * property ibm,thread-groups[i]
*
- * ibm,thread-groups[2] tells us the number of threads in each such
+ * ibm,thread-groups[i+2] tells us the number of threads in each such
* group.
+ * Suppose k = (ibm,thread-groups[i+1] * ibm,thread-groups[i+2]), then,
*
- * ibm,thread-groups[3..N-1] is the list of threads identified by
+ * ibm,thread-groups[i+3..i+k+2] (is the list of threads identified by
* "ibm,ppc-interrupt-server#s" arranged as per their membership in
* the grouping.
*
- * Example: If ibm,thread-groups = [1,2,4,5,6,7,8,9,10,11,12] it
- * implies that there are 2 groups of 4 threads each, where each group
- * of threads share L1, translation cache.
+ * Example:
+ * If ibm,thread-groups = [1,2,4,5,6,7,8,9,10,11,12,2,2,4,5,7,9,11,6,8,10,12]
+ * This can be decomposed up into two consecutive arrays:
+ * a) [1,2,4,5,6,7,8,9,10,11,12]
+ * b) [2,2,4,5,7,9,11,6,8,10,12]
+ * where in,
*
- * The "ibm,ppc-interrupt-server#s" of the first group is {5,6,7,8}
- * and the "ibm,ppc-interrupt-server#s" of the second group is {9, 10,
- * 11, 12} structure
+ * a) there are 2 groups of 4 threads each, where each group of
+ * threads share Property 1 (L1, translation cache). The
+ * "ibm,ppc-interrupt-server#s" of the first group is {5,6,7,8} and
+ * the "ibm,ppc-interrupt-server#s" of the second group is {9, 10, 11,
+ * 12}.
+ *
+ * b) there are 2 groups of 4 threads each, where each group of
+ * threads share some property indicated by the first value 2. The
+ * "ibm,ppc-interrupt-server#s" of the first group is {5,7,9,11}
+ * and the "ibm,ppc-interrupt-server#s" of the second group is
+ * {6,8,10,12} structure
*
* Returns 0 on success, -EINVAL if the property does not exist,
* -ENODATA if property does not have a value, and -EOVERFLOW if the
* property data isn't large enough.
*/
static int parse_thread_groups(struct device_node *dn,
- struct thread_groups *tg,
- unsigned int property)
+ struct thread_groups_list *tglp)
{
- int i;
- u32 thread_group_array[3 + MAX_THREAD_LIST_SIZE];
+ int i = 0;
+ u32 *thread_group_array;
u32 *thread_list;
size_t total_threads;
- int ret;
+ int ret = 0, count;
+ unsigned int property_idx = 0;

+ count = of_property_count_u32_elems(dn, "ibm,thread-groups");
+ thread_group_array = kcalloc(count, sizeof(u32), GFP_KERNEL);
ret = of_property_read_u32_array(dn, "ibm,thread-groups",
- thread_group_array, 3);
+ thread_group_array, count);
if (ret)
- return ret;
-
- tg->property = thread_group_array[0];
- tg->nr_groups = thread_group_array[1];
- tg->threads_per_group = thread_group_array[2];
- if (tg->property != property ||
- tg->nr_groups < 1 ||
- tg->threads_per_group < 1)
- return -ENODATA;
+ goto out_free;

- total_threads = tg->nr_groups * tg->threads_per_group;
+ while (i < count && property_idx < MAX_THREAD_GROUP_PROPERTIES) {
+ int j;
+ struct thread_groups *tg = &tglp->property_tgs[property_idx++];

- ret = of_property_read_u32_array(dn, "ibm,thread-groups",
- thread_group_array,
- 3 + total_threads);
- if (ret)
- return ret;
+ tg->property = thread_group_array[i];
+ tg->nr_groups = thread_group_array[i + 1];
+ tg->threads_per_group = thread_group_array[i + 2];
+ total_threads = tg->nr_groups * tg->threads_per_group;
+
+ thread_list = &thread_group_array[i + 3];

- thread_list = &thread_group_array[3];
+ for (j = 0; j < total_threads; j++)
+ tg->thread_list[j] = thread_list[j];
+ i = i + 3 + total_threads;
+ }

- for (i = 0 ; i < total_threads; i++)
- tg->thread_list[i] = thread_list[i];
+ tglp->nr_properties = property_idx;

- return 0;
+out_free:
+ kfree(thread_group_array);
+ return ret;
}

/*
@@ -805,24 +827,39 @@ static int get_cpu_thread_group_start(int cpu, struct thread_groups *tg)
return -1;
}

-static int init_cpu_l1_cache_map(int cpu)
+static int init_cpu_cache_map(int cpu, unsigned int cache_property)

{
struct device_node *dn = of_get_cpu_node(cpu, NULL);
- struct thread_groups tg = {.property = 0,
- .nr_groups = 0,
- .threads_per_group = 0};
+ struct thread_groups *tg = NULL;
int first_thread = cpu_first_thread_sibling(cpu);
int i, cpu_group_start = -1, err = 0;
+ cpumask_var_t *mask;
+ struct thread_groups_list *cpu_tgl = &tgl[cpu];

if (!dn)
return -ENODATA;

- err = parse_thread_groups(dn, &tg, THREAD_GROUP_SHARE_L1);
- if (err)
- goto out;
+ if (!(cache_property == THREAD_GROUP_SHARE_L1))
+ return -EINVAL;

- cpu_group_start = get_cpu_thread_group_start(cpu, &tg);
+ if (!cpu_tgl->nr_properties) {
+ err = parse_thread_groups(dn, cpu_tgl);
+ if (err)
+ goto out;
+ }
+
+ for (i = 0; i < cpu_tgl->nr_properties; i++) {
+ if (cpu_tgl->property_tgs[i].property == cache_property) {
+ tg = &cpu_tgl->property_tgs[i];
+ break;
+ }
+ }
+
+ if (!tg)
+ return -EINVAL;
+
+ cpu_group_start = get_cpu_thread_group_start(cpu, tg);

if (unlikely(cpu_group_start == -1)) {
WARN_ON_ONCE(1);
@@ -830,11 +867,12 @@ static int init_cpu_l1_cache_map(int cpu)
goto out;
}

- zalloc_cpumask_var_node(&per_cpu(cpu_l1_cache_map, cpu),
- GFP_KERNEL, cpu_to_node(cpu));
+ mask = &per_cpu(cpu_l1_cache_map, cpu);
+
+ zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu));

for (i = first_thread; i < first_thread + threads_per_core; i++) {
- int i_group_start = get_cpu_thread_group_start(i, &tg);
+ int i_group_start = get_cpu_thread_group_start(i, tg);

if (unlikely(i_group_start == -1)) {
WARN_ON_ONCE(1);
@@ -843,7 +881,7 @@ static int init_cpu_l1_cache_map(int cpu)
}

if (i_group_start == cpu_group_start)
- cpumask_set_cpu(i, per_cpu(cpu_l1_cache_map, cpu));
+ cpumask_set_cpu(i, *mask);
}

out:
@@ -924,7 +962,7 @@ static int init_big_cores(void)
int cpu;

for_each_possible_cpu(cpu) {
- int err = init_cpu_l1_cache_map(cpu);
+ int err = init_cpu_cache_map(cpu, THREAD_GROUP_SHARE_L1);

if (err)
return err;
--
1.9.4

2020-12-07 12:19:21

by Srikar Dronamraju

[permalink] [raw]

Subject: Re: [PATCH 1/3] powerpc/smp: Parse ibm,thread-groups with multiple properties

* Gautham R. Shenoy <[email protected]> [2020-12-04 10:18:45]:

> From: "Gautham R. Shenoy" <[email protected]>

<snipped>

>
> static int parse_thread_groups(struct device_node *dn,
> - struct thread_groups *tg,
> - unsigned int property)
> + struct thread_groups_list *tglp)
> {
> - int i;
> - u32 thread_group_array[3 + MAX_THREAD_LIST_SIZE];
> + int i = 0;
> + u32 *thread_group_array;
> u32 *thread_list;
> size_t total_threads;
> - int ret;
> + int ret = 0, count;
> + unsigned int property_idx = 0;

NIT:
tglx mentions in one of his recent comments to try keep a reverse fir tree
ordering of variables where possible.

>
> + count = of_property_count_u32_elems(dn, "ibm,thread-groups");
> + thread_group_array = kcalloc(count, sizeof(u32), GFP_KERNEL);
> ret = of_property_read_u32_array(dn, "ibm,thread-groups",
> - thread_group_array, 3);
> + thread_group_array, count);
> if (ret)
> - return ret;
> -
> - tg->property = thread_group_array[0];
> - tg->nr_groups = thread_group_array[1];
> - tg->threads_per_group = thread_group_array[2];
> - if (tg->property != property ||
> - tg->nr_groups < 1 ||
> - tg->threads_per_group < 1)
> - return -ENODATA;
> + goto out_free;
>
> - total_threads = tg->nr_groups * tg->threads_per_group;
> + while (i < count && property_idx < MAX_THREAD_GROUP_PROPERTIES) {
> + int j;
> + struct thread_groups *tg = &tglp->property_tgs[property_idx++];

NIT: same as above.

>
> - ret = of_property_read_u32_array(dn, "ibm,thread-groups",
> - thread_group_array,
> - 3 + total_threads);
> - if (ret)
> - return ret;
> + tg->property = thread_group_array[i];
> + tg->nr_groups = thread_group_array[i + 1];
> + tg->threads_per_group = thread_group_array[i + 2];
> + total_threads = tg->nr_groups * tg->threads_per_group;
> +
> + thread_list = &thread_group_array[i + 3];
>
> - thread_list = &thread_group_array[3];
> + for (j = 0; j < total_threads; j++)
> + tg->thread_list[j] = thread_list[j];
> + i = i + 3 + total_threads;

Can't we simply use memcpy instead?

> + }
>
> - for (i = 0 ; i < total_threads; i++)
> - tg->thread_list[i] = thread_list[i];
> + tglp->nr_properties = property_idx;
>
> - return 0;
> +out_free:
> + kfree(thread_group_array);
> + return ret;
> }
>
> /*
> @@ -805,24 +827,39 @@ static int get_cpu_thread_group_start(int cpu, struct thread_groups *tg)
> return -1;
> }
>
> -static int init_cpu_l1_cache_map(int cpu)
> +static int init_cpu_cache_map(int cpu, unsigned int cache_property)
>
> {
> struct device_node *dn = of_get_cpu_node(cpu, NULL);
> - struct thread_groups tg = {.property = 0,
> - .nr_groups = 0,
> - .threads_per_group = 0};
> + struct thread_groups *tg = NULL;
> int first_thread = cpu_first_thread_sibling(cpu);
> int i, cpu_group_start = -1, err = 0;
> + cpumask_var_t *mask;
> + struct thread_groups_list *cpu_tgl = &tgl[cpu];

NIT: same as 1st comment.

>
> if (!dn)
> return -ENODATA;
>
> - err = parse_thread_groups(dn, &tg, THREAD_GROUP_SHARE_L1);
> - if (err)
> - goto out;
> + if (!(cache_property == THREAD_GROUP_SHARE_L1))
> + return -EINVAL;
>
> - cpu_group_start = get_cpu_thread_group_start(cpu, &tg);
> + if (!cpu_tgl->nr_properties) {
> + err = parse_thread_groups(dn, cpu_tgl);
> + if (err)
> + goto out;
> + }
> +
> + for (i = 0; i < cpu_tgl->nr_properties; i++) {
> + if (cpu_tgl->property_tgs[i].property == cache_property) {
> + tg = &cpu_tgl->property_tgs[i];
> + break;
> + }
> + }
> +
> + if (!tg)
> + return -EINVAL;
> +
> + cpu_group_start = get_cpu_thread_group_start(cpu, tg);

This whole hunk should be moved to a new function and called before
init_cpu_cache_map. It will simplify the logic to great extent.

>
> if (unlikely(cpu_group_start == -1)) {
> WARN_ON_ONCE(1);
> @@ -830,11 +867,12 @@ static int init_cpu_l1_cache_map(int cpu)
> goto out;
> }
>
> - zalloc_cpumask_var_node(&per_cpu(cpu_l1_cache_map, cpu),
> - GFP_KERNEL, cpu_to_node(cpu));
> + mask = &per_cpu(cpu_l1_cache_map, cpu);
> +
> + zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu));
>

This hunk (and the next hunk) should be moved to next patch.

> for (i = first_thread; i < first_thread + threads_per_core; i++) {
> - int i_group_start = get_cpu_thread_group_start(i, &tg);
> + int i_group_start = get_cpu_thread_group_start(i, tg);
>
> if (unlikely(i_group_start == -1)) {
> WARN_ON_ONCE(1);
> @@ -843,7 +881,7 @@ static int init_cpu_l1_cache_map(int cpu)
> }
>
> if (i_group_start == cpu_group_start)
> - cpumask_set_cpu(i, per_cpu(cpu_l1_cache_map, cpu));
> + cpumask_set_cpu(i, *mask);
> }
>
> out:
> @@ -924,7 +962,7 @@ static int init_big_cores(void)
> int cpu;
>
> for_each_possible_cpu(cpu) {
> - int err = init_cpu_l1_cache_map(cpu);
> + int err = init_cpu_cache_map(cpu, THREAD_GROUP_SHARE_L1);
>
> if (err)
> return err;
> --
> 1.9.4
>

--
Thanks and Regards
Srikar Dronamraju

2020-12-09 01:04:30

by Gautham R Shenoy

[permalink] [raw]

Subject: Re: [PATCH 1/3] powerpc/smp: Parse ibm,thread-groups with multiple properties

Hello Srikar,

Thanks for taking a look at the patch.

On Mon, Dec 07, 2020 at 05:40:42PM +0530, Srikar Dronamraju wrote:
> * Gautham R. Shenoy <[email protected]> [2020-12-04 10:18:45]:
>
> > From: "Gautham R. Shenoy" <[email protected]>
>
> <snipped>
>
> >
> > static int parse_thread_groups(struct device_node *dn,
> > - struct thread_groups *tg,
> > - unsigned int property)
> > + struct thread_groups_list *tglp)
> > {
> > - int i;
> > - u32 thread_group_array[3 + MAX_THREAD_LIST_SIZE];
> > + int i = 0;
> > + u32 *thread_group_array;
> > u32 *thread_list;
> > size_t total_threads;
> > - int ret;
> > + int ret = 0, count;
> > + unsigned int property_idx = 0;
>
> NIT:
> tglx mentions in one of his recent comments to try keep a reverse fir tree
> ordering of variables where possible.

I suppose you mean moving the longer local variable declarations to to
the top and shorter ones to the bottom. Thanks. Will fix this.

>
> >
> > + count = of_property_count_u32_elems(dn, "ibm,thread-groups");
> > + thread_group_array = kcalloc(count, sizeof(u32), GFP_KERNEL);
> > ret = of_property_read_u32_array(dn, "ibm,thread-groups",
> > - thread_group_array, 3);
> > + thread_group_array, count);
> > if (ret)
> > - return ret;
> > -
> > - tg->property = thread_group_array[0];
> > - tg->nr_groups = thread_group_array[1];
> > - tg->threads_per_group = thread_group_array[2];
> > - if (tg->property != property ||
> > - tg->nr_groups < 1 ||
> > - tg->threads_per_group < 1)
> > - return -ENODATA;
> > + goto out_free;
> >
> > - total_threads = tg->nr_groups * tg->threads_per_group;
> > + while (i < count && property_idx < MAX_THREAD_GROUP_PROPERTIES) {
> > + int j;
> > + struct thread_groups *tg = &tglp->property_tgs[property_idx++];
>
> NIT: same as above.

Ok.
>
> >
> > - ret = of_property_read_u32_array(dn, "ibm,thread-groups",
> > - thread_group_array,
> > - 3 + total_threads);
> > - if (ret)
> > - return ret;
> > + tg->property = thread_group_array[i];
> > + tg->nr_groups = thread_group_array[i + 1];
> > + tg->threads_per_group = thread_group_array[i + 2];
> > + total_threads = tg->nr_groups * tg->threads_per_group;
> > +
> > + thread_list = &thread_group_array[i + 3];
> >
> > - thread_list = &thread_group_array[3];
> > + for (j = 0; j < total_threads; j++)
> > + tg->thread_list[j] = thread_list[j];
> > + i = i + 3 + total_threads;
>
> Can't we simply use memcpy instead?

We could. But this one makes it more explicit.

>
> > + }
> >
> > - for (i = 0 ; i < total_threads; i++)
> > - tg->thread_list[i] = thread_list[i];
> > + tglp->nr_properties = property_idx;
> >
> > - return 0;
> > +out_free:
> > + kfree(thread_group_array);
> > + return ret;
> > }
> >
> > /*
> > @@ -805,24 +827,39 @@ static int get_cpu_thread_group_start(int cpu, struct thread_groups *tg)
> > return -1;
> > }
> >
> > -static int init_cpu_l1_cache_map(int cpu)
> > +static int init_cpu_cache_map(int cpu, unsigned int cache_property)
> >
> > {
> > struct device_node *dn = of_get_cpu_node(cpu, NULL);
> > - struct thread_groups tg = {.property = 0,
> > - .nr_groups = 0,
> > - .threads_per_group = 0};
> > + struct thread_groups *tg = NULL;
> > int first_thread = cpu_first_thread_sibling(cpu);
> > int i, cpu_group_start = -1, err = 0;
> > + cpumask_var_t *mask;
> > + struct thread_groups_list *cpu_tgl = &tgl[cpu];
>
> NIT: same as 1st comment.

Sure, will fix this.

>
> >
> > if (!dn)
> > return -ENODATA;
> >
> > - err = parse_thread_groups(dn, &tg, THREAD_GROUP_SHARE_L1);
> > - if (err)
> > - goto out;
> > + if (!(cache_property == THREAD_GROUP_SHARE_L1))
> > + return -EINVAL;
> >
> > - cpu_group_start = get_cpu_thread_group_start(cpu, &tg);
> > + if (!cpu_tgl->nr_properties) {
> > + err = parse_thread_groups(dn, cpu_tgl);
> > + if (err)
> > + goto out;
> > + }
> > +
> > + for (i = 0; i < cpu_tgl->nr_properties; i++) {
> > + if (cpu_tgl->property_tgs[i].property == cache_property) {
> > + tg = &cpu_tgl->property_tgs[i];
> > + break;
> > + }
> > + }
> > +
> > + if (!tg)
> > + return -EINVAL;
> > +
> > + cpu_group_start = get_cpu_thread_group_start(cpu, tg);
>
> This whole hunk should be moved to a new function and called before
> init_cpu_cache_map. It will simplify the logic to great extent.

I suppose you are referring to the part where we select the correct
tg. Yeah, that can move to a different helper.

>
> >
> > if (unlikely(cpu_group_start == -1)) {
> > WARN_ON_ONCE(1);
> > @@ -830,11 +867,12 @@ static int init_cpu_l1_cache_map(int cpu)
> > goto out;
> > }
> >
> > - zalloc_cpumask_var_node(&per_cpu(cpu_l1_cache_map, cpu),
> > - GFP_KERNEL, cpu_to_node(cpu));
> > + mask = &per_cpu(cpu_l1_cache_map, cpu);
> > +
> > + zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu));
> >
>
> This hunk (and the next hunk) should be moved to next patch.
>

The next patch is only about introducing THREAD_GROUP_SHARE_L2. Hence
I put in any other code in this patch, since it seems to be a logical
place to collate whatever we have in a generic form.

> > for (i = first_thread; i < first_thread + threads_per_core; i++) {
> > - int i_group_start = get_cpu_thread_group_start(i, &tg);
> > + int i_group_start = get_cpu_thread_group_start(i, tg);
> >
> > if (unlikely(i_group_start == -1)) {
> > WARN_ON_ONCE(1);
> > @@ -843,7 +881,7 @@ static int init_cpu_l1_cache_map(int cpu)
> > }
> >
> > if (i_group_start == cpu_group_start)
> > - cpumask_set_cpu(i, per_cpu(cpu_l1_cache_map, cpu));
> > + cpumask_set_cpu(i, *mask);
> > }
> >
> > out:
> > @@ -924,7 +962,7 @@ static int init_big_cores(void)
> > int cpu;
> >
> > for_each_possible_cpu(cpu) {
> > - int err = init_cpu_l1_cache_map(cpu);
> > + int err = init_cpu_cache_map(cpu, THREAD_GROUP_SHARE_L1);
> >
> > if (err)
> > return err;
> > --
> > 1.9.4
> >
>
> --
> Thanks and Regards
> Srikar Dronamraju

2020-12-09 05:22:35

by Michael Ellerman

[permalink] [raw]

Subject: Re: [PATCH 1/3] powerpc/smp: Parse ibm,thread-groups with multiple properties

Gautham R Shenoy <[email protected]> writes:
> Hello Srikar,
>
> Thanks for taking a look at the patch.
>
> On Mon, Dec 07, 2020 at 05:40:42PM +0530, Srikar Dronamraju wrote:
>> * Gautham R. Shenoy <[email protected]> [2020-12-04 10:18:45]:
>>
>> > From: "Gautham R. Shenoy" <[email protected]>
>>
>> <snipped>
>>
>> >
>> > static int parse_thread_groups(struct device_node *dn,
>> > - struct thread_groups *tg,
>> > - unsigned int property)
>> > + struct thread_groups_list *tglp)
>> > {
>> > - int i;
>> > - u32 thread_group_array[3 + MAX_THREAD_LIST_SIZE];
>> > + int i = 0;
>> > + u32 *thread_group_array;
>> > u32 *thread_list;
>> > size_t total_threads;
>> > - int ret;
>> > + int ret = 0, count;
>> > + unsigned int property_idx = 0;
>>
>> NIT:
>> tglx mentions in one of his recent comments to try keep a reverse fir tree
>> ordering of variables where possible.
>
> I suppose you mean moving the longer local variable declarations to to
> the top and shorter ones to the bottom. Thanks. Will fix this.

Yeah. It's called "reverse christmas tree", that's googleable.

I also prefer that style, it makes the locals visually sit with the
beginning of the function body.

cheers

2020-12-09 08:40:38

by Srikar Dronamraju

[permalink] [raw]

Subject: Re: [PATCH 1/3] powerpc/smp: Parse ibm,thread-groups with multiple properties

* Gautham R Shenoy <[email protected]> [2020-12-08 22:55:40]:

> >
> > NIT:
> > tglx mentions in one of his recent comments to try keep a reverse fir tree
> > ordering of variables where possible.
>
> I suppose you mean moving the longer local variable declarations to to
> the top and shorter ones to the bottom. Thanks. Will fix this.
>

Yes.

> > > + }
> > > +
> > > + if (!tg)
> > > + return -EINVAL;
> > > +
> > > + cpu_group_start = get_cpu_thread_group_start(cpu, tg);
> >
> > This whole hunk should be moved to a new function and called before
> > init_cpu_cache_map. It will simplify the logic to great extent.
>
> I suppose you are referring to the part where we select the correct
> tg. Yeah, that can move to a different helper.
>

Yes, I would prefer if we could call this new helper outside
init_cpu_cache_map.

> > >
> > > - zalloc_cpumask_var_node(&per_cpu(cpu_l1_cache_map, cpu),
> > > - GFP_KERNEL, cpu_to_node(cpu));
> > > + mask = &per_cpu(cpu_l1_cache_map, cpu);
> > > +
> > > + zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu));
> > >
> >
> > This hunk (and the next hunk) should be moved to next patch.
> >
>
> The next patch is only about introducing THREAD_GROUP_SHARE_L2. Hence
> I put in any other code in this patch, since it seems to be a logical
> place to collate whatever we have in a generic form.
>

While I am fine with it, having a pointer that always points to the same
mask looks wierd.

--
Thanks and Regards
Srikar Dronamraju

2020-12-09 09:08:28

by Gautham R Shenoy

[permalink] [raw]

Subject: Re: [PATCH 1/3] powerpc/smp: Parse ibm,thread-groups with multiple properties

On Wed, Dec 09, 2020 at 02:05:41PM +0530, Srikar Dronamraju wrote:
> * Gautham R Shenoy <[email protected]> [2020-12-08 22:55:40]:
>
> > >
> > > NIT:
> > > tglx mentions in one of his recent comments to try keep a reverse fir tree
> > > ordering of variables where possible.
> >
> > I suppose you mean moving the longer local variable declarations to to
> > the top and shorter ones to the bottom. Thanks. Will fix this.
> >
>
> Yes.
>
> > > > + }
> > > > +
> > > > + if (!tg)
> > > > + return -EINVAL;
> > > > +
> > > > + cpu_group_start = get_cpu_thread_group_start(cpu, tg);
> > >
> > > This whole hunk should be moved to a new function and called before
> > > init_cpu_cache_map. It will simplify the logic to great extent.
> >
> > I suppose you are referring to the part where we select the correct
> > tg. Yeah, that can move to a different helper.
> >
>
> Yes, I would prefer if we could call this new helper outside
> init_cpu_cache_map.
>
> > > >
> > > > - zalloc_cpumask_var_node(&per_cpu(cpu_l1_cache_map, cpu),
> > > > - GFP_KERNEL, cpu_to_node(cpu));
> > > > + mask = &per_cpu(cpu_l1_cache_map, cpu);
> > > > +
> > > > + zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu));
> > > >
> > >
> > > This hunk (and the next hunk) should be moved to next patch.
> > >
> >
> > The next patch is only about introducing THREAD_GROUP_SHARE_L2. Hence
> > I put in any other code in this patch, since it seems to be a logical
> > place to collate whatever we have in a generic form.
> >
>
> While I am fine with it, having a pointer that always points to the same
> mask looks wierd.

Sure. Moving some of this to a separate preparatory patch.

>
> --
> Thanks and Regards
> Srikar Dronamraju