2018-05-29 14:30:34

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 00/25] staging: lustre: libcfs: SMP rework

From: James Simmons <[email protected]>

Recently lustre support has been expanded to extreme machines with as
many as a 1000+ cores. On the other end lustre also has been ported
to platforms like ARM and KNL which have uniquie NUMA and core setup.
For example some devices exist that have NUMA nodes with no cores.
With these new platforms the limitations of the Lustre's SMP code
came to light so a lot of work was needed. This resulted in this
patch set which has been tested on these platforms.

Amir Shehata (8):
staging: lustre: libcfs: replace MAX_NUMNODES with nr_node_ids
staging: lustre: libcfs: remove excess space
staging: lustre: libcfs: replace num_possible_cpus() with nr_cpu_ids
staging: lustre: libcfs: NUMA support
staging: lustre: libcfs: add cpu distance handling
staging: lustre: libcfs: use distance in cpu and node handling
staging: lustre: libcfs: provide debugfs files for distance handling
staging: lustre: libcfs: invert error handling for cfs_cpt_table_print

Dmitry Eremin (15):
staging: lustre: libcfs: remove useless CPU partition code
staging: lustre: libcfs: rename variable i to cpu
staging: lustre: libcfs: fix libcfs_cpu coding style
staging: lustre: libcfs: use int type for CPT identification.
staging: lustre: libcfs: rename i to node for cfs_cpt_set_nodemask
staging: lustre: libcfs: rename i to cpu for cfs_cpt_bind
staging: lustre: libcfs: rename cpumask_var_t variables to *_mask
staging: lustre: libcfs: rename goto label in cfs_cpt_table_print
staging: lustre: libcfs: update debug messages
staging: lustre: libcfs: make tolerant to offline CPUs and empty NUMA nodes
staging: lustre: libcfs: report NUMA node instead of just node
staging: lustre: libcfs: update debug messages in CPT code
staging: lustre: libcfs: rework CPU pattern parsing code
staging: lustre: libcfs: change CPT estimate algorithm
staging: lustre: ptlrpc: use current CPU instead of hardcoded 0

James Simmons (2):
staging: lustre: libcfs: restore UMP handling
staging: lustre: libcfs: properly handle failure cases in SMP code

.../lustre/include/linux/libcfs/libcfs_cpu.h | 225 +++--
drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 965 +++++++++++----------
drivers/staging/lustre/lnet/libcfs/module.c | 57 ++
drivers/staging/lustre/lnet/lnet/lib-msg.c | 2 +
drivers/staging/lustre/lustre/ptlrpc/service.c | 11 +-
5 files changed, 728 insertions(+), 532 deletions(-)

--
1.8.3.1



2018-05-29 14:23:28

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 03/25] staging: lustre: libcfs: rename variable i to cpu

From: Dmitry Eremin <[email protected]>

Change the name of the variable i used for for_each_cpu() to cpu
for code readability.

Signed-off-by: Dmitry Eremin <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23303
Reviewed-by: James Simmons <[email protected]>
Reviewed-by: Doug Oucharek <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased to handle recent cleanups in libcfs

drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 951a9ca..34df7ed 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -340,7 +340,7 @@ struct cfs_cpt_table *
cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt,
const cpumask_t *mask)
{
- int i;
+ int cpu;

if (!cpumask_weight(mask) ||
cpumask_any_and(mask, cpu_online_mask) >= nr_cpu_ids) {
@@ -349,8 +349,8 @@ struct cfs_cpt_table *
return 0;
}

- for_each_cpu(i, mask) {
- if (!cfs_cpt_set_cpu(cptab, cpt, i))
+ for_each_cpu(cpu, mask) {
+ if (!cfs_cpt_set_cpu(cptab, cpt, cpu))
return 0;
}

@@ -362,10 +362,10 @@ struct cfs_cpt_table *
cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
const cpumask_t *mask)
{
- int i;
+ int cpu;

- for_each_cpu(i, mask)
- cfs_cpt_unset_cpu(cptab, cpt, i);
+ for_each_cpu(cpu, mask)
+ cfs_cpt_unset_cpu(cptab, cpt, cpu);
}
EXPORT_SYMBOL(cfs_cpt_unset_cpumask);

--
1.8.3.1


2018-05-29 14:24:29

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 25/25] staging: lustre: ptlrpc: use current CPU instead of hardcoded 0

From: Dmitry Eremin <[email protected]>

fix crash if CPU 0 disabled.

Signed-off-by: Dmitry Eremin <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8710
Reviewed-on: https://review.whamcloud.com/23305
Reviewed-by: Doug Oucharek <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) New patch to address crash in ptlrpc

drivers/staging/lustre/lustre/ptlrpc/service.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/service.c b/drivers/staging/lustre/lustre/ptlrpc/service.c
index 3fd8c74..8e74a45 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/service.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/service.c
@@ -421,7 +421,7 @@ static void ptlrpc_at_timer(struct timer_list *t)
* there are.
*/
/* weight is # of HTs */
- if (cpumask_weight(topology_sibling_cpumask(0)) > 1) {
+ if (cpumask_weight(topology_sibling_cpumask(smp_processor_id())) > 1) {
/* depress thread factor for hyper-thread */
factor = factor - (factor >> 1) + (factor >> 3);
}
@@ -2221,15 +2221,16 @@ static int ptlrpc_hr_main(void *arg)
struct ptlrpc_hr_thread *hrt = arg;
struct ptlrpc_hr_partition *hrp = hrt->hrt_partition;
LIST_HEAD(replies);
- char threadname[20];
int rc;

- snprintf(threadname, sizeof(threadname), "ptlrpc_hr%02d_%03d",
- hrp->hrp_cpt, hrt->hrt_id);
unshare_fs_struct();

rc = cfs_cpt_bind(ptlrpc_hr.hr_cpt_table, hrp->hrp_cpt);
if (rc != 0) {
+ char threadname[20];
+
+ snprintf(threadname, sizeof(threadname), "ptlrpc_hr%02d_%03d",
+ hrp->hrp_cpt, hrt->hrt_id);
CWARN("Failed to bind %s on CPT %d of CPT table %p: rc = %d\n",
threadname, hrp->hrp_cpt, ptlrpc_hr.hr_cpt_table, rc);
}
@@ -2528,7 +2529,7 @@ int ptlrpc_hr_init(void)

init_waitqueue_head(&ptlrpc_hr.hr_waitq);

- weight = cpumask_weight(topology_sibling_cpumask(0));
+ weight = cpumask_weight(topology_sibling_cpumask(smp_processor_id()));

cfs_percpt_for_each(hrp, i, ptlrpc_hr.hr_partitions) {
hrp->hrp_cpt = i;
--
1.8.3.1


2018-05-29 14:24:44

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 06/25] staging: lustre: libcfs: remove excess space

From: Amir Shehata <[email protected]>

The function cfs_cpt_table_print() was adding two spaces
to the string buffer. Just add it once.

Signed-off-by: Amir Shehata <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber <[email protected]>
Reviewed-by: Doug Oucharek <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch. Same code

drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index d3017e8..d9d1388 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -173,7 +173,7 @@ struct cfs_cpt_table *

for (i = 0; i < cptab->ctb_nparts; i++) {
if (len > 0) {
- rc = snprintf(tmp, len, "%d\t: ", i);
+ rc = snprintf(tmp, len, "%d\t:", i);
len -= rc;
}

--
1.8.3.1


2018-05-29 14:24:48

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 22/25] staging: lustre: libcfs: update debug messages in CPT code

From: Dmitry Eremin <[email protected]>

Update the debug messages for the CPT table creation code. Place
the passed in string in quotes to make it clear what it is.
Captialize cpu in the debug strings.

Signed-off-by: Dmitry Eremin <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23306
Reviewed-by: James Simmons <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: Patrick Farrell <[email protected]>
Reviewed-by: Olaf Weber <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 0fc102c..649f7f9 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -496,7 +496,7 @@ void cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)

} else if (cpt != cptab->ctb_cpu2cpt[cpu]) {
CDEBUG(D_INFO,
- "CPU %d is not in cpu-partition %d\n", cpu, cpt);
+ "CPU %d is not in CPU partition %d\n", cpu, cpt);
return;
}

@@ -940,14 +940,14 @@ static struct cfs_cpt_table *cfs_cpt_table_create_pattern(char *pattern)
if (!ncpt ||
(node && ncpt > num_online_nodes()) ||
(!node && ncpt > num_online_cpus())) {
- CERROR("Invalid pattern %s, or too many partitions %d\n",
+ CERROR("Invalid pattern '%s', or too many partitions %d\n",
pattern, ncpt);
return NULL;
}

cptab = cfs_cpt_table_alloc(ncpt);
if (!cptab) {
- CERROR("Failed to allocate cpu partition table\n");
+ CERROR("Failed to allocate CPU partition table\n");
return NULL;
}

@@ -978,11 +978,11 @@ static struct cfs_cpt_table *cfs_cpt_table_create_pattern(char *pattern)

if (!bracket) {
if (*str) {
- CERROR("Invalid pattern %s\n", str);
+ CERROR("Invalid pattern '%s'\n", str);
goto failed;
}
if (c != ncpt) {
- CERROR("expect %d partitions but found %d\n",
+ CERROR("Expect %d partitions but found %d\n",
ncpt, c);
goto failed;
}
@@ -990,7 +990,7 @@ static struct cfs_cpt_table *cfs_cpt_table_create_pattern(char *pattern)
}

if (sscanf(str, "%d%n", &cpt, &n) < 1) {
- CERROR("Invalid cpu pattern %s\n", str);
+ CERROR("Invalid CPU pattern '%s'\n", str);
goto failed;
}

@@ -1007,20 +1007,20 @@ static struct cfs_cpt_table *cfs_cpt_table_create_pattern(char *pattern)

str = strim(str + n);
if (str != bracket) {
- CERROR("Invalid pattern %s\n", str);
+ CERROR("Invalid pattern '%s'\n", str);
goto failed;
}

bracket = strchr(str, ']');
if (!bracket) {
- CERROR("Missing right bracket for partition %d, %s\n",
+ CERROR("Missing right bracket for partition %d in '%s'\n",
cpt, str);
goto failed;
}

if (cfs_expr_list_parse(str, (bracket - str) + 1,
0, high, &el)) {
- CERROR("Can't parse number range: %s\n", str);
+ CERROR("Can't parse number range in '%s'\n", str);
goto failed;
}

--
1.8.3.1


2018-05-29 14:25:27

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 21/25] staging: lustre: libcfs: report NUMA node instead of just node

From: Dmitry Eremin <[email protected]>

Reporting "HW nodes" is too generic. It really is reporting
"HW NUMA nodes". Update the debug message.

Signed-off-by: Dmitry Eremin <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23306
Reviewed-by: James Simmons <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: Patrick Farrell <[email protected]>
Reviewed-by: Olaf Weber <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 99a9494..0fc102c 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -1138,7 +1138,7 @@ int cfs_cpu_init(void)

put_online_cpus();

- LCONSOLE(0, "HW nodes: %d, HW CPU cores: %d, npartitions: %d\n",
+ LCONSOLE(0, "HW NUMA nodes: %d, HW CPU cores: %d, npartitions: %d\n",
num_online_nodes(), num_online_cpus(),
cfs_cpt_number(cfs_cpt_tab));
return 0;
--
1.8.3.1


2018-05-29 14:25:31

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 18/25] staging: lustre: libcfs: rename goto label in cfs_cpt_table_print

From: Dmitry Eremin <[email protected]>

Change goto label out to err.

Signed-off-by: Dmitry Eremin <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata <[email protected]>
Reviewed-by: James Simmons <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index fb27dac..e12d337 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -193,20 +193,20 @@ int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)

for (i = 0; i < cptab->ctb_nparts; i++) {
if (len <= 0)
- goto out;
+ goto err;

rc = snprintf(tmp, len, "%d\t:", i);
len -= rc;

if (len <= 0)
- goto out;
+ goto err;

tmp += rc;
for_each_cpu(j, cptab->ctb_parts[i].cpt_cpumask) {
rc = snprintf(tmp, len, "%d ", j);
len -= rc;
if (len <= 0)
- goto out;
+ goto err;
tmp += rc;
}

@@ -216,7 +216,7 @@ int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
}

return tmp - buf;
-out:
+err:
return -E2BIG;
}
EXPORT_SYMBOL(cfs_cpt_table_print);
--
1.8.3.1


2018-05-29 14:25:36

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 23/25] staging: lustre: libcfs: rework CPU pattern parsing code

From: Dmitry Eremin <[email protected]>

Currently the module param string for CPU pattern can be
modified which is wrong. Rewrite CPU pattern parsing code
to avoid the passed buffer from being changed. This change
also enables us to add real errors propogation to the caller
functions.

Signed-off-by: Dmitry Eremin <[email protected]>
Signed-off-by: Amir Shehata <[email protected]>
Signed-off-by: Andreas Dilger <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23306
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9715
Reviewed-on: https://review.whamcloud.com/27872
Reviewed-by: James Simmons <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: Patrick Farrell <[email protected]>
Reviewed-by: Olaf Weber <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

.../lustre/include/linux/libcfs/libcfs_cpu.h | 2 +-
drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 146 ++++++++++++---------
2 files changed, 87 insertions(+), 61 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index c0aa0b3..12ed0a9 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -393,7 +393,7 @@ static inline int cfs_cpu_init(void)

static inline void cfs_cpu_fini(void)
{
- if (cfs_cpt_tab) {
+ if (!IS_ERR_OR_NULL(cfs_cpt_tab)) {
cfs_cpt_table_free(cfs_cpt_tab);
cfs_cpt_tab = NULL;
}
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 649f7f9..aed48de 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -692,11 +692,11 @@ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
nodemask = cptab->ctb_parts[cpt].cpt_nodemask;
}

- if (cpumask_any_and(*cpumask, cpu_online_mask) >= nr_cpu_ids) {
+ if (!cpumask_intersects(*cpumask, cpu_online_mask)) {
CDEBUG(D_INFO,
"No online CPU found in CPU partition %d, did someone do CPU hotplug on system? You might need to reload Lustre modules to keep system working well.\n",
cpt);
- return -EINVAL;
+ return -ENODEV;
}

for_each_online_cpu(cpu) {
@@ -860,11 +860,13 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt)
cptab = cfs_cpt_table_alloc(ncpt);
if (!cptab) {
CERROR("Failed to allocate CPU map(%d)\n", ncpt);
+ rc = -ENOMEM;
goto failed;
}

if (!zalloc_cpumask_var(&node_mask, GFP_NOFS)) {
CERROR("Failed to allocate scratch cpumask\n");
+ rc = -ENOMEM;
goto failed;
}

@@ -879,8 +881,10 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt)

rc = cfs_cpt_choose_ncpus(cptab, cpt, node_mask,
num - ncpu);
- if (rc < 0)
+ if (rc < 0) {
+ rc = -EINVAL;
goto failed_mask;
+ }

ncpu = cpumask_weight(part->cpt_cpumask);
if (ncpu == num + !!(rem > 0)) {
@@ -903,37 +907,51 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt)
if (cptab)
cfs_cpt_table_free(cptab);

- return NULL;
+ return ERR_PTR(rc);
}

-static struct cfs_cpt_table *cfs_cpt_table_create_pattern(char *pattern)
+static struct cfs_cpt_table *cfs_cpt_table_create_pattern(const char *pattern)
{
struct cfs_cpt_table *cptab;
+ char *pattern_dup;
+ char *bracket;
char *str;
int node = 0;
- int high;
int ncpt = 0;
- int cpt;
+ int cpt = 0;
+ int high;
int rc;
int c;
int i;

- str = strim(pattern);
+ pattern_dup = kstrdup(pattern, GFP_KERNEL);
+ if (!pattern_dup) {
+ CERROR("Failed to duplicate pattern '%s'\n", pattern);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ str = strim(pattern_dup);
if (*str == 'n' || *str == 'N') {
- pattern = str + 1;
- if (*pattern != '\0') {
- node = 1;
- } else { /* shortcut to create CPT from NUMA & CPU topology */
+ str++; /* skip 'N' char */
+ node = 1; /* NUMA pattern */
+ if (*str == '\0') {
node = -1;
- ncpt = num_online_nodes();
+ for_each_online_node(i) {
+ if (!cpumask_empty(cpumask_of_node(i)))
+ ncpt++;
+ }
+ if (ncpt == 1) { /* single NUMA node */
+ kfree(pattern_dup);
+ return cfs_cpt_table_create(cpu_npartitions);
+ }
}
}

if (!ncpt) { /* scanning bracket which is mark of partition */
- for (str = pattern;; str++, ncpt++) {
- str = strchr(str, '[');
- if (!str)
- break;
+ bracket = str;
+ while ((bracket = strchr(bracket, '['))) {
+ bracket++;
+ ncpt++;
}
}

@@ -941,87 +959,96 @@ static struct cfs_cpt_table *cfs_cpt_table_create_pattern(char *pattern)
(node && ncpt > num_online_nodes()) ||
(!node && ncpt > num_online_cpus())) {
CERROR("Invalid pattern '%s', or too many partitions %d\n",
- pattern, ncpt);
- return NULL;
+ pattern_dup, ncpt);
+ rc = -EINVAL;
+ goto err_free_str;
}

cptab = cfs_cpt_table_alloc(ncpt);
if (!cptab) {
CERROR("Failed to allocate CPU partition table\n");
- return NULL;
+ rc = -ENOMEM;
+ goto err_free_str;
}

if (node < 0) { /* shortcut to create CPT from NUMA & CPU topology */
- cpt = 0;
-
for_each_online_node(i) {
- if (cpt >= ncpt) {
- CERROR("CPU changed while setting CPU partition table, %d/%d\n",
- cpt, ncpt);
- goto failed;
- }
+ if (cpumask_empty(cpumask_of_node(i)))
+ continue;

rc = cfs_cpt_set_node(cptab, cpt++, i);
- if (!rc)
- goto failed;
+ if (!rc) {
+ rc = -EINVAL;
+ goto err_free_table;
+ }
}
+ kfree(pattern_dup);
return cptab;
}

high = node ? nr_node_ids - 1 : nr_cpu_ids - 1;

- for (str = strim(pattern), c = 0;; c++) {
+ for (str = strim(str), c = 0; /* until break */; c++) {
struct cfs_range_expr *range;
struct cfs_expr_list *el;
- char *bracket = strchr(str, '[');
int n;

+ bracket = strchr(str, '[');
if (!bracket) {
if (*str) {
CERROR("Invalid pattern '%s'\n", str);
- goto failed;
+ rc = -EINVAL;
+ goto err_free_table;
}
if (c != ncpt) {
CERROR("Expect %d partitions but found %d\n",
ncpt, c);
- goto failed;
+ rc = -EINVAL;
+ goto err_free_table;
}
break;
}

if (sscanf(str, "%d%n", &cpt, &n) < 1) {
CERROR("Invalid CPU pattern '%s'\n", str);
- goto failed;
+ rc = -EINVAL;
+ goto err_free_table;
}

if (cpt < 0 || cpt >= ncpt) {
CERROR("Invalid partition id %d, total partitions %d\n",
cpt, ncpt);
- goto failed;
+ rc = -EINVAL;
+ goto err_free_table;
}

if (cfs_cpt_weight(cptab, cpt)) {
CERROR("Partition %d has already been set.\n", cpt);
- goto failed;
+ rc = -EPERM;
+ goto err_free_table;
}

str = strim(str + n);
if (str != bracket) {
CERROR("Invalid pattern '%s'\n", str);
- goto failed;
+ rc = -EINVAL;
+ goto err_free_table;
}

bracket = strchr(str, ']');
if (!bracket) {
CERROR("Missing right bracket for partition %d in '%s'\n",
cpt, str);
- goto failed;
+ rc = -EINVAL;
+ goto err_free_table;
}

- if (cfs_expr_list_parse(str, (bracket - str) + 1,
- 0, high, &el)) {
+ rc = cfs_expr_list_parse(str, (bracket - str) + 1, 0, high,
+ &el);
+ if (rc) {
CERROR("Can't parse number range in '%s'\n", str);
- goto failed;
+ rc = -ERANGE;
+ goto err_free_table;
}

list_for_each_entry(range, &el->el_exprs, re_link) {
@@ -1033,7 +1060,8 @@ static struct cfs_cpt_table *cfs_cpt_table_create_pattern(char *pattern)
cfs_cpt_set_cpu(cptab, cpt, i);
if (!rc) {
cfs_expr_list_free(el);
- goto failed;
+ rc = -EINVAL;
+ goto err_free_table;
}
}
}
@@ -1042,17 +1070,21 @@ static struct cfs_cpt_table *cfs_cpt_table_create_pattern(char *pattern)

if (!cfs_cpt_online(cptab, cpt)) {
CERROR("No online CPU is found on partition %d\n", cpt);
- goto failed;
+ rc = -ENODEV;
+ goto err_free_table;
}

str = strim(bracket + 1);
}

+ kfree(pattern_dup);
return cptab;

-failed:
+err_free_table:
cfs_cpt_table_free(cptab);
- return NULL;
+err_free_str:
+ kfree(pattern_dup);
+ return ERR_PTR(rc);
}

#ifdef CONFIG_HOTPLUG_CPU
@@ -1079,7 +1111,7 @@ static int cfs_cpu_dead(unsigned int cpu)

void cfs_cpu_fini(void)
{
- if (cfs_cpt_tab)
+ if (!IS_ERR_OR_NULL(cfs_cpt_tab))
cfs_cpt_table_free(cfs_cpt_tab);

#ifdef CONFIG_HOTPLUG_CPU
@@ -1112,26 +1144,20 @@ int cfs_cpu_init(void)
#endif
get_online_cpus();
if (*cpu_pattern) {
- char *cpu_pattern_dup = kstrdup(cpu_pattern, GFP_KERNEL);
-
- if (!cpu_pattern_dup) {
- CERROR("Failed to duplicate cpu_pattern\n");
- goto failed_alloc_table;
- }
-
- cfs_cpt_tab = cfs_cpt_table_create_pattern(cpu_pattern_dup);
- kfree(cpu_pattern_dup);
- if (!cfs_cpt_tab) {
- CERROR("Failed to create cptab from pattern %s\n",
+ cfs_cpt_tab = cfs_cpt_table_create_pattern(cpu_pattern);
+ if (IS_ERR(cfs_cpt_tab)) {
+ CERROR("Failed to create cptab from pattern '%s'\n",
cpu_pattern);
+ ret = PTR_ERR(cfs_cpt_tab);
goto failed_alloc_table;
}

} else {
cfs_cpt_tab = cfs_cpt_table_create(cpu_npartitions);
- if (!cfs_cpt_tab) {
- CERROR("Failed to create ptable with npartitions %d\n",
+ if (IS_ERR(cfs_cpt_tab)) {
+ CERROR("Failed to create cptab with npartitions %d\n",
cpu_npartitions);
+ ret = PTR_ERR(cfs_cpt_tab);
goto failed_alloc_table;
}
}
@@ -1146,7 +1172,7 @@ int cfs_cpu_init(void)
failed_alloc_table:
put_online_cpus();

- if (cfs_cpt_tab)
+ if (!IS_ERR_OR_NULL(cfs_cpt_tab))
cfs_cpt_table_free(cfs_cpt_tab);

ret = -EINVAL;
--
1.8.3.1


2018-05-29 14:26:00

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 14/25] staging: lustre: libcfs: use int type for CPT identification.

From: Dmitry Eremin <[email protected]>

Use int type for CPT identification to match the linux kernel
CPU identification.

Signed-off-by: Dmitry Eremin <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23304
Reviewed-by: James Simmons <[email protected]>
Reviewed-by: Doug Oucharek <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch to handle recent libcfs changes

drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h | 8 ++++----
drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 14 +++++++-------
2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 2c97adf..9f4ba9d 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -90,7 +90,7 @@ struct cfs_cpu_partition {
/* NUMA distance between CPTs */
unsigned int *cpt_distance;
/* spread rotor for NUMA allocator */
- unsigned int cpt_spread_rotor;
+ int cpt_spread_rotor;
};
#endif /* CONFIG_SMP */

@@ -98,11 +98,11 @@ struct cfs_cpu_partition {
struct cfs_cpt_table {
#ifdef CONFIG_SMP
/* spread rotor for NUMA allocator */
- unsigned int ctb_spread_rotor;
+ int ctb_spread_rotor;
/* maximum NUMA distance between all nodes in table */
unsigned int ctb_distance;
/* # of CPU partitions */
- unsigned int ctb_nparts;
+ int ctb_nparts;
/* partitions tables */
struct cfs_cpu_partition *ctb_parts;
/* shadow HW CPU to CPU partition ID */
@@ -128,7 +128,7 @@ struct cfs_cpt_table {
/**
* create a cfs_cpt_table with \a ncpt number of partitions
*/
-struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt);
+struct cfs_cpt_table *cfs_cpt_table_alloc(int ncpt);
/**
* return cpumask of CPU partition \a cpt
*/
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index fab6675..14d5791 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -69,7 +69,7 @@
module_param(cpu_pattern, charp, 0444);
MODULE_PARM_DESC(cpu_pattern, "CPU partitions pattern");

-struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt)
+struct cfs_cpt_table *cfs_cpt_table_alloc(int ncpt)
{
struct cfs_cpt_table *cptab;
int i;
@@ -784,13 +784,13 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt,
return rc;
}

-#define CPT_WEIGHT_MIN 4u
+#define CPT_WEIGHT_MIN 4

-static unsigned int cfs_cpt_num_estimate(void)
+static int cfs_cpt_num_estimate(void)
{
- unsigned int nnode = num_online_nodes();
- unsigned int ncpu = num_online_cpus();
- unsigned int ncpt;
+ int nnode = num_online_nodes();
+ int ncpu = num_online_cpus();
+ int ncpt;

if (ncpu <= CPT_WEIGHT_MIN) {
ncpt = 1;
@@ -820,7 +820,7 @@ static unsigned int cfs_cpt_num_estimate(void)
/* config many CPU partitions on 32-bit system could consume
* too much memory
*/
- ncpt = min(2U, ncpt);
+ ncpt = min(2, ncpt);
#endif
while (ncpu % ncpt)
ncpt--; /* worst case is 1 */
--
1.8.3.1


2018-05-29 14:26:28

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 24/25] staging: lustre: libcfs: change CPT estimate algorithm

From: Dmitry Eremin <[email protected]>

The main idea to have more CPU partitions is based on KNL experience.
When a thread submit IO for network communication one of threads from
current CPT is used for network stack. Whith high parallelization many
threads become involved in network submission but having less CPU
partitions they will wait until single thread process them from network
queue. So, the bottleneck just moves into network layer in case of
small amount of CPU partitions. My experiments showed that the best
performance was when for each IO thread we have one network thread.
This condition can be provided having 2 real HW cores (without hyper
threads) per CPT. This is exactly what implemented in this patch.

Change CPT estimate algorithm from 2 * (N - 1)^2 < NCPUS <= 2 * N^2
to 2 HW cores per CPT. This is critical for machines with number of
cores different from 2^N.

Current algorithm splits CPTs in KNL:
LNet: HW CPU cores: 272, npartitions: 16
cpu_partition_table=
0 : 0-4,68-71,136-139,204-207
1 : 5-9,73-76,141-144,209-212
2 : 10-14,78-81,146-149,214-217
3 : 15-17,72,77,83-85,140,145,151-153,208,219-221
4 : 18-21,82,86-88,150,154-156,213,218,222-224
5 : 22-26,90-93,158-161,226-229
6 : 27-31,95-98,163-166,231-234
7 : 32-35,89,100-103,168-171,236-239
8 : 36-38,94,99,104-105,157,162,167,172-173,225,230,235,240-241
9 : 39-43,107-110,175-178,243-246
10 : 44-48,112-115,180-183,248-251
11 : 49-51,106,111,117-119,174,179,185-187,242,253-255
12 : 52-55,116,120-122,184,188-190,247,252,256-258
13 : 56-60,124-127,192-195,260-263
14 : 61-65,129-132,197-200,265-268
15 : 66-67,123,128,133-135,191,196,201-203,259,264,269-271

New algorithm will split CPTs in KNL:
LNet: HW CPU cores: 272, npartitions: 34
cpu_partition_table=
0 : 0-1,68-69,136-137,204-205
1 : 2-3,70-71,138-139,206-207
2 : 4-5,72-73,140-141,208-209
3 : 6-7,74-75,142-143,210-211
4 : 8-9,76-77,144-145,212-213
5 : 10-11,78-79,146-147,214-215
6 : 12-13,80-81,148-149,216-217
7 : 14-15,82-83,150-151,218-219
8 : 16-17,84-85,152-153,220-221
9 : 18-19,86-87,154-155,222-223
10 : 20-21,88-89,156-157,224-225
11 : 22-23,90-91,158-159,226-227
12 : 24-25,92-93,160-161,228-229
13 : 26-27,94-95,162-163,230-231
14 : 28-29,96-97,164-165,232-233
15 : 30-31,98-99,166-167,234-235
16 : 32-33,100-101,168-169,236-237
17 : 34-35,102-103,170-171,238-239
18 : 36-37,104-105,172-173,240-241
19 : 38-39,106-107,174-175,242-243
20 : 40-41,108-109,176-177,244-245
21 : 42-43,110-111,178-179,246-247
22 : 44-45,112-113,180-181,248-249
23 : 46-47,114-115,182-183,250-251
24 : 48-49,116-117,184-185,252-253
25 : 50-51,118-119,186-187,254-255
26 : 52-53,120-121,188-189,256-257
27 : 54-55,122-123,190-191,258-259
28 : 56-57,124-125,192-193,260-261
29 : 58-59,126-127,194-195,262-263
30 : 60-61,128-129,196-197,264-265
31 : 62-63,130-131,198-199,266-267
32 : 64-65,132-133,200-201,268-269
33 : 66-67,134-135,202-203,270-271

'N' pattern in KNL works is not always good.
in flat mode it will be one CPT with all CPUs inside.

in SNC-4 mode:
cpu_partition_table=
0 : 0-17,68-85,136-153,204-221
1 : 18-35,86-103,154-171,222-239
2 : 36-51,104-119,172-187,240-255
3 : 52-67,120-135,188-203,256-271

Signed-off-by: Dmitry Eremin <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/24304
Reviewed-by: James Simmons <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 30 +++++--------------------
1 file changed, 5 insertions(+), 25 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index aed48de..ff752d5 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -798,34 +798,14 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt,

static int cfs_cpt_num_estimate(void)
{
- int nnode = num_online_nodes();
+ int nthr = cpumask_weight(topology_sibling_cpumask(smp_processor_id()));
int ncpu = num_online_cpus();
- int ncpt;
+ int ncpt = 1;

- if (ncpu <= CPT_WEIGHT_MIN) {
- ncpt = 1;
- goto out;
- }
-
- /* generate reasonable number of CPU partitions based on total number
- * of CPUs, Preferred N should be power2 and match this condition:
- * 2 * (N - 1)^2 < NCPUS <= 2 * N^2
- */
- for (ncpt = 2; ncpu > 2 * ncpt * ncpt; ncpt <<= 1)
- ;
-
- if (ncpt <= nnode) { /* fat numa system */
- while (nnode > ncpt)
- nnode >>= 1;
+ if (ncpu > CPT_WEIGHT_MIN)
+ for (ncpt = 2; ncpu > 2 * nthr * ncpt; ncpt++)
+ ; /* nothing */

- } else { /* ncpt > nnode */
- while ((nnode << 1) <= ncpt)
- nnode <<= 1;
- }
-
- ncpt = nnode;
-
-out:
#if (BITS_PER_LONG == 32)
/* config many CPU partitions on 32-bit system could consume
* too much memory
--
1.8.3.1


2018-05-29 14:26:31

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 17/25] staging: lustre: libcfs: rename cpumask_var_t variables to *_mask

From: Dmitry Eremin <[email protected]>

Because we handle both cpu mask as well as core identifiers it can
easily be confused. To avoid this rename various cpumask_var_t to
have appended *_mask to their names.

Signed-off-by: Dmitry Eremin <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata <[email protected]>
Reviewed-by: James Simmons <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 62 ++++++++++++-------------
1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 1c10529..fb27dac 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -710,23 +710,23 @@ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
* We always prefer to choose CPU in the same core/socket.
*/
static int cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt,
- cpumask_t *node, int number)
+ cpumask_t *node_mask, int number)
{
- cpumask_var_t socket;
- cpumask_var_t core;
+ cpumask_var_t socket_mask;
+ cpumask_var_t core_mask;
int rc = 0;
int cpu;

LASSERT(number > 0);

- if (number >= cpumask_weight(node)) {
- while (!cpumask_empty(node)) {
- cpu = cpumask_first(node);
+ if (number >= cpumask_weight(node_mask)) {
+ while (!cpumask_empty(node_mask)) {
+ cpu = cpumask_first(node_mask);

rc = cfs_cpt_set_cpu(cptab, cpt, cpu);
if (!rc)
return -EINVAL;
- cpumask_clear_cpu(cpu, node);
+ cpumask_clear_cpu(cpu, node_mask);
}
return 0;
}
@@ -736,34 +736,34 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt,
* As we cannot initialize a cpumask_var_t, we need
* to alloc both before we can risk trying to free either
*/
- if (!zalloc_cpumask_var(&socket, GFP_NOFS))
+ if (!zalloc_cpumask_var(&socket_mask, GFP_NOFS))
rc = -ENOMEM;
- if (!zalloc_cpumask_var(&core, GFP_NOFS))
+ if (!zalloc_cpumask_var(&core_mask, GFP_NOFS))
rc = -ENOMEM;
if (rc)
goto out;

- while (!cpumask_empty(node)) {
- cpu = cpumask_first(node);
+ while (!cpumask_empty(node_mask)) {
+ cpu = cpumask_first(node_mask);

/* get cpumask for cores in the same socket */
- cpumask_copy(socket, topology_core_cpumask(cpu));
- cpumask_and(socket, socket, node);
+ cpumask_copy(socket_mask, topology_core_cpumask(cpu));
+ cpumask_and(socket_mask, socket_mask, node_mask);

- LASSERT(!cpumask_empty(socket));
+ LASSERT(!cpumask_empty(socket_mask));

- while (!cpumask_empty(socket)) {
+ while (!cpumask_empty(socket_mask)) {
int i;

/* get cpumask for hts in the same core */
- cpumask_copy(core, topology_sibling_cpumask(cpu));
- cpumask_and(core, core, node);
+ cpumask_copy(core_mask, topology_sibling_cpumask(cpu));
+ cpumask_and(core_mask, core_mask, node_mask);

- LASSERT(!cpumask_empty(core));
+ LASSERT(!cpumask_empty(core_mask));

- for_each_cpu(i, core) {
- cpumask_clear_cpu(i, socket);
- cpumask_clear_cpu(i, node);
+ for_each_cpu(i, core_mask) {
+ cpumask_clear_cpu(i, socket_mask);
+ cpumask_clear_cpu(i, node_mask);

rc = cfs_cpt_set_cpu(cptab, cpt, i);
if (!rc) {
@@ -774,13 +774,13 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt,
if (!--number)
goto out;
}
- cpu = cpumask_first(socket);
+ cpu = cpumask_first(socket_mask);
}
}

out:
- free_cpumask_var(socket);
- free_cpumask_var(core);
+ free_cpumask_var(socket_mask);
+ free_cpumask_var(core_mask);
return rc;
}

@@ -831,7 +831,7 @@ static int cfs_cpt_num_estimate(void)
static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt)
{
struct cfs_cpt_table *cptab = NULL;
- cpumask_var_t mask;
+ cpumask_var_t node_mask;
int cpt = 0;
int num;
int rc;
@@ -864,15 +864,15 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt)
goto failed;
}

- if (!zalloc_cpumask_var(&mask, GFP_NOFS)) {
+ if (!zalloc_cpumask_var(&node_mask, GFP_NOFS)) {
CERROR("Failed to allocate scratch cpumask\n");
goto failed;
}

for_each_online_node(i) {
- cpumask_copy(mask, cpumask_of_node(i));
+ cpumask_copy(node_mask, cpumask_of_node(i));

- while (!cpumask_empty(mask)) {
+ while (!cpumask_empty(node_mask)) {
struct cfs_cpu_partition *part;
int n;

@@ -889,7 +889,7 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt)
n = num - cpumask_weight(part->cpt_cpumask);
LASSERT(n > 0);

- rc = cfs_cpt_choose_ncpus(cptab, cpt, mask, n);
+ rc = cfs_cpt_choose_ncpus(cptab, cpt, node_mask, n);
if (rc < 0)
goto failed_mask;

@@ -907,12 +907,12 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt)
goto failed_mask;
}

- free_cpumask_var(mask);
+ free_cpumask_var(node_mask);

return cptab;

failed_mask:
- free_cpumask_var(mask);
+ free_cpumask_var(node_mask);
failed:
CERROR("Failed to setup CPU-partition-table with %d CPU-partitions, online HW nodes: %d, HW cpus: %d.\n",
ncpt, num_online_nodes(), num_online_cpus());
--
1.8.3.1


2018-05-29 14:26:33

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 16/25] staging: lustre: libcfs: rename i to cpu for cfs_cpt_bind

From: Dmitry Eremin <[email protected]>

Rename variable i to cpu to make code easier to understand.

Signed-off-by: Dmitry Eremin <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata <[email protected]>
Reviewed-by: James Simmons <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code

drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index bac5601..1c10529 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -669,8 +669,8 @@ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
{
cpumask_var_t *cpumask;
nodemask_t *nodemask;
+ int cpu;
int rc;
- int i;

LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));

@@ -688,8 +688,8 @@ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
return -EINVAL;
}

- for_each_online_cpu(i) {
- if (cpumask_test_cpu(i, *cpumask))
+ for_each_online_cpu(cpu) {
+ if (cpumask_test_cpu(cpu, *cpumask))
continue;

rc = set_cpus_allowed_ptr(current, *cpumask);
--
1.8.3.1


2018-05-29 14:26:42

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 20/25] staging: lustre: libcfs: make tolerant to offline CPUs and empty NUMA nodes

From: Dmitry Eremin <[email protected]>

Rework CPU partition code in the way of make it more tolerant to
offline CPUs and empty nodes.

Signed-off-by: Dmitry Eremin <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata <[email protected]>
Reviewed-by: James Simmons <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

.../lustre/include/linux/libcfs/libcfs_cpu.h | 2 +
drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 132 +++++++++------------
drivers/staging/lustre/lnet/lnet/lib-msg.c | 2 +
3 files changed, 60 insertions(+), 76 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 9f4ba9d..c0aa0b3 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -91,6 +91,8 @@ struct cfs_cpu_partition {
unsigned int *cpt_distance;
/* spread rotor for NUMA allocator */
int cpt_spread_rotor;
+ /* NUMA node if cpt_nodemask is empty */
+ int cpt_node;
};
#endif /* CONFIG_SMP */

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 7f1061e..99a9494 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -457,8 +457,16 @@ int cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
return 0;
}

- LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_cpumask));
- LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask));
+ if (cpumask_test_cpu(cpu, cptab->ctb_cpumask)) {
+ CDEBUG(D_INFO, "CPU %d is already in cpumask\n", cpu);
+ return 0;
+ }
+
+ if (cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask)) {
+ CDEBUG(D_INFO, "CPU %d is already in partition %d cpumask\n",
+ cpu, cptab->ctb_cpu2cpt[cpu]);
+ return 0;
+ }

cfs_cpt_add_cpu(cptab, cpt, cpu);
cfs_cpt_add_node(cptab, cpt, cpu_to_node(cpu));
@@ -527,8 +535,10 @@ void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
{
int cpu;

- for_each_cpu(cpu, mask)
- cfs_cpt_unset_cpu(cptab, cpt, cpu);
+ for_each_cpu(cpu, mask) {
+ cfs_cpt_del_cpu(cptab, cpt, cpu);
+ cfs_cpt_del_node(cptab, cpt, cpu_to_node(cpu));
+ }
}
EXPORT_SYMBOL(cfs_cpt_unset_cpumask);

@@ -579,10 +589,8 @@ int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int cpt,
{
int node;

- for_each_node_mask(node, *mask) {
- if (!cfs_cpt_set_node(cptab, cpt, node))
- return 0;
- }
+ for_each_node_mask(node, *mask)
+ cfs_cpt_set_node(cptab, cpt, node);

return 1;
}
@@ -603,7 +611,7 @@ int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
nodemask_t *mask;
int weight;
int rotor;
- int node;
+ int node = 0;

/* convert CPU partition ID to HW node id */

@@ -613,20 +621,20 @@ int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
} else {
mask = cptab->ctb_parts[cpt].cpt_nodemask;
rotor = cptab->ctb_parts[cpt].cpt_spread_rotor++;
+ node = cptab->ctb_parts[cpt].cpt_node;
}

weight = nodes_weight(*mask);
- LASSERT(weight > 0);
-
- rotor %= weight;
+ if (weight > 0) {
+ rotor %= weight;

- for_each_node_mask(node, *mask) {
- if (!rotor--)
- return node;
+ for_each_node_mask(node, *mask) {
+ if (!rotor--)
+ return node;
+ }
}

- LBUG();
- return 0;
+ return node;
}
EXPORT_SYMBOL(cfs_cpt_spread_node);

@@ -719,17 +727,21 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt,
cpumask_var_t core_mask;
int rc = 0;
int cpu;
+ int i;

LASSERT(number > 0);

if (number >= cpumask_weight(node_mask)) {
while (!cpumask_empty(node_mask)) {
cpu = cpumask_first(node_mask);
+ cpumask_clear_cpu(cpu, node_mask);
+
+ if (!cpu_online(cpu))
+ continue;

rc = cfs_cpt_set_cpu(cptab, cpt, cpu);
if (!rc)
return -EINVAL;
- cpumask_clear_cpu(cpu, node_mask);
}
return 0;
}
@@ -750,24 +762,19 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt,
cpu = cpumask_first(node_mask);

/* get cpumask for cores in the same socket */
- cpumask_copy(socket_mask, topology_core_cpumask(cpu));
- cpumask_and(socket_mask, socket_mask, node_mask);
-
- LASSERT(!cpumask_empty(socket_mask));
-
+ cpumask_and(socket_mask, topology_core_cpumask(cpu), node_mask);
while (!cpumask_empty(socket_mask)) {
- int i;
-
/* get cpumask for hts in the same core */
- cpumask_copy(core_mask, topology_sibling_cpumask(cpu));
- cpumask_and(core_mask, core_mask, node_mask);
-
- LASSERT(!cpumask_empty(core_mask));
+ cpumask_and(core_mask, topology_sibling_cpumask(cpu),
+ node_mask);

for_each_cpu(i, core_mask) {
cpumask_clear_cpu(i, socket_mask);
cpumask_clear_cpu(i, node_mask);

+ if (!cpu_online(i))
+ continue;
+
rc = cfs_cpt_set_cpu(cptab, cpt, i);
if (!rc) {
rc = -EINVAL;
@@ -836,23 +843,18 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt)
struct cfs_cpt_table *cptab = NULL;
cpumask_var_t node_mask;
int cpt = 0;
+ int node;
int num;
- int rc;
- int i;
+ int rem;
+ int rc = 0;

- rc = cfs_cpt_num_estimate();
+ num = cfs_cpt_num_estimate();
if (ncpt <= 0)
- ncpt = rc;
+ ncpt = num;

- if (ncpt > num_online_cpus() || ncpt > 4 * rc) {
+ if (ncpt > num_online_cpus() || ncpt > 4 * num) {
CWARN("CPU partition number %d is larger than suggested value (%d), your system may have performance issue or run out of memory while under pressure\n",
- ncpt, rc);
- }
-
- if (num_online_cpus() % ncpt) {
- CERROR("CPU number %d is not multiple of cpu_npartition %d, please try different cpu_npartitions value or set pattern string by cpu_pattern=STRING\n",
- (int)num_online_cpus(), ncpt);
- goto failed;
+ ncpt, num);
}

cptab = cfs_cpt_table_alloc(ncpt);
@@ -861,55 +863,33 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt)
goto failed;
}

- num = num_online_cpus() / ncpt;
- if (!num) {
- CERROR("CPU changed while setting CPU partition\n");
- goto failed;
- }
-
if (!zalloc_cpumask_var(&node_mask, GFP_NOFS)) {
CERROR("Failed to allocate scratch cpumask\n");
goto failed;
}

- for_each_online_node(i) {
- cpumask_copy(node_mask, cpumask_of_node(i));
-
- while (!cpumask_empty(node_mask)) {
- struct cfs_cpu_partition *part;
- int n;
-
- /*
- * Each emulated NUMA node has all allowed CPUs in
- * the mask.
- * End loop when all partitions have assigned CPUs.
- */
- if (cpt == ncpt)
- break;
-
- part = &cptab->ctb_parts[cpt];
+ num = num_online_cpus() / ncpt;
+ rem = num_online_cpus() % ncpt;
+ for_each_online_node(node) {
+ cpumask_copy(node_mask, cpumask_of_node(node));

- n = num - cpumask_weight(part->cpt_cpumask);
- LASSERT(n > 0);
+ while (cpt < ncpt && !cpumask_empty(node_mask)) {
+ struct cfs_cpu_partition *part = &cptab->ctb_parts[cpt];
+ int ncpu = cpumask_weight(part->cpt_cpumask);

- rc = cfs_cpt_choose_ncpus(cptab, cpt, node_mask, n);
+ rc = cfs_cpt_choose_ncpus(cptab, cpt, node_mask,
+ num - ncpu);
if (rc < 0)
goto failed_mask;

- LASSERT(num >= cpumask_weight(part->cpt_cpumask));
- if (num == cpumask_weight(part->cpt_cpumask))
+ ncpu = cpumask_weight(part->cpt_cpumask);
+ if (ncpu == num + !!(rem > 0)) {
cpt++;
+ rem--;
+ }
}
}

- if (cpt != ncpt ||
- num != cpumask_weight(cptab->ctb_parts[ncpt - 1].cpt_cpumask)) {
- CERROR("Expect %d(%d) CPU partitions but got %d(%d), CPU hotplug/unplug while setting?\n",
- cptab->ctb_nparts, num, cpt,
- cpumask_weight(cptab->ctb_parts[ncpt - 1].cpt_cpumask));
- goto failed_mask;
- }
-
free_cpumask_var(node_mask);

return cptab;
diff --git a/drivers/staging/lustre/lnet/lnet/lib-msg.c b/drivers/staging/lustre/lnet/lnet/lib-msg.c
index 0091273..27bdefa 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-msg.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-msg.c
@@ -568,6 +568,8 @@

/* number of CPUs */
container->msc_nfinalizers = cfs_cpt_weight(lnet_cpt_table(), cpt);
+ if (container->msc_nfinalizers == 0)
+ container->msc_nfinalizers = 1;

container->msc_finalizers = kvzalloc_cpt(container->msc_nfinalizers *
sizeof(*container->msc_finalizers),
--
1.8.3.1


2018-05-29 14:26:57

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 19/25] staging: lustre: libcfs: update debug messages

From: Dmitry Eremin <[email protected]>

For cfs_cpt_bind() change the CERROR to CDEBUG. Make the debug
message in cfs_cpt_table_create_pattern() more understandable.
Report rc value for when cfs_cpt_create_table() fails.

Signed-off-by: Dmitry Eremin <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata <[email protected]>
Reviewed-by: James Simmons <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index e12d337..7f1061e 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -480,7 +480,8 @@ void cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
/* caller doesn't know the partition ID */
cpt = cptab->ctb_cpu2cpt[cpu];
if (cpt < 0) { /* not set in this CPT-table */
- CDEBUG(D_INFO, "Try to unset cpu %d which is not in CPT-table %p\n",
+ CDEBUG(D_INFO,
+ "Try to unset cpu %d which is not in CPT-table %p\n",
cpt, cptab);
return;
}
@@ -506,7 +507,8 @@ int cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt,

if (!cpumask_weight(mask) ||
cpumask_any_and(mask, cpu_online_mask) >= nr_cpu_ids) {
- CDEBUG(D_INFO, "No online CPU is found in the CPU mask for CPU partition %d\n",
+ CDEBUG(D_INFO,
+ "No online CPU is found in the CPU mask for CPU partition %d\n",
cpt);
return 0;
}
@@ -683,7 +685,8 @@ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
}

if (cpumask_any_and(*cpumask, cpu_online_mask) >= nr_cpu_ids) {
- CERROR("No online CPU found in CPU partition %d, did someone do CPU hotplug on system? You might need to reload Lustre modules to keep system working well.\n",
+ CDEBUG(D_INFO,
+ "No online CPU found in CPU partition %d, did someone do CPU hotplug on system? You might need to reload Lustre modules to keep system working well.\n",
cpt);
return -EINVAL;
}
@@ -914,8 +917,8 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt)
failed_mask:
free_cpumask_var(node_mask);
failed:
- CERROR("Failed to setup CPU-partition-table with %d CPU-partitions, online HW nodes: %d, HW cpus: %d.\n",
- ncpt, num_online_nodes(), num_online_cpus());
+ CERROR("Failed (rc = %d) to setup CPU partition table with %d partitions, online HW NUMA nodes: %d, HW CPU cores: %d.\n",
+ rc, ncpt, num_online_nodes(), num_online_cpus());

if (cptab)
cfs_cpt_table_free(cptab);
@@ -1030,7 +1033,7 @@ static struct cfs_cpt_table *cfs_cpt_table_create_pattern(char *pattern)

bracket = strchr(str, ']');
if (!bracket) {
- CERROR("missing right bracket for cpt %d, %s\n",
+ CERROR("Missing right bracket for partition %d, %s\n",
cpt, str);
goto failed;
}
--
1.8.3.1


2018-05-29 14:27:32

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 15/25] staging: lustre: libcfs: rename i to node for cfs_cpt_set_nodemask

From: Dmitry Eremin <[email protected]>

Rename variable i to node to make code easier to understand.

Signed-off-by: Dmitry Eremin <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata <[email protected]>
Reviewed-by: James Simmons <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code

drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 14d5791..bac5601 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -575,10 +575,10 @@ void cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node)
int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int cpt,
const nodemask_t *mask)
{
- int i;
+ int node;

- for_each_node_mask(i, *mask) {
- if (!cfs_cpt_set_node(cptab, cpt, i))
+ for_each_node_mask(node, *mask) {
+ if (!cfs_cpt_set_node(cptab, cpt, node))
return 0;
}

@@ -589,10 +589,10 @@ int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int cpt,
void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab, int cpt,
const nodemask_t *mask)
{
- int i;
+ int node;

- for_each_node_mask(i, *mask)
- cfs_cpt_unset_node(cptab, cpt, i);
+ for_each_node_mask(node, *mask)
+ cfs_cpt_unset_node(cptab, cpt, node);
}
EXPORT_SYMBOL(cfs_cpt_unset_nodemask);

--
1.8.3.1


2018-05-29 14:27:37

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 13/25] staging: lustre: libcfs: fix libcfs_cpu coding style

From: Dmitry Eremin <[email protected]>

This patch bring the lustre CPT code into alignment with the
Linux kernel coding style.

Signed-off-by: Dmitry Eremin <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23304
Reviewed-by: James Simmons <[email protected]>
Reviewed-by: Doug Oucharek <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch to handle recent libcfs changes

.../lustre/include/linux/libcfs/libcfs_cpu.h | 76 ++++++++----------
drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 92 ++++++++--------------
2 files changed, 66 insertions(+), 102 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index d5237d0..2c97adf 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -144,8 +144,7 @@ struct cfs_cpt_table {
/**
* return total number of CPU partitions in \a cptab
*/
-int
-cfs_cpt_number(struct cfs_cpt_table *cptab);
+int cfs_cpt_number(struct cfs_cpt_table *cptab);
/**
* return number of HW cores or hyper-threadings in a CPU partition \a cpt
*/
@@ -207,25 +206,24 @@ void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab,
* remove all cpus in NUMA node \a node from CPU partition \a cpt
*/
void cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node);
-
/**
* add all cpus in node mask \a mask to CPU partition \a cpt
* return 1 if successfully set all CPUs, otherwise return 0
*/
int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab,
- int cpt, nodemask_t *mask);
+ int cpt, const nodemask_t *mask);
/**
* remove all cpus in node mask \a mask from CPU partition \a cpt
*/
void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
- int cpt, nodemask_t *mask);
+ int cpt, const nodemask_t *mask);
/**
* convert partition id \a cpt to numa node id, if there are more than one
* nodes in this partition, it might return a different node id each time.
*/
int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt);

-int cfs_cpu_init(void);
+int cfs_cpu_init(void);
void cfs_cpu_fini(void);

#else /* !CONFIG_SMP */
@@ -282,32 +280,29 @@ static inline int cfs_cpt_distance_print(struct cfs_cpt_table *cptab,
return rc;
}

-static inline cpumask_var_t *
-cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
+static inline cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab,
+ int cpt)
{
return &cptab->ctb_cpumask;
}

-static inline int
-cfs_cpt_number(struct cfs_cpt_table *cptab)
+static inline int cfs_cpt_number(struct cfs_cpt_table *cptab)
{
return 1;
}

-static inline int
-cfs_cpt_weight(struct cfs_cpt_table *cptab, int cpt)
+static inline int cfs_cpt_weight(struct cfs_cpt_table *cptab, int cpt)
{
return 1;
}

-static inline int
-cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt)
+static inline int cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt)
{
return 1;
}

-static inline nodemask_t *
-cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
+static inline nodemask_t *cfs_cpt_nodemask(struct cfs_cpt_table *cptab,
+ int cpt)
{
return &cptab->ctb_nodemask;
}
@@ -318,66 +313,61 @@ static inline unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab,
return 1;
}

-static inline int
-cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+static inline int cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt,
+ int cpu)
{
return 1;
}

-static inline void
-cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+static inline void cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt,
+ int cpu)
{
}

-static inline int
-cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt,
- const cpumask_t *mask)
+static inline int cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt,
+ const cpumask_t *mask)
{
return 1;
}

-static inline void
-cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
- const cpumask_t *mask)
+static inline void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
+ const cpumask_t *mask)
{
}

-static inline int
-cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt, int node)
+static inline int cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt,
+ int node)
{
return 1;
}

-static inline void
-cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node)
+static inline void cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt,
+ int node)
{
}

-static inline int
-cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
+static inline int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int cpt,
+ const nodemask_t *mask)
{
return 1;
}

-static inline void
-cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
+static inline void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
+ int cpt, const nodemask_t *mask)
{
}

-static inline int
-cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
+static inline int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
{
return 0;
}

-static inline int
-cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
+static inline int cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
{
return 0;
}

-static inline int
-cfs_cpt_of_cpu(struct cfs_cpt_table *cptab, int cpu)
+static inline int cfs_cpt_of_cpu(struct cfs_cpt_table *cptab, int cpu)
{
return 0;
}
@@ -387,14 +377,12 @@ static inline int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node)
return 0;
}

-static inline int
-cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
+static inline int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
{
return 0;
}

-static inline int
-cfs_cpu_init(void)
+static inline int cfs_cpu_init(void)
{
cfs_cpt_tab = cfs_cpt_table_alloc(1);

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index bf41ba3..fab6675 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -69,8 +69,7 @@
module_param(cpu_pattern, charp, 0444);
MODULE_PARM_DESC(cpu_pattern, "CPU partitions pattern");

-struct cfs_cpt_table *
-cfs_cpt_table_alloc(unsigned int ncpt)
+struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt)
{
struct cfs_cpt_table *cptab;
int i;
@@ -161,8 +160,7 @@ struct cfs_cpt_table *
}
EXPORT_SYMBOL(cfs_cpt_table_alloc);

-void
-cfs_cpt_table_free(struct cfs_cpt_table *cptab)
+void cfs_cpt_table_free(struct cfs_cpt_table *cptab)
{
int i;

@@ -186,8 +184,7 @@ struct cfs_cpt_table *
}
EXPORT_SYMBOL(cfs_cpt_table_free);

-int
-cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
+int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
{
char *tmp = buf;
int rc;
@@ -262,15 +259,13 @@ int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len)
}
EXPORT_SYMBOL(cfs_cpt_distance_print);

-int
-cfs_cpt_number(struct cfs_cpt_table *cptab)
+int cfs_cpt_number(struct cfs_cpt_table *cptab)
{
return cptab->ctb_nparts;
}
EXPORT_SYMBOL(cfs_cpt_number);

-int
-cfs_cpt_weight(struct cfs_cpt_table *cptab, int cpt)
+int cfs_cpt_weight(struct cfs_cpt_table *cptab, int cpt)
{
LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));

@@ -280,8 +275,7 @@ int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len)
}
EXPORT_SYMBOL(cfs_cpt_weight);

-int
-cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt)
+int cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt)
{
LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));

@@ -293,8 +287,7 @@ int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len)
}
EXPORT_SYMBOL(cfs_cpt_online);

-cpumask_var_t *
-cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
+cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
{
LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));

@@ -303,8 +296,7 @@ int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len)
}
EXPORT_SYMBOL(cfs_cpt_cpumask);

-nodemask_t *
-cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
+nodemask_t *cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
{
LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));

@@ -450,8 +442,7 @@ static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node)
}
}

-int
-cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+int cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
{
LASSERT(cpt >= 0 && cpt < cptab->ctb_nparts);

@@ -476,8 +467,7 @@ static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node)
}
EXPORT_SYMBOL(cfs_cpt_set_cpu);

-void
-cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+void cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
{
LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));

@@ -509,9 +499,8 @@ static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node)
}
EXPORT_SYMBOL(cfs_cpt_unset_cpu);

-int
-cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt,
- const cpumask_t *mask)
+int cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt,
+ const cpumask_t *mask)
{
int cpu;

@@ -531,9 +520,8 @@ static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node)
}
EXPORT_SYMBOL(cfs_cpt_set_cpumask);

-void
-cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
- const cpumask_t *mask)
+void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
+ const cpumask_t *mask)
{
int cpu;

@@ -542,8 +530,7 @@ static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node)
}
EXPORT_SYMBOL(cfs_cpt_unset_cpumask);

-int
-cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt, int node)
+int cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt, int node)
{
const cpumask_t *mask;
int cpu;
@@ -565,8 +552,7 @@ static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node)
}
EXPORT_SYMBOL(cfs_cpt_set_node);

-void
-cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node)
+void cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node)
{
const cpumask_t *mask;
int cpu;
@@ -586,8 +572,8 @@ static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node)
}
EXPORT_SYMBOL(cfs_cpt_unset_node);

-int
-cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
+int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int cpt,
+ const nodemask_t *mask)
{
int i;

@@ -600,8 +586,8 @@ static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node)
}
EXPORT_SYMBOL(cfs_cpt_set_nodemask);

-void
-cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
+void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab, int cpt,
+ const nodemask_t *mask)
{
int i;

@@ -610,8 +596,7 @@ static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node)
}
EXPORT_SYMBOL(cfs_cpt_unset_nodemask);

-int
-cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
+int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
{
nodemask_t *mask;
int weight;
@@ -643,8 +628,7 @@ static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node)
}
EXPORT_SYMBOL(cfs_cpt_spread_node);

-int
-cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
+int cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
{
int cpu;
int cpt;
@@ -664,8 +648,7 @@ static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node)
}
EXPORT_SYMBOL(cfs_cpt_current);

-int
-cfs_cpt_of_cpu(struct cfs_cpt_table *cptab, int cpu)
+int cfs_cpt_of_cpu(struct cfs_cpt_table *cptab, int cpu)
{
LASSERT(cpu >= 0 && cpu < nr_cpu_ids);

@@ -682,8 +665,7 @@ int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node)
}
EXPORT_SYMBOL(cfs_cpt_of_node);

-int
-cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
+int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
{
cpumask_var_t *cpumask;
nodemask_t *nodemask;
@@ -727,9 +709,8 @@ int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node)
* Choose max to \a number CPUs from \a node and set them in \a cpt.
* We always prefer to choose CPU in the same core/socket.
*/
-static int
-cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt,
- cpumask_t *node, int number)
+static int cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt,
+ cpumask_t *node, int number)
{
cpumask_var_t socket;
cpumask_var_t core;
@@ -805,8 +786,7 @@ int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node)

#define CPT_WEIGHT_MIN 4u

-static unsigned int
-cfs_cpt_num_estimate(void)
+static unsigned int cfs_cpt_num_estimate(void)
{
unsigned int nnode = num_online_nodes();
unsigned int ncpu = num_online_cpus();
@@ -848,8 +828,7 @@ int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node)
return ncpt;
}

-static struct cfs_cpt_table *
-cfs_cpt_table_create(int ncpt)
+static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt)
{
struct cfs_cpt_table *cptab = NULL;
cpumask_var_t mask;
@@ -932,9 +911,9 @@ int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node)

return cptab;

- failed_mask:
+failed_mask:
free_cpumask_var(mask);
- failed:
+failed:
CERROR("Failed to setup CPU-partition-table with %d CPU-partitions, online HW nodes: %d, HW cpus: %d.\n",
ncpt, num_online_nodes(), num_online_cpus());

@@ -944,8 +923,7 @@ int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node)
return NULL;
}

-static struct cfs_cpt_table *
-cfs_cpt_table_create_pattern(char *pattern)
+static struct cfs_cpt_table *cfs_cpt_table_create_pattern(char *pattern)
{
struct cfs_cpt_table *cptab;
char *str;
@@ -1089,7 +1067,7 @@ int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node)

return cptab;

- failed:
+failed:
cfs_cpt_table_free(cptab);
return NULL;
}
@@ -1116,8 +1094,7 @@ static int cfs_cpu_dead(unsigned int cpu)
}
#endif

-void
-cfs_cpu_fini(void)
+void cfs_cpu_fini(void)
{
if (cfs_cpt_tab)
cfs_cpt_table_free(cfs_cpt_tab);
@@ -1129,8 +1106,7 @@ static int cfs_cpu_dead(unsigned int cpu)
#endif
}

-int
-cfs_cpu_init(void)
+int cfs_cpu_init(void)
{
int ret;

--
1.8.3.1


2018-05-29 14:27:49

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 12/25] staging: lustre: libcfs: invert error handling for cfs_cpt_table_print

From: Amir Shehata <[email protected]>

Instead of setting rc to -EFBIG for several cases in the loop lets
just go to the out label on error which returns -E2BIG directly.

Signed-off-by: Amir Shehata <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber <[email protected]>
Reviewed-by: Doug Oucharek <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) New patch to replace several patches. Went crazy for the one
change per patch approach.

drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 25 ++++++++++---------------
1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 9ff9fe9..bf41ba3 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -190,29 +190,26 @@ struct cfs_cpt_table *
cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
{
char *tmp = buf;
- int rc = 0;
+ int rc;
int i;
int j;

for (i = 0; i < cptab->ctb_nparts; i++) {
- if (len > 0) {
- rc = snprintf(tmp, len, "%d\t:", i);
- len -= rc;
- }
+ if (len <= 0)
+ goto out;
+
+ rc = snprintf(tmp, len, "%d\t:", i);
+ len -= rc;

- if (len <= 0) {
- rc = -EFBIG;
+ if (len <= 0)
goto out;
- }

tmp += rc;
for_each_cpu(j, cptab->ctb_parts[i].cpt_cpumask) {
rc = snprintf(tmp, len, "%d ", j);
len -= rc;
- if (len <= 0) {
- rc = -EFBIG;
+ if (len <= 0)
goto out;
- }
tmp += rc;
}

@@ -221,11 +218,9 @@ struct cfs_cpt_table *
len--;
}

- out:
- if (rc < 0)
- return rc;
-
return tmp - buf;
+out:
+ return -E2BIG;
}
EXPORT_SYMBOL(cfs_cpt_table_print);

--
1.8.3.1


2018-05-29 14:29:00

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 11/25] staging: lustre: libcfs: provide debugfs files for distance handling

From: Amir Shehata <[email protected]>

On systems with large number of NUMA nodes and cores it is easy
to incorrectly configure their use with Lustre. Provide debugfs
files which can help track down any issues.

Signed-off-by: Amir Shehata <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber <[email protected]>
Reviewed-by: Doug Oucharek <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch. No code changes from original patch

drivers/staging/lustre/lnet/libcfs/module.c | 53 +++++++++++++++++++++++++++++
1 file changed, 53 insertions(+)

diff --git a/drivers/staging/lustre/lnet/libcfs/module.c b/drivers/staging/lustre/lnet/libcfs/module.c
index b438d456..d2dfc29 100644
--- a/drivers/staging/lustre/lnet/libcfs/module.c
+++ b/drivers/staging/lustre/lnet/libcfs/module.c
@@ -468,6 +468,53 @@ static int proc_cpt_table(struct ctl_table *table, int write,
__proc_cpt_table);
}

+static int __proc_cpt_distance(void *data, int write,
+ loff_t pos, void __user *buffer, int nob)
+{
+ char *buf = NULL;
+ int len = 4096;
+ int rc = 0;
+
+ if (write)
+ return -EPERM;
+
+ LASSERT(cfs_cpt_tab);
+
+ while (1) {
+ buf = kzalloc(len, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ rc = cfs_cpt_distance_print(cfs_cpt_tab, buf, len);
+ if (rc >= 0)
+ break;
+
+ if (rc == -EFBIG) {
+ kfree(buf);
+ len <<= 1;
+ continue;
+ }
+ goto out;
+ }
+
+ if (pos >= rc) {
+ rc = 0;
+ goto out;
+ }
+
+ rc = cfs_trace_copyout_string(buffer, nob, buf + pos, NULL);
+out:
+ kfree(buf);
+ return rc;
+}
+
+static int proc_cpt_distance(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ return lprocfs_call_handler(table->data, write, ppos, buffer, lenp,
+ __proc_cpt_distance);
+}
+
static struct ctl_table lnet_table[] = {
{
.procname = "debug",
@@ -497,6 +544,12 @@ static int proc_cpt_table(struct ctl_table *table, int write,
.proc_handler = &proc_cpt_table,
},
{
+ .procname = "cpu_partition_distance",
+ .maxlen = 128,
+ .mode = 0444,
+ .proc_handler = &proc_cpt_distance,
+ },
+ {
.procname = "debug_log_upcall",
.data = lnet_debug_log_upcall,
.maxlen = sizeof(lnet_debug_log_upcall),
--
1.8.3.1


2018-05-29 14:29:11

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 05/25] staging: lustre: libcfs: replace MAX_NUMNODES with nr_node_ids

From: Amir Shehata <[email protected]>

Replace MAX_NUMNODES which is considered deprocated with
nr_nodes_ids. Looking at page_malloc.c you will see that
nr_nodes_ids is equal to MAX_NUMNODES. MAX_NUMNODES is
actually setup with Kconfig.

Signed-off-by: Amir Shehata <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber <[email protected]>
Reviewed-by: Doug Oucharek <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Same code but added in more details in commit message

drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index b67a60c..d3017e8 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -395,7 +395,7 @@ struct cfs_cpt_table *
{
const cpumask_t *mask;

- if (node < 0 || node >= MAX_NUMNODES) {
+ if (node < 0 || node >= nr_node_ids) {
CDEBUG(D_INFO,
"Invalid NUMA id %d for CPU partition %d\n", node, cpt);
return 0;
@@ -412,7 +412,7 @@ struct cfs_cpt_table *
{
const cpumask_t *mask;

- if (node < 0 || node >= MAX_NUMNODES) {
+ if (node < 0 || node >= nr_node_ids) {
CDEBUG(D_INFO,
"Invalid NUMA id %d for CPU partition %d\n", node, cpt);
return;
@@ -836,7 +836,7 @@ struct cfs_cpt_table *
return cptab;
}

- high = node ? MAX_NUMNODES - 1 : nr_cpu_ids - 1;
+ high = node ? nr_node_ids - 1 : nr_cpu_ids - 1;

for (str = strim(pattern), c = 0;; c++) {
struct cfs_range_expr *range;
--
1.8.3.1


2018-05-29 14:29:12

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 10/25] staging: lustre: libcfs: use distance in cpu and node handling

From: Amir Shehata <[email protected]>

Take into consideration the location of NUMA nodes and core
when calling cfs_cpt_[un]set_cpu() and cfs_cpt_[un]set_node().
This enables functioning on platforms with 100s of cores and
NUMA nodes.

Signed-off-by: Amir Shehata <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber <[email protected]>
Reviewed-by: Doug Oucharek <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch to handle recent libcfs changes

drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 192 ++++++++++++++++++------
1 file changed, 143 insertions(+), 49 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 2a74e51..9ff9fe9 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -330,11 +330,134 @@ unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2)
}
EXPORT_SYMBOL(cfs_cpt_distance);

+/*
+ * Calculate the maximum NUMA distance between all nodes in the
+ * from_mask and all nodes in the to_mask.
+ */
+static unsigned int cfs_cpt_distance_calculate(nodemask_t *from_mask,
+ nodemask_t *to_mask)
+{
+ unsigned int maximum;
+ unsigned int distance;
+ int from;
+ int to;
+
+ maximum = 0;
+ for_each_node_mask(from, *from_mask) {
+ for_each_node_mask(to, *to_mask) {
+ distance = node_distance(from, to);
+ if (maximum < distance)
+ maximum = distance;
+ }
+ }
+ return maximum;
+}
+
+static void cfs_cpt_add_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+{
+ cptab->ctb_cpu2cpt[cpu] = cpt;
+
+ cpumask_set_cpu(cpu, cptab->ctb_cpumask);
+ cpumask_set_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
+}
+
+static void cfs_cpt_del_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+{
+ cpumask_clear_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
+ cpumask_clear_cpu(cpu, cptab->ctb_cpumask);
+
+ cptab->ctb_cpu2cpt[cpu] = -1;
+}
+
+static void cfs_cpt_add_node(struct cfs_cpt_table *cptab, int cpt, int node)
+{
+ struct cfs_cpu_partition *part;
+
+ if (!node_isset(node, *cptab->ctb_nodemask)) {
+ unsigned int dist;
+
+ /* first time node is added to the CPT table */
+ node_set(node, *cptab->ctb_nodemask);
+ cptab->ctb_node2cpt[node] = cpt;
+
+ dist = cfs_cpt_distance_calculate(cptab->ctb_nodemask,
+ cptab->ctb_nodemask);
+ cptab->ctb_distance = dist;
+ }
+
+ part = &cptab->ctb_parts[cpt];
+ if (!node_isset(node, *part->cpt_nodemask)) {
+ int cpt2;
+
+ /* first time node is added to this CPT */
+ node_set(node, *part->cpt_nodemask);
+ for (cpt2 = 0; cpt2 < cptab->ctb_nparts; cpt2++) {
+ struct cfs_cpu_partition *part2;
+ unsigned int dist;
+
+ part2 = &cptab->ctb_parts[cpt2];
+ dist = cfs_cpt_distance_calculate(part->cpt_nodemask,
+ part2->cpt_nodemask);
+ part->cpt_distance[cpt2] = dist;
+ dist = cfs_cpt_distance_calculate(part2->cpt_nodemask,
+ part->cpt_nodemask);
+ part2->cpt_distance[cpt] = dist;
+ }
+ }
+}
+
+static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node)
+{
+ struct cfs_cpu_partition *part = &cptab->ctb_parts[cpt];
+ int cpu;
+
+ for_each_cpu(cpu, part->cpt_cpumask) {
+ /* this CPT has other CPU belonging to this node? */
+ if (cpu_to_node(cpu) == node)
+ break;
+ }
+
+ if (cpu >= nr_cpu_ids && node_isset(node, *part->cpt_nodemask)) {
+ int cpt2;
+
+ /* No more CPUs in the node for this CPT. */
+ node_clear(node, *part->cpt_nodemask);
+ for (cpt2 = 0; cpt2 < cptab->ctb_nparts; cpt2++) {
+ struct cfs_cpu_partition *part2;
+ unsigned int dist;
+
+ part2 = &cptab->ctb_parts[cpt2];
+ if (node_isset(node, *part2->cpt_nodemask))
+ cptab->ctb_node2cpt[node] = cpt2;
+
+ dist = cfs_cpt_distance_calculate(part->cpt_nodemask,
+ part2->cpt_nodemask);
+ part->cpt_distance[cpt2] = dist;
+ dist = cfs_cpt_distance_calculate(part2->cpt_nodemask,
+ part->cpt_nodemask);
+ part2->cpt_distance[cpt] = dist;
+ }
+ }
+
+ for_each_cpu(cpu, cptab->ctb_cpumask) {
+ /* this CPT-table has other CPUs belonging to this node? */
+ if (cpu_to_node(cpu) == node)
+ break;
+ }
+
+ if (cpu >= nr_cpu_ids && node_isset(node, *cptab->ctb_nodemask)) {
+ /* No more CPUs in the table for this node. */
+ node_clear(node, *cptab->ctb_nodemask);
+ cptab->ctb_node2cpt[node] = -1;
+ cptab->ctb_distance =
+ cfs_cpt_distance_calculate(cptab->ctb_nodemask,
+ cptab->ctb_nodemask);
+ }
+}
+
int
cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
{
- int node;
-
LASSERT(cpt >= 0 && cpt < cptab->ctb_nparts);

if (cpu < 0 || cpu >= nr_cpu_ids || !cpu_online(cpu)) {
@@ -348,23 +471,11 @@ unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2)
return 0;
}

- cptab->ctb_cpu2cpt[cpu] = cpt;
-
LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_cpumask));
LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask));

- cpumask_set_cpu(cpu, cptab->ctb_cpumask);
- cpumask_set_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
-
- node = cpu_to_node(cpu);
-
- /* first CPU of @node in this CPT table */
- if (!node_isset(node, *cptab->ctb_nodemask))
- node_set(node, *cptab->ctb_nodemask);
-
- /* first CPU of @node in this partition */
- if (!node_isset(node, *cptab->ctb_parts[cpt].cpt_nodemask))
- node_set(node, *cptab->ctb_parts[cpt].cpt_nodemask);
+ cfs_cpt_add_cpu(cptab, cpt, cpu);
+ cfs_cpt_add_node(cptab, cpt, cpu_to_node(cpu));

return 1;
}
@@ -373,9 +484,6 @@ unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2)
void
cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
{
- int node;
- int i;
-
LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));

if (cpu < 0 || cpu >= nr_cpu_ids) {
@@ -401,32 +509,8 @@ unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2)
LASSERT(cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask));
LASSERT(cpumask_test_cpu(cpu, cptab->ctb_cpumask));

- cpumask_clear_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
- cpumask_clear_cpu(cpu, cptab->ctb_cpumask);
- cptab->ctb_cpu2cpt[cpu] = -1;
-
- node = cpu_to_node(cpu);
-
- LASSERT(node_isset(node, *cptab->ctb_parts[cpt].cpt_nodemask));
- LASSERT(node_isset(node, *cptab->ctb_nodemask));
-
- for_each_cpu(i, cptab->ctb_parts[cpt].cpt_cpumask) {
- /* this CPT has other CPU belonging to this node? */
- if (cpu_to_node(i) == node)
- break;
- }
-
- if (i >= nr_cpu_ids)
- node_clear(node, *cptab->ctb_parts[cpt].cpt_nodemask);
-
- for_each_cpu(i, cptab->ctb_cpumask) {
- /* this CPT-table has other CPU belonging to this node? */
- if (cpu_to_node(i) == node)
- break;
- }
-
- if (i >= nr_cpu_ids)
- node_clear(node, *cptab->ctb_nodemask);
+ cfs_cpt_del_cpu(cptab, cpt, cpu);
+ cfs_cpt_del_node(cptab, cpt, cpu_to_node(cpu));
}
EXPORT_SYMBOL(cfs_cpt_unset_cpu);

@@ -444,8 +528,8 @@ unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2)
}

for_each_cpu(cpu, mask) {
- if (!cfs_cpt_set_cpu(cptab, cpt, cpu))
- return 0;
+ cfs_cpt_add_cpu(cptab, cpt, cpu);
+ cfs_cpt_add_node(cptab, cpt, cpu_to_node(cpu));
}

return 1;
@@ -467,6 +551,7 @@ unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2)
cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt, int node)
{
const cpumask_t *mask;
+ int cpu;

if (node < 0 || node >= nr_node_ids) {
CDEBUG(D_INFO,
@@ -476,7 +561,12 @@ unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2)

mask = cpumask_of_node(node);

- return cfs_cpt_set_cpumask(cptab, cpt, mask);
+ for_each_cpu(cpu, mask)
+ cfs_cpt_add_cpu(cptab, cpt, cpu);
+
+ cfs_cpt_add_node(cptab, cpt, node);
+
+ return 1;
}
EXPORT_SYMBOL(cfs_cpt_set_node);

@@ -484,6 +574,7 @@ unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2)
cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node)
{
const cpumask_t *mask;
+ int cpu;

if (node < 0 || node >= nr_node_ids) {
CDEBUG(D_INFO,
@@ -493,7 +584,10 @@ unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2)

mask = cpumask_of_node(node);

- cfs_cpt_unset_cpumask(cptab, cpt, mask);
+ for_each_cpu(cpu, mask)
+ cfs_cpt_del_cpu(cptab, cpt, cpu);
+
+ cfs_cpt_del_node(cptab, cpt, node);
}
EXPORT_SYMBOL(cfs_cpt_unset_node);

--
1.8.3.1


2018-05-29 14:29:21

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 08/25] staging: lustre: libcfs: NUMA support

From: Amir Shehata <[email protected]>

This patch adds NUMA node support. NUMA node information is stored
in the CPT table. A NUMA node mask is maintained for the entire
table as well as for each CPT to track the NUMA nodes related to
each of the CPTs. Add new function cfs_cpt_of_node() which returns
the CPT of a particular NUMA node.

Signed-off-by: Amir Shehata <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber <[email protected]>
Reviewed-by: Doug Oucharek <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch to handle recent libcfs changes

.../lustre/include/linux/libcfs/libcfs_cpu.h | 11 +++++++++++
drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 21 +++++++++++++++++++++
2 files changed, 32 insertions(+)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 3626969..487625d 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -103,6 +103,8 @@ struct cfs_cpt_table {
struct cfs_cpu_partition *ctb_parts;
/* shadow HW CPU to CPU partition ID */
int *ctb_cpu2cpt;
+ /* shadow HW node to CPU partition ID */
+ int *ctb_node2cpt;
/* all nodes in this partition table */
nodemask_t *ctb_nodemask;
#else
@@ -157,6 +159,10 @@ struct cfs_cpt_table {
*/
int cfs_cpt_of_cpu(struct cfs_cpt_table *cptab, int cpu);
/**
+ * shadow HW node ID \a NODE to CPU-partition ID by \a cptab
+ */
+int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node);
+/**
* bind current thread on a CPU-partition \a cpt of \a cptab
*/
int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt);
@@ -345,6 +351,11 @@ static inline int cfs_cpt_table_print(struct cfs_cpt_table *cptab,
return 0;
}

+static inline int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node)
+{
+ return 0;
+}
+
static inline int
cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
{
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 3f855a8..f616073 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -98,6 +98,15 @@ struct cfs_cpt_table *
memset(cptab->ctb_cpu2cpt, -1,
nr_cpu_ids * sizeof(cptab->ctb_cpu2cpt[0]));

+ cptab->ctb_node2cpt = kvmalloc_array(nr_node_ids,
+ sizeof(cptab->ctb_node2cpt[0]),
+ GFP_KERNEL);
+ if (!cptab->ctb_node2cpt)
+ goto failed_alloc_node2cpt;
+
+ memset(cptab->ctb_node2cpt, -1,
+ nr_node_ids * sizeof(cptab->ctb_node2cpt[0]));
+
cptab->ctb_parts = kvmalloc_array(ncpt, sizeof(cptab->ctb_parts[0]),
GFP_KERNEL);
if (!cptab->ctb_parts)
@@ -129,6 +138,8 @@ struct cfs_cpt_table *

kvfree(cptab->ctb_parts);
failed_alloc_ctb_parts:
+ kvfree(cptab->ctb_node2cpt);
+failed_alloc_node2cpt:
kvfree(cptab->ctb_cpu2cpt);
failed_alloc_cpu2cpt:
kfree(cptab->ctb_nodemask);
@@ -146,6 +157,7 @@ struct cfs_cpt_table *
int i;

kvfree(cptab->ctb_cpu2cpt);
+ kvfree(cptab->ctb_node2cpt);

for (i = 0; cptab->ctb_parts && i < cptab->ctb_nparts; i++) {
struct cfs_cpu_partition *part = &cptab->ctb_parts[i];
@@ -511,6 +523,15 @@ struct cfs_cpt_table *
}
EXPORT_SYMBOL(cfs_cpt_of_cpu);

+int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node)
+{
+ if (node < 0 || node > nr_node_ids)
+ return CFS_CPT_ANY;
+
+ return cptab->ctb_node2cpt[node];
+}
+EXPORT_SYMBOL(cfs_cpt_of_node);
+
int
cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
{
--
1.8.3.1


2018-05-29 14:29:22

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 09/25] staging: lustre: libcfs: add cpu distance handling

From: Amir Shehata <[email protected]>

Add functionality to calculate the distance between two CPTs.
Expose those distance in debugfs so people deploying a setup
can debug what is being created for CPTs.

Signed-off-by: Amir Shehata <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber <[email protected]>
Reviewed-by: Doug Oucharek <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch to handle recent libcfs changes

.../lustre/include/linux/libcfs/libcfs_cpu.h | 31 +++++++++++
drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 61 ++++++++++++++++++++++
2 files changed, 92 insertions(+)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 487625d..d5237d0 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -87,6 +87,8 @@ struct cfs_cpu_partition {
cpumask_var_t cpt_cpumask;
/* nodes mask for this partition */
nodemask_t *cpt_nodemask;
+ /* NUMA distance between CPTs */
+ unsigned int *cpt_distance;
/* spread rotor for NUMA allocator */
unsigned int cpt_spread_rotor;
};
@@ -97,6 +99,8 @@ struct cfs_cpt_table {
#ifdef CONFIG_SMP
/* spread rotor for NUMA allocator */
unsigned int ctb_spread_rotor;
+ /* maximum NUMA distance between all nodes in table */
+ unsigned int ctb_distance;
/* # of CPU partitions */
unsigned int ctb_nparts;
/* partitions tables */
@@ -134,6 +138,10 @@ struct cfs_cpt_table {
*/
int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len);
/**
+ * print distance information of cpt-table
+ */
+int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len);
+/**
* return total number of CPU partitions in \a cptab
*/
int
@@ -163,6 +171,10 @@ struct cfs_cpt_table {
*/
int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node);
/**
+ * NUMA distance between \a cpt1 and \a cpt2 in \a cptab
+ */
+unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2);
+/**
* bind current thread on a CPU-partition \a cpt of \a cptab
*/
int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt);
@@ -257,6 +269,19 @@ static inline int cfs_cpt_table_print(struct cfs_cpt_table *cptab,
return rc;
}

+static inline int cfs_cpt_distance_print(struct cfs_cpt_table *cptab,
+ char *buf, int len)
+{
+ int rc;
+
+ rc = snprintf(buf, len, "0\t: 0:1\n");
+ len -= rc;
+ if (len <= 0)
+ return -EFBIG;
+
+ return rc;
+}
+
static inline cpumask_var_t *
cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
{
@@ -287,6 +312,12 @@ static inline int cfs_cpt_table_print(struct cfs_cpt_table *cptab,
return &cptab->ctb_nodemask;
}

+static inline unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab,
+ int cpt1, int cpt2)
+{
+ return 1;
+}
+
static inline int
cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
{
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index f616073..2a74e51 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -124,6 +124,15 @@ struct cfs_cpt_table *
GFP_NOFS);
if (!part->cpt_nodemask)
goto failed_setting_ctb_parts;
+
+ part->cpt_distance = kvmalloc_array(cptab->ctb_nparts,
+ sizeof(part->cpt_distance[0]),
+ GFP_KERNEL);
+ if (!part->cpt_distance)
+ goto failed_setting_ctb_parts;
+
+ memset(part->cpt_distance, -1,
+ cptab->ctb_nparts * sizeof(part->cpt_distance[0]));
}

return cptab;
@@ -134,6 +143,7 @@ struct cfs_cpt_table *

kfree(part->cpt_nodemask);
free_cpumask_var(part->cpt_cpumask);
+ kvfree(part->cpt_distance);
}

kvfree(cptab->ctb_parts);
@@ -164,6 +174,7 @@ struct cfs_cpt_table *

kfree(part->cpt_nodemask);
free_cpumask_var(part->cpt_cpumask);
+ kvfree(part->cpt_distance);
}

kvfree(cptab->ctb_parts);
@@ -218,6 +229,44 @@ struct cfs_cpt_table *
}
EXPORT_SYMBOL(cfs_cpt_table_print);

+int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len)
+{
+ char *tmp = buf;
+ int rc;
+ int i;
+ int j;
+
+ for (i = 0; i < cptab->ctb_nparts; i++) {
+ if (len <= 0)
+ goto err;
+
+ rc = snprintf(tmp, len, "%d\t:", i);
+ len -= rc;
+
+ if (len <= 0)
+ goto err;
+
+ tmp += rc;
+ for (j = 0; j < cptab->ctb_nparts; j++) {
+ rc = snprintf(tmp, len, " %d:%d", j,
+ cptab->ctb_parts[i].cpt_distance[j]);
+ len -= rc;
+ if (len <= 0)
+ goto err;
+ tmp += rc;
+ }
+
+ *tmp = '\n';
+ tmp++;
+ len--;
+ }
+
+ return tmp - buf;
+err:
+ return -E2BIG;
+}
+EXPORT_SYMBOL(cfs_cpt_distance_print);
+
int
cfs_cpt_number(struct cfs_cpt_table *cptab)
{
@@ -269,6 +318,18 @@ struct cfs_cpt_table *
}
EXPORT_SYMBOL(cfs_cpt_nodemask);

+unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2)
+{
+ LASSERT(cpt1 == CFS_CPT_ANY || (cpt1 >= 0 && cpt1 < cptab->ctb_nparts));
+ LASSERT(cpt2 == CFS_CPT_ANY || (cpt2 >= 0 && cpt2 < cptab->ctb_nparts));
+
+ if (cpt1 == CFS_CPT_ANY || cpt2 == CFS_CPT_ANY)
+ return cptab->ctb_distance;
+
+ return cptab->ctb_parts[cpt1].cpt_distance[cpt2];
+}
+EXPORT_SYMBOL(cfs_cpt_distance);
+
int
cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
{
--
1.8.3.1


2018-05-29 14:30:43

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 v2 07/25] staging: lustre: libcfs: replace num_possible_cpus() with nr_cpu_ids

From: Amir Shehata <[email protected]>

Move from num_possible_cpus() to nr_cpu_ids.

Signed-off-by: Amir Shehata <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber <[email protected]>
Reviewed-by: Doug Oucharek <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased patch. Same code

drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index d9d1388..3f855a8 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -89,14 +89,14 @@ struct cfs_cpt_table *
if (!cptab->ctb_nodemask)
goto failed_alloc_nodemask;

- cptab->ctb_cpu2cpt = kvmalloc_array(num_possible_cpus(),
+ cptab->ctb_cpu2cpt = kvmalloc_array(nr_cpu_ids,
sizeof(cptab->ctb_cpu2cpt[0]),
GFP_KERNEL);
if (!cptab->ctb_cpu2cpt)
goto failed_alloc_cpu2cpt;

memset(cptab->ctb_cpu2cpt, -1,
- num_possible_cpus() * sizeof(cptab->ctb_cpu2cpt[0]));
+ nr_cpu_ids * sizeof(cptab->ctb_cpu2cpt[0]));

cptab->ctb_parts = kvmalloc_array(ncpt, sizeof(cptab->ctb_parts[0]),
GFP_KERNEL);
--
1.8.3.1


2018-05-29 14:30:55

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 04/25] staging: lustre: libcfs: properly handle failure cases in SMP code

While pushing the SMP work some bugs were pointed out by Dan
Carpenter in the code. Due to single err label in cfs_cpu_init()
and cfs_cpt_table_alloc() a few items were being cleaned up that
were never initialized. This can lead to crashed and other problems.
In those initialization function introduce individual labels to
jump to only the thing initialized get freed on failure.

Signed-off-by: James Simmons <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10932
Reviewed-on: https://review.whamcloud.com/32085
Reviewed-by: Dmitry Eremin <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) New patch to make libcfs SMP code handle failure paths correctly.

drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 72 ++++++++++++++++++-------
1 file changed, 52 insertions(+), 20 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 34df7ed..b67a60c 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -81,17 +81,19 @@ struct cfs_cpt_table *

cptab->ctb_nparts = ncpt;

+ if (!zalloc_cpumask_var(&cptab->ctb_cpumask, GFP_NOFS))
+ goto failed_alloc_cpumask;
+
cptab->ctb_nodemask = kzalloc(sizeof(*cptab->ctb_nodemask),
GFP_NOFS);
- if (!zalloc_cpumask_var(&cptab->ctb_cpumask, GFP_NOFS) ||
- !cptab->ctb_nodemask)
- goto failed;
+ if (!cptab->ctb_nodemask)
+ goto failed_alloc_nodemask;

cptab->ctb_cpu2cpt = kvmalloc_array(num_possible_cpus(),
sizeof(cptab->ctb_cpu2cpt[0]),
GFP_KERNEL);
if (!cptab->ctb_cpu2cpt)
- goto failed;
+ goto failed_alloc_cpu2cpt;

memset(cptab->ctb_cpu2cpt, -1,
num_possible_cpus() * sizeof(cptab->ctb_cpu2cpt[0]));
@@ -99,22 +101,41 @@ struct cfs_cpt_table *
cptab->ctb_parts = kvmalloc_array(ncpt, sizeof(cptab->ctb_parts[0]),
GFP_KERNEL);
if (!cptab->ctb_parts)
- goto failed;
+ goto failed_alloc_ctb_parts;
+
+ memset(cptab->ctb_parts, -1, ncpt * sizeof(cptab->ctb_parts[0]));

for (i = 0; i < ncpt; i++) {
struct cfs_cpu_partition *part = &cptab->ctb_parts[i];

+ if (!zalloc_cpumask_var(&part->cpt_cpumask, GFP_NOFS))
+ goto failed_setting_ctb_parts;
+
part->cpt_nodemask = kzalloc(sizeof(*part->cpt_nodemask),
GFP_NOFS);
- if (!zalloc_cpumask_var(&part->cpt_cpumask, GFP_NOFS) ||
- !part->cpt_nodemask)
- goto failed;
+ if (!part->cpt_nodemask)
+ goto failed_setting_ctb_parts;
}

return cptab;

- failed:
- cfs_cpt_table_free(cptab);
+failed_setting_ctb_parts:
+ while (i-- >= 0) {
+ struct cfs_cpu_partition *part = &cptab->ctb_parts[i];
+
+ kfree(part->cpt_nodemask);
+ free_cpumask_var(part->cpt_cpumask);
+ }
+
+ kvfree(cptab->ctb_parts);
+failed_alloc_ctb_parts:
+ kvfree(cptab->ctb_cpu2cpt);
+failed_alloc_cpu2cpt:
+ kfree(cptab->ctb_nodemask);
+failed_alloc_nodemask:
+ free_cpumask_var(cptab->ctb_cpumask);
+failed_alloc_cpumask:
+ kfree(cptab);
return NULL;
}
EXPORT_SYMBOL(cfs_cpt_table_alloc);
@@ -940,7 +961,7 @@ static int cfs_cpu_dead(unsigned int cpu)
int
cfs_cpu_init(void)
{
- int ret = 0;
+ int ret;

LASSERT(!cfs_cpt_tab);

@@ -949,23 +970,23 @@ static int cfs_cpu_dead(unsigned int cpu)
"staging/lustre/cfe:dead", NULL,
cfs_cpu_dead);
if (ret < 0)
- goto failed;
+ goto failed_cpu_dead;
+
ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
"staging/lustre/cfe:online",
cfs_cpu_online, NULL);
if (ret < 0)
- goto failed;
+ goto failed_cpu_online;
+
lustre_cpu_online = ret;
#endif
- ret = -EINVAL;
-
get_online_cpus();
if (*cpu_pattern) {
char *cpu_pattern_dup = kstrdup(cpu_pattern, GFP_KERNEL);

if (!cpu_pattern_dup) {
CERROR("Failed to duplicate cpu_pattern\n");
- goto failed;
+ goto failed_alloc_table;
}

cfs_cpt_tab = cfs_cpt_table_create_pattern(cpu_pattern_dup);
@@ -973,7 +994,7 @@ static int cfs_cpu_dead(unsigned int cpu)
if (!cfs_cpt_tab) {
CERROR("Failed to create cptab from pattern %s\n",
cpu_pattern);
- goto failed;
+ goto failed_alloc_table;
}

} else {
@@ -981,7 +1002,7 @@ static int cfs_cpu_dead(unsigned int cpu)
if (!cfs_cpt_tab) {
CERROR("Failed to create ptable with npartitions %d\n",
cpu_npartitions);
- goto failed;
+ goto failed_alloc_table;
}
}

@@ -992,8 +1013,19 @@ static int cfs_cpu_dead(unsigned int cpu)
cfs_cpt_number(cfs_cpt_tab));
return 0;

- failed:
+failed_alloc_table:
put_online_cpus();
- cfs_cpu_fini();
+
+ if (cfs_cpt_tab)
+ cfs_cpt_table_free(cfs_cpt_tab);
+
+ ret = -EINVAL;
+#ifdef CONFIG_HOTPLUG_CPU
+ if (lustre_cpu_online > 0)
+ cpuhp_remove_state_nocalls(lustre_cpu_online);
+failed_cpu_online:
+ cpuhp_remove_state_nocalls(CPUHP_LUSTRE_CFS_DEAD);
+failed_cpu_dead:
+#endif
return ret;
}
--
1.8.3.1


2018-05-29 14:30:58

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 02/25] staging: lustre: libcfs: remove useless CPU partition code

From: Dmitry Eremin <[email protected]>

* remove scratch buffer and mutex which guard it.
* remove global cpumask and spinlock which guard it.
* remove cpt_version for checking CPUs state change during setup
because of just disable CPUs state change during setup.
* remove whole global struct cfs_cpt_data cpt_data.
* remove few unused APIs.

Signed-off-by: Dmitry Eremin <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23303
Reviewed-on: https://review.whamcloud.com/25048
Reviewed-by: James Simmons <[email protected]>
Reviewed-by: Doug Oucharek <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: Olaf Weber <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) Initial patch
v2) Rebased to handle recent cleanups in libcfs

.../lustre/include/linux/libcfs/libcfs_cpu.h | 32 ++----
drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 115 +++------------------
2 files changed, 22 insertions(+), 125 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 2ad12a6..3626969 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -95,8 +95,6 @@ struct cfs_cpu_partition {
/** descriptor for CPU partitions */
struct cfs_cpt_table {
#ifdef CONFIG_SMP
- /* version, reserved for hotplug */
- unsigned int ctb_version;
/* spread rotor for NUMA allocator */
unsigned int ctb_spread_rotor;
/* # of CPU partitions */
@@ -176,12 +174,12 @@ struct cfs_cpt_table {
* return 1 if successfully set all CPUs, otherwise return 0
*/
int cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab,
- int cpt, cpumask_t *mask);
+ int cpt, const cpumask_t *mask);
/**
* remove all cpus in \a mask from CPU partition \a cpt
*/
void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab,
- int cpt, cpumask_t *mask);
+ int cpt, const cpumask_t *mask);
/**
* add all cpus in NUMA node \a node to CPU partition \a cpt
* return 1 if successfully set all CPUs, otherwise return 0
@@ -204,20 +202,11 @@ int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab,
void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
int cpt, nodemask_t *mask);
/**
- * unset all cpus for CPU partition \a cpt
- */
-void cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt);
-/**
* convert partition id \a cpt to numa node id, if there are more than one
* nodes in this partition, it might return a different node id each time.
*/
int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt);

-/**
- * return number of HTs in the same core of \a cpu
- */
-int cfs_cpu_ht_nsiblings(int cpu);
-
int cfs_cpu_init(void);
void cfs_cpu_fini(void);

@@ -304,13 +293,15 @@ static inline int cfs_cpt_table_print(struct cfs_cpt_table *cptab,
}

static inline int
-cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
+cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt,
+ const cpumask_t *mask)
{
return 1;
}

static inline void
-cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
+cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
+ const cpumask_t *mask)
{
}

@@ -336,11 +327,6 @@ static inline int cfs_cpt_table_print(struct cfs_cpt_table *cptab,
{
}

-static inline void
-cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt)
-{
-}
-
static inline int
cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
{
@@ -348,12 +334,6 @@ static inline int cfs_cpt_table_print(struct cfs_cpt_table *cptab,
}

static inline int
-cfs_cpu_ht_nsiblings(int cpu)
-{
- return 1;
-}
-
-static inline int
cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
{
return 0;
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 803fc58..951a9ca 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -69,19 +69,6 @@
module_param(cpu_pattern, charp, 0444);
MODULE_PARM_DESC(cpu_pattern, "CPU partitions pattern");

-static struct cfs_cpt_data {
- /* serialize hotplug etc */
- spinlock_t cpt_lock;
- /* reserved for hotplug */
- unsigned long cpt_version;
- /* mutex to protect cpt_cpumask */
- struct mutex cpt_mutex;
- /* scratch buffer for set/unset_node */
- cpumask_var_t cpt_cpumask;
-} cpt_data;
-
-#define CFS_CPU_VERSION_MAGIC 0xbabecafe
-
struct cfs_cpt_table *
cfs_cpt_table_alloc(unsigned int ncpt)
{
@@ -124,11 +111,6 @@ struct cfs_cpt_table *
goto failed;
}

- spin_lock(&cpt_data.cpt_lock);
- /* Reserved for hotplug */
- cptab->ctb_version = cpt_data.cpt_version;
- spin_unlock(&cpt_data.cpt_lock);
-
return cptab;

failed:
@@ -203,17 +185,6 @@ struct cfs_cpt_table *
}
EXPORT_SYMBOL(cfs_cpt_table_print);

-static void
-cfs_node_to_cpumask(int node, cpumask_t *mask)
-{
- const cpumask_t *tmp = cpumask_of_node(node);
-
- if (tmp)
- cpumask_copy(mask, tmp);
- else
- cpumask_clear(mask);
-}
-
int
cfs_cpt_number(struct cfs_cpt_table *cptab)
{
@@ -366,7 +337,8 @@ struct cfs_cpt_table *
EXPORT_SYMBOL(cfs_cpt_unset_cpu);

int
-cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
+cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt,
+ const cpumask_t *mask)
{
int i;

@@ -387,7 +359,8 @@ struct cfs_cpt_table *
EXPORT_SYMBOL(cfs_cpt_set_cpumask);

void
-cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
+cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
+ const cpumask_t *mask)
{
int i;

@@ -399,7 +372,7 @@ struct cfs_cpt_table *
int
cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt, int node)
{
- int rc;
+ const cpumask_t *mask;

if (node < 0 || node >= MAX_NUMNODES) {
CDEBUG(D_INFO,
@@ -407,34 +380,26 @@ struct cfs_cpt_table *
return 0;
}

- mutex_lock(&cpt_data.cpt_mutex);
-
- cfs_node_to_cpumask(node, cpt_data.cpt_cpumask);
-
- rc = cfs_cpt_set_cpumask(cptab, cpt, cpt_data.cpt_cpumask);
+ mask = cpumask_of_node(node);

- mutex_unlock(&cpt_data.cpt_mutex);
-
- return rc;
+ return cfs_cpt_set_cpumask(cptab, cpt, mask);
}
EXPORT_SYMBOL(cfs_cpt_set_node);

void
cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node)
{
+ const cpumask_t *mask;
+
if (node < 0 || node >= MAX_NUMNODES) {
CDEBUG(D_INFO,
"Invalid NUMA id %d for CPU partition %d\n", node, cpt);
return;
}

- mutex_lock(&cpt_data.cpt_mutex);
-
- cfs_node_to_cpumask(node, cpt_data.cpt_cpumask);
-
- cfs_cpt_unset_cpumask(cptab, cpt, cpt_data.cpt_cpumask);
+ mask = cpumask_of_node(node);

- mutex_unlock(&cpt_data.cpt_mutex);
+ cfs_cpt_unset_cpumask(cptab, cpt, mask);
}
EXPORT_SYMBOL(cfs_cpt_unset_node);

@@ -462,26 +427,6 @@ struct cfs_cpt_table *
}
EXPORT_SYMBOL(cfs_cpt_unset_nodemask);

-void
-cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt)
-{
- int last;
- int i;
-
- if (cpt == CFS_CPT_ANY) {
- last = cptab->ctb_nparts - 1;
- cpt = 0;
- } else {
- last = cpt;
- }
-
- for (; cpt <= last; cpt++) {
- for_each_cpu(i, cptab->ctb_parts[cpt].cpt_cpumask)
- cfs_cpt_unset_cpu(cptab, cpt, i);
- }
-}
-EXPORT_SYMBOL(cfs_cpt_clear);
-
int
cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
{
@@ -754,7 +699,7 @@ struct cfs_cpt_table *
}

for_each_online_node(i) {
- cfs_node_to_cpumask(i, mask);
+ cpumask_copy(mask, cpumask_of_node(i));

while (!cpumask_empty(mask)) {
struct cfs_cpu_partition *part;
@@ -960,16 +905,8 @@ struct cfs_cpt_table *
#ifdef CONFIG_HOTPLUG_CPU
static enum cpuhp_state lustre_cpu_online;

-static void cfs_cpu_incr_cpt_version(void)
-{
- spin_lock(&cpt_data.cpt_lock);
- cpt_data.cpt_version++;
- spin_unlock(&cpt_data.cpt_lock);
-}
-
static int cfs_cpu_online(unsigned int cpu)
{
- cfs_cpu_incr_cpt_version();
return 0;
}

@@ -977,14 +914,9 @@ static int cfs_cpu_dead(unsigned int cpu)
{
bool warn;

- cfs_cpu_incr_cpt_version();
-
- mutex_lock(&cpt_data.cpt_mutex);
/* if all HTs in a core are offline, it may break affinity */
- cpumask_copy(cpt_data.cpt_cpumask, topology_sibling_cpumask(cpu));
- warn = cpumask_any_and(cpt_data.cpt_cpumask,
+ warn = cpumask_any_and(topology_sibling_cpumask(cpu),
cpu_online_mask) >= nr_cpu_ids;
- mutex_unlock(&cpt_data.cpt_mutex);
CDEBUG(warn ? D_WARNING : D_INFO,
"Lustre: can't support CPU plug-out well now, performance and stability could be impacted [CPU %u]\n",
cpu);
@@ -1003,7 +935,6 @@ static int cfs_cpu_dead(unsigned int cpu)
cpuhp_remove_state_nocalls(lustre_cpu_online);
cpuhp_remove_state_nocalls(CPUHP_LUSTRE_CFS_DEAD);
#endif
- free_cpumask_var(cpt_data.cpt_cpumask);
}

int
@@ -1013,16 +944,6 @@ static int cfs_cpu_dead(unsigned int cpu)

LASSERT(!cfs_cpt_tab);

- memset(&cpt_data, 0, sizeof(cpt_data));
-
- if (!zalloc_cpumask_var(&cpt_data.cpt_cpumask, GFP_NOFS)) {
- CERROR("Failed to allocate scratch buffer\n");
- return -1;
- }
-
- spin_lock_init(&cpt_data.cpt_lock);
- mutex_init(&cpt_data.cpt_mutex);
-
#ifdef CONFIG_HOTPLUG_CPU
ret = cpuhp_setup_state_nocalls(CPUHP_LUSTRE_CFS_DEAD,
"staging/lustre/cfe:dead", NULL,
@@ -1038,6 +959,7 @@ static int cfs_cpu_dead(unsigned int cpu)
#endif
ret = -EINVAL;

+ get_online_cpus();
if (*cpu_pattern) {
char *cpu_pattern_dup = kstrdup(cpu_pattern, GFP_KERNEL);

@@ -1063,13 +985,7 @@ static int cfs_cpu_dead(unsigned int cpu)
}
}

- spin_lock(&cpt_data.cpt_lock);
- if (cfs_cpt_tab->ctb_version != cpt_data.cpt_version) {
- spin_unlock(&cpt_data.cpt_lock);
- CERROR("CPU hotplug/unplug during setup\n");
- goto failed;
- }
- spin_unlock(&cpt_data.cpt_lock);
+ put_online_cpus();

LCONSOLE(0, "HW nodes: %d, HW CPU cores: %d, npartitions: %d\n",
num_online_nodes(), num_online_cpus(),
@@ -1077,6 +993,7 @@ static int cfs_cpu_dead(unsigned int cpu)
return 0;

failed:
+ put_online_cpus();
cfs_cpu_fini();
return ret;
}
--
1.8.3.1


2018-05-29 14:31:06

by James Simmons

[permalink] [raw]
Subject: [PATCH v2 01/25] staging: lustre: libcfs: restore UMP handling

With the cleanup of the libcfs SMP handling all UMP handling
was removed. In the process now various NULL pointers and
empty fields are return in the UMP case which causes lustre
to crash hard. Restore the proper UMP handling so Lustre can
properly function.

Signed-off-by: James Simmons <[email protected]>
Signed-off-by: Amir Shehata <[email protected]>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber <[email protected]>
Reviewed-by: Doug Oucharek <[email protected]>
Signed-off-by: James Simmons <[email protected]>
---
Changelog:

v1) New patch to handle the disappearence of UMP support

.../lustre/include/linux/libcfs/libcfs_cpu.h | 87 ++++++++++++++++------
drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 4 -
drivers/staging/lustre/lnet/libcfs/module.c | 4 +
3 files changed, 69 insertions(+), 26 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 61641c4..2ad12a6 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -74,6 +74,7 @@

#include <linux/cpu.h>
#include <linux/cpuset.h>
+#include <linux/slab.h>
#include <linux/topology.h>

/* any CPU partition */
@@ -89,10 +90,11 @@ struct cfs_cpu_partition {
/* spread rotor for NUMA allocator */
unsigned int cpt_spread_rotor;
};
-
+#endif /* CONFIG_SMP */

/** descriptor for CPU partitions */
struct cfs_cpt_table {
+#ifdef CONFIG_SMP
/* version, reserved for hotplug */
unsigned int ctb_version;
/* spread rotor for NUMA allocator */
@@ -103,14 +105,26 @@ struct cfs_cpt_table {
struct cfs_cpu_partition *ctb_parts;
/* shadow HW CPU to CPU partition ID */
int *ctb_cpu2cpt;
- /* all cpus in this partition table */
- cpumask_var_t ctb_cpumask;
/* all nodes in this partition table */
nodemask_t *ctb_nodemask;
+#else
+ nodemask_t ctb_nodemask;
+#endif /* CONFIG_SMP */
+ /* all cpus in this partition table */
+ cpumask_var_t ctb_cpumask;
};

extern struct cfs_cpt_table *cfs_cpt_tab;

+#ifdef CONFIG_SMP
+/**
+ * destroy a CPU partition table
+ */
+void cfs_cpt_table_free(struct cfs_cpt_table *cptab);
+/**
+ * create a cfs_cpt_table with \a ncpt number of partitions
+ */
+struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt);
/**
* return cpumask of CPU partition \a cpt
*/
@@ -208,20 +222,52 @@ void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
void cfs_cpu_fini(void);

#else /* !CONFIG_SMP */
-struct cfs_cpt_table;
-#define cfs_cpt_tab ((struct cfs_cpt_table *)NULL)

-static inline cpumask_var_t *
-cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
+static inline void cfs_cpt_table_free(struct cfs_cpt_table *cptab)
{
- return NULL;
+ kfree(cptab);
}

-static inline int
-cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
+static inline struct cfs_cpt_table *cfs_cpt_table_alloc(int ncpt)
{
- return 0;
+ struct cfs_cpt_table *cptab;
+
+ if (ncpt != 1)
+ return NULL;
+
+ cptab = kzalloc(sizeof(*cptab), GFP_NOFS);
+ if (!cptab)
+ return NULL;
+
+ if (!zalloc_cpumask_var(&cptab->ctb_cpumask, GFP_NOFS)) {
+ kfree(cptab);
+ return NULL;
+ }
+ cpumask_set_cpu(0, cptab->ctb_cpumask);
+ node_set(0, cptab->ctb_nodemask);
+
+ return cptab;
+}
+
+static inline int cfs_cpt_table_print(struct cfs_cpt_table *cptab,
+ char *buf, int len)
+{
+ int rc;
+
+ rc = snprintf(buf, len, "0\t: 0\n");
+ len -= rc;
+ if (len <= 0)
+ return -EFBIG;
+
+ return rc;
}
+
+static inline cpumask_var_t *
+cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
+{
+ return &cptab->ctb_cpumask;
+}
+
static inline int
cfs_cpt_number(struct cfs_cpt_table *cptab)
{
@@ -243,7 +289,7 @@ void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
static inline nodemask_t *
cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
{
- return NULL;
+ return &cptab->ctb_nodemask;
}

static inline int
@@ -328,24 +374,21 @@ void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
static inline int
cfs_cpu_init(void)
{
- return 0;
+ cfs_cpt_tab = cfs_cpt_table_alloc(1);
+
+ return cfs_cpt_tab ? 0 : -1;
}

static inline void cfs_cpu_fini(void)
{
+ if (cfs_cpt_tab) {
+ cfs_cpt_table_free(cfs_cpt_tab);
+ cfs_cpt_tab = NULL;
+ }
}

#endif /* CONFIG_SMP */

-/**
- * destroy a CPU partition table
- */
-void cfs_cpt_table_free(struct cfs_cpt_table *cptab);
-/**
- * create a cfs_cpt_table with \a ncpt number of partitions
- */
-struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt);
-
/*
* allocate per-cpu-partition data, returned value is an array of pointers,
* variable can be indexed by CPU ID.
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 3d1cf45..803fc58 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -41,10 +41,6 @@
#include <linux/libcfs/libcfs_string.h>
#include <linux/libcfs/libcfs.h>

-/** Global CPU partition table */
-struct cfs_cpt_table *cfs_cpt_tab __read_mostly;
-EXPORT_SYMBOL(cfs_cpt_tab);
-
/**
* modparam for setting number of partitions
*
diff --git a/drivers/staging/lustre/lnet/libcfs/module.c b/drivers/staging/lustre/lnet/libcfs/module.c
index 5dc7de9..b438d456 100644
--- a/drivers/staging/lustre/lnet/libcfs/module.c
+++ b/drivers/staging/lustre/lnet/libcfs/module.c
@@ -66,6 +66,10 @@ struct lnet_debugfs_symlink_def {

static struct dentry *lnet_debugfs_root;

+/** Global CPU partition table */
+struct cfs_cpt_table *cfs_cpt_tab __read_mostly;
+EXPORT_SYMBOL(cfs_cpt_tab);
+
BLOCKING_NOTIFIER_HEAD(libcfs_ioctl_list);
EXPORT_SYMBOL(libcfs_ioctl_list);

--
1.8.3.1


2018-05-29 23:46:19

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH v2 01/25] staging: lustre: libcfs: restore UMP handling

On Tue, May 29 2018, James Simmons wrote:

> With the cleanup of the libcfs SMP handling all UMP handling
> was removed. In the process now various NULL pointers and
> empty fields are return in the UMP case which causes lustre
> to crash hard. Restore the proper UMP handling so Lustre can
> properly function.

Can't we just get lustre to handle the NULL pointer?
Is most cases, the pointer is accessed through an accessor function, and
on !CONFIG_SMP, that can be a static inline that doesn't even look at
the pointer.

I really think this is a step backwards. If you can identify specific
problems caused by the current code, I'm sure we can fix them.

>
> Signed-off-by: James Simmons <[email protected]>
> Signed-off-by: Amir Shehata <[email protected]>
> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734

This bug doesn't seem to mention this patch at all

> Reviewed-on: http://review.whamcloud.com/18916

Nor does this review.

Thanks,
NeilBrown


> Reviewed-by: Olaf Weber <[email protected]>
> Reviewed-by: Doug Oucharek <[email protected]>
> Signed-off-by: James Simmons <[email protected]>
> ---
> Changelog:
>
> v1) New patch to handle the disappearence of UMP support
>
> .../lustre/include/linux/libcfs/libcfs_cpu.h | 87 ++++++++++++++++------
> drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 4 -
> drivers/staging/lustre/lnet/libcfs/module.c | 4 +
> 3 files changed, 69 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
> index 61641c4..2ad12a6 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
> @@ -74,6 +74,7 @@
>
> #include <linux/cpu.h>
> #include <linux/cpuset.h>
> +#include <linux/slab.h>
> #include <linux/topology.h>
>
> /* any CPU partition */
> @@ -89,10 +90,11 @@ struct cfs_cpu_partition {
> /* spread rotor for NUMA allocator */
> unsigned int cpt_spread_rotor;
> };
> -
> +#endif /* CONFIG_SMP */
>
> /** descriptor for CPU partitions */
> struct cfs_cpt_table {
> +#ifdef CONFIG_SMP
> /* version, reserved for hotplug */
> unsigned int ctb_version;
> /* spread rotor for NUMA allocator */
> @@ -103,14 +105,26 @@ struct cfs_cpt_table {
> struct cfs_cpu_partition *ctb_parts;
> /* shadow HW CPU to CPU partition ID */
> int *ctb_cpu2cpt;
> - /* all cpus in this partition table */
> - cpumask_var_t ctb_cpumask;
> /* all nodes in this partition table */
> nodemask_t *ctb_nodemask;
> +#else
> + nodemask_t ctb_nodemask;
> +#endif /* CONFIG_SMP */
> + /* all cpus in this partition table */
> + cpumask_var_t ctb_cpumask;
> };
>
> extern struct cfs_cpt_table *cfs_cpt_tab;
>
> +#ifdef CONFIG_SMP
> +/**
> + * destroy a CPU partition table
> + */
> +void cfs_cpt_table_free(struct cfs_cpt_table *cptab);
> +/**
> + * create a cfs_cpt_table with \a ncpt number of partitions
> + */
> +struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt);
> /**
> * return cpumask of CPU partition \a cpt
> */
> @@ -208,20 +222,52 @@ void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
> void cfs_cpu_fini(void);
>
> #else /* !CONFIG_SMP */
> -struct cfs_cpt_table;
> -#define cfs_cpt_tab ((struct cfs_cpt_table *)NULL)
>
> -static inline cpumask_var_t *
> -cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
> +static inline void cfs_cpt_table_free(struct cfs_cpt_table *cptab)
> {
> - return NULL;
> + kfree(cptab);
> }
>
> -static inline int
> -cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
> +static inline struct cfs_cpt_table *cfs_cpt_table_alloc(int ncpt)
> {
> - return 0;
> + struct cfs_cpt_table *cptab;
> +
> + if (ncpt != 1)
> + return NULL;
> +
> + cptab = kzalloc(sizeof(*cptab), GFP_NOFS);
> + if (!cptab)
> + return NULL;
> +
> + if (!zalloc_cpumask_var(&cptab->ctb_cpumask, GFP_NOFS)) {
> + kfree(cptab);
> + return NULL;
> + }
> + cpumask_set_cpu(0, cptab->ctb_cpumask);
> + node_set(0, cptab->ctb_nodemask);
> +
> + return cptab;
> +}
> +
> +static inline int cfs_cpt_table_print(struct cfs_cpt_table *cptab,
> + char *buf, int len)
> +{
> + int rc;
> +
> + rc = snprintf(buf, len, "0\t: 0\n");
> + len -= rc;
> + if (len <= 0)
> + return -EFBIG;
> +
> + return rc;
> }
> +
> +static inline cpumask_var_t *
> +cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
> +{
> + return &cptab->ctb_cpumask;
> +}
> +
> static inline int
> cfs_cpt_number(struct cfs_cpt_table *cptab)
> {
> @@ -243,7 +289,7 @@ void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
> static inline nodemask_t *
> cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
> {
> - return NULL;
> + return &cptab->ctb_nodemask;
> }
>
> static inline int
> @@ -328,24 +374,21 @@ void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
> static inline int
> cfs_cpu_init(void)
> {
> - return 0;
> + cfs_cpt_tab = cfs_cpt_table_alloc(1);
> +
> + return cfs_cpt_tab ? 0 : -1;
> }
>
> static inline void cfs_cpu_fini(void)
> {
> + if (cfs_cpt_tab) {
> + cfs_cpt_table_free(cfs_cpt_tab);
> + cfs_cpt_tab = NULL;
> + }
> }
>
> #endif /* CONFIG_SMP */
>
> -/**
> - * destroy a CPU partition table
> - */
> -void cfs_cpt_table_free(struct cfs_cpt_table *cptab);
> -/**
> - * create a cfs_cpt_table with \a ncpt number of partitions
> - */
> -struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt);
> -
> /*
> * allocate per-cpu-partition data, returned value is an array of pointers,
> * variable can be indexed by CPU ID.
> diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
> index 3d1cf45..803fc58 100644
> --- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
> +++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
> @@ -41,10 +41,6 @@
> #include <linux/libcfs/libcfs_string.h>
> #include <linux/libcfs/libcfs.h>
>
> -/** Global CPU partition table */
> -struct cfs_cpt_table *cfs_cpt_tab __read_mostly;
> -EXPORT_SYMBOL(cfs_cpt_tab);
> -
> /**
> * modparam for setting number of partitions
> *
> diff --git a/drivers/staging/lustre/lnet/libcfs/module.c b/drivers/staging/lustre/lnet/libcfs/module.c
> index 5dc7de9..b438d456 100644
> --- a/drivers/staging/lustre/lnet/libcfs/module.c
> +++ b/drivers/staging/lustre/lnet/libcfs/module.c
> @@ -66,6 +66,10 @@ struct lnet_debugfs_symlink_def {
>
> static struct dentry *lnet_debugfs_root;
>
> +/** Global CPU partition table */
> +struct cfs_cpt_table *cfs_cpt_tab __read_mostly;
> +EXPORT_SYMBOL(cfs_cpt_tab);
> +
> BLOCKING_NOTIFIER_HEAD(libcfs_ioctl_list);
> EXPORT_SYMBOL(libcfs_ioctl_list);
>
> --
> 1.8.3.1


Attachments:
signature.asc (847.00 B)

2018-05-30 10:06:48

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH v2 01/25] staging: lustre: libcfs: restore UMP handling

On Tue, May 29, 2018 at 10:21:41AM -0400, James Simmons wrote:
> @@ -208,20 +222,52 @@ void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
> void cfs_cpu_fini(void);
>
> #else /* !CONFIG_SMP */
> -struct cfs_cpt_table;
> -#define cfs_cpt_tab ((struct cfs_cpt_table *)NULL)
>
> -static inline cpumask_var_t *
> -cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
> +static inline void cfs_cpt_table_free(struct cfs_cpt_table *cptab)
> {
> - return NULL;
> + kfree(cptab);
> }

This should free cptab->ctb_cpumask?

regards,
dan carpenter


2018-06-01 08:29:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 00/25] staging: lustre: libcfs: SMP rework

On Tue, May 29, 2018 at 10:21:40AM -0400, James Simmons wrote:
> From: James Simmons <[email protected]>
>
> Recently lustre support has been expanded to extreme machines with as
> many as a 1000+ cores. On the other end lustre also has been ported
> to platforms like ARM and KNL which have uniquie NUMA and core setup.
> For example some devices exist that have NUMA nodes with no cores.
> With these new platforms the limitations of the Lustre's SMP code
> came to light so a lot of work was needed. This resulted in this
> patch set which has been tested on these platforms.

That's great work, but why is this happening instead of effort to get
this out of the staging tree? I see a mix of "clean this up" combined
with "add this new feature" happening here, and I am getting tired of
it.

I keep saying, "no new features until this gets out of staging", so why
isn't anyone working directly on getting this out of staging? I can
only assume the reason why is because I keep being "nice" and accepting
random new feature additions :(

So, no more, I'm really really really tired of dealing with this
filesystem for all of these years. There is still loads to do to get
this cleaned up and moved out (as my simple debugfs cleanup patch series
showed). So please do it.

Again, I am not going to take any more new feature additions or anything
that does not obviously look like an attempt to get this code cleaned up
into mergable shape. Adding things like "now works on systems with
thousands of CPUs as well as an RPi" is not such work, unless you can
directly show me an end result of cleaner, smaller, and more easily
reviewable code.

This patch series is now dropped, if you think it meets the above
criteria, please feel free to resend it to be judged in this manner.

thanks,

greg k-h

2018-06-01 08:32:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 v2 07/25] staging: lustre: libcfs: replace num_possible_cpus() with nr_cpu_ids

On Tue, May 29, 2018 at 10:21:47AM -0400, James Simmons wrote:
> From: Amir Shehata <[email protected]>
>
> Move from num_possible_cpus() to nr_cpu_ids.

Nit, when doing a "v2" of a patch, in a series that is not all "v2", put
the moniker after the patch number. Otherwise it does not sort properly
in anyone's email client and I have to manually edit it in order to get
it to apply in the correct order.

Remember, make things impossible for me to reject. This is trivial for
me to reject on that issue alone, given that I deal with hundreds of
staging patches a week...

thanks,

greg k-h

2018-06-01 08:34:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 v2 07/25] staging: lustre: libcfs: replace num_possible_cpus() with nr_cpu_ids

On Tue, May 29, 2018 at 10:21:47AM -0400, James Simmons wrote:
> From: Amir Shehata <[email protected]>
>
> Move from num_possible_cpus() to nr_cpu_ids.

This says what you did, not _why_ you did it. If these functions are
really identical, why do we have both of them? :)

greg k-h

2018-06-01 08:35:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 18/25] staging: lustre: libcfs: rename goto label in cfs_cpt_table_print

On Tue, May 29, 2018 at 10:21:58AM -0400, James Simmons wrote:
> From: Dmitry Eremin <[email protected]>
>
> Change goto label out to err.

Again, this says what it does, not why.


2018-06-01 08:40:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 08/25] staging: lustre: libcfs: NUMA support

On Tue, May 29, 2018 at 10:21:48AM -0400, James Simmons wrote:
> From: Amir Shehata <[email protected]>
>
> This patch adds NUMA node support.

Really? It looks like you just added an empty data structure pointer
that doesn't really do anything at all.

Where are you reading the host memory NUMA information from?

And why would a filesystem care about this type of thing? Are you
going to now mirror what the scheduler does with regards to NUMA
topology issues? How are you going to handle things when the topology
changes? What systems did you test this on? What performance
improvements were seen? What downsides are there with all of this?

I need a whole lot more information here...

> NUMA node information is stored
> in the CPT table. A NUMA node mask is maintained for the entire
> table as well as for each CPT to track the NUMA nodes related to
> each of the CPTs. Add new function cfs_cpt_of_node() which returns
> the CPT of a particular NUMA node.

This doesn't really seem to match up with the code changes from what I
can tell...

thanks,

greg k-h

2018-06-01 08:40:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 13/25] staging: lustre: libcfs: fix libcfs_cpu coding style

On Tue, May 29, 2018 at 10:21:53AM -0400, James Simmons wrote:
> int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab,
> - int cpt, nodemask_t *mask);
> + int cpt, const nodemask_t *mask);

This is not a coding style change. It is a semi-functional change.
Don't bury code changes in the middle of "cleanup code formatting"
patches, it makes maintainers very grumpy when we notice this as we then
feel like we might have missed other things being snuck by us in the
guise of these types of patches...

thanks,

greg k-h

2018-06-01 08:42:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 11/25] staging: lustre: libcfs: provide debugfs files for distance handling

On Tue, May 29, 2018 at 10:21:51AM -0400, James Simmons wrote:
> From: Amir Shehata <[email protected]>
>
> On systems with large number of NUMA nodes and cores it is easy
> to incorrectly configure their use with Lustre. Provide debugfs
> files which can help track down any issues.
>
> Signed-off-by: Amir Shehata <[email protected]>
> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
> Reviewed-on: http://review.whamcloud.com/18916
> Reviewed-by: Olaf Weber <[email protected]>
> Reviewed-by: Doug Oucharek <[email protected]>
> Signed-off-by: James Simmons <[email protected]>
> ---
> Changelog:
>
> v1) Initial patch
> v2) Rebased patch. No code changes from original patch
>
> drivers/staging/lustre/lnet/libcfs/module.c | 53 +++++++++++++++++++++++++++++
> 1 file changed, 53 insertions(+)
>
> diff --git a/drivers/staging/lustre/lnet/libcfs/module.c b/drivers/staging/lustre/lnet/libcfs/module.c
> index b438d456..d2dfc29 100644
> --- a/drivers/staging/lustre/lnet/libcfs/module.c
> +++ b/drivers/staging/lustre/lnet/libcfs/module.c
> @@ -468,6 +468,53 @@ static int proc_cpt_table(struct ctl_table *table, int write,
> __proc_cpt_table);
> }
>
> +static int __proc_cpt_distance(void *data, int write,
> + loff_t pos, void __user *buffer, int nob)
> +{
> + char *buf = NULL;
> + int len = 4096;
> + int rc = 0;
> +
> + if (write)
> + return -EPERM;
> +
> + LASSERT(cfs_cpt_tab);

What is this assert really checking? Why is it needed?

thanks,

greg k-h

2018-06-01 08:45:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 23/25] staging: lustre: libcfs: rework CPU pattern parsing code

On Tue, May 29, 2018 at 10:22:03AM -0400, James Simmons wrote:
> From: Dmitry Eremin <[email protected]>
>
> Currently the module param string for CPU pattern can be
> modified which is wrong. Rewrite CPU pattern parsing code
> to avoid the passed buffer from being changed. This change
> also enables us to add real errors propogation to the caller
> functions.
>
> Signed-off-by: Dmitry Eremin <[email protected]>
> Signed-off-by: Amir Shehata <[email protected]>
> Signed-off-by: Andreas Dilger <[email protected]>
> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
> Reviewed-on: https://review.whamcloud.com/23306
> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9715
> Reviewed-on: https://review.whamcloud.com/27872
> Reviewed-by: James Simmons <[email protected]>
> Reviewed-by: Andreas Dilger <[email protected]>
> Reviewed-by: Patrick Farrell <[email protected]>
> Reviewed-by: Olaf Weber <[email protected]>
> Reviewed-by: Oleg Drokin <[email protected]>
> Signed-off-by: James Simmons <[email protected]>
> ---
> Changelog:
>
> v1) Initial patch
> v2) Rebased patch. No changes in code from earlier patch
>
> .../lustre/include/linux/libcfs/libcfs_cpu.h | 2 +-
> drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 146 ++++++++++++---------
> 2 files changed, 87 insertions(+), 61 deletions(-)
>
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
> index c0aa0b3..12ed0a9 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
> @@ -393,7 +393,7 @@ static inline int cfs_cpu_init(void)
>
> static inline void cfs_cpu_fini(void)
> {
> - if (cfs_cpt_tab) {
> + if (!IS_ERR_OR_NULL(cfs_cpt_tab)) {
> cfs_cpt_table_free(cfs_cpt_tab);
> cfs_cpt_tab = NULL;
> }
> diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
> index 649f7f9..aed48de 100644
> --- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
> +++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
> @@ -692,11 +692,11 @@ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
> nodemask = cptab->ctb_parts[cpt].cpt_nodemask;
> }
>
> - if (cpumask_any_and(*cpumask, cpu_online_mask) >= nr_cpu_ids) {
> + if (!cpumask_intersects(*cpumask, cpu_online_mask)) {
> CDEBUG(D_INFO,
> "No online CPU found in CPU partition %d, did someone do CPU hotplug on system? You might need to reload Lustre modules to keep system working well.\n",
> cpt);

This is the funniest error message I have seen in a while.

No one should have to reload all kernel modules just because the CPU
topology changed, that's crazy. You have the ability to read all of
this at runtime, and react to changes that happen while the system is
running. You should never need/rely on userspace passing in random
strings to pretend to match up with what the system really has at the
moment, that way lies madness.

All of this should be ripped out and use the proper apis instead. No
special userspace api should be needed at all.

greg k-h

2018-06-13 22:03:10

by James Simmons

[permalink] [raw]
Subject: Re: [PATCH v2 01/25] staging: lustre: libcfs: restore UMP handling


> > With the cleanup of the libcfs SMP handling all UMP handling
> > was removed. In the process now various NULL pointers and
> > empty fields are return in the UMP case which causes lustre
> > to crash hard. Restore the proper UMP handling so Lustre can
> > properly function.
>
> Can't we just get lustre to handle the NULL pointer?
> Is most cases, the pointer is accessed through an accessor function, and
> on !CONFIG_SMP, that can be a static inline that doesn't even look at
> the pointer.

Lots of NULL pointer checks for a structure allocated at libcfs module
start and only cleaned up at libcfs removal is not a clean approach.
So I have thought about it and I have to ask why allocate a global
struct cfs_cpu_table. It could be made static and fill it in which would
avoid the whole NULL pointer issue. Plus for the UMP case why allocate
a new cfs_cpu_table with cfs_cpt_table_alloc() which is exactly like
the default UMP cfs_cpu_table. Instead we could just return the pointer
to the static default cfs_cpt_tab every time. We still have the NULL
ctb_cpumask field to deal with. Does that sound like a better solution
to you? Doug what do you think?

> I really think this is a step backwards. If you can identify specific
> problems caused by the current code, I'm sure we can fix them.
>
> >
> > Signed-off-by: James Simmons <[email protected]>
> > Signed-off-by: Amir Shehata <[email protected]>
> > Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
>
> This bug doesn't seem to mention this patch at all
>
> > Reviewed-on: http://review.whamcloud.com/18916
>
> Nor does this review.

Yeah its mutated so much from what is in the Intel tree.
I do believe it was the last patch to touch this.

2018-06-13 22:28:33

by Doug Oucharek

[permalink] [raw]
Subject: Re: [lustre-devel] [PATCH v2 01/25] staging: lustre: libcfs: restore UMP handling


> On Jun 13, 2018, at 3:02 PM, James Simmons <[email protected]> wrote:
>
>
>>> With the cleanup of the libcfs SMP handling all UMP handling
>>> was removed. In the process now various NULL pointers and
>>> empty fields are return in the UMP case which causes lustre
>>> to crash hard. Restore the proper UMP handling so Lustre can
>>> properly function.
>>
>> Can't we just get lustre to handle the NULL pointer?
>> Is most cases, the pointer is accessed through an accessor function, and
>> on !CONFIG_SMP, that can be a static inline that doesn't even look at
>> the pointer.
>
> Lots of NULL pointer checks for a structure allocated at libcfs module
> start and only cleaned up at libcfs removal is not a clean approach.
> So I have thought about it and I have to ask why allocate a global
> struct cfs_cpu_table. It could be made static and fill it in which would
> avoid the whole NULL pointer issue. Plus for the UMP case why allocate
> a new cfs_cpu_table with cfs_cpt_table_alloc() which is exactly like
> the default UMP cfs_cpu_table. Instead we could just return the pointer
> to the static default cfs_cpt_tab every time. We still have the NULL
> ctb_cpumask field to deal with. Does that sound like a better solution
> to you? Doug what do you think?

I like your suggestion, James. Always having a cfs_cpu_table so we don’t have to worry about a NULL pointer would be good. I think we can track down the code which relies on the cpumask field to make it safe for the NULL case. Is there possibility of having an empty cpumask rather than it being NULL?

>
>> I really think this is a step backwards. If you can identify specific
>> problems caused by the current code, I'm sure we can fix them.
>>
>>>
>>> Signed-off-by: James Simmons <[email protected]>
>>> Signed-off-by: Amir Shehata <[email protected]>
>>> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
>>
>> This bug doesn't seem to mention this patch at all
>>
>>> Reviewed-on: http://review.whamcloud.com/18916
>>
>> Nor does this review.
>
> Yeah its mutated so much from what is in the Intel tree.
> I do believe it was the last patch to touch this.
> _______________________________________________
> lustre-devel mailing list
> [email protected]
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

2018-06-13 22:31:58

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH v2 01/25] staging: lustre: libcfs: restore UMP handling

On Wed, Jun 13 2018, James Simmons wrote:

>> > With the cleanup of the libcfs SMP handling all UMP handling
>> > was removed. In the process now various NULL pointers and
>> > empty fields are return in the UMP case which causes lustre
>> > to crash hard. Restore the proper UMP handling so Lustre can
>> > properly function.
>>
>> Can't we just get lustre to handle the NULL pointer?
>> Is most cases, the pointer is accessed through an accessor function, and
>> on !CONFIG_SMP, that can be a static inline that doesn't even look at
>> the pointer.
>
> Lots of NULL pointer checks for a structure allocated at libcfs module
> start and only cleaned up at libcfs removal is not a clean approach.
> So I have thought about it and I have to ask why allocate a global
> struct cfs_cpu_table. It could be made static and fill it in which would
> avoid the whole NULL pointer issue. Plus for the UMP case why allocate
> a new cfs_cpu_table with cfs_cpt_table_alloc() which is exactly like
> the default UMP cfs_cpu_table. Instead we could just return the pointer
> to the static default cfs_cpt_tab every time. We still have the NULL
> ctb_cpumask field to deal with. Does that sound like a better solution
> to you? Doug what do you think?

I'm not convinced there will be lots of NULL pointer checks - maybe one
or two. Most the accesses to the structure are inside helper
functions which are empty inlines in the UP build.

However I don't object to a static cfs_cpt_tab if that turns out to make
some code simpler. I would want it to be clear up-front which code was
simplified so that an informed decision could be made.

Thanks,
NeilBrown


>
>> I really think this is a step backwards. If you can identify specific
>> problems caused by the current code, I'm sure we can fix them.
>>
>> >
>> > Signed-off-by: James Simmons <[email protected]>
>> > Signed-off-by: Amir Shehata <[email protected]>
>> > Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
>>
>> This bug doesn't seem to mention this patch at all
>>
>> > Reviewed-on: http://review.whamcloud.com/18916
>>
>> Nor does this review.
>
> Yeah its mutated so much from what is in the Intel tree.
> I do believe it was the last patch to touch this.


Attachments:
signature.asc (847.00 B)