2018-04-16 00:47:15

by NeilBrown

[permalink] [raw]
Subject: [PATCH 0/6] staging: lustre: code rearrangement

This series rearranges some code and manages to delete four
files in the process, all from directories called "linux".
This results in
drivers/staging/lustre/include/linux/libcfs/linux
becoming empty, so it disappears too.

These patches depend on
[PATCH] staging: lustre: libcfs: use dynamic minors for /dev/{lnet,obd}
which James posted on 30th March.

Thanks,
NeilBrown

---

NeilBrown (6):
staging: lustre: move stack-check macros to libcfs_debug.h
staging: lustre: remove libcfs/linux/libcfs.h
staging: lustre: remove include/linux/libcfs/linux/linux-cpu.h
staging: lustre: rearrange placement of CPU partition management code.
staging: lustre: move misc-device registration closer to related code.
staging: lustre: move remaining code from linux-module.c to module.c


.../staging/lustre/include/linux/libcfs/libcfs.h | 48 +
.../lustre/include/linux/libcfs/libcfs_cpu.h | 206 +++-
.../lustre/include/linux/libcfs/libcfs_debug.h | 32 +
.../lustre/include/linux/libcfs/linux/libcfs.h | 114 --
.../lustre/include/linux/libcfs/linux/linux-cpu.h | 78 -
drivers/staging/lustre/lnet/libcfs/Makefile | 2
drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 961 +++++++++++++++++-
.../staging/lustre/lnet/libcfs/linux/linux-cpu.c | 1079 --------------------
.../lustre/lnet/libcfs/linux/linux-module.c | 196 ----
drivers/staging/lustre/lnet/libcfs/module.c | 162 +++
10 files changed, 1344 insertions(+), 1534 deletions(-)
delete mode 100644 drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
delete mode 100644 drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
delete mode 100644 drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
delete mode 100644 drivers/staging/lustre/lnet/libcfs/linux/linux-module.c

--
Signature



2018-04-16 00:47:15

by NeilBrown

[permalink] [raw]
Subject: [PATCH 5/6] staging: lustre: move misc-device registration closer to related code.

The ioctl handler for the misc device is in lnet/libcfs/module.c
but is it registered in lnet/libcfs/linux/linux-module.c.

Keeping related code together make maintenance easier, so move the
code.

Signed-off-by: NeilBrown <[email protected]>
---
.../staging/lustre/include/linux/libcfs/libcfs.h | 2 -
.../lustre/lnet/libcfs/linux/linux-module.c | 28 ------------------
drivers/staging/lustre/lnet/libcfs/module.c | 31 +++++++++++++++++++-
3 files changed, 30 insertions(+), 31 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
index aca1f19c4977..19dae42b9a94 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
@@ -140,11 +140,9 @@ int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand);
int libcfs_ioctl_getdata(struct libcfs_ioctl_hdr **hdr_pp,
const struct libcfs_ioctl_hdr __user *uparam);
int libcfs_ioctl_data_adjust(struct libcfs_ioctl_data *data);
-int libcfs_ioctl(unsigned long cmd, void __user *arg);

#define _LIBCFS_H

-extern struct miscdevice libcfs_dev;
/**
* The path of debug log dump upcall script.
*/
diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c
index c8908e816c4c..954b681f9db7 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c
@@ -166,31 +166,3 @@ int libcfs_ioctl_getdata(struct libcfs_ioctl_hdr **hdr_pp,
kvfree(*hdr_pp);
return err;
}
-
-static long
-libcfs_psdev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
-{
- if (!capable(CAP_SYS_ADMIN))
- return -EACCES;
-
- if (_IOC_TYPE(cmd) != IOC_LIBCFS_TYPE ||
- _IOC_NR(cmd) < IOC_LIBCFS_MIN_NR ||
- _IOC_NR(cmd) > IOC_LIBCFS_MAX_NR) {
- CDEBUG(D_IOCTL, "invalid ioctl ( type %d, nr %d, size %d )\n",
- _IOC_TYPE(cmd), _IOC_NR(cmd), _IOC_SIZE(cmd));
- return -EINVAL;
- }
-
- return libcfs_ioctl(cmd, (void __user *)arg);
-}
-
-static const struct file_operations libcfs_fops = {
- .owner = THIS_MODULE,
- .unlocked_ioctl = libcfs_psdev_ioctl,
-};
-
-struct miscdevice libcfs_dev = {
- .minor = MISC_DYNAMIC_MINOR,
- .name = "lnet",
- .fops = &libcfs_fops,
-};
diff --git a/drivers/staging/lustre/lnet/libcfs/module.c b/drivers/staging/lustre/lnet/libcfs/module.c
index f93f3cf58127..aab0eb7b7632 100644
--- a/drivers/staging/lustre/lnet/libcfs/module.c
+++ b/drivers/staging/lustre/lnet/libcfs/module.c
@@ -95,7 +95,7 @@ int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand)
}
EXPORT_SYMBOL(libcfs_deregister_ioctl);

-int libcfs_ioctl(unsigned long cmd, void __user *uparam)
+static int libcfs_ioctl(unsigned long cmd, void __user *uparam)
{
struct libcfs_ioctl_data *data = NULL;
struct libcfs_ioctl_hdr *hdr;
@@ -161,6 +161,35 @@ int libcfs_ioctl(unsigned long cmd, void __user *uparam)
return err;
}

+
+static long
+libcfs_psdev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
+
+ if (_IOC_TYPE(cmd) != IOC_LIBCFS_TYPE ||
+ _IOC_NR(cmd) < IOC_LIBCFS_MIN_NR ||
+ _IOC_NR(cmd) > IOC_LIBCFS_MAX_NR) {
+ CDEBUG(D_IOCTL, "invalid ioctl ( type %d, nr %d, size %d )\n",
+ _IOC_TYPE(cmd), _IOC_NR(cmd), _IOC_SIZE(cmd));
+ return -EINVAL;
+ }
+
+ return libcfs_ioctl(cmd, (void __user *)arg);
+}
+
+static const struct file_operations libcfs_fops = {
+ .owner = THIS_MODULE,
+ .unlocked_ioctl = libcfs_psdev_ioctl,
+};
+
+struct miscdevice libcfs_dev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "lnet",
+ .fops = &libcfs_fops,
+};
+
int lprocfs_call_handler(void *data, int write, loff_t *ppos,
void __user *buffer, size_t *lenp,
int (*handler)(void *data, int write, loff_t pos,



2018-04-16 00:47:15

by NeilBrown

[permalink] [raw]
Subject: [PATCH 2/6] staging: lustre: remove libcfs/linux/libcfs.h

This include file is only included in one place,
and only contains a list of other include directives.
So just move all those to the place where this file
is included, and discard the file.

One include directive uses a local name ("linux-cpu.h"), so
that needs to be given a proper path.

Probably many of these should be remove from here, and moved to
just the files that need them.

Signed-off-by: NeilBrown <[email protected]>
---
.../staging/lustre/include/linux/libcfs/libcfs.h | 43 ++++++++++
.../lustre/include/linux/libcfs/linux/libcfs.h | 83 --------------------
2 files changed, 42 insertions(+), 84 deletions(-)
delete mode 100644 drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
index 62e46aa3c554..e59d107d6482 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
@@ -38,7 +38,48 @@
#include <linux/list.h>

#include <uapi/linux/lnet/libcfs_ioctl.h>
-#include <linux/libcfs/linux/libcfs.h>
+#include <linux/bitops.h>
+#include <linux/compiler.h>
+#include <linux/ctype.h>
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/highmem.h>
+#include <linux/interrupt.h>
+#include <linux/kallsyms.h>
+#include <linux/kernel.h>
+#include <linux/kmod.h>
+#include <linux/kthread.h>
+#include <linux/mm.h>
+#include <linux/mm_inline.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/pagemap.h>
+#include <linux/random.h>
+#include <linux/rbtree.h>
+#include <linux/rwsem.h>
+#include <linux/scatterlist.h>
+#include <linux/sched.h>
+#include <linux/signal.h>
+#include <linux/slab.h>
+#include <linux/smp.h>
+#include <linux/stat.h>
+#include <linux/string.h>
+#include <linux/time.h>
+#include <linux/timer.h>
+#include <linux/types.h>
+#include <linux/unistd.h>
+#include <linux/vmalloc.h>
+#include <net/sock.h>
+#include <linux/atomic.h>
+#include <asm/div64.h>
+#include <linux/timex.h>
+#include <linux/uaccess.h>
+#include <stdarg.h>
+#include <linux/libcfs/linux/linux-cpu.h>
+
#include <linux/libcfs/libcfs_debug.h>
#include <linux/libcfs/libcfs_private.h>
#include <linux/libcfs/libcfs_cpu.h>
diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
deleted file mode 100644
index 83aec9c7698f..000000000000
--- a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
+++ /dev/null
@@ -1,83 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * GPL HEADER START
- *
- * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 only,
- * as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License version 2 for more details (a copy is included
- * in the LICENSE file that accompanied this code).
- *
- * You should have received a copy of the GNU General Public License
- * version 2 along with this program; If not, see
- * http://www.gnu.org/licenses/gpl-2.0.html
- *
- * GPL HEADER END
- */
-/*
- * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
- * Use is subject to license terms.
- *
- * Copyright (c) 2012, Intel Corporation.
- */
-/*
- * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
- */
-
-#ifndef __LIBCFS_LINUX_LIBCFS_H__
-#define __LIBCFS_LINUX_LIBCFS_H__
-
-#ifndef __LIBCFS_LIBCFS_H__
-#error Do not #include this file directly. #include <linux/libcfs/libcfs.h> instead
-#endif
-
-#include <linux/bitops.h>
-#include <linux/compiler.h>
-#include <linux/ctype.h>
-#include <linux/errno.h>
-#include <linux/file.h>
-#include <linux/fs.h>
-#include <linux/highmem.h>
-#include <linux/interrupt.h>
-#include <linux/kallsyms.h>
-#include <linux/kernel.h>
-#include <linux/kmod.h>
-#include <linux/kthread.h>
-#include <linux/mm.h>
-#include <linux/mm_inline.h>
-#include <linux/module.h>
-#include <linux/moduleparam.h>
-#include <linux/mutex.h>
-#include <linux/notifier.h>
-#include <linux/pagemap.h>
-#include <linux/random.h>
-#include <linux/rbtree.h>
-#include <linux/rwsem.h>
-#include <linux/scatterlist.h>
-#include <linux/sched.h>
-#include <linux/signal.h>
-#include <linux/slab.h>
-#include <linux/smp.h>
-#include <linux/stat.h>
-#include <linux/string.h>
-#include <linux/time.h>
-#include <linux/timer.h>
-#include <linux/types.h>
-#include <linux/unistd.h>
-#include <linux/vmalloc.h>
-#include <net/sock.h>
-#include <linux/atomic.h>
-#include <asm/div64.h>
-#include <linux/timex.h>
-#include <linux/uaccess.h>
-#include <stdarg.h>
-#include "linux-cpu.h"
-
-#endif /* _LINUX_LIBCFS_H */



2018-04-16 00:47:15

by NeilBrown

[permalink] [raw]
Subject: [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

CDEBUG_STACK() and CHECK_STACK() are macros to help with
debugging, so move them from
drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
to
drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h

This seems a more fitting location, and is a step towards
removing linux/libcfs.h and simplifying the include file structure.

Signed-off-by: NeilBrown <[email protected]>
---
.../lustre/include/linux/libcfs/libcfs_debug.h | 32 ++++++++++++++++++++
.../lustre/include/linux/libcfs/linux/libcfs.h | 31 -------------------
2 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
index 9290a19429e7..0dc7b91efe7c 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
@@ -62,6 +62,38 @@ int libcfs_debug_str2mask(int *mask, const char *str, int is_subsys);
extern unsigned int libcfs_catastrophe;
extern unsigned int libcfs_panic_on_lbug;

+/* Enable debug-checks on stack size - except on x86_64 */
+#if !defined(__x86_64__)
+# ifdef __ia64__
+# define CDEBUG_STACK() (THREAD_SIZE - \
+ ((unsigned long)__builtin_dwarf_cfa() & \
+ (THREAD_SIZE - 1)))
+# else
+# define CDEBUG_STACK() (THREAD_SIZE - \
+ ((unsigned long)__builtin_frame_address(0) & \
+ (THREAD_SIZE - 1)))
+# endif /* __ia64__ */
+
+#define __CHECK_STACK(msgdata, mask, cdls) \
+do { \
+ if (unlikely(CDEBUG_STACK() > libcfs_stack)) { \
+ LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL); \
+ libcfs_stack = CDEBUG_STACK(); \
+ libcfs_debug_msg(msgdata, \
+ "maximum lustre stack %lu\n", \
+ CDEBUG_STACK()); \
+ (msgdata)->msg_mask = mask; \
+ (msgdata)->msg_cdls = cdls; \
+ dump_stack(); \
+ /*panic("LBUG");*/ \
+ } \
+} while (0)
+#define CFS_CHECK_STACK(msgdata, mask, cdls) __CHECK_STACK(msgdata, mask, cdls)
+#else /* __x86_64__ */
+#define CFS_CHECK_STACK(msgdata, mask, cdls) do {} while (0)
+#define CDEBUG_STACK() (0L)
+#endif /* __x86_64__ */
+
#ifndef DEBUG_SUBSYSTEM
# define DEBUG_SUBSYSTEM S_UNDEFINED
#endif
diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
index 07d3cb2217d1..83aec9c7698f 100644
--- a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
+++ b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
@@ -80,35 +80,4 @@
#include <stdarg.h>
#include "linux-cpu.h"

-#if !defined(__x86_64__)
-# ifdef __ia64__
-# define CDEBUG_STACK() (THREAD_SIZE - \
- ((unsigned long)__builtin_dwarf_cfa() & \
- (THREAD_SIZE - 1)))
-# else
-# define CDEBUG_STACK() (THREAD_SIZE - \
- ((unsigned long)__builtin_frame_address(0) & \
- (THREAD_SIZE - 1)))
-# endif /* __ia64__ */
-
-#define __CHECK_STACK(msgdata, mask, cdls) \
-do { \
- if (unlikely(CDEBUG_STACK() > libcfs_stack)) { \
- LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL); \
- libcfs_stack = CDEBUG_STACK(); \
- libcfs_debug_msg(msgdata, \
- "maximum lustre stack %lu\n", \
- CDEBUG_STACK()); \
- (msgdata)->msg_mask = mask; \
- (msgdata)->msg_cdls = cdls; \
- dump_stack(); \
- /*panic("LBUG");*/ \
- } \
-} while (0)
-#define CFS_CHECK_STACK(msgdata, mask, cdls) __CHECK_STACK(msgdata, mask, cdls)
-#else /* __x86_64__ */
-#define CFS_CHECK_STACK(msgdata, mask, cdls) do {} while (0)
-#define CDEBUG_STACK() (0L)
-#endif /* __x86_64__ */
-
#endif /* _LINUX_LIBCFS_H */



2018-04-16 00:47:15

by NeilBrown

[permalink] [raw]
Subject: [PATCH 3/6] staging: lustre: remove include/linux/libcfs/linux/linux-cpu.h

This include file contains definitions used when CONFIG_SMP
is in effect. Other includes contain corresponding definitions
for when it isn't.
This can be hard to follow, so move the definitions to the one place.

As HAVE_LIBCFS_CPT is defined precisely when CONFIG_SMP, we discard
that macro and just use CONFIG_SMP when needed.
---
.../staging/lustre/include/linux/libcfs/libcfs.h | 1
.../lustre/include/linux/libcfs/libcfs_cpu.h | 33 ++++++++
.../lustre/include/linux/libcfs/linux/linux-cpu.h | 78 --------------------
drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 4 +
4 files changed, 35 insertions(+), 81 deletions(-)
delete mode 100644 drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
index e59d107d6482..aca1f19c4977 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
@@ -78,7 +78,6 @@
#include <linux/timex.h>
#include <linux/uaccess.h>
#include <stdarg.h>
-#include <linux/libcfs/linux/linux-cpu.h>

#include <linux/libcfs/libcfs_debug.h>
#include <linux/libcfs/libcfs_private.h>
diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 61bce77fddd6..829c35e68db8 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -72,10 +72,43 @@
#ifndef __LIBCFS_CPU_H__
#define __LIBCFS_CPU_H__

+#include <linux/cpu.h>
+#include <linux/cpuset.h>
+#include <linux/topology.h>
+
/* any CPU partition */
#define CFS_CPT_ANY (-1)

#ifdef CONFIG_SMP
+/** virtual processing unit */
+struct cfs_cpu_partition {
+ /* CPUs mask for this partition */
+ cpumask_var_t cpt_cpumask;
+ /* nodes mask for this partition */
+ nodemask_t *cpt_nodemask;
+ /* spread rotor for NUMA allocator */
+ unsigned int cpt_spread_rotor;
+};
+
+
+/** descriptor for CPU partitions */
+struct cfs_cpt_table {
+ /* version, reserved for hotplug */
+ unsigned int ctb_version;
+ /* spread rotor for NUMA allocator */
+ unsigned int ctb_spread_rotor;
+ /* # of CPU partitions */
+ unsigned int ctb_nparts;
+ /* partitions tables */
+ struct cfs_cpu_partition *ctb_parts;
+ /* shadow HW CPU to CPU partition ID */
+ int *ctb_cpu2cpt;
+ /* all cpus in this partition table */
+ cpumask_var_t ctb_cpumask;
+ /* all nodes in this partition table */
+ nodemask_t *ctb_nodemask;
+};
+
/**
* return cpumask of CPU partition \a cpt
*/
diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
deleted file mode 100644
index 6035376f2830..000000000000
--- a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
+++ /dev/null
@@ -1,78 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * GPL HEADER START
- *
- * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 only,
- * as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License version 2 for more details (a copy is included
- * in the LICENSE file that accompanied this code).
- *
- * GPL HEADER END
- */
-/*
- * Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
- * Copyright (c) 2012, Intel Corporation.
- */
-/*
- * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
- *
- * libcfs/include/libcfs/linux/linux-cpu.h
- *
- * Basic library routines.
- *
- * Author: [email protected]
- */
-
-#ifndef __LIBCFS_LINUX_CPU_H__
-#define __LIBCFS_LINUX_CPU_H__
-
-#ifndef __LIBCFS_LIBCFS_H__
-#error Do not #include this file directly. #include <linux/libcfs/libcfs.h> instead
-#endif
-
-#include <linux/cpu.h>
-#include <linux/cpuset.h>
-#include <linux/topology.h>
-
-#ifdef CONFIG_SMP
-
-#define HAVE_LIBCFS_CPT
-
-/** virtual processing unit */
-struct cfs_cpu_partition {
- /* CPUs mask for this partition */
- cpumask_var_t cpt_cpumask;
- /* nodes mask for this partition */
- nodemask_t *cpt_nodemask;
- /* spread rotor for NUMA allocator */
- unsigned int cpt_spread_rotor;
-};
-
-/** descriptor for CPU partitions */
-struct cfs_cpt_table {
- /* version, reserved for hotplug */
- unsigned int ctb_version;
- /* spread rotor for NUMA allocator */
- unsigned int ctb_spread_rotor;
- /* # of CPU partitions */
- unsigned int ctb_nparts;
- /* partitions tables */
- struct cfs_cpu_partition *ctb_parts;
- /* shadow HW CPU to CPU partition ID */
- int *ctb_cpu2cpt;
- /* all cpus in this partition table */
- cpumask_var_t ctb_cpumask;
- /* all nodes in this partition table */
- nodemask_t *ctb_nodemask;
-};
-
-#endif /* CONFIG_SMP */
-#endif /* __LIBCFS_LINUX_CPU_H__ */
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 76291a350406..5818f641455f 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -37,7 +37,7 @@
struct cfs_cpt_table *cfs_cpt_table __read_mostly;
EXPORT_SYMBOL(cfs_cpt_table);

-#ifndef HAVE_LIBCFS_CPT
+#ifndef CONFIG_SMP

#define CFS_CPU_VERSION_MAGIC 0xbabecafe

@@ -225,4 +225,4 @@ cfs_cpu_init(void)
return cfs_cpt_table ? 0 : -1;
}

-#endif /* HAVE_LIBCFS_CPT */
+#endif /* CONFIG_SMP */



2018-04-16 00:47:15

by NeilBrown

[permalink] [raw]
Subject: [PATCH 6/6] staging: lustre: move remaining code from linux-module.c to module.c

There is no longer any need to keep this code separate,
and now we can remove linux-module.c

Signed-off-by: NeilBrown <[email protected]>
---
.../staging/lustre/include/linux/libcfs/libcfs.h | 4
drivers/staging/lustre/lnet/libcfs/Makefile | 1
.../lustre/lnet/libcfs/linux/linux-module.c | 168 --------------------
drivers/staging/lustre/lnet/libcfs/module.c | 131 ++++++++++++++++
4 files changed, 131 insertions(+), 173 deletions(-)
delete mode 100644 drivers/staging/lustre/lnet/libcfs/linux/linux-module.c

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
index 19dae42b9a94..3ce0cccc0c61 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
@@ -137,10 +137,6 @@ struct libcfs_ioctl_handler {
int libcfs_register_ioctl(struct libcfs_ioctl_handler *hand);
int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand);

-int libcfs_ioctl_getdata(struct libcfs_ioctl_hdr **hdr_pp,
- const struct libcfs_ioctl_hdr __user *uparam);
-int libcfs_ioctl_data_adjust(struct libcfs_ioctl_data *data);
-
#define _LIBCFS_H

/**
diff --git a/drivers/staging/lustre/lnet/libcfs/Makefile b/drivers/staging/lustre/lnet/libcfs/Makefile
index 673fe348c445..f3781c511a04 100644
--- a/drivers/staging/lustre/lnet/libcfs/Makefile
+++ b/drivers/staging/lustre/lnet/libcfs/Makefile
@@ -5,7 +5,6 @@ subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
obj-$(CONFIG_LNET) += libcfs.o

libcfs-linux-objs := linux-tracefile.o linux-debug.o
-libcfs-linux-objs += linux-module.o
libcfs-linux-objs += linux-crypto.o
libcfs-linux-objs += linux-crypto-adler.o

diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c
deleted file mode 100644
index 954b681f9db7..000000000000
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c
+++ /dev/null
@@ -1,168 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * GPL HEADER START
- *
- * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 only,
- * as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License version 2 for more details (a copy is included
- * in the LICENSE file that accompanied this code).
- *
- * You should have received a copy of the GNU General Public License
- * version 2 along with this program; If not, see
- * http://www.gnu.org/licenses/gpl-2.0.html
- *
- * GPL HEADER END
- */
-/*
- * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
- * Use is subject to license terms.
- *
- * Copyright (c) 2012, Intel Corporation.
- */
-/*
- * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
- */
-
-#define DEBUG_SUBSYSTEM S_LNET
-
-#include <linux/miscdevice.h>
-#include <linux/libcfs/libcfs.h>
-
-static inline size_t libcfs_ioctl_packlen(struct libcfs_ioctl_data *data)
-{
- size_t len = sizeof(*data);
-
- len += cfs_size_round(data->ioc_inllen1);
- len += cfs_size_round(data->ioc_inllen2);
- return len;
-}
-
-static inline bool libcfs_ioctl_is_invalid(struct libcfs_ioctl_data *data)
-{
- if (data->ioc_hdr.ioc_len > BIT(30)) {
- CERROR("LIBCFS ioctl: ioc_len larger than 1<<30\n");
- return true;
- }
- if (data->ioc_inllen1 > BIT(30)) {
- CERROR("LIBCFS ioctl: ioc_inllen1 larger than 1<<30\n");
- return true;
- }
- if (data->ioc_inllen2 > BIT(30)) {
- CERROR("LIBCFS ioctl: ioc_inllen2 larger than 1<<30\n");
- return true;
- }
- if (data->ioc_inlbuf1 && !data->ioc_inllen1) {
- CERROR("LIBCFS ioctl: inlbuf1 pointer but 0 length\n");
- return true;
- }
- if (data->ioc_inlbuf2 && !data->ioc_inllen2) {
- CERROR("LIBCFS ioctl: inlbuf2 pointer but 0 length\n");
- return true;
- }
- if (data->ioc_pbuf1 && !data->ioc_plen1) {
- CERROR("LIBCFS ioctl: pbuf1 pointer but 0 length\n");
- return true;
- }
- if (data->ioc_pbuf2 && !data->ioc_plen2) {
- CERROR("LIBCFS ioctl: pbuf2 pointer but 0 length\n");
- return true;
- }
- if (data->ioc_plen1 && !data->ioc_pbuf1) {
- CERROR("LIBCFS ioctl: plen1 nonzero but no pbuf1 pointer\n");
- return true;
- }
- if (data->ioc_plen2 && !data->ioc_pbuf2) {
- CERROR("LIBCFS ioctl: plen2 nonzero but no pbuf2 pointer\n");
- return true;
- }
- if ((u32)libcfs_ioctl_packlen(data) != data->ioc_hdr.ioc_len) {
- CERROR("LIBCFS ioctl: packlen != ioc_len\n");
- return true;
- }
- if (data->ioc_inllen1 &&
- data->ioc_bulk[data->ioc_inllen1 - 1] != '\0') {
- CERROR("LIBCFS ioctl: inlbuf1 not 0 terminated\n");
- return true;
- }
- if (data->ioc_inllen2 &&
- data->ioc_bulk[cfs_size_round(data->ioc_inllen1) +
- data->ioc_inllen2 - 1] != '\0') {
- CERROR("LIBCFS ioctl: inlbuf2 not 0 terminated\n");
- return true;
- }
- return false;
-}
-
-int libcfs_ioctl_data_adjust(struct libcfs_ioctl_data *data)
-{
- if (libcfs_ioctl_is_invalid(data)) {
- CERROR("libcfs ioctl: parameter not correctly formatted\n");
- return -EINVAL;
- }
-
- if (data->ioc_inllen1)
- data->ioc_inlbuf1 = &data->ioc_bulk[0];
-
- if (data->ioc_inllen2)
- data->ioc_inlbuf2 = &data->ioc_bulk[0] +
- cfs_size_round(data->ioc_inllen1);
-
- return 0;
-}
-
-int libcfs_ioctl_getdata(struct libcfs_ioctl_hdr **hdr_pp,
- const struct libcfs_ioctl_hdr __user *uhdr)
-{
- struct libcfs_ioctl_hdr hdr;
- int err;
-
- if (copy_from_user(&hdr, uhdr, sizeof(hdr)))
- return -EFAULT;
-
- if (hdr.ioc_version != LIBCFS_IOCTL_VERSION &&
- hdr.ioc_version != LIBCFS_IOCTL_VERSION2) {
- CERROR("libcfs ioctl: version mismatch expected %#x, got %#x\n",
- LIBCFS_IOCTL_VERSION, hdr.ioc_version);
- return -EINVAL;
- }
-
- if (hdr.ioc_len < sizeof(hdr)) {
- CERROR("libcfs ioctl: user buffer too small for ioctl\n");
- return -EINVAL;
- }
-
- if (hdr.ioc_len > LIBCFS_IOC_DATA_MAX) {
- CERROR("libcfs ioctl: user buffer is too large %d/%d\n",
- hdr.ioc_len, LIBCFS_IOC_DATA_MAX);
- return -EINVAL;
- }
-
- *hdr_pp = kvmalloc(hdr.ioc_len, GFP_KERNEL);
- if (!*hdr_pp)
- return -ENOMEM;
-
- if (copy_from_user(*hdr_pp, uhdr, hdr.ioc_len)) {
- err = -EFAULT;
- goto free;
- }
-
- if ((*hdr_pp)->ioc_version != hdr.ioc_version ||
- (*hdr_pp)->ioc_len != hdr.ioc_len) {
- err = -EINVAL;
- goto free;
- }
-
- return 0;
-
-free:
- kvfree(*hdr_pp);
- return err;
-}
diff --git a/drivers/staging/lustre/lnet/libcfs/module.c b/drivers/staging/lustre/lnet/libcfs/module.c
index aab0eb7b7632..d1ba210deb25 100644
--- a/drivers/staging/lustre/lnet/libcfs/module.c
+++ b/drivers/staging/lustre/lnet/libcfs/module.c
@@ -95,6 +95,137 @@ int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand)
}
EXPORT_SYMBOL(libcfs_deregister_ioctl);

+static inline size_t libcfs_ioctl_packlen(struct libcfs_ioctl_data *data)
+{
+ size_t len = sizeof(*data);
+
+ len += cfs_size_round(data->ioc_inllen1);
+ len += cfs_size_round(data->ioc_inllen2);
+ return len;
+}
+
+static inline bool libcfs_ioctl_is_invalid(struct libcfs_ioctl_data *data)
+{
+ if (data->ioc_hdr.ioc_len > BIT(30)) {
+ CERROR("LIBCFS ioctl: ioc_len larger than 1<<30\n");
+ return true;
+ }
+ if (data->ioc_inllen1 > BIT(30)) {
+ CERROR("LIBCFS ioctl: ioc_inllen1 larger than 1<<30\n");
+ return true;
+ }
+ if (data->ioc_inllen2 > BIT(30)) {
+ CERROR("LIBCFS ioctl: ioc_inllen2 larger than 1<<30\n");
+ return true;
+ }
+ if (data->ioc_inlbuf1 && !data->ioc_inllen1) {
+ CERROR("LIBCFS ioctl: inlbuf1 pointer but 0 length\n");
+ return true;
+ }
+ if (data->ioc_inlbuf2 && !data->ioc_inllen2) {
+ CERROR("LIBCFS ioctl: inlbuf2 pointer but 0 length\n");
+ return true;
+ }
+ if (data->ioc_pbuf1 && !data->ioc_plen1) {
+ CERROR("LIBCFS ioctl: pbuf1 pointer but 0 length\n");
+ return true;
+ }
+ if (data->ioc_pbuf2 && !data->ioc_plen2) {
+ CERROR("LIBCFS ioctl: pbuf2 pointer but 0 length\n");
+ return true;
+ }
+ if (data->ioc_plen1 && !data->ioc_pbuf1) {
+ CERROR("LIBCFS ioctl: plen1 nonzero but no pbuf1 pointer\n");
+ return true;
+ }
+ if (data->ioc_plen2 && !data->ioc_pbuf2) {
+ CERROR("LIBCFS ioctl: plen2 nonzero but no pbuf2 pointer\n");
+ return true;
+ }
+ if ((u32)libcfs_ioctl_packlen(data) != data->ioc_hdr.ioc_len) {
+ CERROR("LIBCFS ioctl: packlen != ioc_len\n");
+ return true;
+ }
+ if (data->ioc_inllen1 &&
+ data->ioc_bulk[data->ioc_inllen1 - 1] != '\0') {
+ CERROR("LIBCFS ioctl: inlbuf1 not 0 terminated\n");
+ return true;
+ }
+ if (data->ioc_inllen2 &&
+ data->ioc_bulk[cfs_size_round(data->ioc_inllen1) +
+ data->ioc_inllen2 - 1] != '\0') {
+ CERROR("LIBCFS ioctl: inlbuf2 not 0 terminated\n");
+ return true;
+ }
+ return false;
+}
+
+static int libcfs_ioctl_data_adjust(struct libcfs_ioctl_data *data)
+{
+ if (libcfs_ioctl_is_invalid(data)) {
+ CERROR("libcfs ioctl: parameter not correctly formatted\n");
+ return -EINVAL;
+ }
+
+ if (data->ioc_inllen1)
+ data->ioc_inlbuf1 = &data->ioc_bulk[0];
+
+ if (data->ioc_inllen2)
+ data->ioc_inlbuf2 = &data->ioc_bulk[0] +
+ cfs_size_round(data->ioc_inllen1);
+
+ return 0;
+}
+
+static int libcfs_ioctl_getdata(struct libcfs_ioctl_hdr **hdr_pp,
+ const struct libcfs_ioctl_hdr __user *uhdr)
+{
+ struct libcfs_ioctl_hdr hdr;
+ int err;
+
+ if (copy_from_user(&hdr, uhdr, sizeof(hdr)))
+ return -EFAULT;
+
+ if (hdr.ioc_version != LIBCFS_IOCTL_VERSION &&
+ hdr.ioc_version != LIBCFS_IOCTL_VERSION2) {
+ CERROR("libcfs ioctl: version mismatch expected %#x, got %#x\n",
+ LIBCFS_IOCTL_VERSION, hdr.ioc_version);
+ return -EINVAL;
+ }
+
+ if (hdr.ioc_len < sizeof(hdr)) {
+ CERROR("libcfs ioctl: user buffer too small for ioctl\n");
+ return -EINVAL;
+ }
+
+ if (hdr.ioc_len > LIBCFS_IOC_DATA_MAX) {
+ CERROR("libcfs ioctl: user buffer is too large %d/%d\n",
+ hdr.ioc_len, LIBCFS_IOC_DATA_MAX);
+ return -EINVAL;
+ }
+
+ *hdr_pp = kvmalloc(hdr.ioc_len, GFP_KERNEL);
+ if (!*hdr_pp)
+ return -ENOMEM;
+
+ if (copy_from_user(*hdr_pp, uhdr, hdr.ioc_len)) {
+ err = -EFAULT;
+ goto free;
+ }
+
+ if ((*hdr_pp)->ioc_version != hdr.ioc_version ||
+ (*hdr_pp)->ioc_len != hdr.ioc_len) {
+ err = -EINVAL;
+ goto free;
+ }
+
+ return 0;
+
+free:
+ kvfree(*hdr_pp);
+ return err;
+}
+
static int libcfs_ioctl(unsigned long cmd, void __user *uparam)
{
struct libcfs_ioctl_data *data = NULL;



2018-04-16 00:47:15

by NeilBrown

[permalink] [raw]
Subject: [PATCH 4/6] staging: lustre: rearrange placement of CPU partition management code.

Currently the code for cpu-partition tables lives in various places.
The non-SMP code is partly in libcfs/libcfs_cpu.h as static inlines,
and partly in lnet/libcfs/libcfs_cpu.c - some of the functions are
tiny and could well be inlines.

The SMP code is all in lnet/libcfs/linux/linux-cpu.c.

This patch moves all the trivial non-SMP functions into
libcfs_cpu.h as inlines, and all the SMP functions into libcfs_cpu.c
with the non-trival !SMP code.

Now when you go looking for some function, it is easier to find both
versions together when neither is trivial.

There is no code change here - just code movement.

Signed-off-by: NeilBrown <[email protected]>
---
.../lustre/include/linux/libcfs/libcfs_cpu.h | 173 +++
drivers/staging/lustre/lnet/libcfs/Makefile | 1
drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 959 +++++++++++++++++-
.../staging/lustre/lnet/libcfs/linux/linux-cpu.c | 1079 --------------------
4 files changed, 1076 insertions(+), 1136 deletions(-)
delete mode 100644 drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 829c35e68db8..813ba4564bb9 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -117,41 +117,6 @@ cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt);
* print string information of cpt-table
*/
int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len);
-#else /* !CONFIG_SMP */
-struct cfs_cpt_table {
- /* # of CPU partitions */
- int ctb_nparts;
- /* cpu mask */
- cpumask_t ctb_mask;
- /* node mask */
- nodemask_t ctb_nodemask;
- /* version */
- u64 ctb_version;
-};
-
-static inline cpumask_var_t *
-cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
-{
- return NULL;
-}
-
-static inline int
-cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
-{
- return 0;
-}
-#endif /* CONFIG_SMP */
-
-extern struct cfs_cpt_table *cfs_cpt_table;
-
-/**
- * destroy a CPU partition table
- */
-void cfs_cpt_table_free(struct cfs_cpt_table *cptab);
-/**
- * create a cfs_cpt_table with \a ncpt number of partitions
- */
-struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt);
/**
* return total number of CPU partitions in \a cptab
*/
@@ -237,6 +202,144 @@ int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt);
*/
int cfs_cpu_ht_nsiblings(int cpu);

+#else /* !CONFIG_SMP */
+struct cfs_cpt_table {
+ /* # of CPU partitions */
+ int ctb_nparts;
+ /* cpu mask */
+ cpumask_t ctb_mask;
+ /* node mask */
+ nodemask_t ctb_nodemask;
+ /* version */
+ u64 ctb_version;
+};
+
+static inline cpumask_var_t *
+cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
+{
+ return NULL;
+}
+
+static inline int
+cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
+{
+ return 0;
+}
+static inline int
+cfs_cpt_number(struct cfs_cpt_table *cptab)
+{
+ return 1;
+}
+
+static inline int
+cfs_cpt_weight(struct cfs_cpt_table *cptab, int cpt)
+{
+ return 1;
+}
+
+static inline int
+cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt)
+{
+ return 1;
+}
+
+static inline nodemask_t *
+cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
+{
+ return &cptab->ctb_nodemask;
+}
+
+static inline int
+cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+{
+ return 1;
+}
+
+static inline void
+cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+{
+}
+
+static inline int
+cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
+{
+ return 1;
+}
+
+static inline void
+cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
+{
+}
+
+static inline int
+cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt, int node)
+{
+ return 1;
+}
+
+static inline void
+cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node)
+{
+}
+
+static inline int
+cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
+{
+ return 1;
+}
+
+static inline void
+cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
+{
+}
+
+static inline void
+cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt)
+{
+}
+
+static inline int
+cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
+{
+ return 0;
+}
+
+static inline int
+cfs_cpu_ht_nsiblings(int cpu)
+{
+ return 1;
+}
+
+static inline int
+cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
+{
+ return 0;
+}
+
+static inline int
+cfs_cpt_of_cpu(struct cfs_cpt_table *cptab, int cpu)
+{
+ return 0;
+}
+
+static inline int
+cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
+{
+ return 0;
+}
+#endif /* CONFIG_SMP */
+
+extern struct cfs_cpt_table *cfs_cpt_table;
+
+/**
+ * destroy a CPU partition table
+ */
+void cfs_cpt_table_free(struct cfs_cpt_table *cptab);
+/**
+ * create a cfs_cpt_table with \a ncpt number of partitions
+ */
+struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt);
+
/*
* allocate per-cpu-partition data, returned value is an array of pointers,
* variable can be indexed by CPU ID.
diff --git a/drivers/staging/lustre/lnet/libcfs/Makefile b/drivers/staging/lustre/lnet/libcfs/Makefile
index 36b49a6b7b88..673fe348c445 100644
--- a/drivers/staging/lustre/lnet/libcfs/Makefile
+++ b/drivers/staging/lustre/lnet/libcfs/Makefile
@@ -5,7 +5,6 @@ subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
obj-$(CONFIG_LNET) += libcfs.o

libcfs-linux-objs := linux-tracefile.o linux-debug.o
-libcfs-linux-objs += linux-cpu.o
libcfs-linux-objs += linux-module.o
libcfs-linux-objs += linux-crypto.o
libcfs-linux-objs += linux-crypto-adler.o
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 5818f641455f..ac6fd11ae9d6 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -36,11 +36,110 @@
/** Global CPU partition table */
struct cfs_cpt_table *cfs_cpt_table __read_mostly;
EXPORT_SYMBOL(cfs_cpt_table);
+#define DEBUG_SUBSYSTEM S_LNET
+
+#include <linux/cpu.h>
+#include <linux/sched.h>
+#include <linux/libcfs/libcfs.h>
+
+#ifdef CONFIG_SMP
+/**
+ * modparam for setting number of partitions
+ *
+ * 0 : estimate best value based on cores or NUMA nodes
+ * 1 : disable multiple partitions
+ * >1 : specify number of partitions
+ */
+static int cpu_npartitions;
+module_param(cpu_npartitions, int, 0444);
+MODULE_PARM_DESC(cpu_npartitions, "# of CPU partitions");
+
+/**
+ * modparam for setting CPU partitions patterns:
+ *
+ * i.e: "0[0,1,2,3] 1[4,5,6,7]", number before bracket is CPU partition ID,
+ * number in bracket is processor ID (core or HT)
+ *
+ * i.e: "N 0[0,1] 1[2,3]" the first character 'N' means numbers in bracket
+ * are NUMA node ID, number before bracket is CPU partition ID.
+ *
+ * i.e: "N", shortcut expression to create CPT from NUMA & CPU topology
+ *
+ * NB: If user specified cpu_pattern, cpu_npartitions will be ignored
+ */
+static char *cpu_pattern = "N";
+module_param(cpu_pattern, charp, 0444);
+MODULE_PARM_DESC(cpu_pattern, "CPU partitions pattern");

-#ifndef CONFIG_SMP
+static struct cfs_cpt_data {
+ /* serialize hotplug etc */
+ spinlock_t cpt_lock;
+ /* reserved for hotplug */
+ unsigned long cpt_version;
+ /* mutex to protect cpt_cpumask */
+ struct mutex cpt_mutex;
+ /* scratch buffer for set/unset_node */
+ cpumask_var_t cpt_cpumask;
+} cpt_data;
+#endif

#define CFS_CPU_VERSION_MAGIC 0xbabecafe

+#ifdef CONFIG_SMP
+struct cfs_cpt_table *
+cfs_cpt_table_alloc(unsigned int ncpt)
+{
+ struct cfs_cpt_table *cptab;
+ int i;
+
+ cptab = kzalloc(sizeof(*cptab), GFP_NOFS);
+ if (!cptab)
+ return NULL;
+
+ cptab->ctb_nparts = ncpt;
+
+ cptab->ctb_nodemask = kzalloc(sizeof(*cptab->ctb_nodemask),
+ GFP_NOFS);
+ if (!zalloc_cpumask_var(&cptab->ctb_cpumask, GFP_NOFS) ||
+ !cptab->ctb_nodemask)
+ goto failed;
+
+ cptab->ctb_cpu2cpt = kvmalloc_array(num_possible_cpus(),
+ sizeof(cptab->ctb_cpu2cpt[0]),
+ GFP_KERNEL);
+ if (!cptab->ctb_cpu2cpt)
+ goto failed;
+
+ memset(cptab->ctb_cpu2cpt, -1,
+ num_possible_cpus() * sizeof(cptab->ctb_cpu2cpt[0]));
+
+ cptab->ctb_parts = kvmalloc_array(ncpt, sizeof(cptab->ctb_parts[0]),
+ GFP_KERNEL);
+ if (!cptab->ctb_parts)
+ goto failed;
+
+ for (i = 0; i < ncpt; i++) {
+ struct cfs_cpu_partition *part = &cptab->ctb_parts[i];
+
+ part->cpt_nodemask = kzalloc(sizeof(*part->cpt_nodemask),
+ GFP_NOFS);
+ if (!zalloc_cpumask_var(&part->cpt_cpumask, GFP_NOFS) ||
+ !part->cpt_nodemask)
+ goto failed;
+ }
+
+ spin_lock(&cpt_data.cpt_lock);
+ /* Reserved for hotplug */
+ cptab->ctb_version = cpt_data.cpt_version;
+ spin_unlock(&cpt_data.cpt_lock);
+
+ return cptab;
+
+ failed:
+ cfs_cpt_table_free(cptab);
+ return NULL;
+}
+#else /* ! CONFIG_SMP */
struct cfs_cpt_table *
cfs_cpt_table_alloc(unsigned int ncpt)
{
@@ -60,8 +159,32 @@ cfs_cpt_table_alloc(unsigned int ncpt)

return cptab;
}
+#endif /* CONFIG_SMP */
EXPORT_SYMBOL(cfs_cpt_table_alloc);

+#ifdef CONFIG_SMP
+void
+cfs_cpt_table_free(struct cfs_cpt_table *cptab)
+{
+ int i;
+
+ kvfree(cptab->ctb_cpu2cpt);
+
+ for (i = 0; cptab->ctb_parts && i < cptab->ctb_nparts; i++) {
+ struct cfs_cpu_partition *part = &cptab->ctb_parts[i];
+
+ kfree(part->cpt_nodemask);
+ free_cpumask_var(part->cpt_cpumask);
+ }
+
+ kvfree(cptab->ctb_parts);
+
+ kfree(cptab->ctb_nodemask);
+ free_cpumask_var(cptab->ctb_cpumask);
+
+ kfree(cptab);
+}
+#else /* ! CONFIG_SMP */
void
cfs_cpt_table_free(struct cfs_cpt_table *cptab)
{
@@ -69,55 +192,153 @@ cfs_cpt_table_free(struct cfs_cpt_table *cptab)

kfree(cptab);
}
+#endif /* CONFIG_SMP */
EXPORT_SYMBOL(cfs_cpt_table_free);

#ifdef CONFIG_SMP
int
cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
{
- int rc;
+ char *tmp = buf;
+ int rc = 0;
+ int i;
+ int j;

- rc = snprintf(buf, len, "%d\t: %d\n", 0, 0);
- len -= rc;
- if (len <= 0)
- return -EFBIG;
+ for (i = 0; i < cptab->ctb_nparts; i++) {
+ if (len > 0) {
+ rc = snprintf(tmp, len, "%d\t: ", i);
+ len -= rc;
+ }

- return rc;
+ if (len <= 0) {
+ rc = -EFBIG;
+ goto out;
+ }
+
+ tmp += rc;
+ for_each_cpu(j, cptab->ctb_parts[i].cpt_cpumask) {
+ rc = snprintf(tmp, len, "%d ", j);
+ len -= rc;
+ if (len <= 0) {
+ rc = -EFBIG;
+ goto out;
+ }
+ tmp += rc;
+ }
+
+ *tmp = '\n';
+ tmp++;
+ len--;
+ }
+
+ out:
+ if (rc < 0)
+ return rc;
+
+ return tmp - buf;
}
EXPORT_SYMBOL(cfs_cpt_table_print);
#endif /* CONFIG_SMP */

+#ifdef CONFIG_SMP
+static void
+cfs_node_to_cpumask(int node, cpumask_t *mask)
+{
+ const cpumask_t *tmp = cpumask_of_node(node);
+
+ if (tmp)
+ cpumask_copy(mask, tmp);
+ else
+ cpumask_clear(mask);
+}
+
int
cfs_cpt_number(struct cfs_cpt_table *cptab)
{
- return 1;
+ return cptab->ctb_nparts;
}
EXPORT_SYMBOL(cfs_cpt_number);

int
cfs_cpt_weight(struct cfs_cpt_table *cptab, int cpt)
{
- return 1;
+ LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
+
+ return cpt == CFS_CPT_ANY ?
+ cpumask_weight(cptab->ctb_cpumask) :
+ cpumask_weight(cptab->ctb_parts[cpt].cpt_cpumask);
}
EXPORT_SYMBOL(cfs_cpt_weight);

int
cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt)
{
- return 1;
+ LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
+
+ return cpt == CFS_CPT_ANY ?
+ cpumask_any_and(cptab->ctb_cpumask,
+ cpu_online_mask) < nr_cpu_ids :
+ cpumask_any_and(cptab->ctb_parts[cpt].cpt_cpumask,
+ cpu_online_mask) < nr_cpu_ids;
}
EXPORT_SYMBOL(cfs_cpt_online);

+cpumask_var_t *
+cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
+{
+ LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
+
+ return cpt == CFS_CPT_ANY ?
+ &cptab->ctb_cpumask : &cptab->ctb_parts[cpt].cpt_cpumask;
+}
+EXPORT_SYMBOL(cfs_cpt_cpumask);
+
nodemask_t *
cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
{
- return &cptab->ctb_nodemask;
+ LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
+
+ return cpt == CFS_CPT_ANY ?
+ cptab->ctb_nodemask : cptab->ctb_parts[cpt].cpt_nodemask;
}
EXPORT_SYMBOL(cfs_cpt_nodemask);

int
cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
{
+ int node;
+
+ LASSERT(cpt >= 0 && cpt < cptab->ctb_nparts);
+
+ if (cpu < 0 || cpu >= nr_cpu_ids || !cpu_online(cpu)) {
+ CDEBUG(D_INFO, "CPU %d is invalid or it's offline\n", cpu);
+ return 0;
+ }
+
+ if (cptab->ctb_cpu2cpt[cpu] != -1) {
+ CDEBUG(D_INFO, "CPU %d is already in partition %d\n",
+ cpu, cptab->ctb_cpu2cpt[cpu]);
+ return 0;
+ }
+
+ cptab->ctb_cpu2cpt[cpu] = cpt;
+
+ LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_cpumask));
+ LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask));
+
+ cpumask_set_cpu(cpu, cptab->ctb_cpumask);
+ cpumask_set_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
+
+ node = cpu_to_node(cpu);
+
+ /* first CPU of @node in this CPT table */
+ if (!node_isset(node, *cptab->ctb_nodemask))
+ node_set(node, *cptab->ctb_nodemask);
+
+ /* first CPU of @node in this partition */
+ if (!node_isset(node, *cptab->ctb_parts[cpt].cpt_nodemask))
+ node_set(node, *cptab->ctb_parts[cpt].cpt_nodemask);
+
return 1;
}
EXPORT_SYMBOL(cfs_cpt_set_cpu);
@@ -125,12 +346,80 @@ EXPORT_SYMBOL(cfs_cpt_set_cpu);
void
cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
{
+ int node;
+ int i;
+
+ LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
+
+ if (cpu < 0 || cpu >= nr_cpu_ids) {
+ CDEBUG(D_INFO, "Invalid CPU id %d\n", cpu);
+ return;
+ }
+
+ if (cpt == CFS_CPT_ANY) {
+ /* caller doesn't know the partition ID */
+ cpt = cptab->ctb_cpu2cpt[cpu];
+ if (cpt < 0) { /* not set in this CPT-table */
+ CDEBUG(D_INFO, "Try to unset cpu %d which is not in CPT-table %p\n",
+ cpt, cptab);
+ return;
+ }
+
+ } else if (cpt != cptab->ctb_cpu2cpt[cpu]) {
+ CDEBUG(D_INFO,
+ "CPU %d is not in cpu-partition %d\n", cpu, cpt);
+ return;
+ }
+
+ LASSERT(cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask));
+ LASSERT(cpumask_test_cpu(cpu, cptab->ctb_cpumask));
+
+ cpumask_clear_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
+ cpumask_clear_cpu(cpu, cptab->ctb_cpumask);
+ cptab->ctb_cpu2cpt[cpu] = -1;
+
+ node = cpu_to_node(cpu);
+
+ LASSERT(node_isset(node, *cptab->ctb_parts[cpt].cpt_nodemask));
+ LASSERT(node_isset(node, *cptab->ctb_nodemask));
+
+ for_each_cpu(i, cptab->ctb_parts[cpt].cpt_cpumask) {
+ /* this CPT has other CPU belonging to this node? */
+ if (cpu_to_node(i) == node)
+ break;
+ }
+
+ if (i >= nr_cpu_ids)
+ node_clear(node, *cptab->ctb_parts[cpt].cpt_nodemask);
+
+ for_each_cpu(i, cptab->ctb_cpumask) {
+ /* this CPT-table has other CPU belonging to this node? */
+ if (cpu_to_node(i) == node)
+ break;
+ }
+
+ if (i >= nr_cpu_ids)
+ node_clear(node, *cptab->ctb_nodemask);
}
EXPORT_SYMBOL(cfs_cpt_unset_cpu);

int
cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
{
+ int i;
+
+ if (!cpumask_weight(mask) ||
+ cpumask_any_and(mask, cpu_online_mask) >= nr_cpu_ids) {
+ CDEBUG(D_INFO, "No online CPU is found in the CPU mask for CPU partition %d\n",
+ cpt);
+ return 0;
+ }
+
+ for_each_cpu(i, mask) {
+ if (!cfs_cpt_set_cpu(cptab, cpt, i))
+ return 0;
+ }
+
return 1;
}
EXPORT_SYMBOL(cfs_cpt_set_cpumask);
@@ -138,25 +427,65 @@ EXPORT_SYMBOL(cfs_cpt_set_cpumask);
void
cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
{
+ int i;
+
+ for_each_cpu(i, mask)
+ cfs_cpt_unset_cpu(cptab, cpt, i);
}
EXPORT_SYMBOL(cfs_cpt_unset_cpumask);

int
cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt, int node)
{
- return 1;
+ int rc;
+
+ if (node < 0 || node >= MAX_NUMNODES) {
+ CDEBUG(D_INFO,
+ "Invalid NUMA id %d for CPU partition %d\n", node, cpt);
+ return 0;
+ }
+
+ mutex_lock(&cpt_data.cpt_mutex);
+
+ cfs_node_to_cpumask(node, cpt_data.cpt_cpumask);
+
+ rc = cfs_cpt_set_cpumask(cptab, cpt, cpt_data.cpt_cpumask);
+
+ mutex_unlock(&cpt_data.cpt_mutex);
+
+ return rc;
}
EXPORT_SYMBOL(cfs_cpt_set_node);

void
cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node)
{
+ if (node < 0 || node >= MAX_NUMNODES) {
+ CDEBUG(D_INFO,
+ "Invalid NUMA id %d for CPU partition %d\n", node, cpt);
+ return;
+ }
+
+ mutex_lock(&cpt_data.cpt_mutex);
+
+ cfs_node_to_cpumask(node, cpt_data.cpt_cpumask);
+
+ cfs_cpt_unset_cpumask(cptab, cpt, cpt_data.cpt_cpumask);
+
+ mutex_unlock(&cpt_data.cpt_mutex);
}
EXPORT_SYMBOL(cfs_cpt_unset_node);

int
cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
{
+ int i;
+
+ for_each_node_mask(i, *mask) {
+ if (!cfs_cpt_set_node(cptab, cpt, i))
+ return 0;
+ }
+
return 1;
}
EXPORT_SYMBOL(cfs_cpt_set_nodemask);
@@ -164,50 +493,638 @@ EXPORT_SYMBOL(cfs_cpt_set_nodemask);
void
cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
{
+ int i;
+
+ for_each_node_mask(i, *mask)
+ cfs_cpt_unset_node(cptab, cpt, i);
}
EXPORT_SYMBOL(cfs_cpt_unset_nodemask);

void
cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt)
{
+ int last;
+ int i;
+
+ if (cpt == CFS_CPT_ANY) {
+ last = cptab->ctb_nparts - 1;
+ cpt = 0;
+ } else {
+ last = cpt;
+ }
+
+ for (; cpt <= last; cpt++) {
+ for_each_cpu(i, cptab->ctb_parts[cpt].cpt_cpumask)
+ cfs_cpt_unset_cpu(cptab, cpt, i);
+ }
}
EXPORT_SYMBOL(cfs_cpt_clear);

int
cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
{
+ nodemask_t *mask;
+ int weight;
+ int rotor;
+ int node;
+
+ /* convert CPU partition ID to HW node id */
+
+ if (cpt < 0 || cpt >= cptab->ctb_nparts) {
+ mask = cptab->ctb_nodemask;
+ rotor = cptab->ctb_spread_rotor++;
+ } else {
+ mask = cptab->ctb_parts[cpt].cpt_nodemask;
+ rotor = cptab->ctb_parts[cpt].cpt_spread_rotor++;
+ }
+
+ weight = nodes_weight(*mask);
+ LASSERT(weight > 0);
+
+ rotor %= weight;
+
+ for_each_node_mask(node, *mask) {
+ if (!rotor--)
+ return node;
+ }
+
+ LBUG();
return 0;
}
EXPORT_SYMBOL(cfs_cpt_spread_node);

-int
-cfs_cpu_ht_nsiblings(int cpu)
-{
- return 1;
-}
-EXPORT_SYMBOL(cfs_cpu_ht_nsiblings);
-
int
cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
{
- return 0;
+ int cpu;
+ int cpt;
+
+ preempt_disable();
+ cpu = smp_processor_id();
+ cpt = cptab->ctb_cpu2cpt[cpu];
+
+ if (cpt < 0 && remap) {
+ /* don't return negative value for safety of upper layer,
+ * instead we shadow the unknown cpu to a valid partition ID
+ */
+ cpt = cpu % cptab->ctb_nparts;
+ }
+ preempt_enable();
+ return cpt;
}
EXPORT_SYMBOL(cfs_cpt_current);

int
cfs_cpt_of_cpu(struct cfs_cpt_table *cptab, int cpu)
{
- return 0;
+ LASSERT(cpu >= 0 && cpu < nr_cpu_ids);
+
+ return cptab->ctb_cpu2cpt[cpu];
}
EXPORT_SYMBOL(cfs_cpt_of_cpu);

int
cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
{
+ cpumask_var_t *cpumask;
+ nodemask_t *nodemask;
+ int rc;
+ int i;
+
+ LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
+
+ if (cpt == CFS_CPT_ANY) {
+ cpumask = &cptab->ctb_cpumask;
+ nodemask = cptab->ctb_nodemask;
+ } else {
+ cpumask = &cptab->ctb_parts[cpt].cpt_cpumask;
+ nodemask = cptab->ctb_parts[cpt].cpt_nodemask;
+ }
+
+ if (cpumask_any_and(*cpumask, cpu_online_mask) >= nr_cpu_ids) {
+ CERROR("No online CPU found in CPU partition %d, did someone do CPU hotplug on system? You might need to reload Lustre modules to keep system working well.\n",
+ cpt);
+ return -EINVAL;
+ }
+
+ for_each_online_cpu(i) {
+ if (cpumask_test_cpu(i, *cpumask))
+ continue;
+
+ rc = set_cpus_allowed_ptr(current, *cpumask);
+ set_mems_allowed(*nodemask);
+ if (!rc)
+ schedule(); /* switch to allowed CPU */
+
+ return rc;
+ }
+
+ /* don't need to set affinity because all online CPUs are covered */
return 0;
}
EXPORT_SYMBOL(cfs_cpt_bind);

+#endif
+
+#ifdef CONFIG_SMP
+
+/**
+ * Choose max to \a number CPUs from \a node and set them in \a cpt.
+ * We always prefer to choose CPU in the same core/socket.
+ */
+static int
+cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt,
+ cpumask_t *node, int number)
+{
+ cpumask_var_t socket;
+ cpumask_var_t core;
+ int rc = 0;
+ int cpu;
+
+ LASSERT(number > 0);
+
+ if (number >= cpumask_weight(node)) {
+ while (!cpumask_empty(node)) {
+ cpu = cpumask_first(node);
+
+ rc = cfs_cpt_set_cpu(cptab, cpt, cpu);
+ if (!rc)
+ return -EINVAL;
+ cpumask_clear_cpu(cpu, node);
+ }
+ return 0;
+ }
+
+ /*
+ * Allocate scratch buffers
+ * As we cannot initialize a cpumask_var_t, we need
+ * to alloc both before we can risk trying to free either
+ */
+ if (!zalloc_cpumask_var(&socket, GFP_NOFS))
+ rc = -ENOMEM;
+ if (!zalloc_cpumask_var(&core, GFP_NOFS))
+ rc = -ENOMEM;
+ if (rc)
+ goto out;
+
+ while (!cpumask_empty(node)) {
+ cpu = cpumask_first(node);
+
+ /* get cpumask for cores in the same socket */
+ cpumask_copy(socket, topology_core_cpumask(cpu));
+ cpumask_and(socket, socket, node);
+
+ LASSERT(!cpumask_empty(socket));
+
+ while (!cpumask_empty(socket)) {
+ int i;
+
+ /* get cpumask for hts in the same core */
+ cpumask_copy(core, topology_sibling_cpumask(cpu));
+ cpumask_and(core, core, node);
+
+ LASSERT(!cpumask_empty(core));
+
+ for_each_cpu(i, core) {
+ cpumask_clear_cpu(i, socket);
+ cpumask_clear_cpu(i, node);
+
+ rc = cfs_cpt_set_cpu(cptab, cpt, i);
+ if (!rc) {
+ rc = -EINVAL;
+ goto out;
+ }
+
+ if (!--number)
+ goto out;
+ }
+ cpu = cpumask_first(socket);
+ }
+ }
+
+out:
+ free_cpumask_var(socket);
+ free_cpumask_var(core);
+ return rc;
+}
+
+#define CPT_WEIGHT_MIN 4u
+
+static unsigned int
+cfs_cpt_num_estimate(void)
+{
+ unsigned int nnode = num_online_nodes();
+ unsigned int ncpu = num_online_cpus();
+ unsigned int ncpt;
+
+ if (ncpu <= CPT_WEIGHT_MIN) {
+ ncpt = 1;
+ goto out;
+ }
+
+ /* generate reasonable number of CPU partitions based on total number
+ * of CPUs, Preferred N should be power2 and match this condition:
+ * 2 * (N - 1)^2 < NCPUS <= 2 * N^2
+ */
+ for (ncpt = 2; ncpu > 2 * ncpt * ncpt; ncpt <<= 1)
+ ;
+
+ if (ncpt <= nnode) { /* fat numa system */
+ while (nnode > ncpt)
+ nnode >>= 1;
+
+ } else { /* ncpt > nnode */
+ while ((nnode << 1) <= ncpt)
+ nnode <<= 1;
+ }
+
+ ncpt = nnode;
+
+out:
+#if (BITS_PER_LONG == 32)
+ /* config many CPU partitions on 32-bit system could consume
+ * too much memory
+ */
+ ncpt = min(2U, ncpt);
+#endif
+ while (ncpu % ncpt)
+ ncpt--; /* worst case is 1 */
+
+ return ncpt;
+}
+
+static struct cfs_cpt_table *
+cfs_cpt_table_create(int ncpt)
+{
+ struct cfs_cpt_table *cptab = NULL;
+ cpumask_var_t mask;
+ int cpt = 0;
+ int num;
+ int rc;
+ int i;
+
+ rc = cfs_cpt_num_estimate();
+ if (ncpt <= 0)
+ ncpt = rc;
+
+ if (ncpt > num_online_cpus() || ncpt > 4 * rc) {
+ CWARN("CPU partition number %d is larger than suggested value (%d), your system may have performance issue or run out of memory while under pressure\n",
+ ncpt, rc);
+ }
+
+ if (num_online_cpus() % ncpt) {
+ CERROR("CPU number %d is not multiple of cpu_npartition %d, please try different cpu_npartitions value or set pattern string by cpu_pattern=STRING\n",
+ (int)num_online_cpus(), ncpt);
+ goto failed;
+ }
+
+ cptab = cfs_cpt_table_alloc(ncpt);
+ if (!cptab) {
+ CERROR("Failed to allocate CPU map(%d)\n", ncpt);
+ goto failed;
+ }
+
+ num = num_online_cpus() / ncpt;
+ if (!num) {
+ CERROR("CPU changed while setting CPU partition\n");
+ goto failed;
+ }
+
+ if (!zalloc_cpumask_var(&mask, GFP_NOFS)) {
+ CERROR("Failed to allocate scratch cpumask\n");
+ goto failed;
+ }
+
+ for_each_online_node(i) {
+ cfs_node_to_cpumask(i, mask);
+
+ while (!cpumask_empty(mask)) {
+ struct cfs_cpu_partition *part;
+ int n;
+
+ /*
+ * Each emulated NUMA node has all allowed CPUs in
+ * the mask.
+ * End loop when all partitions have assigned CPUs.
+ */
+ if (cpt == ncpt)
+ break;
+
+ part = &cptab->ctb_parts[cpt];
+
+ n = num - cpumask_weight(part->cpt_cpumask);
+ LASSERT(n > 0);
+
+ rc = cfs_cpt_choose_ncpus(cptab, cpt, mask, n);
+ if (rc < 0)
+ goto failed_mask;
+
+ LASSERT(num >= cpumask_weight(part->cpt_cpumask));
+ if (num == cpumask_weight(part->cpt_cpumask))
+ cpt++;
+ }
+ }
+
+ if (cpt != ncpt ||
+ num != cpumask_weight(cptab->ctb_parts[ncpt - 1].cpt_cpumask)) {
+ CERROR("Expect %d(%d) CPU partitions but got %d(%d), CPU hotplug/unplug while setting?\n",
+ cptab->ctb_nparts, num, cpt,
+ cpumask_weight(cptab->ctb_parts[ncpt - 1].cpt_cpumask));
+ goto failed_mask;
+ }
+
+ free_cpumask_var(mask);
+
+ return cptab;
+
+ failed_mask:
+ free_cpumask_var(mask);
+ failed:
+ CERROR("Failed to setup CPU-partition-table with %d CPU-partitions, online HW nodes: %d, HW cpus: %d.\n",
+ ncpt, num_online_nodes(), num_online_cpus());
+
+ if (cptab)
+ cfs_cpt_table_free(cptab);
+
+ return NULL;
+}
+
+static struct cfs_cpt_table *
+cfs_cpt_table_create_pattern(char *pattern)
+{
+ struct cfs_cpt_table *cptab;
+ char *str;
+ int node = 0;
+ int high;
+ int ncpt = 0;
+ int cpt;
+ int rc;
+ int c;
+ int i;
+
+ str = strim(pattern);
+ if (*str == 'n' || *str == 'N') {
+ pattern = str + 1;
+ if (*pattern != '\0') {
+ node = 1;
+ } else { /* shortcut to create CPT from NUMA & CPU topology */
+ node = -1;
+ ncpt = num_online_nodes();
+ }
+ }
+
+ if (!ncpt) { /* scanning bracket which is mark of partition */
+ for (str = pattern;; str++, ncpt++) {
+ str = strchr(str, '[');
+ if (!str)
+ break;
+ }
+ }
+
+ if (!ncpt ||
+ (node && ncpt > num_online_nodes()) ||
+ (!node && ncpt > num_online_cpus())) {
+ CERROR("Invalid pattern %s, or too many partitions %d\n",
+ pattern, ncpt);
+ return NULL;
+ }
+
+ cptab = cfs_cpt_table_alloc(ncpt);
+ if (!cptab) {
+ CERROR("Failed to allocate cpu partition table\n");
+ return NULL;
+ }
+
+ if (node < 0) { /* shortcut to create CPT from NUMA & CPU topology */
+ cpt = 0;
+
+ for_each_online_node(i) {
+ if (cpt >= ncpt) {
+ CERROR("CPU changed while setting CPU partition table, %d/%d\n",
+ cpt, ncpt);
+ goto failed;
+ }
+
+ rc = cfs_cpt_set_node(cptab, cpt++, i);
+ if (!rc)
+ goto failed;
+ }
+ return cptab;
+ }
+
+ high = node ? MAX_NUMNODES - 1 : nr_cpu_ids - 1;
+
+ for (str = strim(pattern), c = 0;; c++) {
+ struct cfs_range_expr *range;
+ struct cfs_expr_list *el;
+ char *bracket = strchr(str, '[');
+ int n;
+
+ if (!bracket) {
+ if (*str) {
+ CERROR("Invalid pattern %s\n", str);
+ goto failed;
+ }
+ if (c != ncpt) {
+ CERROR("expect %d partitions but found %d\n",
+ ncpt, c);
+ goto failed;
+ }
+ break;
+ }
+
+ if (sscanf(str, "%d%n", &cpt, &n) < 1) {
+ CERROR("Invalid cpu pattern %s\n", str);
+ goto failed;
+ }
+
+ if (cpt < 0 || cpt >= ncpt) {
+ CERROR("Invalid partition id %d, total partitions %d\n",
+ cpt, ncpt);
+ goto failed;
+ }
+
+ if (cfs_cpt_weight(cptab, cpt)) {
+ CERROR("Partition %d has already been set.\n", cpt);
+ goto failed;
+ }
+
+ str = strim(str + n);
+ if (str != bracket) {
+ CERROR("Invalid pattern %s\n", str);
+ goto failed;
+ }
+
+ bracket = strchr(str, ']');
+ if (!bracket) {
+ CERROR("missing right bracket for cpt %d, %s\n",
+ cpt, str);
+ goto failed;
+ }
+
+ if (cfs_expr_list_parse(str, (bracket - str) + 1,
+ 0, high, &el)) {
+ CERROR("Can't parse number range: %s\n", str);
+ goto failed;
+ }
+
+ list_for_each_entry(range, &el->el_exprs, re_link) {
+ for (i = range->re_lo; i <= range->re_hi; i++) {
+ if ((i - range->re_lo) % range->re_stride)
+ continue;
+
+ rc = node ? cfs_cpt_set_node(cptab, cpt, i) :
+ cfs_cpt_set_cpu(cptab, cpt, i);
+ if (!rc) {
+ cfs_expr_list_free(el);
+ goto failed;
+ }
+ }
+ }
+
+ cfs_expr_list_free(el);
+
+ if (!cfs_cpt_online(cptab, cpt)) {
+ CERROR("No online CPU is found on partition %d\n", cpt);
+ goto failed;
+ }
+
+ str = strim(bracket + 1);
+ }
+
+ return cptab;
+
+ failed:
+ cfs_cpt_table_free(cptab);
+ return NULL;
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
+static enum cpuhp_state lustre_cpu_online;
+
+static void cfs_cpu_incr_cpt_version(void)
+{
+ spin_lock(&cpt_data.cpt_lock);
+ cpt_data.cpt_version++;
+ spin_unlock(&cpt_data.cpt_lock);
+}
+
+static int cfs_cpu_online(unsigned int cpu)
+{
+ cfs_cpu_incr_cpt_version();
+ return 0;
+}
+
+static int cfs_cpu_dead(unsigned int cpu)
+{
+ bool warn;
+
+ cfs_cpu_incr_cpt_version();
+
+ mutex_lock(&cpt_data.cpt_mutex);
+ /* if all HTs in a core are offline, it may break affinity */
+ cpumask_copy(cpt_data.cpt_cpumask, topology_sibling_cpumask(cpu));
+ warn = cpumask_any_and(cpt_data.cpt_cpumask,
+ cpu_online_mask) >= nr_cpu_ids;
+ mutex_unlock(&cpt_data.cpt_mutex);
+ CDEBUG(warn ? D_WARNING : D_INFO,
+ "Lustre: can't support CPU plug-out well now, performance and stability could be impacted [CPU %u]\n",
+ cpu);
+ return 0;
+}
+#endif
+
+void
+cfs_cpu_fini(void)
+{
+ if (cfs_cpt_table)
+ cfs_cpt_table_free(cfs_cpt_table);
+
+#ifdef CONFIG_HOTPLUG_CPU
+ if (lustre_cpu_online > 0)
+ cpuhp_remove_state_nocalls(lustre_cpu_online);
+ cpuhp_remove_state_nocalls(CPUHP_LUSTRE_CFS_DEAD);
+#endif
+ free_cpumask_var(cpt_data.cpt_cpumask);
+}
+
+int
+cfs_cpu_init(void)
+{
+ int ret = 0;
+
+ LASSERT(!cfs_cpt_table);
+
+ memset(&cpt_data, 0, sizeof(cpt_data));
+
+ if (!zalloc_cpumask_var(&cpt_data.cpt_cpumask, GFP_NOFS)) {
+ CERROR("Failed to allocate scratch buffer\n");
+ return -1;
+ }
+
+ spin_lock_init(&cpt_data.cpt_lock);
+ mutex_init(&cpt_data.cpt_mutex);
+
+#ifdef CONFIG_HOTPLUG_CPU
+ ret = cpuhp_setup_state_nocalls(CPUHP_LUSTRE_CFS_DEAD,
+ "staging/lustre/cfe:dead", NULL,
+ cfs_cpu_dead);
+ if (ret < 0)
+ goto failed;
+ ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+ "staging/lustre/cfe:online",
+ cfs_cpu_online, NULL);
+ if (ret < 0)
+ goto failed;
+ lustre_cpu_online = ret;
+#endif
+ ret = -EINVAL;
+
+ if (*cpu_pattern) {
+ char *cpu_pattern_dup = kstrdup(cpu_pattern, GFP_KERNEL);
+
+ if (!cpu_pattern_dup) {
+ CERROR("Failed to duplicate cpu_pattern\n");
+ goto failed;
+ }
+
+ cfs_cpt_table = cfs_cpt_table_create_pattern(cpu_pattern_dup);
+ kfree(cpu_pattern_dup);
+ if (!cfs_cpt_table) {
+ CERROR("Failed to create cptab from pattern %s\n",
+ cpu_pattern);
+ goto failed;
+ }
+
+ } else {
+ cfs_cpt_table = cfs_cpt_table_create(cpu_npartitions);
+ if (!cfs_cpt_table) {
+ CERROR("Failed to create ptable with npartitions %d\n",
+ cpu_npartitions);
+ goto failed;
+ }
+ }
+
+ spin_lock(&cpt_data.cpt_lock);
+ if (cfs_cpt_table->ctb_version != cpt_data.cpt_version) {
+ spin_unlock(&cpt_data.cpt_lock);
+ CERROR("CPU hotplug/unplug during setup\n");
+ goto failed;
+ }
+ spin_unlock(&cpt_data.cpt_lock);
+
+ LCONSOLE(0, "HW nodes: %d, HW CPU cores: %d, npartitions: %d\n",
+ num_online_nodes(), num_online_cpus(),
+ cfs_cpt_number(cfs_cpt_table));
+ return 0;
+
+ failed:
+ cfs_cpu_fini();
+ return ret;
+}
+
+#else /* ! CONFIG_SMP */
+
void
cfs_cpu_fini(void)
{
diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
deleted file mode 100644
index 388521e4e354..000000000000
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
+++ /dev/null
@@ -1,1079 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * GPL HEADER START
- *
- * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 only,
- * as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License version 2 for more details (a copy is included
- * in the LICENSE file that accompanied this code).
- *
- * GPL HEADER END
- */
-/*
- * Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
- *
- * Copyright (c) 2012, 2015 Intel Corporation.
- */
-/*
- * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
- *
- * Author: [email protected]
- */
-
-#define DEBUG_SUBSYSTEM S_LNET
-
-#include <linux/cpu.h>
-#include <linux/sched.h>
-#include <linux/libcfs/libcfs.h>
-
-#ifdef CONFIG_SMP
-
-/**
- * modparam for setting number of partitions
- *
- * 0 : estimate best value based on cores or NUMA nodes
- * 1 : disable multiple partitions
- * >1 : specify number of partitions
- */
-static int cpu_npartitions;
-module_param(cpu_npartitions, int, 0444);
-MODULE_PARM_DESC(cpu_npartitions, "# of CPU partitions");
-
-/**
- * modparam for setting CPU partitions patterns:
- *
- * i.e: "0[0,1,2,3] 1[4,5,6,7]", number before bracket is CPU partition ID,
- * number in bracket is processor ID (core or HT)
- *
- * i.e: "N 0[0,1] 1[2,3]" the first character 'N' means numbers in bracket
- * are NUMA node ID, number before bracket is CPU partition ID.
- *
- * i.e: "N", shortcut expression to create CPT from NUMA & CPU topology
- *
- * NB: If user specified cpu_pattern, cpu_npartitions will be ignored
- */
-static char *cpu_pattern = "N";
-module_param(cpu_pattern, charp, 0444);
-MODULE_PARM_DESC(cpu_pattern, "CPU partitions pattern");
-
-struct cfs_cpt_data {
- /* serialize hotplug etc */
- spinlock_t cpt_lock;
- /* reserved for hotplug */
- unsigned long cpt_version;
- /* mutex to protect cpt_cpumask */
- struct mutex cpt_mutex;
- /* scratch buffer for set/unset_node */
- cpumask_var_t cpt_cpumask;
-};
-
-static struct cfs_cpt_data cpt_data;
-
-static void
-cfs_node_to_cpumask(int node, cpumask_t *mask)
-{
- const cpumask_t *tmp = cpumask_of_node(node);
-
- if (tmp)
- cpumask_copy(mask, tmp);
- else
- cpumask_clear(mask);
-}
-
-void
-cfs_cpt_table_free(struct cfs_cpt_table *cptab)
-{
- int i;
-
- kvfree(cptab->ctb_cpu2cpt);
-
- for (i = 0; cptab->ctb_parts && i < cptab->ctb_nparts; i++) {
- struct cfs_cpu_partition *part = &cptab->ctb_parts[i];
-
- kfree(part->cpt_nodemask);
- free_cpumask_var(part->cpt_cpumask);
- }
-
- kvfree(cptab->ctb_parts);
-
- kfree(cptab->ctb_nodemask);
- free_cpumask_var(cptab->ctb_cpumask);
-
- kfree(cptab);
-}
-EXPORT_SYMBOL(cfs_cpt_table_free);
-
-struct cfs_cpt_table *
-cfs_cpt_table_alloc(unsigned int ncpt)
-{
- struct cfs_cpt_table *cptab;
- int i;
-
- cptab = kzalloc(sizeof(*cptab), GFP_NOFS);
- if (!cptab)
- return NULL;
-
- cptab->ctb_nparts = ncpt;
-
- cptab->ctb_nodemask = kzalloc(sizeof(*cptab->ctb_nodemask),
- GFP_NOFS);
- if (!zalloc_cpumask_var(&cptab->ctb_cpumask, GFP_NOFS) ||
- !cptab->ctb_nodemask)
- goto failed;
-
- cptab->ctb_cpu2cpt = kvmalloc_array(num_possible_cpus(),
- sizeof(cptab->ctb_cpu2cpt[0]),
- GFP_KERNEL);
- if (!cptab->ctb_cpu2cpt)
- goto failed;
-
- memset(cptab->ctb_cpu2cpt, -1,
- num_possible_cpus() * sizeof(cptab->ctb_cpu2cpt[0]));
-
- cptab->ctb_parts = kvmalloc_array(ncpt, sizeof(cptab->ctb_parts[0]),
- GFP_KERNEL);
- if (!cptab->ctb_parts)
- goto failed;
-
- for (i = 0; i < ncpt; i++) {
- struct cfs_cpu_partition *part = &cptab->ctb_parts[i];
-
- part->cpt_nodemask = kzalloc(sizeof(*part->cpt_nodemask),
- GFP_NOFS);
- if (!zalloc_cpumask_var(&part->cpt_cpumask, GFP_NOFS) ||
- !part->cpt_nodemask)
- goto failed;
- }
-
- spin_lock(&cpt_data.cpt_lock);
- /* Reserved for hotplug */
- cptab->ctb_version = cpt_data.cpt_version;
- spin_unlock(&cpt_data.cpt_lock);
-
- return cptab;
-
- failed:
- cfs_cpt_table_free(cptab);
- return NULL;
-}
-EXPORT_SYMBOL(cfs_cpt_table_alloc);
-
-int
-cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
-{
- char *tmp = buf;
- int rc = 0;
- int i;
- int j;
-
- for (i = 0; i < cptab->ctb_nparts; i++) {
- if (len > 0) {
- rc = snprintf(tmp, len, "%d\t: ", i);
- len -= rc;
- }
-
- if (len <= 0) {
- rc = -EFBIG;
- goto out;
- }
-
- tmp += rc;
- for_each_cpu(j, cptab->ctb_parts[i].cpt_cpumask) {
- rc = snprintf(tmp, len, "%d ", j);
- len -= rc;
- if (len <= 0) {
- rc = -EFBIG;
- goto out;
- }
- tmp += rc;
- }
-
- *tmp = '\n';
- tmp++;
- len--;
- }
-
- out:
- if (rc < 0)
- return rc;
-
- return tmp - buf;
-}
-EXPORT_SYMBOL(cfs_cpt_table_print);
-
-int
-cfs_cpt_number(struct cfs_cpt_table *cptab)
-{
- return cptab->ctb_nparts;
-}
-EXPORT_SYMBOL(cfs_cpt_number);
-
-int
-cfs_cpt_weight(struct cfs_cpt_table *cptab, int cpt)
-{
- LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
-
- return cpt == CFS_CPT_ANY ?
- cpumask_weight(cptab->ctb_cpumask) :
- cpumask_weight(cptab->ctb_parts[cpt].cpt_cpumask);
-}
-EXPORT_SYMBOL(cfs_cpt_weight);
-
-int
-cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt)
-{
- LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
-
- return cpt == CFS_CPT_ANY ?
- cpumask_any_and(cptab->ctb_cpumask,
- cpu_online_mask) < nr_cpu_ids :
- cpumask_any_and(cptab->ctb_parts[cpt].cpt_cpumask,
- cpu_online_mask) < nr_cpu_ids;
-}
-EXPORT_SYMBOL(cfs_cpt_online);
-
-cpumask_var_t *
-cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
-{
- LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
-
- return cpt == CFS_CPT_ANY ?
- &cptab->ctb_cpumask : &cptab->ctb_parts[cpt].cpt_cpumask;
-}
-EXPORT_SYMBOL(cfs_cpt_cpumask);
-
-nodemask_t *
-cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
-{
- LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
-
- return cpt == CFS_CPT_ANY ?
- cptab->ctb_nodemask : cptab->ctb_parts[cpt].cpt_nodemask;
-}
-EXPORT_SYMBOL(cfs_cpt_nodemask);
-
-int
-cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
-{
- int node;
-
- LASSERT(cpt >= 0 && cpt < cptab->ctb_nparts);
-
- if (cpu < 0 || cpu >= nr_cpu_ids || !cpu_online(cpu)) {
- CDEBUG(D_INFO, "CPU %d is invalid or it's offline\n", cpu);
- return 0;
- }
-
- if (cptab->ctb_cpu2cpt[cpu] != -1) {
- CDEBUG(D_INFO, "CPU %d is already in partition %d\n",
- cpu, cptab->ctb_cpu2cpt[cpu]);
- return 0;
- }
-
- cptab->ctb_cpu2cpt[cpu] = cpt;
-
- LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_cpumask));
- LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask));
-
- cpumask_set_cpu(cpu, cptab->ctb_cpumask);
- cpumask_set_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
-
- node = cpu_to_node(cpu);
-
- /* first CPU of @node in this CPT table */
- if (!node_isset(node, *cptab->ctb_nodemask))
- node_set(node, *cptab->ctb_nodemask);
-
- /* first CPU of @node in this partition */
- if (!node_isset(node, *cptab->ctb_parts[cpt].cpt_nodemask))
- node_set(node, *cptab->ctb_parts[cpt].cpt_nodemask);
-
- return 1;
-}
-EXPORT_SYMBOL(cfs_cpt_set_cpu);
-
-void
-cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
-{
- int node;
- int i;
-
- LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
-
- if (cpu < 0 || cpu >= nr_cpu_ids) {
- CDEBUG(D_INFO, "Invalid CPU id %d\n", cpu);
- return;
- }
-
- if (cpt == CFS_CPT_ANY) {
- /* caller doesn't know the partition ID */
- cpt = cptab->ctb_cpu2cpt[cpu];
- if (cpt < 0) { /* not set in this CPT-table */
- CDEBUG(D_INFO, "Try to unset cpu %d which is not in CPT-table %p\n",
- cpt, cptab);
- return;
- }
-
- } else if (cpt != cptab->ctb_cpu2cpt[cpu]) {
- CDEBUG(D_INFO,
- "CPU %d is not in cpu-partition %d\n", cpu, cpt);
- return;
- }
-
- LASSERT(cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask));
- LASSERT(cpumask_test_cpu(cpu, cptab->ctb_cpumask));
-
- cpumask_clear_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
- cpumask_clear_cpu(cpu, cptab->ctb_cpumask);
- cptab->ctb_cpu2cpt[cpu] = -1;
-
- node = cpu_to_node(cpu);
-
- LASSERT(node_isset(node, *cptab->ctb_parts[cpt].cpt_nodemask));
- LASSERT(node_isset(node, *cptab->ctb_nodemask));
-
- for_each_cpu(i, cptab->ctb_parts[cpt].cpt_cpumask) {
- /* this CPT has other CPU belonging to this node? */
- if (cpu_to_node(i) == node)
- break;
- }
-
- if (i >= nr_cpu_ids)
- node_clear(node, *cptab->ctb_parts[cpt].cpt_nodemask);
-
- for_each_cpu(i, cptab->ctb_cpumask) {
- /* this CPT-table has other CPU belonging to this node? */
- if (cpu_to_node(i) == node)
- break;
- }
-
- if (i >= nr_cpu_ids)
- node_clear(node, *cptab->ctb_nodemask);
-}
-EXPORT_SYMBOL(cfs_cpt_unset_cpu);
-
-int
-cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
-{
- int i;
-
- if (!cpumask_weight(mask) ||
- cpumask_any_and(mask, cpu_online_mask) >= nr_cpu_ids) {
- CDEBUG(D_INFO, "No online CPU is found in the CPU mask for CPU partition %d\n",
- cpt);
- return 0;
- }
-
- for_each_cpu(i, mask) {
- if (!cfs_cpt_set_cpu(cptab, cpt, i))
- return 0;
- }
-
- return 1;
-}
-EXPORT_SYMBOL(cfs_cpt_set_cpumask);
-
-void
-cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
-{
- int i;
-
- for_each_cpu(i, mask)
- cfs_cpt_unset_cpu(cptab, cpt, i);
-}
-EXPORT_SYMBOL(cfs_cpt_unset_cpumask);
-
-int
-cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt, int node)
-{
- int rc;
-
- if (node < 0 || node >= MAX_NUMNODES) {
- CDEBUG(D_INFO,
- "Invalid NUMA id %d for CPU partition %d\n", node, cpt);
- return 0;
- }
-
- mutex_lock(&cpt_data.cpt_mutex);
-
- cfs_node_to_cpumask(node, cpt_data.cpt_cpumask);
-
- rc = cfs_cpt_set_cpumask(cptab, cpt, cpt_data.cpt_cpumask);
-
- mutex_unlock(&cpt_data.cpt_mutex);
-
- return rc;
-}
-EXPORT_SYMBOL(cfs_cpt_set_node);
-
-void
-cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node)
-{
- if (node < 0 || node >= MAX_NUMNODES) {
- CDEBUG(D_INFO,
- "Invalid NUMA id %d for CPU partition %d\n", node, cpt);
- return;
- }
-
- mutex_lock(&cpt_data.cpt_mutex);
-
- cfs_node_to_cpumask(node, cpt_data.cpt_cpumask);
-
- cfs_cpt_unset_cpumask(cptab, cpt, cpt_data.cpt_cpumask);
-
- mutex_unlock(&cpt_data.cpt_mutex);
-}
-EXPORT_SYMBOL(cfs_cpt_unset_node);
-
-int
-cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
-{
- int i;
-
- for_each_node_mask(i, *mask) {
- if (!cfs_cpt_set_node(cptab, cpt, i))
- return 0;
- }
-
- return 1;
-}
-EXPORT_SYMBOL(cfs_cpt_set_nodemask);
-
-void
-cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
-{
- int i;
-
- for_each_node_mask(i, *mask)
- cfs_cpt_unset_node(cptab, cpt, i);
-}
-EXPORT_SYMBOL(cfs_cpt_unset_nodemask);
-
-void
-cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt)
-{
- int last;
- int i;
-
- if (cpt == CFS_CPT_ANY) {
- last = cptab->ctb_nparts - 1;
- cpt = 0;
- } else {
- last = cpt;
- }
-
- for (; cpt <= last; cpt++) {
- for_each_cpu(i, cptab->ctb_parts[cpt].cpt_cpumask)
- cfs_cpt_unset_cpu(cptab, cpt, i);
- }
-}
-EXPORT_SYMBOL(cfs_cpt_clear);
-
-int
-cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
-{
- nodemask_t *mask;
- int weight;
- int rotor;
- int node;
-
- /* convert CPU partition ID to HW node id */
-
- if (cpt < 0 || cpt >= cptab->ctb_nparts) {
- mask = cptab->ctb_nodemask;
- rotor = cptab->ctb_spread_rotor++;
- } else {
- mask = cptab->ctb_parts[cpt].cpt_nodemask;
- rotor = cptab->ctb_parts[cpt].cpt_spread_rotor++;
- }
-
- weight = nodes_weight(*mask);
- LASSERT(weight > 0);
-
- rotor %= weight;
-
- for_each_node_mask(node, *mask) {
- if (!rotor--)
- return node;
- }
-
- LBUG();
- return 0;
-}
-EXPORT_SYMBOL(cfs_cpt_spread_node);
-
-int
-cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
-{
- int cpu;
- int cpt;
-
- preempt_disable();
- cpu = smp_processor_id();
- cpt = cptab->ctb_cpu2cpt[cpu];
-
- if (cpt < 0 && remap) {
- /* don't return negative value for safety of upper layer,
- * instead we shadow the unknown cpu to a valid partition ID
- */
- cpt = cpu % cptab->ctb_nparts;
- }
- preempt_enable();
- return cpt;
-}
-EXPORT_SYMBOL(cfs_cpt_current);
-
-int
-cfs_cpt_of_cpu(struct cfs_cpt_table *cptab, int cpu)
-{
- LASSERT(cpu >= 0 && cpu < nr_cpu_ids);
-
- return cptab->ctb_cpu2cpt[cpu];
-}
-EXPORT_SYMBOL(cfs_cpt_of_cpu);
-
-int
-cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
-{
- cpumask_var_t *cpumask;
- nodemask_t *nodemask;
- int rc;
- int i;
-
- LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
-
- if (cpt == CFS_CPT_ANY) {
- cpumask = &cptab->ctb_cpumask;
- nodemask = cptab->ctb_nodemask;
- } else {
- cpumask = &cptab->ctb_parts[cpt].cpt_cpumask;
- nodemask = cptab->ctb_parts[cpt].cpt_nodemask;
- }
-
- if (cpumask_any_and(*cpumask, cpu_online_mask) >= nr_cpu_ids) {
- CERROR("No online CPU found in CPU partition %d, did someone do CPU hotplug on system? You might need to reload Lustre modules to keep system working well.\n",
- cpt);
- return -EINVAL;
- }
-
- for_each_online_cpu(i) {
- if (cpumask_test_cpu(i, *cpumask))
- continue;
-
- rc = set_cpus_allowed_ptr(current, *cpumask);
- set_mems_allowed(*nodemask);
- if (!rc)
- schedule(); /* switch to allowed CPU */
-
- return rc;
- }
-
- /* don't need to set affinity because all online CPUs are covered */
- return 0;
-}
-EXPORT_SYMBOL(cfs_cpt_bind);
-
-/**
- * Choose max to \a number CPUs from \a node and set them in \a cpt.
- * We always prefer to choose CPU in the same core/socket.
- */
-static int
-cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt,
- cpumask_t *node, int number)
-{
- cpumask_var_t socket;
- cpumask_var_t core;
- int rc = 0;
- int cpu;
-
- LASSERT(number > 0);
-
- if (number >= cpumask_weight(node)) {
- while (!cpumask_empty(node)) {
- cpu = cpumask_first(node);
-
- rc = cfs_cpt_set_cpu(cptab, cpt, cpu);
- if (!rc)
- return -EINVAL;
- cpumask_clear_cpu(cpu, node);
- }
- return 0;
- }
-
- /*
- * Allocate scratch buffers
- * As we cannot initialize a cpumask_var_t, we need
- * to alloc both before we can risk trying to free either
- */
- if (!zalloc_cpumask_var(&socket, GFP_NOFS))
- rc = -ENOMEM;
- if (!zalloc_cpumask_var(&core, GFP_NOFS))
- rc = -ENOMEM;
- if (rc)
- goto out;
-
- while (!cpumask_empty(node)) {
- cpu = cpumask_first(node);
-
- /* get cpumask for cores in the same socket */
- cpumask_copy(socket, topology_core_cpumask(cpu));
- cpumask_and(socket, socket, node);
-
- LASSERT(!cpumask_empty(socket));
-
- while (!cpumask_empty(socket)) {
- int i;
-
- /* get cpumask for hts in the same core */
- cpumask_copy(core, topology_sibling_cpumask(cpu));
- cpumask_and(core, core, node);
-
- LASSERT(!cpumask_empty(core));
-
- for_each_cpu(i, core) {
- cpumask_clear_cpu(i, socket);
- cpumask_clear_cpu(i, node);
-
- rc = cfs_cpt_set_cpu(cptab, cpt, i);
- if (!rc) {
- rc = -EINVAL;
- goto out;
- }
-
- if (!--number)
- goto out;
- }
- cpu = cpumask_first(socket);
- }
- }
-
-out:
- free_cpumask_var(socket);
- free_cpumask_var(core);
- return rc;
-}
-
-#define CPT_WEIGHT_MIN 4u
-
-static unsigned int
-cfs_cpt_num_estimate(void)
-{
- unsigned int nnode = num_online_nodes();
- unsigned int ncpu = num_online_cpus();
- unsigned int ncpt;
-
- if (ncpu <= CPT_WEIGHT_MIN) {
- ncpt = 1;
- goto out;
- }
-
- /* generate reasonable number of CPU partitions based on total number
- * of CPUs, Preferred N should be power2 and match this condition:
- * 2 * (N - 1)^2 < NCPUS <= 2 * N^2
- */
- for (ncpt = 2; ncpu > 2 * ncpt * ncpt; ncpt <<= 1)
- ;
-
- if (ncpt <= nnode) { /* fat numa system */
- while (nnode > ncpt)
- nnode >>= 1;
-
- } else { /* ncpt > nnode */
- while ((nnode << 1) <= ncpt)
- nnode <<= 1;
- }
-
- ncpt = nnode;
-
-out:
-#if (BITS_PER_LONG == 32)
- /* config many CPU partitions on 32-bit system could consume
- * too much memory
- */
- ncpt = min(2U, ncpt);
-#endif
- while (ncpu % ncpt)
- ncpt--; /* worst case is 1 */
-
- return ncpt;
-}
-
-static struct cfs_cpt_table *
-cfs_cpt_table_create(int ncpt)
-{
- struct cfs_cpt_table *cptab = NULL;
- cpumask_var_t mask;
- int cpt = 0;
- int num;
- int rc;
- int i;
-
- rc = cfs_cpt_num_estimate();
- if (ncpt <= 0)
- ncpt = rc;
-
- if (ncpt > num_online_cpus() || ncpt > 4 * rc) {
- CWARN("CPU partition number %d is larger than suggested value (%d), your system may have performance issue or run out of memory while under pressure\n",
- ncpt, rc);
- }
-
- if (num_online_cpus() % ncpt) {
- CERROR("CPU number %d is not multiple of cpu_npartition %d, please try different cpu_npartitions value or set pattern string by cpu_pattern=STRING\n",
- (int)num_online_cpus(), ncpt);
- goto failed;
- }
-
- cptab = cfs_cpt_table_alloc(ncpt);
- if (!cptab) {
- CERROR("Failed to allocate CPU map(%d)\n", ncpt);
- goto failed;
- }
-
- num = num_online_cpus() / ncpt;
- if (!num) {
- CERROR("CPU changed while setting CPU partition\n");
- goto failed;
- }
-
- if (!zalloc_cpumask_var(&mask, GFP_NOFS)) {
- CERROR("Failed to allocate scratch cpumask\n");
- goto failed;
- }
-
- for_each_online_node(i) {
- cfs_node_to_cpumask(i, mask);
-
- while (!cpumask_empty(mask)) {
- struct cfs_cpu_partition *part;
- int n;
-
- /*
- * Each emulated NUMA node has all allowed CPUs in
- * the mask.
- * End loop when all partitions have assigned CPUs.
- */
- if (cpt == ncpt)
- break;
-
- part = &cptab->ctb_parts[cpt];
-
- n = num - cpumask_weight(part->cpt_cpumask);
- LASSERT(n > 0);
-
- rc = cfs_cpt_choose_ncpus(cptab, cpt, mask, n);
- if (rc < 0)
- goto failed_mask;
-
- LASSERT(num >= cpumask_weight(part->cpt_cpumask));
- if (num == cpumask_weight(part->cpt_cpumask))
- cpt++;
- }
- }
-
- if (cpt != ncpt ||
- num != cpumask_weight(cptab->ctb_parts[ncpt - 1].cpt_cpumask)) {
- CERROR("Expect %d(%d) CPU partitions but got %d(%d), CPU hotplug/unplug while setting?\n",
- cptab->ctb_nparts, num, cpt,
- cpumask_weight(cptab->ctb_parts[ncpt - 1].cpt_cpumask));
- goto failed_mask;
- }
-
- free_cpumask_var(mask);
-
- return cptab;
-
- failed_mask:
- free_cpumask_var(mask);
- failed:
- CERROR("Failed to setup CPU-partition-table with %d CPU-partitions, online HW nodes: %d, HW cpus: %d.\n",
- ncpt, num_online_nodes(), num_online_cpus());
-
- if (cptab)
- cfs_cpt_table_free(cptab);
-
- return NULL;
-}
-
-static struct cfs_cpt_table *
-cfs_cpt_table_create_pattern(char *pattern)
-{
- struct cfs_cpt_table *cptab;
- char *str;
- int node = 0;
- int high;
- int ncpt = 0;
- int cpt;
- int rc;
- int c;
- int i;
-
- str = strim(pattern);
- if (*str == 'n' || *str == 'N') {
- pattern = str + 1;
- if (*pattern != '\0') {
- node = 1;
- } else { /* shortcut to create CPT from NUMA & CPU topology */
- node = -1;
- ncpt = num_online_nodes();
- }
- }
-
- if (!ncpt) { /* scanning bracket which is mark of partition */
- for (str = pattern;; str++, ncpt++) {
- str = strchr(str, '[');
- if (!str)
- break;
- }
- }
-
- if (!ncpt ||
- (node && ncpt > num_online_nodes()) ||
- (!node && ncpt > num_online_cpus())) {
- CERROR("Invalid pattern %s, or too many partitions %d\n",
- pattern, ncpt);
- return NULL;
- }
-
- cptab = cfs_cpt_table_alloc(ncpt);
- if (!cptab) {
- CERROR("Failed to allocate cpu partition table\n");
- return NULL;
- }
-
- if (node < 0) { /* shortcut to create CPT from NUMA & CPU topology */
- cpt = 0;
-
- for_each_online_node(i) {
- if (cpt >= ncpt) {
- CERROR("CPU changed while setting CPU partition table, %d/%d\n",
- cpt, ncpt);
- goto failed;
- }
-
- rc = cfs_cpt_set_node(cptab, cpt++, i);
- if (!rc)
- goto failed;
- }
- return cptab;
- }
-
- high = node ? MAX_NUMNODES - 1 : nr_cpu_ids - 1;
-
- for (str = strim(pattern), c = 0;; c++) {
- struct cfs_range_expr *range;
- struct cfs_expr_list *el;
- char *bracket = strchr(str, '[');
- int n;
-
- if (!bracket) {
- if (*str) {
- CERROR("Invalid pattern %s\n", str);
- goto failed;
- }
- if (c != ncpt) {
- CERROR("expect %d partitions but found %d\n",
- ncpt, c);
- goto failed;
- }
- break;
- }
-
- if (sscanf(str, "%d%n", &cpt, &n) < 1) {
- CERROR("Invalid cpu pattern %s\n", str);
- goto failed;
- }
-
- if (cpt < 0 || cpt >= ncpt) {
- CERROR("Invalid partition id %d, total partitions %d\n",
- cpt, ncpt);
- goto failed;
- }
-
- if (cfs_cpt_weight(cptab, cpt)) {
- CERROR("Partition %d has already been set.\n", cpt);
- goto failed;
- }
-
- str = strim(str + n);
- if (str != bracket) {
- CERROR("Invalid pattern %s\n", str);
- goto failed;
- }
-
- bracket = strchr(str, ']');
- if (!bracket) {
- CERROR("missing right bracket for cpt %d, %s\n",
- cpt, str);
- goto failed;
- }
-
- if (cfs_expr_list_parse(str, (bracket - str) + 1,
- 0, high, &el)) {
- CERROR("Can't parse number range: %s\n", str);
- goto failed;
- }
-
- list_for_each_entry(range, &el->el_exprs, re_link) {
- for (i = range->re_lo; i <= range->re_hi; i++) {
- if ((i - range->re_lo) % range->re_stride)
- continue;
-
- rc = node ? cfs_cpt_set_node(cptab, cpt, i) :
- cfs_cpt_set_cpu(cptab, cpt, i);
- if (!rc) {
- cfs_expr_list_free(el);
- goto failed;
- }
- }
- }
-
- cfs_expr_list_free(el);
-
- if (!cfs_cpt_online(cptab, cpt)) {
- CERROR("No online CPU is found on partition %d\n", cpt);
- goto failed;
- }
-
- str = strim(bracket + 1);
- }
-
- return cptab;
-
- failed:
- cfs_cpt_table_free(cptab);
- return NULL;
-}
-
-#ifdef CONFIG_HOTPLUG_CPU
-static enum cpuhp_state lustre_cpu_online;
-
-static void cfs_cpu_incr_cpt_version(void)
-{
- spin_lock(&cpt_data.cpt_lock);
- cpt_data.cpt_version++;
- spin_unlock(&cpt_data.cpt_lock);
-}
-
-static int cfs_cpu_online(unsigned int cpu)
-{
- cfs_cpu_incr_cpt_version();
- return 0;
-}
-
-static int cfs_cpu_dead(unsigned int cpu)
-{
- bool warn;
-
- cfs_cpu_incr_cpt_version();
-
- mutex_lock(&cpt_data.cpt_mutex);
- /* if all HTs in a core are offline, it may break affinity */
- cpumask_copy(cpt_data.cpt_cpumask, topology_sibling_cpumask(cpu));
- warn = cpumask_any_and(cpt_data.cpt_cpumask,
- cpu_online_mask) >= nr_cpu_ids;
- mutex_unlock(&cpt_data.cpt_mutex);
- CDEBUG(warn ? D_WARNING : D_INFO,
- "Lustre: can't support CPU plug-out well now, performance and stability could be impacted [CPU %u]\n",
- cpu);
- return 0;
-}
-#endif
-
-void
-cfs_cpu_fini(void)
-{
- if (cfs_cpt_table)
- cfs_cpt_table_free(cfs_cpt_table);
-
-#ifdef CONFIG_HOTPLUG_CPU
- if (lustre_cpu_online > 0)
- cpuhp_remove_state_nocalls(lustre_cpu_online);
- cpuhp_remove_state_nocalls(CPUHP_LUSTRE_CFS_DEAD);
-#endif
- free_cpumask_var(cpt_data.cpt_cpumask);
-}
-
-int
-cfs_cpu_init(void)
-{
- int ret = 0;
-
- LASSERT(!cfs_cpt_table);
-
- memset(&cpt_data, 0, sizeof(cpt_data));
-
- if (!zalloc_cpumask_var(&cpt_data.cpt_cpumask, GFP_NOFS)) {
- CERROR("Failed to allocate scratch buffer\n");
- return -1;
- }
-
- spin_lock_init(&cpt_data.cpt_lock);
- mutex_init(&cpt_data.cpt_mutex);
-
-#ifdef CONFIG_HOTPLUG_CPU
- ret = cpuhp_setup_state_nocalls(CPUHP_LUSTRE_CFS_DEAD,
- "staging/lustre/cfe:dead", NULL,
- cfs_cpu_dead);
- if (ret < 0)
- goto failed;
- ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
- "staging/lustre/cfe:online",
- cfs_cpu_online, NULL);
- if (ret < 0)
- goto failed;
- lustre_cpu_online = ret;
-#endif
- ret = -EINVAL;
-
- if (*cpu_pattern) {
- char *cpu_pattern_dup = kstrdup(cpu_pattern, GFP_KERNEL);
-
- if (!cpu_pattern_dup) {
- CERROR("Failed to duplicate cpu_pattern\n");
- goto failed;
- }
-
- cfs_cpt_table = cfs_cpt_table_create_pattern(cpu_pattern_dup);
- kfree(cpu_pattern_dup);
- if (!cfs_cpt_table) {
- CERROR("Failed to create cptab from pattern %s\n",
- cpu_pattern);
- goto failed;
- }
-
- } else {
- cfs_cpt_table = cfs_cpt_table_create(cpu_npartitions);
- if (!cfs_cpt_table) {
- CERROR("Failed to create ptable with npartitions %d\n",
- cpu_npartitions);
- goto failed;
- }
- }
-
- spin_lock(&cpt_data.cpt_lock);
- if (cfs_cpt_table->ctb_version != cpt_data.cpt_version) {
- spin_unlock(&cpt_data.cpt_lock);
- CERROR("CPU hotplug/unplug during setup\n");
- goto failed;
- }
- spin_unlock(&cpt_data.cpt_lock);
-
- LCONSOLE(0, "HW nodes: %d, HW CPU cores: %d, npartitions: %d\n",
- num_online_nodes(), num_online_cpus(),
- cfs_cpt_number(cfs_cpt_table));
- return 0;
-
- failed:
- cfs_cpu_fini();
- return ret;
-}
-
-#endif



2018-04-16 03:37:11

by James Simmons

[permalink] [raw]
Subject: Re: [PATCH 2/6] staging: lustre: remove libcfs/linux/libcfs.h


> This include file is only included in one place,
> and only contains a list of other include directives.
> So just move all those to the place where this file
> is included, and discard the file.
>
> One include directive uses a local name ("linux-cpu.h"), so
> that needs to be given a proper path.
>
> Probably many of these should be remove from here, and moved to
> just the files that need them.

Nak. Dumping all the extra headers from linux/libcfs.h to libcfs.h is
the wrong approach. The one header, libcfs.h, to be the only header
in all lustre files is the wrong approach. I have been looking to
unroll that mess. I have patch that I need to polish you that I can
submit.

> Signed-off-by: NeilBrown <[email protected]>
> ---
> .../staging/lustre/include/linux/libcfs/libcfs.h | 43 ++++++++++
> .../lustre/include/linux/libcfs/linux/libcfs.h | 83 --------------------
> 2 files changed, 42 insertions(+), 84 deletions(-)
> delete mode 100644 drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
>
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> index 62e46aa3c554..e59d107d6482 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> @@ -38,7 +38,48 @@
> #include <linux/list.h>
>
> #include <uapi/linux/lnet/libcfs_ioctl.h>
> -#include <linux/libcfs/linux/libcfs.h>
> +#include <linux/bitops.h>
> +#include <linux/compiler.h>
> +#include <linux/ctype.h>
> +#include <linux/errno.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/highmem.h>
> +#include <linux/interrupt.h>
> +#include <linux/kallsyms.h>
> +#include <linux/kernel.h>
> +#include <linux/kmod.h>
> +#include <linux/kthread.h>
> +#include <linux/mm.h>
> +#include <linux/mm_inline.h>
> +#include <linux/module.h>
> +#include <linux/moduleparam.h>
> +#include <linux/mutex.h>
> +#include <linux/notifier.h>
> +#include <linux/pagemap.h>
> +#include <linux/random.h>
> +#include <linux/rbtree.h>
> +#include <linux/rwsem.h>
> +#include <linux/scatterlist.h>
> +#include <linux/sched.h>
> +#include <linux/signal.h>
> +#include <linux/slab.h>
> +#include <linux/smp.h>
> +#include <linux/stat.h>
> +#include <linux/string.h>
> +#include <linux/time.h>
> +#include <linux/timer.h>
> +#include <linux/types.h>
> +#include <linux/unistd.h>
> +#include <linux/vmalloc.h>
> +#include <net/sock.h>
> +#include <linux/atomic.h>
> +#include <asm/div64.h>
> +#include <linux/timex.h>
> +#include <linux/uaccess.h>
> +#include <stdarg.h>
> +#include <linux/libcfs/linux/linux-cpu.h>
> +
> #include <linux/libcfs/libcfs_debug.h>
> #include <linux/libcfs/libcfs_private.h>
> #include <linux/libcfs/libcfs_cpu.h>
> diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> deleted file mode 100644
> index 83aec9c7698f..000000000000
> --- a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> +++ /dev/null
> @@ -1,83 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -/*
> - * GPL HEADER START
> - *
> - * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 only,
> - * as published by the Free Software Foundation.
> - *
> - * This program is distributed in the hope that it will be useful, but
> - * WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - * General Public License version 2 for more details (a copy is included
> - * in the LICENSE file that accompanied this code).
> - *
> - * You should have received a copy of the GNU General Public License
> - * version 2 along with this program; If not, see
> - * http://www.gnu.org/licenses/gpl-2.0.html
> - *
> - * GPL HEADER END
> - */
> -/*
> - * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
> - * Use is subject to license terms.
> - *
> - * Copyright (c) 2012, Intel Corporation.
> - */
> -/*
> - * This file is part of Lustre, http://www.lustre.org/
> - * Lustre is a trademark of Sun Microsystems, Inc.
> - */
> -
> -#ifndef __LIBCFS_LINUX_LIBCFS_H__
> -#define __LIBCFS_LINUX_LIBCFS_H__
> -
> -#ifndef __LIBCFS_LIBCFS_H__
> -#error Do not #include this file directly. #include <linux/libcfs/libcfs.h> instead
> -#endif
> -
> -#include <linux/bitops.h>
> -#include <linux/compiler.h>
> -#include <linux/ctype.h>
> -#include <linux/errno.h>
> -#include <linux/file.h>
> -#include <linux/fs.h>
> -#include <linux/highmem.h>
> -#include <linux/interrupt.h>
> -#include <linux/kallsyms.h>
> -#include <linux/kernel.h>
> -#include <linux/kmod.h>
> -#include <linux/kthread.h>
> -#include <linux/mm.h>
> -#include <linux/mm_inline.h>
> -#include <linux/module.h>
> -#include <linux/moduleparam.h>
> -#include <linux/mutex.h>
> -#include <linux/notifier.h>
> -#include <linux/pagemap.h>
> -#include <linux/random.h>
> -#include <linux/rbtree.h>
> -#include <linux/rwsem.h>
> -#include <linux/scatterlist.h>
> -#include <linux/sched.h>
> -#include <linux/signal.h>
> -#include <linux/slab.h>
> -#include <linux/smp.h>
> -#include <linux/stat.h>
> -#include <linux/string.h>
> -#include <linux/time.h>
> -#include <linux/timer.h>
> -#include <linux/types.h>
> -#include <linux/unistd.h>
> -#include <linux/vmalloc.h>
> -#include <net/sock.h>
> -#include <linux/atomic.h>
> -#include <asm/div64.h>
> -#include <linux/timex.h>
> -#include <linux/uaccess.h>
> -#include <stdarg.h>
> -#include "linux-cpu.h"
> -
> -#endif /* _LINUX_LIBCFS_H */
>
>
>

2018-04-16 03:52:05

by James Simmons

[permalink] [raw]
Subject: Re: [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h


> CDEBUG_STACK() and CHECK_STACK() are macros to help with
> debugging, so move them from
> drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> to
> drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>
> This seems a more fitting location, and is a step towards
> removing linux/libcfs.h and simplifying the include file structure.

Nak. Currently the lustre client always enables debugging but that
shouldn't be the case. What we do need is the able to turn off the
crazy debugging stuff. In the development branch of lustre it is
done with CDEBUG_ENABLED. We need something like that in Kconfig
much like we have CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK. Since we like
to be able to turn that off this should be moved to just after
LIBCFS_DEBUG_MSG_DATA_DECL. Then from CHECK_STACK down to CWARN()
it can be build out. When CDEBUG_ENABLED is disabled CDEBUG_LIMIT
would be empty.

> Signed-off-by: NeilBrown <[email protected]>
> ---
> .../lustre/include/linux/libcfs/libcfs_debug.h | 32 ++++++++++++++++++++
> .../lustre/include/linux/libcfs/linux/libcfs.h | 31 -------------------
> 2 files changed, 32 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> index 9290a19429e7..0dc7b91efe7c 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> @@ -62,6 +62,38 @@ int libcfs_debug_str2mask(int *mask, const char *str, int is_subsys);
> extern unsigned int libcfs_catastrophe;
> extern unsigned int libcfs_panic_on_lbug;
>
> +/* Enable debug-checks on stack size - except on x86_64 */
> +#if !defined(__x86_64__)
> +# ifdef __ia64__
> +# define CDEBUG_STACK() (THREAD_SIZE - \
> + ((unsigned long)__builtin_dwarf_cfa() & \
> + (THREAD_SIZE - 1)))
> +# else
> +# define CDEBUG_STACK() (THREAD_SIZE - \
> + ((unsigned long)__builtin_frame_address(0) & \
> + (THREAD_SIZE - 1)))
> +# endif /* __ia64__ */
> +
> +#define __CHECK_STACK(msgdata, mask, cdls) \
> +do { \
> + if (unlikely(CDEBUG_STACK() > libcfs_stack)) { \
> + LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL); \
> + libcfs_stack = CDEBUG_STACK(); \
> + libcfs_debug_msg(msgdata, \
> + "maximum lustre stack %lu\n", \
> + CDEBUG_STACK()); \
> + (msgdata)->msg_mask = mask; \
> + (msgdata)->msg_cdls = cdls; \
> + dump_stack(); \
> + /*panic("LBUG");*/ \
> + } \
> +} while (0)
> +#define CFS_CHECK_STACK(msgdata, mask, cdls) __CHECK_STACK(msgdata, mask, cdls)
> +#else /* __x86_64__ */
> +#define CFS_CHECK_STACK(msgdata, mask, cdls) do {} while (0)
> +#define CDEBUG_STACK() (0L)
> +#endif /* __x86_64__ */
> +
> #ifndef DEBUG_SUBSYSTEM
> # define DEBUG_SUBSYSTEM S_UNDEFINED
> #endif
> diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> index 07d3cb2217d1..83aec9c7698f 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> @@ -80,35 +80,4 @@
> #include <stdarg.h>
> #include "linux-cpu.h"
>
> -#if !defined(__x86_64__)
> -# ifdef __ia64__
> -# define CDEBUG_STACK() (THREAD_SIZE - \
> - ((unsigned long)__builtin_dwarf_cfa() & \
> - (THREAD_SIZE - 1)))
> -# else
> -# define CDEBUG_STACK() (THREAD_SIZE - \
> - ((unsigned long)__builtin_frame_address(0) & \
> - (THREAD_SIZE - 1)))
> -# endif /* __ia64__ */
> -
> -#define __CHECK_STACK(msgdata, mask, cdls) \
> -do { \
> - if (unlikely(CDEBUG_STACK() > libcfs_stack)) { \
> - LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL); \
> - libcfs_stack = CDEBUG_STACK(); \
> - libcfs_debug_msg(msgdata, \
> - "maximum lustre stack %lu\n", \
> - CDEBUG_STACK()); \
> - (msgdata)->msg_mask = mask; \
> - (msgdata)->msg_cdls = cdls; \
> - dump_stack(); \
> - /*panic("LBUG");*/ \
> - } \
> -} while (0)
> -#define CFS_CHECK_STACK(msgdata, mask, cdls) __CHECK_STACK(msgdata, mask, cdls)
> -#else /* __x86_64__ */
> -#define CFS_CHECK_STACK(msgdata, mask, cdls) do {} while (0)
> -#define CDEBUG_STACK() (0L)
> -#endif /* __x86_64__ */
> -
> #endif /* _LINUX_LIBCFS_H */
>
>
>

2018-04-16 03:54:23

by James Simmons

[permalink] [raw]
Subject: Re: [PATCH 3/6] staging: lustre: remove include/linux/libcfs/linux/linux-cpu.h


> This include file contains definitions used when CONFIG_SMP
> is in effect. Other includes contain corresponding definitions
> for when it isn't.
> This can be hard to follow, so move the definitions to the one place.
>
> As HAVE_LIBCFS_CPT is defined precisely when CONFIG_SMP, we discard
> that macro and just use CONFIG_SMP when needed.

Nak. The lustre SMP is broken and needed to badly be reworked. I have it
ready and can push it. I was waiting to see if I had to rebase it once
the rc1 stuff but since their is a push to get everything out their I will
push it.

> ---
> .../staging/lustre/include/linux/libcfs/libcfs.h | 1
> .../lustre/include/linux/libcfs/libcfs_cpu.h | 33 ++++++++
> .../lustre/include/linux/libcfs/linux/linux-cpu.h | 78 --------------------
> drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 4 +
> 4 files changed, 35 insertions(+), 81 deletions(-)
> delete mode 100644 drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
>
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> index e59d107d6482..aca1f19c4977 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> @@ -78,7 +78,6 @@
> #include <linux/timex.h>
> #include <linux/uaccess.h>
> #include <stdarg.h>
> -#include <linux/libcfs/linux/linux-cpu.h>
>
> #include <linux/libcfs/libcfs_debug.h>
> #include <linux/libcfs/libcfs_private.h>
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
> index 61bce77fddd6..829c35e68db8 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
> @@ -72,10 +72,43 @@
> #ifndef __LIBCFS_CPU_H__
> #define __LIBCFS_CPU_H__
>
> +#include <linux/cpu.h>
> +#include <linux/cpuset.h>
> +#include <linux/topology.h>
> +
> /* any CPU partition */
> #define CFS_CPT_ANY (-1)
>
> #ifdef CONFIG_SMP
> +/** virtual processing unit */
> +struct cfs_cpu_partition {
> + /* CPUs mask for this partition */
> + cpumask_var_t cpt_cpumask;
> + /* nodes mask for this partition */
> + nodemask_t *cpt_nodemask;
> + /* spread rotor for NUMA allocator */
> + unsigned int cpt_spread_rotor;
> +};
> +
> +
> +/** descriptor for CPU partitions */
> +struct cfs_cpt_table {
> + /* version, reserved for hotplug */
> + unsigned int ctb_version;
> + /* spread rotor for NUMA allocator */
> + unsigned int ctb_spread_rotor;
> + /* # of CPU partitions */
> + unsigned int ctb_nparts;
> + /* partitions tables */
> + struct cfs_cpu_partition *ctb_parts;
> + /* shadow HW CPU to CPU partition ID */
> + int *ctb_cpu2cpt;
> + /* all cpus in this partition table */
> + cpumask_var_t ctb_cpumask;
> + /* all nodes in this partition table */
> + nodemask_t *ctb_nodemask;
> +};
> +
> /**
> * return cpumask of CPU partition \a cpt
> */
> diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
> deleted file mode 100644
> index 6035376f2830..000000000000
> --- a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
> +++ /dev/null
> @@ -1,78 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -/*
> - * GPL HEADER START
> - *
> - * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 only,
> - * as published by the Free Software Foundation.
> - *
> - * This program is distributed in the hope that it will be useful, but
> - * WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - * General Public License version 2 for more details (a copy is included
> - * in the LICENSE file that accompanied this code).
> - *
> - * GPL HEADER END
> - */
> -/*
> - * Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
> - * Copyright (c) 2012, Intel Corporation.
> - */
> -/*
> - * This file is part of Lustre, http://www.lustre.org/
> - * Lustre is a trademark of Sun Microsystems, Inc.
> - *
> - * libcfs/include/libcfs/linux/linux-cpu.h
> - *
> - * Basic library routines.
> - *
> - * Author: [email protected]
> - */
> -
> -#ifndef __LIBCFS_LINUX_CPU_H__
> -#define __LIBCFS_LINUX_CPU_H__
> -
> -#ifndef __LIBCFS_LIBCFS_H__
> -#error Do not #include this file directly. #include <linux/libcfs/libcfs.h> instead
> -#endif
> -
> -#include <linux/cpu.h>
> -#include <linux/cpuset.h>
> -#include <linux/topology.h>
> -
> -#ifdef CONFIG_SMP
> -
> -#define HAVE_LIBCFS_CPT
> -
> -/** virtual processing unit */
> -struct cfs_cpu_partition {
> - /* CPUs mask for this partition */
> - cpumask_var_t cpt_cpumask;
> - /* nodes mask for this partition */
> - nodemask_t *cpt_nodemask;
> - /* spread rotor for NUMA allocator */
> - unsigned int cpt_spread_rotor;
> -};
> -
> -/** descriptor for CPU partitions */
> -struct cfs_cpt_table {
> - /* version, reserved for hotplug */
> - unsigned int ctb_version;
> - /* spread rotor for NUMA allocator */
> - unsigned int ctb_spread_rotor;
> - /* # of CPU partitions */
> - unsigned int ctb_nparts;
> - /* partitions tables */
> - struct cfs_cpu_partition *ctb_parts;
> - /* shadow HW CPU to CPU partition ID */
> - int *ctb_cpu2cpt;
> - /* all cpus in this partition table */
> - cpumask_var_t ctb_cpumask;
> - /* all nodes in this partition table */
> - nodemask_t *ctb_nodemask;
> -};
> -
> -#endif /* CONFIG_SMP */
> -#endif /* __LIBCFS_LINUX_CPU_H__ */
> diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
> index 76291a350406..5818f641455f 100644
> --- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
> +++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
> @@ -37,7 +37,7 @@
> struct cfs_cpt_table *cfs_cpt_table __read_mostly;
> EXPORT_SYMBOL(cfs_cpt_table);
>
> -#ifndef HAVE_LIBCFS_CPT
> +#ifndef CONFIG_SMP
>
> #define CFS_CPU_VERSION_MAGIC 0xbabecafe
>
> @@ -225,4 +225,4 @@ cfs_cpu_init(void)
> return cfs_cpt_table ? 0 : -1;
> }
>
> -#endif /* HAVE_LIBCFS_CPT */
> +#endif /* CONFIG_SMP */
>
>
>

2018-04-16 04:05:44

by James Simmons

[permalink] [raw]
Subject: Re: [PATCH 4/6] staging: lustre: rearrange placement of CPU partition management code.


> Currently the code for cpu-partition tables lives in various places.
> The non-SMP code is partly in libcfs/libcfs_cpu.h as static inlines,
> and partly in lnet/libcfs/libcfs_cpu.c - some of the functions are
> tiny and could well be inlines.
>
> The SMP code is all in lnet/libcfs/linux/linux-cpu.c.
>
> This patch moves all the trivial non-SMP functions into
> libcfs_cpu.h as inlines, and all the SMP functions into libcfs_cpu.c
> with the non-trival !SMP code.
>
> Now when you go looking for some function, it is easier to find both
> versions together when neither is trivial.
>
> There is no code change here - just code movement.
>
> Signed-off-by: NeilBrown <[email protected]>

Nak. SMP will be reworked.

> ---
> .../lustre/include/linux/libcfs/libcfs_cpu.h | 173 +++
> drivers/staging/lustre/lnet/libcfs/Makefile | 1
> drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 959 +++++++++++++++++-
> .../staging/lustre/lnet/libcfs/linux/linux-cpu.c | 1079 --------------------
> 4 files changed, 1076 insertions(+), 1136 deletions(-)
> delete mode 100644 drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
>
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
> index 829c35e68db8..813ba4564bb9 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
> @@ -117,41 +117,6 @@ cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt);
> * print string information of cpt-table
> */
> int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len);
> -#else /* !CONFIG_SMP */
> -struct cfs_cpt_table {
> - /* # of CPU partitions */
> - int ctb_nparts;
> - /* cpu mask */
> - cpumask_t ctb_mask;
> - /* node mask */
> - nodemask_t ctb_nodemask;
> - /* version */
> - u64 ctb_version;
> -};
> -
> -static inline cpumask_var_t *
> -cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
> -{
> - return NULL;
> -}
> -
> -static inline int
> -cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
> -{
> - return 0;
> -}
> -#endif /* CONFIG_SMP */
> -
> -extern struct cfs_cpt_table *cfs_cpt_table;
> -
> -/**
> - * destroy a CPU partition table
> - */
> -void cfs_cpt_table_free(struct cfs_cpt_table *cptab);
> -/**
> - * create a cfs_cpt_table with \a ncpt number of partitions
> - */
> -struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt);
> /**
> * return total number of CPU partitions in \a cptab
> */
> @@ -237,6 +202,144 @@ int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt);
> */
> int cfs_cpu_ht_nsiblings(int cpu);
>
> +#else /* !CONFIG_SMP */
> +struct cfs_cpt_table {
> + /* # of CPU partitions */
> + int ctb_nparts;
> + /* cpu mask */
> + cpumask_t ctb_mask;
> + /* node mask */
> + nodemask_t ctb_nodemask;
> + /* version */
> + u64 ctb_version;
> +};
> +
> +static inline cpumask_var_t *
> +cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
> +{
> + return NULL;
> +}
> +
> +static inline int
> +cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
> +{
> + return 0;
> +}
> +static inline int
> +cfs_cpt_number(struct cfs_cpt_table *cptab)
> +{
> + return 1;
> +}
> +
> +static inline int
> +cfs_cpt_weight(struct cfs_cpt_table *cptab, int cpt)
> +{
> + return 1;
> +}
> +
> +static inline int
> +cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt)
> +{
> + return 1;
> +}
> +
> +static inline nodemask_t *
> +cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
> +{
> + return &cptab->ctb_nodemask;
> +}
> +
> +static inline int
> +cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
> +{
> + return 1;
> +}
> +
> +static inline void
> +cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
> +{
> +}
> +
> +static inline int
> +cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
> +{
> + return 1;
> +}
> +
> +static inline void
> +cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
> +{
> +}
> +
> +static inline int
> +cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt, int node)
> +{
> + return 1;
> +}
> +
> +static inline void
> +cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node)
> +{
> +}
> +
> +static inline int
> +cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
> +{
> + return 1;
> +}
> +
> +static inline void
> +cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
> +{
> +}
> +
> +static inline void
> +cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt)
> +{
> +}
> +
> +static inline int
> +cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
> +{
> + return 0;
> +}
> +
> +static inline int
> +cfs_cpu_ht_nsiblings(int cpu)
> +{
> + return 1;
> +}
> +
> +static inline int
> +cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
> +{
> + return 0;
> +}
> +
> +static inline int
> +cfs_cpt_of_cpu(struct cfs_cpt_table *cptab, int cpu)
> +{
> + return 0;
> +}
> +
> +static inline int
> +cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
> +{
> + return 0;
> +}
> +#endif /* CONFIG_SMP */
> +
> +extern struct cfs_cpt_table *cfs_cpt_table;
> +
> +/**
> + * destroy a CPU partition table
> + */
> +void cfs_cpt_table_free(struct cfs_cpt_table *cptab);
> +/**
> + * create a cfs_cpt_table with \a ncpt number of partitions
> + */
> +struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt);
> +
> /*
> * allocate per-cpu-partition data, returned value is an array of pointers,
> * variable can be indexed by CPU ID.
> diff --git a/drivers/staging/lustre/lnet/libcfs/Makefile b/drivers/staging/lustre/lnet/libcfs/Makefile
> index 36b49a6b7b88..673fe348c445 100644
> --- a/drivers/staging/lustre/lnet/libcfs/Makefile
> +++ b/drivers/staging/lustre/lnet/libcfs/Makefile
> @@ -5,7 +5,6 @@ subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
> obj-$(CONFIG_LNET) += libcfs.o
>
> libcfs-linux-objs := linux-tracefile.o linux-debug.o
> -libcfs-linux-objs += linux-cpu.o
> libcfs-linux-objs += linux-module.o
> libcfs-linux-objs += linux-crypto.o
> libcfs-linux-objs += linux-crypto-adler.o
> diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
> index 5818f641455f..ac6fd11ae9d6 100644
> --- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
> +++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
> @@ -36,11 +36,110 @@
> /** Global CPU partition table */
> struct cfs_cpt_table *cfs_cpt_table __read_mostly;
> EXPORT_SYMBOL(cfs_cpt_table);
> +#define DEBUG_SUBSYSTEM S_LNET
> +
> +#include <linux/cpu.h>
> +#include <linux/sched.h>
> +#include <linux/libcfs/libcfs.h>
> +
> +#ifdef CONFIG_SMP
> +/**
> + * modparam for setting number of partitions
> + *
> + * 0 : estimate best value based on cores or NUMA nodes
> + * 1 : disable multiple partitions
> + * >1 : specify number of partitions
> + */
> +static int cpu_npartitions;
> +module_param(cpu_npartitions, int, 0444);
> +MODULE_PARM_DESC(cpu_npartitions, "# of CPU partitions");
> +
> +/**
> + * modparam for setting CPU partitions patterns:
> + *
> + * i.e: "0[0,1,2,3] 1[4,5,6,7]", number before bracket is CPU partition ID,
> + * number in bracket is processor ID (core or HT)
> + *
> + * i.e: "N 0[0,1] 1[2,3]" the first character 'N' means numbers in bracket
> + * are NUMA node ID, number before bracket is CPU partition ID.
> + *
> + * i.e: "N", shortcut expression to create CPT from NUMA & CPU topology
> + *
> + * NB: If user specified cpu_pattern, cpu_npartitions will be ignored
> + */
> +static char *cpu_pattern = "N";
> +module_param(cpu_pattern, charp, 0444);
> +MODULE_PARM_DESC(cpu_pattern, "CPU partitions pattern");
>
> -#ifndef CONFIG_SMP
> +static struct cfs_cpt_data {
> + /* serialize hotplug etc */
> + spinlock_t cpt_lock;
> + /* reserved for hotplug */
> + unsigned long cpt_version;
> + /* mutex to protect cpt_cpumask */
> + struct mutex cpt_mutex;
> + /* scratch buffer for set/unset_node */
> + cpumask_var_t cpt_cpumask;
> +} cpt_data;
> +#endif
>
> #define CFS_CPU_VERSION_MAGIC 0xbabecafe
>
> +#ifdef CONFIG_SMP
> +struct cfs_cpt_table *
> +cfs_cpt_table_alloc(unsigned int ncpt)
> +{
> + struct cfs_cpt_table *cptab;
> + int i;
> +
> + cptab = kzalloc(sizeof(*cptab), GFP_NOFS);
> + if (!cptab)
> + return NULL;
> +
> + cptab->ctb_nparts = ncpt;
> +
> + cptab->ctb_nodemask = kzalloc(sizeof(*cptab->ctb_nodemask),
> + GFP_NOFS);
> + if (!zalloc_cpumask_var(&cptab->ctb_cpumask, GFP_NOFS) ||
> + !cptab->ctb_nodemask)
> + goto failed;
> +
> + cptab->ctb_cpu2cpt = kvmalloc_array(num_possible_cpus(),
> + sizeof(cptab->ctb_cpu2cpt[0]),
> + GFP_KERNEL);
> + if (!cptab->ctb_cpu2cpt)
> + goto failed;
> +
> + memset(cptab->ctb_cpu2cpt, -1,
> + num_possible_cpus() * sizeof(cptab->ctb_cpu2cpt[0]));
> +
> + cptab->ctb_parts = kvmalloc_array(ncpt, sizeof(cptab->ctb_parts[0]),
> + GFP_KERNEL);
> + if (!cptab->ctb_parts)
> + goto failed;
> +
> + for (i = 0; i < ncpt; i++) {
> + struct cfs_cpu_partition *part = &cptab->ctb_parts[i];
> +
> + part->cpt_nodemask = kzalloc(sizeof(*part->cpt_nodemask),
> + GFP_NOFS);
> + if (!zalloc_cpumask_var(&part->cpt_cpumask, GFP_NOFS) ||
> + !part->cpt_nodemask)
> + goto failed;
> + }
> +
> + spin_lock(&cpt_data.cpt_lock);
> + /* Reserved for hotplug */
> + cptab->ctb_version = cpt_data.cpt_version;
> + spin_unlock(&cpt_data.cpt_lock);
> +
> + return cptab;
> +
> + failed:
> + cfs_cpt_table_free(cptab);
> + return NULL;
> +}
> +#else /* ! CONFIG_SMP */
> struct cfs_cpt_table *
> cfs_cpt_table_alloc(unsigned int ncpt)
> {
> @@ -60,8 +159,32 @@ cfs_cpt_table_alloc(unsigned int ncpt)
>
> return cptab;
> }
> +#endif /* CONFIG_SMP */
> EXPORT_SYMBOL(cfs_cpt_table_alloc);
>
> +#ifdef CONFIG_SMP
> +void
> +cfs_cpt_table_free(struct cfs_cpt_table *cptab)
> +{
> + int i;
> +
> + kvfree(cptab->ctb_cpu2cpt);
> +
> + for (i = 0; cptab->ctb_parts && i < cptab->ctb_nparts; i++) {
> + struct cfs_cpu_partition *part = &cptab->ctb_parts[i];
> +
> + kfree(part->cpt_nodemask);
> + free_cpumask_var(part->cpt_cpumask);
> + }
> +
> + kvfree(cptab->ctb_parts);
> +
> + kfree(cptab->ctb_nodemask);
> + free_cpumask_var(cptab->ctb_cpumask);
> +
> + kfree(cptab);
> +}
> +#else /* ! CONFIG_SMP */
> void
> cfs_cpt_table_free(struct cfs_cpt_table *cptab)
> {
> @@ -69,55 +192,153 @@ cfs_cpt_table_free(struct cfs_cpt_table *cptab)
>
> kfree(cptab);
> }
> +#endif /* CONFIG_SMP */
> EXPORT_SYMBOL(cfs_cpt_table_free);
>
> #ifdef CONFIG_SMP
> int
> cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
> {
> - int rc;
> + char *tmp = buf;
> + int rc = 0;
> + int i;
> + int j;
>
> - rc = snprintf(buf, len, "%d\t: %d\n", 0, 0);
> - len -= rc;
> - if (len <= 0)
> - return -EFBIG;
> + for (i = 0; i < cptab->ctb_nparts; i++) {
> + if (len > 0) {
> + rc = snprintf(tmp, len, "%d\t: ", i);
> + len -= rc;
> + }
>
> - return rc;
> + if (len <= 0) {
> + rc = -EFBIG;
> + goto out;
> + }
> +
> + tmp += rc;
> + for_each_cpu(j, cptab->ctb_parts[i].cpt_cpumask) {
> + rc = snprintf(tmp, len, "%d ", j);
> + len -= rc;
> + if (len <= 0) {
> + rc = -EFBIG;
> + goto out;
> + }
> + tmp += rc;
> + }
> +
> + *tmp = '\n';
> + tmp++;
> + len--;
> + }
> +
> + out:
> + if (rc < 0)
> + return rc;
> +
> + return tmp - buf;
> }
> EXPORT_SYMBOL(cfs_cpt_table_print);
> #endif /* CONFIG_SMP */
>
> +#ifdef CONFIG_SMP
> +static void
> +cfs_node_to_cpumask(int node, cpumask_t *mask)
> +{
> + const cpumask_t *tmp = cpumask_of_node(node);
> +
> + if (tmp)
> + cpumask_copy(mask, tmp);
> + else
> + cpumask_clear(mask);
> +}
> +
> int
> cfs_cpt_number(struct cfs_cpt_table *cptab)
> {
> - return 1;
> + return cptab->ctb_nparts;
> }
> EXPORT_SYMBOL(cfs_cpt_number);
>
> int
> cfs_cpt_weight(struct cfs_cpt_table *cptab, int cpt)
> {
> - return 1;
> + LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
> +
> + return cpt == CFS_CPT_ANY ?
> + cpumask_weight(cptab->ctb_cpumask) :
> + cpumask_weight(cptab->ctb_parts[cpt].cpt_cpumask);
> }
> EXPORT_SYMBOL(cfs_cpt_weight);
>
> int
> cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt)
> {
> - return 1;
> + LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
> +
> + return cpt == CFS_CPT_ANY ?
> + cpumask_any_and(cptab->ctb_cpumask,
> + cpu_online_mask) < nr_cpu_ids :
> + cpumask_any_and(cptab->ctb_parts[cpt].cpt_cpumask,
> + cpu_online_mask) < nr_cpu_ids;
> }
> EXPORT_SYMBOL(cfs_cpt_online);
>
> +cpumask_var_t *
> +cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
> +{
> + LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
> +
> + return cpt == CFS_CPT_ANY ?
> + &cptab->ctb_cpumask : &cptab->ctb_parts[cpt].cpt_cpumask;
> +}
> +EXPORT_SYMBOL(cfs_cpt_cpumask);
> +
> nodemask_t *
> cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
> {
> - return &cptab->ctb_nodemask;
> + LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
> +
> + return cpt == CFS_CPT_ANY ?
> + cptab->ctb_nodemask : cptab->ctb_parts[cpt].cpt_nodemask;
> }
> EXPORT_SYMBOL(cfs_cpt_nodemask);
>
> int
> cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
> {
> + int node;
> +
> + LASSERT(cpt >= 0 && cpt < cptab->ctb_nparts);
> +
> + if (cpu < 0 || cpu >= nr_cpu_ids || !cpu_online(cpu)) {
> + CDEBUG(D_INFO, "CPU %d is invalid or it's offline\n", cpu);
> + return 0;
> + }
> +
> + if (cptab->ctb_cpu2cpt[cpu] != -1) {
> + CDEBUG(D_INFO, "CPU %d is already in partition %d\n",
> + cpu, cptab->ctb_cpu2cpt[cpu]);
> + return 0;
> + }
> +
> + cptab->ctb_cpu2cpt[cpu] = cpt;
> +
> + LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_cpumask));
> + LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask));
> +
> + cpumask_set_cpu(cpu, cptab->ctb_cpumask);
> + cpumask_set_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
> +
> + node = cpu_to_node(cpu);
> +
> + /* first CPU of @node in this CPT table */
> + if (!node_isset(node, *cptab->ctb_nodemask))
> + node_set(node, *cptab->ctb_nodemask);
> +
> + /* first CPU of @node in this partition */
> + if (!node_isset(node, *cptab->ctb_parts[cpt].cpt_nodemask))
> + node_set(node, *cptab->ctb_parts[cpt].cpt_nodemask);
> +
> return 1;
> }
> EXPORT_SYMBOL(cfs_cpt_set_cpu);
> @@ -125,12 +346,80 @@ EXPORT_SYMBOL(cfs_cpt_set_cpu);
> void
> cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
> {
> + int node;
> + int i;
> +
> + LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
> +
> + if (cpu < 0 || cpu >= nr_cpu_ids) {
> + CDEBUG(D_INFO, "Invalid CPU id %d\n", cpu);
> + return;
> + }
> +
> + if (cpt == CFS_CPT_ANY) {
> + /* caller doesn't know the partition ID */
> + cpt = cptab->ctb_cpu2cpt[cpu];
> + if (cpt < 0) { /* not set in this CPT-table */
> + CDEBUG(D_INFO, "Try to unset cpu %d which is not in CPT-table %p\n",
> + cpt, cptab);
> + return;
> + }
> +
> + } else if (cpt != cptab->ctb_cpu2cpt[cpu]) {
> + CDEBUG(D_INFO,
> + "CPU %d is not in cpu-partition %d\n", cpu, cpt);
> + return;
> + }
> +
> + LASSERT(cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask));
> + LASSERT(cpumask_test_cpu(cpu, cptab->ctb_cpumask));
> +
> + cpumask_clear_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
> + cpumask_clear_cpu(cpu, cptab->ctb_cpumask);
> + cptab->ctb_cpu2cpt[cpu] = -1;
> +
> + node = cpu_to_node(cpu);
> +
> + LASSERT(node_isset(node, *cptab->ctb_parts[cpt].cpt_nodemask));
> + LASSERT(node_isset(node, *cptab->ctb_nodemask));
> +
> + for_each_cpu(i, cptab->ctb_parts[cpt].cpt_cpumask) {
> + /* this CPT has other CPU belonging to this node? */
> + if (cpu_to_node(i) == node)
> + break;
> + }
> +
> + if (i >= nr_cpu_ids)
> + node_clear(node, *cptab->ctb_parts[cpt].cpt_nodemask);
> +
> + for_each_cpu(i, cptab->ctb_cpumask) {
> + /* this CPT-table has other CPU belonging to this node? */
> + if (cpu_to_node(i) == node)
> + break;
> + }
> +
> + if (i >= nr_cpu_ids)
> + node_clear(node, *cptab->ctb_nodemask);
> }
> EXPORT_SYMBOL(cfs_cpt_unset_cpu);
>
> int
> cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
> {
> + int i;
> +
> + if (!cpumask_weight(mask) ||
> + cpumask_any_and(mask, cpu_online_mask) >= nr_cpu_ids) {
> + CDEBUG(D_INFO, "No online CPU is found in the CPU mask for CPU partition %d\n",
> + cpt);
> + return 0;
> + }
> +
> + for_each_cpu(i, mask) {
> + if (!cfs_cpt_set_cpu(cptab, cpt, i))
> + return 0;
> + }
> +
> return 1;
> }
> EXPORT_SYMBOL(cfs_cpt_set_cpumask);
> @@ -138,25 +427,65 @@ EXPORT_SYMBOL(cfs_cpt_set_cpumask);
> void
> cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
> {
> + int i;
> +
> + for_each_cpu(i, mask)
> + cfs_cpt_unset_cpu(cptab, cpt, i);
> }
> EXPORT_SYMBOL(cfs_cpt_unset_cpumask);
>
> int
> cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt, int node)
> {
> - return 1;
> + int rc;
> +
> + if (node < 0 || node >= MAX_NUMNODES) {
> + CDEBUG(D_INFO,
> + "Invalid NUMA id %d for CPU partition %d\n", node, cpt);
> + return 0;
> + }
> +
> + mutex_lock(&cpt_data.cpt_mutex);
> +
> + cfs_node_to_cpumask(node, cpt_data.cpt_cpumask);
> +
> + rc = cfs_cpt_set_cpumask(cptab, cpt, cpt_data.cpt_cpumask);
> +
> + mutex_unlock(&cpt_data.cpt_mutex);
> +
> + return rc;
> }
> EXPORT_SYMBOL(cfs_cpt_set_node);
>
> void
> cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node)
> {
> + if (node < 0 || node >= MAX_NUMNODES) {
> + CDEBUG(D_INFO,
> + "Invalid NUMA id %d for CPU partition %d\n", node, cpt);
> + return;
> + }
> +
> + mutex_lock(&cpt_data.cpt_mutex);
> +
> + cfs_node_to_cpumask(node, cpt_data.cpt_cpumask);
> +
> + cfs_cpt_unset_cpumask(cptab, cpt, cpt_data.cpt_cpumask);
> +
> + mutex_unlock(&cpt_data.cpt_mutex);
> }
> EXPORT_SYMBOL(cfs_cpt_unset_node);
>
> int
> cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
> {
> + int i;
> +
> + for_each_node_mask(i, *mask) {
> + if (!cfs_cpt_set_node(cptab, cpt, i))
> + return 0;
> + }
> +
> return 1;
> }
> EXPORT_SYMBOL(cfs_cpt_set_nodemask);
> @@ -164,50 +493,638 @@ EXPORT_SYMBOL(cfs_cpt_set_nodemask);
> void
> cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
> {
> + int i;
> +
> + for_each_node_mask(i, *mask)
> + cfs_cpt_unset_node(cptab, cpt, i);
> }
> EXPORT_SYMBOL(cfs_cpt_unset_nodemask);
>
> void
> cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt)
> {
> + int last;
> + int i;
> +
> + if (cpt == CFS_CPT_ANY) {
> + last = cptab->ctb_nparts - 1;
> + cpt = 0;
> + } else {
> + last = cpt;
> + }
> +
> + for (; cpt <= last; cpt++) {
> + for_each_cpu(i, cptab->ctb_parts[cpt].cpt_cpumask)
> + cfs_cpt_unset_cpu(cptab, cpt, i);
> + }
> }
> EXPORT_SYMBOL(cfs_cpt_clear);
>
> int
> cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
> {
> + nodemask_t *mask;
> + int weight;
> + int rotor;
> + int node;
> +
> + /* convert CPU partition ID to HW node id */
> +
> + if (cpt < 0 || cpt >= cptab->ctb_nparts) {
> + mask = cptab->ctb_nodemask;
> + rotor = cptab->ctb_spread_rotor++;
> + } else {
> + mask = cptab->ctb_parts[cpt].cpt_nodemask;
> + rotor = cptab->ctb_parts[cpt].cpt_spread_rotor++;
> + }
> +
> + weight = nodes_weight(*mask);
> + LASSERT(weight > 0);
> +
> + rotor %= weight;
> +
> + for_each_node_mask(node, *mask) {
> + if (!rotor--)
> + return node;
> + }
> +
> + LBUG();
> return 0;
> }
> EXPORT_SYMBOL(cfs_cpt_spread_node);
>
> -int
> -cfs_cpu_ht_nsiblings(int cpu)
> -{
> - return 1;
> -}
> -EXPORT_SYMBOL(cfs_cpu_ht_nsiblings);
> -
> int
> cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
> {
> - return 0;
> + int cpu;
> + int cpt;
> +
> + preempt_disable();
> + cpu = smp_processor_id();
> + cpt = cptab->ctb_cpu2cpt[cpu];
> +
> + if (cpt < 0 && remap) {
> + /* don't return negative value for safety of upper layer,
> + * instead we shadow the unknown cpu to a valid partition ID
> + */
> + cpt = cpu % cptab->ctb_nparts;
> + }
> + preempt_enable();
> + return cpt;
> }
> EXPORT_SYMBOL(cfs_cpt_current);
>
> int
> cfs_cpt_of_cpu(struct cfs_cpt_table *cptab, int cpu)
> {
> - return 0;
> + LASSERT(cpu >= 0 && cpu < nr_cpu_ids);
> +
> + return cptab->ctb_cpu2cpt[cpu];
> }
> EXPORT_SYMBOL(cfs_cpt_of_cpu);
>
> int
> cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
> {
> + cpumask_var_t *cpumask;
> + nodemask_t *nodemask;
> + int rc;
> + int i;
> +
> + LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
> +
> + if (cpt == CFS_CPT_ANY) {
> + cpumask = &cptab->ctb_cpumask;
> + nodemask = cptab->ctb_nodemask;
> + } else {
> + cpumask = &cptab->ctb_parts[cpt].cpt_cpumask;
> + nodemask = cptab->ctb_parts[cpt].cpt_nodemask;
> + }
> +
> + if (cpumask_any_and(*cpumask, cpu_online_mask) >= nr_cpu_ids) {
> + CERROR("No online CPU found in CPU partition %d, did someone do CPU hotplug on system? You might need to reload Lustre modules to keep system working well.\n",
> + cpt);
> + return -EINVAL;
> + }
> +
> + for_each_online_cpu(i) {
> + if (cpumask_test_cpu(i, *cpumask))
> + continue;
> +
> + rc = set_cpus_allowed_ptr(current, *cpumask);
> + set_mems_allowed(*nodemask);
> + if (!rc)
> + schedule(); /* switch to allowed CPU */
> +
> + return rc;
> + }
> +
> + /* don't need to set affinity because all online CPUs are covered */
> return 0;
> }
> EXPORT_SYMBOL(cfs_cpt_bind);
>
> +#endif
> +
> +#ifdef CONFIG_SMP
> +
> +/**
> + * Choose max to \a number CPUs from \a node and set them in \a cpt.
> + * We always prefer to choose CPU in the same core/socket.
> + */
> +static int
> +cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt,
> + cpumask_t *node, int number)
> +{
> + cpumask_var_t socket;
> + cpumask_var_t core;
> + int rc = 0;
> + int cpu;
> +
> + LASSERT(number > 0);
> +
> + if (number >= cpumask_weight(node)) {
> + while (!cpumask_empty(node)) {
> + cpu = cpumask_first(node);
> +
> + rc = cfs_cpt_set_cpu(cptab, cpt, cpu);
> + if (!rc)
> + return -EINVAL;
> + cpumask_clear_cpu(cpu, node);
> + }
> + return 0;
> + }
> +
> + /*
> + * Allocate scratch buffers
> + * As we cannot initialize a cpumask_var_t, we need
> + * to alloc both before we can risk trying to free either
> + */
> + if (!zalloc_cpumask_var(&socket, GFP_NOFS))
> + rc = -ENOMEM;
> + if (!zalloc_cpumask_var(&core, GFP_NOFS))
> + rc = -ENOMEM;
> + if (rc)
> + goto out;
> +
> + while (!cpumask_empty(node)) {
> + cpu = cpumask_first(node);
> +
> + /* get cpumask for cores in the same socket */
> + cpumask_copy(socket, topology_core_cpumask(cpu));
> + cpumask_and(socket, socket, node);
> +
> + LASSERT(!cpumask_empty(socket));
> +
> + while (!cpumask_empty(socket)) {
> + int i;
> +
> + /* get cpumask for hts in the same core */
> + cpumask_copy(core, topology_sibling_cpumask(cpu));
> + cpumask_and(core, core, node);
> +
> + LASSERT(!cpumask_empty(core));
> +
> + for_each_cpu(i, core) {
> + cpumask_clear_cpu(i, socket);
> + cpumask_clear_cpu(i, node);
> +
> + rc = cfs_cpt_set_cpu(cptab, cpt, i);
> + if (!rc) {
> + rc = -EINVAL;
> + goto out;
> + }
> +
> + if (!--number)
> + goto out;
> + }
> + cpu = cpumask_first(socket);
> + }
> + }
> +
> +out:
> + free_cpumask_var(socket);
> + free_cpumask_var(core);
> + return rc;
> +}
> +
> +#define CPT_WEIGHT_MIN 4u
> +
> +static unsigned int
> +cfs_cpt_num_estimate(void)
> +{
> + unsigned int nnode = num_online_nodes();
> + unsigned int ncpu = num_online_cpus();
> + unsigned int ncpt;
> +
> + if (ncpu <= CPT_WEIGHT_MIN) {
> + ncpt = 1;
> + goto out;
> + }
> +
> + /* generate reasonable number of CPU partitions based on total number
> + * of CPUs, Preferred N should be power2 and match this condition:
> + * 2 * (N - 1)^2 < NCPUS <= 2 * N^2
> + */
> + for (ncpt = 2; ncpu > 2 * ncpt * ncpt; ncpt <<= 1)
> + ;
> +
> + if (ncpt <= nnode) { /* fat numa system */
> + while (nnode > ncpt)
> + nnode >>= 1;
> +
> + } else { /* ncpt > nnode */
> + while ((nnode << 1) <= ncpt)
> + nnode <<= 1;
> + }
> +
> + ncpt = nnode;
> +
> +out:
> +#if (BITS_PER_LONG == 32)
> + /* config many CPU partitions on 32-bit system could consume
> + * too much memory
> + */
> + ncpt = min(2U, ncpt);
> +#endif
> + while (ncpu % ncpt)
> + ncpt--; /* worst case is 1 */
> +
> + return ncpt;
> +}
> +
> +static struct cfs_cpt_table *
> +cfs_cpt_table_create(int ncpt)
> +{
> + struct cfs_cpt_table *cptab = NULL;
> + cpumask_var_t mask;
> + int cpt = 0;
> + int num;
> + int rc;
> + int i;
> +
> + rc = cfs_cpt_num_estimate();
> + if (ncpt <= 0)
> + ncpt = rc;
> +
> + if (ncpt > num_online_cpus() || ncpt > 4 * rc) {
> + CWARN("CPU partition number %d is larger than suggested value (%d), your system may have performance issue or run out of memory while under pressure\n",
> + ncpt, rc);
> + }
> +
> + if (num_online_cpus() % ncpt) {
> + CERROR("CPU number %d is not multiple of cpu_npartition %d, please try different cpu_npartitions value or set pattern string by cpu_pattern=STRING\n",
> + (int)num_online_cpus(), ncpt);
> + goto failed;
> + }
> +
> + cptab = cfs_cpt_table_alloc(ncpt);
> + if (!cptab) {
> + CERROR("Failed to allocate CPU map(%d)\n", ncpt);
> + goto failed;
> + }
> +
> + num = num_online_cpus() / ncpt;
> + if (!num) {
> + CERROR("CPU changed while setting CPU partition\n");
> + goto failed;
> + }
> +
> + if (!zalloc_cpumask_var(&mask, GFP_NOFS)) {
> + CERROR("Failed to allocate scratch cpumask\n");
> + goto failed;
> + }
> +
> + for_each_online_node(i) {
> + cfs_node_to_cpumask(i, mask);
> +
> + while (!cpumask_empty(mask)) {
> + struct cfs_cpu_partition *part;
> + int n;
> +
> + /*
> + * Each emulated NUMA node has all allowed CPUs in
> + * the mask.
> + * End loop when all partitions have assigned CPUs.
> + */
> + if (cpt == ncpt)
> + break;
> +
> + part = &cptab->ctb_parts[cpt];
> +
> + n = num - cpumask_weight(part->cpt_cpumask);
> + LASSERT(n > 0);
> +
> + rc = cfs_cpt_choose_ncpus(cptab, cpt, mask, n);
> + if (rc < 0)
> + goto failed_mask;
> +
> + LASSERT(num >= cpumask_weight(part->cpt_cpumask));
> + if (num == cpumask_weight(part->cpt_cpumask))
> + cpt++;
> + }
> + }
> +
> + if (cpt != ncpt ||
> + num != cpumask_weight(cptab->ctb_parts[ncpt - 1].cpt_cpumask)) {
> + CERROR("Expect %d(%d) CPU partitions but got %d(%d), CPU hotplug/unplug while setting?\n",
> + cptab->ctb_nparts, num, cpt,
> + cpumask_weight(cptab->ctb_parts[ncpt - 1].cpt_cpumask));
> + goto failed_mask;
> + }
> +
> + free_cpumask_var(mask);
> +
> + return cptab;
> +
> + failed_mask:
> + free_cpumask_var(mask);
> + failed:
> + CERROR("Failed to setup CPU-partition-table with %d CPU-partitions, online HW nodes: %d, HW cpus: %d.\n",
> + ncpt, num_online_nodes(), num_online_cpus());
> +
> + if (cptab)
> + cfs_cpt_table_free(cptab);
> +
> + return NULL;
> +}
> +
> +static struct cfs_cpt_table *
> +cfs_cpt_table_create_pattern(char *pattern)
> +{
> + struct cfs_cpt_table *cptab;
> + char *str;
> + int node = 0;
> + int high;
> + int ncpt = 0;
> + int cpt;
> + int rc;
> + int c;
> + int i;
> +
> + str = strim(pattern);
> + if (*str == 'n' || *str == 'N') {
> + pattern = str + 1;
> + if (*pattern != '\0') {
> + node = 1;
> + } else { /* shortcut to create CPT from NUMA & CPU topology */
> + node = -1;
> + ncpt = num_online_nodes();
> + }
> + }
> +
> + if (!ncpt) { /* scanning bracket which is mark of partition */
> + for (str = pattern;; str++, ncpt++) {
> + str = strchr(str, '[');
> + if (!str)
> + break;
> + }
> + }
> +
> + if (!ncpt ||
> + (node && ncpt > num_online_nodes()) ||
> + (!node && ncpt > num_online_cpus())) {
> + CERROR("Invalid pattern %s, or too many partitions %d\n",
> + pattern, ncpt);
> + return NULL;
> + }
> +
> + cptab = cfs_cpt_table_alloc(ncpt);
> + if (!cptab) {
> + CERROR("Failed to allocate cpu partition table\n");
> + return NULL;
> + }
> +
> + if (node < 0) { /* shortcut to create CPT from NUMA & CPU topology */
> + cpt = 0;
> +
> + for_each_online_node(i) {
> + if (cpt >= ncpt) {
> + CERROR("CPU changed while setting CPU partition table, %d/%d\n",
> + cpt, ncpt);
> + goto failed;
> + }
> +
> + rc = cfs_cpt_set_node(cptab, cpt++, i);
> + if (!rc)
> + goto failed;
> + }
> + return cptab;
> + }
> +
> + high = node ? MAX_NUMNODES - 1 : nr_cpu_ids - 1;
> +
> + for (str = strim(pattern), c = 0;; c++) {
> + struct cfs_range_expr *range;
> + struct cfs_expr_list *el;
> + char *bracket = strchr(str, '[');
> + int n;
> +
> + if (!bracket) {
> + if (*str) {
> + CERROR("Invalid pattern %s\n", str);
> + goto failed;
> + }
> + if (c != ncpt) {
> + CERROR("expect %d partitions but found %d\n",
> + ncpt, c);
> + goto failed;
> + }
> + break;
> + }
> +
> + if (sscanf(str, "%d%n", &cpt, &n) < 1) {
> + CERROR("Invalid cpu pattern %s\n", str);
> + goto failed;
> + }
> +
> + if (cpt < 0 || cpt >= ncpt) {
> + CERROR("Invalid partition id %d, total partitions %d\n",
> + cpt, ncpt);
> + goto failed;
> + }
> +
> + if (cfs_cpt_weight(cptab, cpt)) {
> + CERROR("Partition %d has already been set.\n", cpt);
> + goto failed;
> + }
> +
> + str = strim(str + n);
> + if (str != bracket) {
> + CERROR("Invalid pattern %s\n", str);
> + goto failed;
> + }
> +
> + bracket = strchr(str, ']');
> + if (!bracket) {
> + CERROR("missing right bracket for cpt %d, %s\n",
> + cpt, str);
> + goto failed;
> + }
> +
> + if (cfs_expr_list_parse(str, (bracket - str) + 1,
> + 0, high, &el)) {
> + CERROR("Can't parse number range: %s\n", str);
> + goto failed;
> + }
> +
> + list_for_each_entry(range, &el->el_exprs, re_link) {
> + for (i = range->re_lo; i <= range->re_hi; i++) {
> + if ((i - range->re_lo) % range->re_stride)
> + continue;
> +
> + rc = node ? cfs_cpt_set_node(cptab, cpt, i) :
> + cfs_cpt_set_cpu(cptab, cpt, i);
> + if (!rc) {
> + cfs_expr_list_free(el);
> + goto failed;
> + }
> + }
> + }
> +
> + cfs_expr_list_free(el);
> +
> + if (!cfs_cpt_online(cptab, cpt)) {
> + CERROR("No online CPU is found on partition %d\n", cpt);
> + goto failed;
> + }
> +
> + str = strim(bracket + 1);
> + }
> +
> + return cptab;
> +
> + failed:
> + cfs_cpt_table_free(cptab);
> + return NULL;
> +}
> +
> +#ifdef CONFIG_HOTPLUG_CPU
> +static enum cpuhp_state lustre_cpu_online;
> +
> +static void cfs_cpu_incr_cpt_version(void)
> +{
> + spin_lock(&cpt_data.cpt_lock);
> + cpt_data.cpt_version++;
> + spin_unlock(&cpt_data.cpt_lock);
> +}
> +
> +static int cfs_cpu_online(unsigned int cpu)
> +{
> + cfs_cpu_incr_cpt_version();
> + return 0;
> +}
> +
> +static int cfs_cpu_dead(unsigned int cpu)
> +{
> + bool warn;
> +
> + cfs_cpu_incr_cpt_version();
> +
> + mutex_lock(&cpt_data.cpt_mutex);
> + /* if all HTs in a core are offline, it may break affinity */
> + cpumask_copy(cpt_data.cpt_cpumask, topology_sibling_cpumask(cpu));
> + warn = cpumask_any_and(cpt_data.cpt_cpumask,
> + cpu_online_mask) >= nr_cpu_ids;
> + mutex_unlock(&cpt_data.cpt_mutex);
> + CDEBUG(warn ? D_WARNING : D_INFO,
> + "Lustre: can't support CPU plug-out well now, performance and stability could be impacted [CPU %u]\n",
> + cpu);
> + return 0;
> +}
> +#endif
> +
> +void
> +cfs_cpu_fini(void)
> +{
> + if (cfs_cpt_table)
> + cfs_cpt_table_free(cfs_cpt_table);
> +
> +#ifdef CONFIG_HOTPLUG_CPU
> + if (lustre_cpu_online > 0)
> + cpuhp_remove_state_nocalls(lustre_cpu_online);
> + cpuhp_remove_state_nocalls(CPUHP_LUSTRE_CFS_DEAD);
> +#endif
> + free_cpumask_var(cpt_data.cpt_cpumask);
> +}
> +
> +int
> +cfs_cpu_init(void)
> +{
> + int ret = 0;
> +
> + LASSERT(!cfs_cpt_table);
> +
> + memset(&cpt_data, 0, sizeof(cpt_data));
> +
> + if (!zalloc_cpumask_var(&cpt_data.cpt_cpumask, GFP_NOFS)) {
> + CERROR("Failed to allocate scratch buffer\n");
> + return -1;
> + }
> +
> + spin_lock_init(&cpt_data.cpt_lock);
> + mutex_init(&cpt_data.cpt_mutex);
> +
> +#ifdef CONFIG_HOTPLUG_CPU
> + ret = cpuhp_setup_state_nocalls(CPUHP_LUSTRE_CFS_DEAD,
> + "staging/lustre/cfe:dead", NULL,
> + cfs_cpu_dead);
> + if (ret < 0)
> + goto failed;
> + ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> + "staging/lustre/cfe:online",
> + cfs_cpu_online, NULL);
> + if (ret < 0)
> + goto failed;
> + lustre_cpu_online = ret;
> +#endif
> + ret = -EINVAL;
> +
> + if (*cpu_pattern) {
> + char *cpu_pattern_dup = kstrdup(cpu_pattern, GFP_KERNEL);
> +
> + if (!cpu_pattern_dup) {
> + CERROR("Failed to duplicate cpu_pattern\n");
> + goto failed;
> + }
> +
> + cfs_cpt_table = cfs_cpt_table_create_pattern(cpu_pattern_dup);
> + kfree(cpu_pattern_dup);
> + if (!cfs_cpt_table) {
> + CERROR("Failed to create cptab from pattern %s\n",
> + cpu_pattern);
> + goto failed;
> + }
> +
> + } else {
> + cfs_cpt_table = cfs_cpt_table_create(cpu_npartitions);
> + if (!cfs_cpt_table) {
> + CERROR("Failed to create ptable with npartitions %d\n",
> + cpu_npartitions);
> + goto failed;
> + }
> + }
> +
> + spin_lock(&cpt_data.cpt_lock);
> + if (cfs_cpt_table->ctb_version != cpt_data.cpt_version) {
> + spin_unlock(&cpt_data.cpt_lock);
> + CERROR("CPU hotplug/unplug during setup\n");
> + goto failed;
> + }
> + spin_unlock(&cpt_data.cpt_lock);
> +
> + LCONSOLE(0, "HW nodes: %d, HW CPU cores: %d, npartitions: %d\n",
> + num_online_nodes(), num_online_cpus(),
> + cfs_cpt_number(cfs_cpt_table));
> + return 0;
> +
> + failed:
> + cfs_cpu_fini();
> + return ret;
> +}
> +
> +#else /* ! CONFIG_SMP */
> +
> void
> cfs_cpu_fini(void)
> {
> diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
> deleted file mode 100644
> index 388521e4e354..000000000000
> --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
> +++ /dev/null
> @@ -1,1079 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -/*
> - * GPL HEADER START
> - *
> - * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 only,
> - * as published by the Free Software Foundation.
> - *
> - * This program is distributed in the hope that it will be useful, but
> - * WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - * General Public License version 2 for more details (a copy is included
> - * in the LICENSE file that accompanied this code).
> - *
> - * GPL HEADER END
> - */
> -/*
> - * Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
> - *
> - * Copyright (c) 2012, 2015 Intel Corporation.
> - */
> -/*
> - * This file is part of Lustre, http://www.lustre.org/
> - * Lustre is a trademark of Sun Microsystems, Inc.
> - *
> - * Author: [email protected]
> - */
> -
> -#define DEBUG_SUBSYSTEM S_LNET
> -
> -#include <linux/cpu.h>
> -#include <linux/sched.h>
> -#include <linux/libcfs/libcfs.h>
> -
> -#ifdef CONFIG_SMP
> -
> -/**
> - * modparam for setting number of partitions
> - *
> - * 0 : estimate best value based on cores or NUMA nodes
> - * 1 : disable multiple partitions
> - * >1 : specify number of partitions
> - */
> -static int cpu_npartitions;
> -module_param(cpu_npartitions, int, 0444);
> -MODULE_PARM_DESC(cpu_npartitions, "# of CPU partitions");
> -
> -/**
> - * modparam for setting CPU partitions patterns:
> - *
> - * i.e: "0[0,1,2,3] 1[4,5,6,7]", number before bracket is CPU partition ID,
> - * number in bracket is processor ID (core or HT)
> - *
> - * i.e: "N 0[0,1] 1[2,3]" the first character 'N' means numbers in bracket
> - * are NUMA node ID, number before bracket is CPU partition ID.
> - *
> - * i.e: "N", shortcut expression to create CPT from NUMA & CPU topology
> - *
> - * NB: If user specified cpu_pattern, cpu_npartitions will be ignored
> - */
> -static char *cpu_pattern = "N";
> -module_param(cpu_pattern, charp, 0444);
> -MODULE_PARM_DESC(cpu_pattern, "CPU partitions pattern");
> -
> -struct cfs_cpt_data {
> - /* serialize hotplug etc */
> - spinlock_t cpt_lock;
> - /* reserved for hotplug */
> - unsigned long cpt_version;
> - /* mutex to protect cpt_cpumask */
> - struct mutex cpt_mutex;
> - /* scratch buffer for set/unset_node */
> - cpumask_var_t cpt_cpumask;
> -};
> -
> -static struct cfs_cpt_data cpt_data;
> -
> -static void
> -cfs_node_to_cpumask(int node, cpumask_t *mask)
> -{
> - const cpumask_t *tmp = cpumask_of_node(node);
> -
> - if (tmp)
> - cpumask_copy(mask, tmp);
> - else
> - cpumask_clear(mask);
> -}
> -
> -void
> -cfs_cpt_table_free(struct cfs_cpt_table *cptab)
> -{
> - int i;
> -
> - kvfree(cptab->ctb_cpu2cpt);
> -
> - for (i = 0; cptab->ctb_parts && i < cptab->ctb_nparts; i++) {
> - struct cfs_cpu_partition *part = &cptab->ctb_parts[i];
> -
> - kfree(part->cpt_nodemask);
> - free_cpumask_var(part->cpt_cpumask);
> - }
> -
> - kvfree(cptab->ctb_parts);
> -
> - kfree(cptab->ctb_nodemask);
> - free_cpumask_var(cptab->ctb_cpumask);
> -
> - kfree(cptab);
> -}
> -EXPORT_SYMBOL(cfs_cpt_table_free);
> -
> -struct cfs_cpt_table *
> -cfs_cpt_table_alloc(unsigned int ncpt)
> -{
> - struct cfs_cpt_table *cptab;
> - int i;
> -
> - cptab = kzalloc(sizeof(*cptab), GFP_NOFS);
> - if (!cptab)
> - return NULL;
> -
> - cptab->ctb_nparts = ncpt;
> -
> - cptab->ctb_nodemask = kzalloc(sizeof(*cptab->ctb_nodemask),
> - GFP_NOFS);
> - if (!zalloc_cpumask_var(&cptab->ctb_cpumask, GFP_NOFS) ||
> - !cptab->ctb_nodemask)
> - goto failed;
> -
> - cptab->ctb_cpu2cpt = kvmalloc_array(num_possible_cpus(),
> - sizeof(cptab->ctb_cpu2cpt[0]),
> - GFP_KERNEL);
> - if (!cptab->ctb_cpu2cpt)
> - goto failed;
> -
> - memset(cptab->ctb_cpu2cpt, -1,
> - num_possible_cpus() * sizeof(cptab->ctb_cpu2cpt[0]));
> -
> - cptab->ctb_parts = kvmalloc_array(ncpt, sizeof(cptab->ctb_parts[0]),
> - GFP_KERNEL);
> - if (!cptab->ctb_parts)
> - goto failed;
> -
> - for (i = 0; i < ncpt; i++) {
> - struct cfs_cpu_partition *part = &cptab->ctb_parts[i];
> -
> - part->cpt_nodemask = kzalloc(sizeof(*part->cpt_nodemask),
> - GFP_NOFS);
> - if (!zalloc_cpumask_var(&part->cpt_cpumask, GFP_NOFS) ||
> - !part->cpt_nodemask)
> - goto failed;
> - }
> -
> - spin_lock(&cpt_data.cpt_lock);
> - /* Reserved for hotplug */
> - cptab->ctb_version = cpt_data.cpt_version;
> - spin_unlock(&cpt_data.cpt_lock);
> -
> - return cptab;
> -
> - failed:
> - cfs_cpt_table_free(cptab);
> - return NULL;
> -}
> -EXPORT_SYMBOL(cfs_cpt_table_alloc);
> -
> -int
> -cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
> -{
> - char *tmp = buf;
> - int rc = 0;
> - int i;
> - int j;
> -
> - for (i = 0; i < cptab->ctb_nparts; i++) {
> - if (len > 0) {
> - rc = snprintf(tmp, len, "%d\t: ", i);
> - len -= rc;
> - }
> -
> - if (len <= 0) {
> - rc = -EFBIG;
> - goto out;
> - }
> -
> - tmp += rc;
> - for_each_cpu(j, cptab->ctb_parts[i].cpt_cpumask) {
> - rc = snprintf(tmp, len, "%d ", j);
> - len -= rc;
> - if (len <= 0) {
> - rc = -EFBIG;
> - goto out;
> - }
> - tmp += rc;
> - }
> -
> - *tmp = '\n';
> - tmp++;
> - len--;
> - }
> -
> - out:
> - if (rc < 0)
> - return rc;
> -
> - return tmp - buf;
> -}
> -EXPORT_SYMBOL(cfs_cpt_table_print);
> -
> -int
> -cfs_cpt_number(struct cfs_cpt_table *cptab)
> -{
> - return cptab->ctb_nparts;
> -}
> -EXPORT_SYMBOL(cfs_cpt_number);
> -
> -int
> -cfs_cpt_weight(struct cfs_cpt_table *cptab, int cpt)
> -{
> - LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
> -
> - return cpt == CFS_CPT_ANY ?
> - cpumask_weight(cptab->ctb_cpumask) :
> - cpumask_weight(cptab->ctb_parts[cpt].cpt_cpumask);
> -}
> -EXPORT_SYMBOL(cfs_cpt_weight);
> -
> -int
> -cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt)
> -{
> - LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
> -
> - return cpt == CFS_CPT_ANY ?
> - cpumask_any_and(cptab->ctb_cpumask,
> - cpu_online_mask) < nr_cpu_ids :
> - cpumask_any_and(cptab->ctb_parts[cpt].cpt_cpumask,
> - cpu_online_mask) < nr_cpu_ids;
> -}
> -EXPORT_SYMBOL(cfs_cpt_online);
> -
> -cpumask_var_t *
> -cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
> -{
> - LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
> -
> - return cpt == CFS_CPT_ANY ?
> - &cptab->ctb_cpumask : &cptab->ctb_parts[cpt].cpt_cpumask;
> -}
> -EXPORT_SYMBOL(cfs_cpt_cpumask);
> -
> -nodemask_t *
> -cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
> -{
> - LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
> -
> - return cpt == CFS_CPT_ANY ?
> - cptab->ctb_nodemask : cptab->ctb_parts[cpt].cpt_nodemask;
> -}
> -EXPORT_SYMBOL(cfs_cpt_nodemask);
> -
> -int
> -cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
> -{
> - int node;
> -
> - LASSERT(cpt >= 0 && cpt < cptab->ctb_nparts);
> -
> - if (cpu < 0 || cpu >= nr_cpu_ids || !cpu_online(cpu)) {
> - CDEBUG(D_INFO, "CPU %d is invalid or it's offline\n", cpu);
> - return 0;
> - }
> -
> - if (cptab->ctb_cpu2cpt[cpu] != -1) {
> - CDEBUG(D_INFO, "CPU %d is already in partition %d\n",
> - cpu, cptab->ctb_cpu2cpt[cpu]);
> - return 0;
> - }
> -
> - cptab->ctb_cpu2cpt[cpu] = cpt;
> -
> - LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_cpumask));
> - LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask));
> -
> - cpumask_set_cpu(cpu, cptab->ctb_cpumask);
> - cpumask_set_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
> -
> - node = cpu_to_node(cpu);
> -
> - /* first CPU of @node in this CPT table */
> - if (!node_isset(node, *cptab->ctb_nodemask))
> - node_set(node, *cptab->ctb_nodemask);
> -
> - /* first CPU of @node in this partition */
> - if (!node_isset(node, *cptab->ctb_parts[cpt].cpt_nodemask))
> - node_set(node, *cptab->ctb_parts[cpt].cpt_nodemask);
> -
> - return 1;
> -}
> -EXPORT_SYMBOL(cfs_cpt_set_cpu);
> -
> -void
> -cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
> -{
> - int node;
> - int i;
> -
> - LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
> -
> - if (cpu < 0 || cpu >= nr_cpu_ids) {
> - CDEBUG(D_INFO, "Invalid CPU id %d\n", cpu);
> - return;
> - }
> -
> - if (cpt == CFS_CPT_ANY) {
> - /* caller doesn't know the partition ID */
> - cpt = cptab->ctb_cpu2cpt[cpu];
> - if (cpt < 0) { /* not set in this CPT-table */
> - CDEBUG(D_INFO, "Try to unset cpu %d which is not in CPT-table %p\n",
> - cpt, cptab);
> - return;
> - }
> -
> - } else if (cpt != cptab->ctb_cpu2cpt[cpu]) {
> - CDEBUG(D_INFO,
> - "CPU %d is not in cpu-partition %d\n", cpu, cpt);
> - return;
> - }
> -
> - LASSERT(cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask));
> - LASSERT(cpumask_test_cpu(cpu, cptab->ctb_cpumask));
> -
> - cpumask_clear_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
> - cpumask_clear_cpu(cpu, cptab->ctb_cpumask);
> - cptab->ctb_cpu2cpt[cpu] = -1;
> -
> - node = cpu_to_node(cpu);
> -
> - LASSERT(node_isset(node, *cptab->ctb_parts[cpt].cpt_nodemask));
> - LASSERT(node_isset(node, *cptab->ctb_nodemask));
> -
> - for_each_cpu(i, cptab->ctb_parts[cpt].cpt_cpumask) {
> - /* this CPT has other CPU belonging to this node? */
> - if (cpu_to_node(i) == node)
> - break;
> - }
> -
> - if (i >= nr_cpu_ids)
> - node_clear(node, *cptab->ctb_parts[cpt].cpt_nodemask);
> -
> - for_each_cpu(i, cptab->ctb_cpumask) {
> - /* this CPT-table has other CPU belonging to this node? */
> - if (cpu_to_node(i) == node)
> - break;
> - }
> -
> - if (i >= nr_cpu_ids)
> - node_clear(node, *cptab->ctb_nodemask);
> -}
> -EXPORT_SYMBOL(cfs_cpt_unset_cpu);
> -
> -int
> -cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
> -{
> - int i;
> -
> - if (!cpumask_weight(mask) ||
> - cpumask_any_and(mask, cpu_online_mask) >= nr_cpu_ids) {
> - CDEBUG(D_INFO, "No online CPU is found in the CPU mask for CPU partition %d\n",
> - cpt);
> - return 0;
> - }
> -
> - for_each_cpu(i, mask) {
> - if (!cfs_cpt_set_cpu(cptab, cpt, i))
> - return 0;
> - }
> -
> - return 1;
> -}
> -EXPORT_SYMBOL(cfs_cpt_set_cpumask);
> -
> -void
> -cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
> -{
> - int i;
> -
> - for_each_cpu(i, mask)
> - cfs_cpt_unset_cpu(cptab, cpt, i);
> -}
> -EXPORT_SYMBOL(cfs_cpt_unset_cpumask);
> -
> -int
> -cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt, int node)
> -{
> - int rc;
> -
> - if (node < 0 || node >= MAX_NUMNODES) {
> - CDEBUG(D_INFO,
> - "Invalid NUMA id %d for CPU partition %d\n", node, cpt);
> - return 0;
> - }
> -
> - mutex_lock(&cpt_data.cpt_mutex);
> -
> - cfs_node_to_cpumask(node, cpt_data.cpt_cpumask);
> -
> - rc = cfs_cpt_set_cpumask(cptab, cpt, cpt_data.cpt_cpumask);
> -
> - mutex_unlock(&cpt_data.cpt_mutex);
> -
> - return rc;
> -}
> -EXPORT_SYMBOL(cfs_cpt_set_node);
> -
> -void
> -cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node)
> -{
> - if (node < 0 || node >= MAX_NUMNODES) {
> - CDEBUG(D_INFO,
> - "Invalid NUMA id %d for CPU partition %d\n", node, cpt);
> - return;
> - }
> -
> - mutex_lock(&cpt_data.cpt_mutex);
> -
> - cfs_node_to_cpumask(node, cpt_data.cpt_cpumask);
> -
> - cfs_cpt_unset_cpumask(cptab, cpt, cpt_data.cpt_cpumask);
> -
> - mutex_unlock(&cpt_data.cpt_mutex);
> -}
> -EXPORT_SYMBOL(cfs_cpt_unset_node);
> -
> -int
> -cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
> -{
> - int i;
> -
> - for_each_node_mask(i, *mask) {
> - if (!cfs_cpt_set_node(cptab, cpt, i))
> - return 0;
> - }
> -
> - return 1;
> -}
> -EXPORT_SYMBOL(cfs_cpt_set_nodemask);
> -
> -void
> -cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
> -{
> - int i;
> -
> - for_each_node_mask(i, *mask)
> - cfs_cpt_unset_node(cptab, cpt, i);
> -}
> -EXPORT_SYMBOL(cfs_cpt_unset_nodemask);
> -
> -void
> -cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt)
> -{
> - int last;
> - int i;
> -
> - if (cpt == CFS_CPT_ANY) {
> - last = cptab->ctb_nparts - 1;
> - cpt = 0;
> - } else {
> - last = cpt;
> - }
> -
> - for (; cpt <= last; cpt++) {
> - for_each_cpu(i, cptab->ctb_parts[cpt].cpt_cpumask)
> - cfs_cpt_unset_cpu(cptab, cpt, i);
> - }
> -}
> -EXPORT_SYMBOL(cfs_cpt_clear);
> -
> -int
> -cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
> -{
> - nodemask_t *mask;
> - int weight;
> - int rotor;
> - int node;
> -
> - /* convert CPU partition ID to HW node id */
> -
> - if (cpt < 0 || cpt >= cptab->ctb_nparts) {
> - mask = cptab->ctb_nodemask;
> - rotor = cptab->ctb_spread_rotor++;
> - } else {
> - mask = cptab->ctb_parts[cpt].cpt_nodemask;
> - rotor = cptab->ctb_parts[cpt].cpt_spread_rotor++;
> - }
> -
> - weight = nodes_weight(*mask);
> - LASSERT(weight > 0);
> -
> - rotor %= weight;
> -
> - for_each_node_mask(node, *mask) {
> - if (!rotor--)
> - return node;
> - }
> -
> - LBUG();
> - return 0;
> -}
> -EXPORT_SYMBOL(cfs_cpt_spread_node);
> -
> -int
> -cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
> -{
> - int cpu;
> - int cpt;
> -
> - preempt_disable();
> - cpu = smp_processor_id();
> - cpt = cptab->ctb_cpu2cpt[cpu];
> -
> - if (cpt < 0 && remap) {
> - /* don't return negative value for safety of upper layer,
> - * instead we shadow the unknown cpu to a valid partition ID
> - */
> - cpt = cpu % cptab->ctb_nparts;
> - }
> - preempt_enable();
> - return cpt;
> -}
> -EXPORT_SYMBOL(cfs_cpt_current);
> -
> -int
> -cfs_cpt_of_cpu(struct cfs_cpt_table *cptab, int cpu)
> -{
> - LASSERT(cpu >= 0 && cpu < nr_cpu_ids);
> -
> - return cptab->ctb_cpu2cpt[cpu];
> -}
> -EXPORT_SYMBOL(cfs_cpt_of_cpu);
> -
> -int
> -cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
> -{
> - cpumask_var_t *cpumask;
> - nodemask_t *nodemask;
> - int rc;
> - int i;
> -
> - LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
> -
> - if (cpt == CFS_CPT_ANY) {
> - cpumask = &cptab->ctb_cpumask;
> - nodemask = cptab->ctb_nodemask;
> - } else {
> - cpumask = &cptab->ctb_parts[cpt].cpt_cpumask;
> - nodemask = cptab->ctb_parts[cpt].cpt_nodemask;
> - }
> -
> - if (cpumask_any_and(*cpumask, cpu_online_mask) >= nr_cpu_ids) {
> - CERROR("No online CPU found in CPU partition %d, did someone do CPU hotplug on system? You might need to reload Lustre modules to keep system working well.\n",
> - cpt);
> - return -EINVAL;
> - }
> -
> - for_each_online_cpu(i) {
> - if (cpumask_test_cpu(i, *cpumask))
> - continue;
> -
> - rc = set_cpus_allowed_ptr(current, *cpumask);
> - set_mems_allowed(*nodemask);
> - if (!rc)
> - schedule(); /* switch to allowed CPU */
> -
> - return rc;
> - }
> -
> - /* don't need to set affinity because all online CPUs are covered */
> - return 0;
> -}
> -EXPORT_SYMBOL(cfs_cpt_bind);
> -
> -/**
> - * Choose max to \a number CPUs from \a node and set them in \a cpt.
> - * We always prefer to choose CPU in the same core/socket.
> - */
> -static int
> -cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt,
> - cpumask_t *node, int number)
> -{
> - cpumask_var_t socket;
> - cpumask_var_t core;
> - int rc = 0;
> - int cpu;
> -
> - LASSERT(number > 0);
> -
> - if (number >= cpumask_weight(node)) {
> - while (!cpumask_empty(node)) {
> - cpu = cpumask_first(node);
> -
> - rc = cfs_cpt_set_cpu(cptab, cpt, cpu);
> - if (!rc)
> - return -EINVAL;
> - cpumask_clear_cpu(cpu, node);
> - }
> - return 0;
> - }
> -
> - /*
> - * Allocate scratch buffers
> - * As we cannot initialize a cpumask_var_t, we need
> - * to alloc both before we can risk trying to free either
> - */
> - if (!zalloc_cpumask_var(&socket, GFP_NOFS))
> - rc = -ENOMEM;
> - if (!zalloc_cpumask_var(&core, GFP_NOFS))
> - rc = -ENOMEM;
> - if (rc)
> - goto out;
> -
> - while (!cpumask_empty(node)) {
> - cpu = cpumask_first(node);
> -
> - /* get cpumask for cores in the same socket */
> - cpumask_copy(socket, topology_core_cpumask(cpu));
> - cpumask_and(socket, socket, node);
> -
> - LASSERT(!cpumask_empty(socket));
> -
> - while (!cpumask_empty(socket)) {
> - int i;
> -
> - /* get cpumask for hts in the same core */
> - cpumask_copy(core, topology_sibling_cpumask(cpu));
> - cpumask_and(core, core, node);
> -
> - LASSERT(!cpumask_empty(core));
> -
> - for_each_cpu(i, core) {
> - cpumask_clear_cpu(i, socket);
> - cpumask_clear_cpu(i, node);
> -
> - rc = cfs_cpt_set_cpu(cptab, cpt, i);
> - if (!rc) {
> - rc = -EINVAL;
> - goto out;
> - }
> -
> - if (!--number)
> - goto out;
> - }
> - cpu = cpumask_first(socket);
> - }
> - }
> -
> -out:
> - free_cpumask_var(socket);
> - free_cpumask_var(core);
> - return rc;
> -}
> -
> -#define CPT_WEIGHT_MIN 4u
> -
> -static unsigned int
> -cfs_cpt_num_estimate(void)
> -{
> - unsigned int nnode = num_online_nodes();
> - unsigned int ncpu = num_online_cpus();
> - unsigned int ncpt;
> -
> - if (ncpu <= CPT_WEIGHT_MIN) {
> - ncpt = 1;
> - goto out;
> - }
> -
> - /* generate reasonable number of CPU partitions based on total number
> - * of CPUs, Preferred N should be power2 and match this condition:
> - * 2 * (N - 1)^2 < NCPUS <= 2 * N^2
> - */
> - for (ncpt = 2; ncpu > 2 * ncpt * ncpt; ncpt <<= 1)
> - ;
> -
> - if (ncpt <= nnode) { /* fat numa system */
> - while (nnode > ncpt)
> - nnode >>= 1;
> -
> - } else { /* ncpt > nnode */
> - while ((nnode << 1) <= ncpt)
> - nnode <<= 1;
> - }
> -
> - ncpt = nnode;
> -
> -out:
> -#if (BITS_PER_LONG == 32)
> - /* config many CPU partitions on 32-bit system could consume
> - * too much memory
> - */
> - ncpt = min(2U, ncpt);
> -#endif
> - while (ncpu % ncpt)
> - ncpt--; /* worst case is 1 */
> -
> - return ncpt;
> -}
> -
> -static struct cfs_cpt_table *
> -cfs_cpt_table_create(int ncpt)
> -{
> - struct cfs_cpt_table *cptab = NULL;
> - cpumask_var_t mask;
> - int cpt = 0;
> - int num;
> - int rc;
> - int i;
> -
> - rc = cfs_cpt_num_estimate();
> - if (ncpt <= 0)
> - ncpt = rc;
> -
> - if (ncpt > num_online_cpus() || ncpt > 4 * rc) {
> - CWARN("CPU partition number %d is larger than suggested value (%d), your system may have performance issue or run out of memory while under pressure\n",
> - ncpt, rc);
> - }
> -
> - if (num_online_cpus() % ncpt) {
> - CERROR("CPU number %d is not multiple of cpu_npartition %d, please try different cpu_npartitions value or set pattern string by cpu_pattern=STRING\n",
> - (int)num_online_cpus(), ncpt);
> - goto failed;
> - }
> -
> - cptab = cfs_cpt_table_alloc(ncpt);
> - if (!cptab) {
> - CERROR("Failed to allocate CPU map(%d)\n", ncpt);
> - goto failed;
> - }
> -
> - num = num_online_cpus() / ncpt;
> - if (!num) {
> - CERROR("CPU changed while setting CPU partition\n");
> - goto failed;
> - }
> -
> - if (!zalloc_cpumask_var(&mask, GFP_NOFS)) {
> - CERROR("Failed to allocate scratch cpumask\n");
> - goto failed;
> - }
> -
> - for_each_online_node(i) {
> - cfs_node_to_cpumask(i, mask);
> -
> - while (!cpumask_empty(mask)) {
> - struct cfs_cpu_partition *part;
> - int n;
> -
> - /*
> - * Each emulated NUMA node has all allowed CPUs in
> - * the mask.
> - * End loop when all partitions have assigned CPUs.
> - */
> - if (cpt == ncpt)
> - break;
> -
> - part = &cptab->ctb_parts[cpt];
> -
> - n = num - cpumask_weight(part->cpt_cpumask);
> - LASSERT(n > 0);
> -
> - rc = cfs_cpt_choose_ncpus(cptab, cpt, mask, n);
> - if (rc < 0)
> - goto failed_mask;
> -
> - LASSERT(num >= cpumask_weight(part->cpt_cpumask));
> - if (num == cpumask_weight(part->cpt_cpumask))
> - cpt++;
> - }
> - }
> -
> - if (cpt != ncpt ||
> - num != cpumask_weight(cptab->ctb_parts[ncpt - 1].cpt_cpumask)) {
> - CERROR("Expect %d(%d) CPU partitions but got %d(%d), CPU hotplug/unplug while setting?\n",
> - cptab->ctb_nparts, num, cpt,
> - cpumask_weight(cptab->ctb_parts[ncpt - 1].cpt_cpumask));
> - goto failed_mask;
> - }
> -
> - free_cpumask_var(mask);
> -
> - return cptab;
> -
> - failed_mask:
> - free_cpumask_var(mask);
> - failed:
> - CERROR("Failed to setup CPU-partition-table with %d CPU-partitions, online HW nodes: %d, HW cpus: %d.\n",
> - ncpt, num_online_nodes(), num_online_cpus());
> -
> - if (cptab)
> - cfs_cpt_table_free(cptab);
> -
> - return NULL;
> -}
> -
> -static struct cfs_cpt_table *
> -cfs_cpt_table_create_pattern(char *pattern)
> -{
> - struct cfs_cpt_table *cptab;
> - char *str;
> - int node = 0;
> - int high;
> - int ncpt = 0;
> - int cpt;
> - int rc;
> - int c;
> - int i;
> -
> - str = strim(pattern);
> - if (*str == 'n' || *str == 'N') {
> - pattern = str + 1;
> - if (*pattern != '\0') {
> - node = 1;
> - } else { /* shortcut to create CPT from NUMA & CPU topology */
> - node = -1;
> - ncpt = num_online_nodes();
> - }
> - }
> -
> - if (!ncpt) { /* scanning bracket which is mark of partition */
> - for (str = pattern;; str++, ncpt++) {
> - str = strchr(str, '[');
> - if (!str)
> - break;
> - }
> - }
> -
> - if (!ncpt ||
> - (node && ncpt > num_online_nodes()) ||
> - (!node && ncpt > num_online_cpus())) {
> - CERROR("Invalid pattern %s, or too many partitions %d\n",
> - pattern, ncpt);
> - return NULL;
> - }
> -
> - cptab = cfs_cpt_table_alloc(ncpt);
> - if (!cptab) {
> - CERROR("Failed to allocate cpu partition table\n");
> - return NULL;
> - }
> -
> - if (node < 0) { /* shortcut to create CPT from NUMA & CPU topology */
> - cpt = 0;
> -
> - for_each_online_node(i) {
> - if (cpt >= ncpt) {
> - CERROR("CPU changed while setting CPU partition table, %d/%d\n",
> - cpt, ncpt);
> - goto failed;
> - }
> -
> - rc = cfs_cpt_set_node(cptab, cpt++, i);
> - if (!rc)
> - goto failed;
> - }
> - return cptab;
> - }
> -
> - high = node ? MAX_NUMNODES - 1 : nr_cpu_ids - 1;
> -
> - for (str = strim(pattern), c = 0;; c++) {
> - struct cfs_range_expr *range;
> - struct cfs_expr_list *el;
> - char *bracket = strchr(str, '[');
> - int n;
> -
> - if (!bracket) {
> - if (*str) {
> - CERROR("Invalid pattern %s\n", str);
> - goto failed;
> - }
> - if (c != ncpt) {
> - CERROR("expect %d partitions but found %d\n",
> - ncpt, c);
> - goto failed;
> - }
> - break;
> - }
> -
> - if (sscanf(str, "%d%n", &cpt, &n) < 1) {
> - CERROR("Invalid cpu pattern %s\n", str);
> - goto failed;
> - }
> -
> - if (cpt < 0 || cpt >= ncpt) {
> - CERROR("Invalid partition id %d, total partitions %d\n",
> - cpt, ncpt);
> - goto failed;
> - }
> -
> - if (cfs_cpt_weight(cptab, cpt)) {
> - CERROR("Partition %d has already been set.\n", cpt);
> - goto failed;
> - }
> -
> - str = strim(str + n);
> - if (str != bracket) {
> - CERROR("Invalid pattern %s\n", str);
> - goto failed;
> - }
> -
> - bracket = strchr(str, ']');
> - if (!bracket) {
> - CERROR("missing right bracket for cpt %d, %s\n",
> - cpt, str);
> - goto failed;
> - }
> -
> - if (cfs_expr_list_parse(str, (bracket - str) + 1,
> - 0, high, &el)) {
> - CERROR("Can't parse number range: %s\n", str);
> - goto failed;
> - }
> -
> - list_for_each_entry(range, &el->el_exprs, re_link) {
> - for (i = range->re_lo; i <= range->re_hi; i++) {
> - if ((i - range->re_lo) % range->re_stride)
> - continue;
> -
> - rc = node ? cfs_cpt_set_node(cptab, cpt, i) :
> - cfs_cpt_set_cpu(cptab, cpt, i);
> - if (!rc) {
> - cfs_expr_list_free(el);
> - goto failed;
> - }
> - }
> - }
> -
> - cfs_expr_list_free(el);
> -
> - if (!cfs_cpt_online(cptab, cpt)) {
> - CERROR("No online CPU is found on partition %d\n", cpt);
> - goto failed;
> - }
> -
> - str = strim(bracket + 1);
> - }
> -
> - return cptab;
> -
> - failed:
> - cfs_cpt_table_free(cptab);
> - return NULL;
> -}
> -
> -#ifdef CONFIG_HOTPLUG_CPU
> -static enum cpuhp_state lustre_cpu_online;
> -
> -static void cfs_cpu_incr_cpt_version(void)
> -{
> - spin_lock(&cpt_data.cpt_lock);
> - cpt_data.cpt_version++;
> - spin_unlock(&cpt_data.cpt_lock);
> -}
> -
> -static int cfs_cpu_online(unsigned int cpu)
> -{
> - cfs_cpu_incr_cpt_version();
> - return 0;
> -}
> -
> -static int cfs_cpu_dead(unsigned int cpu)
> -{
> - bool warn;
> -
> - cfs_cpu_incr_cpt_version();
> -
> - mutex_lock(&cpt_data.cpt_mutex);
> - /* if all HTs in a core are offline, it may break affinity */
> - cpumask_copy(cpt_data.cpt_cpumask, topology_sibling_cpumask(cpu));
> - warn = cpumask_any_and(cpt_data.cpt_cpumask,
> - cpu_online_mask) >= nr_cpu_ids;
> - mutex_unlock(&cpt_data.cpt_mutex);
> - CDEBUG(warn ? D_WARNING : D_INFO,
> - "Lustre: can't support CPU plug-out well now, performance and stability could be impacted [CPU %u]\n",
> - cpu);
> - return 0;
> -}
> -#endif
> -
> -void
> -cfs_cpu_fini(void)
> -{
> - if (cfs_cpt_table)
> - cfs_cpt_table_free(cfs_cpt_table);
> -
> -#ifdef CONFIG_HOTPLUG_CPU
> - if (lustre_cpu_online > 0)
> - cpuhp_remove_state_nocalls(lustre_cpu_online);
> - cpuhp_remove_state_nocalls(CPUHP_LUSTRE_CFS_DEAD);
> -#endif
> - free_cpumask_var(cpt_data.cpt_cpumask);
> -}
> -
> -int
> -cfs_cpu_init(void)
> -{
> - int ret = 0;
> -
> - LASSERT(!cfs_cpt_table);
> -
> - memset(&cpt_data, 0, sizeof(cpt_data));
> -
> - if (!zalloc_cpumask_var(&cpt_data.cpt_cpumask, GFP_NOFS)) {
> - CERROR("Failed to allocate scratch buffer\n");
> - return -1;
> - }
> -
> - spin_lock_init(&cpt_data.cpt_lock);
> - mutex_init(&cpt_data.cpt_mutex);
> -
> -#ifdef CONFIG_HOTPLUG_CPU
> - ret = cpuhp_setup_state_nocalls(CPUHP_LUSTRE_CFS_DEAD,
> - "staging/lustre/cfe:dead", NULL,
> - cfs_cpu_dead);
> - if (ret < 0)
> - goto failed;
> - ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> - "staging/lustre/cfe:online",
> - cfs_cpu_online, NULL);
> - if (ret < 0)
> - goto failed;
> - lustre_cpu_online = ret;
> -#endif
> - ret = -EINVAL;
> -
> - if (*cpu_pattern) {
> - char *cpu_pattern_dup = kstrdup(cpu_pattern, GFP_KERNEL);
> -
> - if (!cpu_pattern_dup) {
> - CERROR("Failed to duplicate cpu_pattern\n");
> - goto failed;
> - }
> -
> - cfs_cpt_table = cfs_cpt_table_create_pattern(cpu_pattern_dup);
> - kfree(cpu_pattern_dup);
> - if (!cfs_cpt_table) {
> - CERROR("Failed to create cptab from pattern %s\n",
> - cpu_pattern);
> - goto failed;
> - }
> -
> - } else {
> - cfs_cpt_table = cfs_cpt_table_create(cpu_npartitions);
> - if (!cfs_cpt_table) {
> - CERROR("Failed to create ptable with npartitions %d\n",
> - cpu_npartitions);
> - goto failed;
> - }
> - }
> -
> - spin_lock(&cpt_data.cpt_lock);
> - if (cfs_cpt_table->ctb_version != cpt_data.cpt_version) {
> - spin_unlock(&cpt_data.cpt_lock);
> - CERROR("CPU hotplug/unplug during setup\n");
> - goto failed;
> - }
> - spin_unlock(&cpt_data.cpt_lock);
> -
> - LCONSOLE(0, "HW nodes: %d, HW CPU cores: %d, npartitions: %d\n",
> - num_online_nodes(), num_online_cpus(),
> - cfs_cpt_number(cfs_cpt_table));
> - return 0;
> -
> - failed:
> - cfs_cpu_fini();
> - return ret;
> -}
> -
> -#endif
>
>
>

2018-04-16 15:36:30

by Patrick Farrell

[permalink] [raw]
Subject: Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

James,

If I understand correctly, you're saying you want to be able to build without debug support...? I'm not convinced that building a client without debug support is interesting or useful. In fact, I think it would be harmful, and we shouldn't open up the possibility - this is switchable debug with very low overhead when not actually "on". It would be really awful to get a problem on a running system and discover there's no debug support - that you can't even enable debug without a reinstall.

If I've understood you correctly, then I would want to see proof of a significant performance cost when debug is built but *off* before agreeing to even exposing this option. (I know it's a choice they'd have to make, but if it's not really useful with a side order of potentially harmful, we shouldn't even give people the choice.)

- Patrick

On 4/15/18, 10:49 PM, "lustre-devel on behalf of James Simmons" <[email protected] on behalf of [email protected]> wrote:


> CDEBUG_STACK() and CHECK_STACK() are macros to help with
> debugging, so move them from
> drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> to
> drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>
> This seems a more fitting location, and is a step towards
> removing linux/libcfs.h and simplifying the include file structure.

Nak. Currently the lustre client always enables debugging but that
shouldn't be the case. What we do need is the able to turn off the
crazy debugging stuff. In the development branch of lustre it is
done with CDEBUG_ENABLED. We need something like that in Kconfig
much like we have CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK. Since we like
to be able to turn that off this should be moved to just after
LIBCFS_DEBUG_MSG_DATA_DECL. Then from CHECK_STACK down to CWARN()
it can be build out. When CDEBUG_ENABLED is disabled CDEBUG_LIMIT
would be empty.

> Signed-off-by: NeilBrown <[email protected]>
> ---
> .../lustre/include/linux/libcfs/libcfs_debug.h | 32 ++++++++++++++++++++
> .../lustre/include/linux/libcfs/linux/libcfs.h | 31 -------------------
> 2 files changed, 32 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> index 9290a19429e7..0dc7b91efe7c 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> @@ -62,6 +62,38 @@ int libcfs_debug_str2mask(int *mask, const char *str, int is_subsys);
> extern unsigned int libcfs_catastrophe;
> extern unsigned int libcfs_panic_on_lbug;
>
> +/* Enable debug-checks on stack size - except on x86_64 */
> +#if !defined(__x86_64__)
> +# ifdef __ia64__
> +# define CDEBUG_STACK() (THREAD_SIZE - \
> + ((unsigned long)__builtin_dwarf_cfa() & \
> + (THREAD_SIZE - 1)))
> +# else
> +# define CDEBUG_STACK() (THREAD_SIZE - \
> + ((unsigned long)__builtin_frame_address(0) & \
> + (THREAD_SIZE - 1)))
> +# endif /* __ia64__ */
> +
> +#define __CHECK_STACK(msgdata, mask, cdls) \
> +do { \
> + if (unlikely(CDEBUG_STACK() > libcfs_stack)) { \
> + LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL); \
> + libcfs_stack = CDEBUG_STACK(); \
> + libcfs_debug_msg(msgdata, \
> + "maximum lustre stack %lu\n", \
> + CDEBUG_STACK()); \
> + (msgdata)->msg_mask = mask; \
> + (msgdata)->msg_cdls = cdls; \
> + dump_stack(); \
> + /*panic("LBUG");*/ \
> + } \
> +} while (0)
> +#define CFS_CHECK_STACK(msgdata, mask, cdls) __CHECK_STACK(msgdata, mask, cdls)
> +#else /* __x86_64__ */
> +#define CFS_CHECK_STACK(msgdata, mask, cdls) do {} while (0)
> +#define CDEBUG_STACK() (0L)
> +#endif /* __x86_64__ */
> +
> #ifndef DEBUG_SUBSYSTEM
> # define DEBUG_SUBSYSTEM S_UNDEFINED
> #endif
> diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> index 07d3cb2217d1..83aec9c7698f 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> @@ -80,35 +80,4 @@
> #include <stdarg.h>
> #include "linux-cpu.h"
>
> -#if !defined(__x86_64__)
> -# ifdef __ia64__
> -# define CDEBUG_STACK() (THREAD_SIZE - \
> - ((unsigned long)__builtin_dwarf_cfa() & \
> - (THREAD_SIZE - 1)))
> -# else
> -# define CDEBUG_STACK() (THREAD_SIZE - \
> - ((unsigned long)__builtin_frame_address(0) & \
> - (THREAD_SIZE - 1)))
> -# endif /* __ia64__ */
> -
> -#define __CHECK_STACK(msgdata, mask, cdls) \
> -do { \
> - if (unlikely(CDEBUG_STACK() > libcfs_stack)) { \
> - LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL); \
> - libcfs_stack = CDEBUG_STACK(); \
> - libcfs_debug_msg(msgdata, \
> - "maximum lustre stack %lu\n", \
> - CDEBUG_STACK()); \
> - (msgdata)->msg_mask = mask; \
> - (msgdata)->msg_cdls = cdls; \
> - dump_stack(); \
> - /*panic("LBUG");*/ \
> - } \
> -} while (0)
> -#define CFS_CHECK_STACK(msgdata, mask, cdls) __CHECK_STACK(msgdata, mask, cdls)
> -#else /* __x86_64__ */
> -#define CFS_CHECK_STACK(msgdata, mask, cdls) do {} while (0)
> -#define CDEBUG_STACK() (0L)
> -#endif /* __x86_64__ */
> -
> #endif /* _LINUX_LIBCFS_H */
>
>
>
_______________________________________________
lustre-devel mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org


2018-04-16 22:44:15

by James Simmons

[permalink] [raw]
Subject: Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h


> James,
>
> If I understand correctly, you're saying you want to be able to build without debug support...? I'm not convinced that building a client without debug support is interesting or useful. In fact, I think it would be harmful, and we shouldn't open up the possibility - this is switchable debug with very low overhead when not actually "on". It would be really awful to get a problem on a running system and discover there's no debug support - that you can't even enable debug without a reinstall.
>
> If I've understood you correctly, then I would want to see proof of a significant performance cost when debug is built but *off* before agreeing to even exposing this option. (I know it's a choice they'd have to make, but if it's not really useful with a side order of potentially harmful, we shouldn't even give people the choice.)

I'm not saying add the option today but this is more for the long game.
While the Intel lustre developers deeply love lustre's debugging
infrastructure I see a future where something better will come along to
replace it. When that day comes we will have a period where both
debugging infrastructurs will exist and some deployers of lustre will
want to turn off the old debugging infrastructure and just use the new.
That is what I have in mind. A switch to flip between options.

> - Patrick
>
> On 4/15/18, 10:49 PM, "lustre-devel on behalf of James Simmons" <[email protected] on behalf of [email protected]> wrote:
>
>
> > CDEBUG_STACK() and CHECK_STACK() are macros to help with
> > debugging, so move them from
> > drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> > to
> > drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> >
> > This seems a more fitting location, and is a step towards
> > removing linux/libcfs.h and simplifying the include file structure.
>
> Nak. Currently the lustre client always enables debugging but that
> shouldn't be the case. What we do need is the able to turn off the
> crazy debugging stuff. In the development branch of lustre it is
> done with CDEBUG_ENABLED. We need something like that in Kconfig
> much like we have CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK. Since we like
> to be able to turn that off this should be moved to just after
> LIBCFS_DEBUG_MSG_DATA_DECL. Then from CHECK_STACK down to CWARN()
> it can be build out. When CDEBUG_ENABLED is disabled CDEBUG_LIMIT
> would be empty.
>
> > Signed-off-by: NeilBrown <[email protected]>
> > ---
> > .../lustre/include/linux/libcfs/libcfs_debug.h | 32 ++++++++++++++++++++
> > .../lustre/include/linux/libcfs/linux/libcfs.h | 31 -------------------
> > 2 files changed, 32 insertions(+), 31 deletions(-)
> >
> > diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> > index 9290a19429e7..0dc7b91efe7c 100644
> > --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> > +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> > @@ -62,6 +62,38 @@ int libcfs_debug_str2mask(int *mask, const char *str, int is_subsys);
> > extern unsigned int libcfs_catastrophe;
> > extern unsigned int libcfs_panic_on_lbug;
> >
> > +/* Enable debug-checks on stack size - except on x86_64 */
> > +#if !defined(__x86_64__)
> > +# ifdef __ia64__
> > +# define CDEBUG_STACK() (THREAD_SIZE - \
> > + ((unsigned long)__builtin_dwarf_cfa() & \
> > + (THREAD_SIZE - 1)))
> > +# else
> > +# define CDEBUG_STACK() (THREAD_SIZE - \
> > + ((unsigned long)__builtin_frame_address(0) & \
> > + (THREAD_SIZE - 1)))
> > +# endif /* __ia64__ */
> > +
> > +#define __CHECK_STACK(msgdata, mask, cdls) \
> > +do { \
> > + if (unlikely(CDEBUG_STACK() > libcfs_stack)) { \
> > + LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL); \
> > + libcfs_stack = CDEBUG_STACK(); \
> > + libcfs_debug_msg(msgdata, \
> > + "maximum lustre stack %lu\n", \
> > + CDEBUG_STACK()); \
> > + (msgdata)->msg_mask = mask; \
> > + (msgdata)->msg_cdls = cdls; \
> > + dump_stack(); \
> > + /*panic("LBUG");*/ \
> > + } \
> > +} while (0)
> > +#define CFS_CHECK_STACK(msgdata, mask, cdls) __CHECK_STACK(msgdata, mask, cdls)
> > +#else /* __x86_64__ */
> > +#define CFS_CHECK_STACK(msgdata, mask, cdls) do {} while (0)
> > +#define CDEBUG_STACK() (0L)
> > +#endif /* __x86_64__ */
> > +
> > #ifndef DEBUG_SUBSYSTEM
> > # define DEBUG_SUBSYSTEM S_UNDEFINED
> > #endif
> > diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> > index 07d3cb2217d1..83aec9c7698f 100644
> > --- a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> > +++ b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> > @@ -80,35 +80,4 @@
> > #include <stdarg.h>
> > #include "linux-cpu.h"
> >
> > -#if !defined(__x86_64__)
> > -# ifdef __ia64__
> > -# define CDEBUG_STACK() (THREAD_SIZE - \
> > - ((unsigned long)__builtin_dwarf_cfa() & \
> > - (THREAD_SIZE - 1)))
> > -# else
> > -# define CDEBUG_STACK() (THREAD_SIZE - \
> > - ((unsigned long)__builtin_frame_address(0) & \
> > - (THREAD_SIZE - 1)))
> > -# endif /* __ia64__ */
> > -
> > -#define __CHECK_STACK(msgdata, mask, cdls) \
> > -do { \
> > - if (unlikely(CDEBUG_STACK() > libcfs_stack)) { \
> > - LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL); \
> > - libcfs_stack = CDEBUG_STACK(); \
> > - libcfs_debug_msg(msgdata, \
> > - "maximum lustre stack %lu\n", \
> > - CDEBUG_STACK()); \
> > - (msgdata)->msg_mask = mask; \
> > - (msgdata)->msg_cdls = cdls; \
> > - dump_stack(); \
> > - /*panic("LBUG");*/ \
> > - } \
> > -} while (0)
> > -#define CFS_CHECK_STACK(msgdata, mask, cdls) __CHECK_STACK(msgdata, mask, cdls)
> > -#else /* __x86_64__ */
> > -#define CFS_CHECK_STACK(msgdata, mask, cdls) do {} while (0)
> > -#define CDEBUG_STACK() (0L)
> > -#endif /* __x86_64__ */
> > -
> > #endif /* _LINUX_LIBCFS_H */
> >
> >
> >
> _______________________________________________
> lustre-devel mailing list
> [email protected]
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
>
>
>

2018-04-16 22:59:41

by Doug Oucharek

[permalink] [raw]
Subject: Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h


> On Apr 16, 2018, at 3:42 PM, James Simmons <[email protected]> wrote:
>
>
>> James,
>>
>> If I understand correctly, you're saying you want to be able to build without debug support...? I'm not convinced that building a client without debug support is interesting or useful. In fact, I think it would be harmful, and we shouldn't open up the possibility - this is switchable debug with very low overhead when not actually "on". It would be really awful to get a problem on a running system and discover there's no debug support - that you can't even enable debug without a reinstall.
>>
>> If I've understood you correctly, then I would want to see proof of a significant performance cost when debug is built but *off* before agreeing to even exposing this option. (I know it's a choice they'd have to make, but if it's not really useful with a side order of potentially harmful, we shouldn't even give people the choice.)
>
> I'm not saying add the option today but this is more for the long game.
> While the Intel lustre developers deeply love lustre's debugging
> infrastructure I see a future where something better will come along to
> replace it. When that day comes we will have a period where both
> debugging infrastructurs will exist and some deployers of lustre will
> want to turn off the old debugging infrastructure and just use the new.
> That is what I have in mind. A switch to flip between options.

Yes please!! An option for users which says “no, you do not have the right to panic my system via LASSERT whenever you like” would be a blessing.

Doug

>
>> - Patrick
>>
>> On 4/15/18, 10:49 PM, "lustre-devel on behalf of James Simmons" <[email protected] on behalf of [email protected]> wrote:
>>
>>
>>> CDEBUG_STACK() and CHECK_STACK() are macros to help with
>>> debugging, so move them from
>>> drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
>>> to
>>> drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>>>
>>> This seems a more fitting location, and is a step towards
>>> removing linux/libcfs.h and simplifying the include file structure.
>>
>> Nak. Currently the lustre client always enables debugging but that
>> shouldn't be the case. What we do need is the able to turn off the
>> crazy debugging stuff. In the development branch of lustre it is
>> done with CDEBUG_ENABLED. We need something like that in Kconfig
>> much like we have CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK. Since we like
>> to be able to turn that off this should be moved to just after
>> LIBCFS_DEBUG_MSG_DATA_DECL. Then from CHECK_STACK down to CWARN()
>> it can be build out. When CDEBUG_ENABLED is disabled CDEBUG_LIMIT
>> would be empty.
>>
>>> Signed-off-by: NeilBrown <[email protected]>
>>> ---
>>> .../lustre/include/linux/libcfs/libcfs_debug.h | 32 ++++++++++++++++++++
>>> .../lustre/include/linux/libcfs/linux/libcfs.h | 31 -------------------
>>> 2 files changed, 32 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>>> index 9290a19429e7..0dc7b91efe7c 100644
>>> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>>> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>>> @@ -62,6 +62,38 @@ int libcfs_debug_str2mask(int *mask, const char *str, int is_subsys);
>>> extern unsigned int libcfs_catastrophe;
>>> extern unsigned int libcfs_panic_on_lbug;
>>>
>>> +/* Enable debug-checks on stack size - except on x86_64 */
>>> +#if !defined(__x86_64__)
>>> +# ifdef __ia64__
>>> +# define CDEBUG_STACK() (THREAD_SIZE - \
>>> + ((unsigned long)__builtin_dwarf_cfa() & \
>>> + (THREAD_SIZE - 1)))
>>> +# else
>>> +# define CDEBUG_STACK() (THREAD_SIZE - \
>>> + ((unsigned long)__builtin_frame_address(0) & \
>>> + (THREAD_SIZE - 1)))
>>> +# endif /* __ia64__ */
>>> +
>>> +#define __CHECK_STACK(msgdata, mask, cdls) \
>>> +do { \
>>> + if (unlikely(CDEBUG_STACK() > libcfs_stack)) { \
>>> + LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL); \
>>> + libcfs_stack = CDEBUG_STACK(); \
>>> + libcfs_debug_msg(msgdata, \
>>> + "maximum lustre stack %lu\n", \
>>> + CDEBUG_STACK()); \
>>> + (msgdata)->msg_mask = mask; \
>>> + (msgdata)->msg_cdls = cdls; \
>>> + dump_stack(); \
>>> + /*panic("LBUG");*/ \
>>> + } \
>>> +} while (0)
>>> +#define CFS_CHECK_STACK(msgdata, mask, cdls) __CHECK_STACK(msgdata, mask, cdls)
>>> +#else /* __x86_64__ */
>>> +#define CFS_CHECK_STACK(msgdata, mask, cdls) do {} while (0)
>>> +#define CDEBUG_STACK() (0L)
>>> +#endif /* __x86_64__ */
>>> +
>>> #ifndef DEBUG_SUBSYSTEM
>>> # define DEBUG_SUBSYSTEM S_UNDEFINED
>>> #endif
>>> diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
>>> index 07d3cb2217d1..83aec9c7698f 100644
>>> --- a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
>>> +++ b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
>>> @@ -80,35 +80,4 @@
>>> #include <stdarg.h>
>>> #include "linux-cpu.h"
>>>
>>> -#if !defined(__x86_64__)
>>> -# ifdef __ia64__
>>> -# define CDEBUG_STACK() (THREAD_SIZE - \
>>> - ((unsigned long)__builtin_dwarf_cfa() & \
>>> - (THREAD_SIZE - 1)))
>>> -# else
>>> -# define CDEBUG_STACK() (THREAD_SIZE - \
>>> - ((unsigned long)__builtin_frame_address(0) & \
>>> - (THREAD_SIZE - 1)))
>>> -# endif /* __ia64__ */
>>> -
>>> -#define __CHECK_STACK(msgdata, mask, cdls) \
>>> -do { \
>>> - if (unlikely(CDEBUG_STACK() > libcfs_stack)) { \
>>> - LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL); \
>>> - libcfs_stack = CDEBUG_STACK(); \
>>> - libcfs_debug_msg(msgdata, \
>>> - "maximum lustre stack %lu\n", \
>>> - CDEBUG_STACK()); \
>>> - (msgdata)->msg_mask = mask; \
>>> - (msgdata)->msg_cdls = cdls; \
>>> - dump_stack(); \
>>> - /*panic("LBUG");*/ \
>>> - } \
>>> -} while (0)
>>> -#define CFS_CHECK_STACK(msgdata, mask, cdls) __CHECK_STACK(msgdata, mask, cdls)
>>> -#else /* __x86_64__ */
>>> -#define CFS_CHECK_STACK(msgdata, mask, cdls) do {} while (0)
>>> -#define CDEBUG_STACK() (0L)
>>> -#endif /* __x86_64__ */
>>> -
>>> #endif /* _LINUX_LIBCFS_H */
>>>
>>>
>>>
>> _______________________________________________
>> lustre-devel mailing list
>> [email protected]
>> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
>>
>>
> _______________________________________________
> lustre-devel mailing list
> [email protected]
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

2018-04-17 05:28:13

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

On Apr 16, 2018, at 16:48, Doug Oucharek <[email protected]> wrote:
>
>>
>> On Apr 16, 2018, at 3:42 PM, James Simmons <[email protected]> wrote:
>>
>>
>>> James,
>>>
>>> If I understand correctly, you're saying you want to be able to build without debug support...? I'm not convinced that building a client without debug support is interesting or useful. In fact, I think it would be harmful, and we shouldn't open up the possibility - this is switchable debug with very low overhead when not actually "on". It would be really awful to get a problem on a running system and discover there's no debug support - that you can't even enable debug without a reinstall.
>>>
>>> If I've understood you correctly, then I would want to see proof of a significant performance cost when debug is built but *off* before agreeing to even exposing this option. (I know it's a choice they'd have to make, but if it's not really useful with a side order of potentially harmful, we shouldn't even give people the choice.)
>>
>> I'm not saying add the option today but this is more for the long game.
>> While the Intel lustre developers deeply love lustre's debugging
>> infrastructure I see a future where something better will come along to
>> replace it. When that day comes we will have a period where both
>> debugging infrastructurs will exist and some deployers of lustre will
>> want to turn off the old debugging infrastructure and just use the new.
>> That is what I have in mind. A switch to flip between options.
>
> Yes please!! An option for users which says “no, you do not have the right to panic my system via LASSERT whenever you like” would be a blessing.

Note that LASSERT() itself does not panic the system, unless you configure it
with panic_on_lbug=1. Otherwise, it just blocks that thread (though this can
also have an impact on other threads if you are holding locks at that time).

That said, the LASSERT() should not be hit unless there is bad code, data
corruption, or the LASSERT() itself is incorrect (essentially bad code also).

So "whenever you like" is "whenever the system is about to corrupt your data",
and people are not very forgiving if a filesystem corrupts their data...

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation







2018-04-18 02:19:25

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

On Mon, Apr 16 2018, James Simmons wrote:

>> CDEBUG_STACK() and CHECK_STACK() are macros to help with
>> debugging, so move them from
>> drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
>> to
>> drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>>
>> This seems a more fitting location, and is a step towards
>> removing linux/libcfs.h and simplifying the include file structure.
>
> Nak. Currently the lustre client always enables debugging but that
> shouldn't be the case. What we do need is the able to turn off the
> crazy debugging stuff. In the development branch of lustre it is
> done with CDEBUG_ENABLED. We need something like that in Kconfig
> much like we have CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK. Since we like
> to be able to turn that off this should be moved to just after
> LIBCFS_DEBUG_MSG_DATA_DECL. Then from CHECK_STACK down to CWARN()
> it can be build out. When CDEBUG_ENABLED is disabled CDEBUG_LIMIT
> would be empty.

So why, exactly, is this an argument to justify a NAK?
Are you just saying that the code I moved into libcfs_debug.h should be
moved to somewhere a bit later in the file?
That can easily be done when it is needed. It isn't needed now so why
insist on it?

Each patch should do one thing and make clear forward progress. This
patch gets rid of an unnecessary file and brings related code together.
I think that qualifies.

Thanks,
NeilBrown


>
>> Signed-off-by: NeilBrown <[email protected]>
>> ---
>> .../lustre/include/linux/libcfs/libcfs_debug.h | 32 ++++++++++++++++++++
>> .../lustre/include/linux/libcfs/linux/libcfs.h | 31 -------------------
>> 2 files changed, 32 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>> index 9290a19429e7..0dc7b91efe7c 100644
>> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>> @@ -62,6 +62,38 @@ int libcfs_debug_str2mask(int *mask, const char *str, int is_subsys);
>> extern unsigned int libcfs_catastrophe;
>> extern unsigned int libcfs_panic_on_lbug;
>>
>> +/* Enable debug-checks on stack size - except on x86_64 */
>> +#if !defined(__x86_64__)
>> +# ifdef __ia64__
>> +# define CDEBUG_STACK() (THREAD_SIZE - \
>> + ((unsigned long)__builtin_dwarf_cfa() & \
>> + (THREAD_SIZE - 1)))
>> +# else
>> +# define CDEBUG_STACK() (THREAD_SIZE - \
>> + ((unsigned long)__builtin_frame_address(0) & \
>> + (THREAD_SIZE - 1)))
>> +# endif /* __ia64__ */
>> +
>> +#define __CHECK_STACK(msgdata, mask, cdls) \
>> +do { \
>> + if (unlikely(CDEBUG_STACK() > libcfs_stack)) { \
>> + LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL); \
>> + libcfs_stack = CDEBUG_STACK(); \
>> + libcfs_debug_msg(msgdata, \
>> + "maximum lustre stack %lu\n", \
>> + CDEBUG_STACK()); \
>> + (msgdata)->msg_mask = mask; \
>> + (msgdata)->msg_cdls = cdls; \
>> + dump_stack(); \
>> + /*panic("LBUG");*/ \
>> + } \
>> +} while (0)
>> +#define CFS_CHECK_STACK(msgdata, mask, cdls) __CHECK_STACK(msgdata, mask, cdls)
>> +#else /* __x86_64__ */
>> +#define CFS_CHECK_STACK(msgdata, mask, cdls) do {} while (0)
>> +#define CDEBUG_STACK() (0L)
>> +#endif /* __x86_64__ */
>> +
>> #ifndef DEBUG_SUBSYSTEM
>> # define DEBUG_SUBSYSTEM S_UNDEFINED
>> #endif
>> diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
>> index 07d3cb2217d1..83aec9c7698f 100644
>> --- a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
>> +++ b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
>> @@ -80,35 +80,4 @@
>> #include <stdarg.h>
>> #include "linux-cpu.h"
>>
>> -#if !defined(__x86_64__)
>> -# ifdef __ia64__
>> -# define CDEBUG_STACK() (THREAD_SIZE - \
>> - ((unsigned long)__builtin_dwarf_cfa() & \
>> - (THREAD_SIZE - 1)))
>> -# else
>> -# define CDEBUG_STACK() (THREAD_SIZE - \
>> - ((unsigned long)__builtin_frame_address(0) & \
>> - (THREAD_SIZE - 1)))
>> -# endif /* __ia64__ */
>> -
>> -#define __CHECK_STACK(msgdata, mask, cdls) \
>> -do { \
>> - if (unlikely(CDEBUG_STACK() > libcfs_stack)) { \
>> - LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL); \
>> - libcfs_stack = CDEBUG_STACK(); \
>> - libcfs_debug_msg(msgdata, \
>> - "maximum lustre stack %lu\n", \
>> - CDEBUG_STACK()); \
>> - (msgdata)->msg_mask = mask; \
>> - (msgdata)->msg_cdls = cdls; \
>> - dump_stack(); \
>> - /*panic("LBUG");*/ \
>> - } \
>> -} while (0)
>> -#define CFS_CHECK_STACK(msgdata, mask, cdls) __CHECK_STACK(msgdata, mask, cdls)
>> -#else /* __x86_64__ */
>> -#define CFS_CHECK_STACK(msgdata, mask, cdls) do {} while (0)
>> -#define CDEBUG_STACK() (0L)
>> -#endif /* __x86_64__ */
>> -
>> #endif /* _LINUX_LIBCFS_H */
>>
>>
>>


Attachments:
signature.asc (847.00 B)

2018-04-18 02:30:38

by NeilBrown

[permalink] [raw]
Subject: Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

On Mon, Apr 16 2018, James Simmons wrote:

>> James,
>>
>> If I understand correctly, you're saying you want to be able to build without debug support...? I'm not convinced that building a client without debug support is interesting or useful. In fact, I think it would be harmful, and we shouldn't open up the possibility - this is switchable debug with very low overhead when not actually "on". It would be really awful to get a problem on a running system and discover there's no debug support - that you can't even enable debug without a reinstall.
>>
>> If I've understood you correctly, then I would want to see proof of a significant performance cost when debug is built but *off* before agreeing to even exposing this option. (I know it's a choice they'd have to make, but if it's not really useful with a side order of potentially harmful, we shouldn't even give people the choice.)
>
> I'm not saying add the option today but this is more for the long game.
> While the Intel lustre developers deeply love lustre's debugging
> infrastructure I see a future where something better will come along to
> replace it. When that day comes we will have a period where both
> debugging infrastructurs will exist and some deployers of lustre will
> want to turn off the old debugging infrastructure and just use the new.
> That is what I have in mind. A switch to flip between options.

My position on this is that lustre's debugging infrastructure (in
mainline) *will* be changed to use something that the rest of the kernel
can and does use. Quite possibly that "something" will first be
enhanced so that it is as powerful and useful as what lustre has.
I suspect this will partly be pr_debug(), partly WARN_ON(), partly trace
points. But I'm not very familiar with tracepoints or with lustre
debugging yet so this is far from certain.
pr_debug() and tracepoints can be compiled out, but only kernel-wide.
There is no reason for lustre to be special there. WARN_ON() and
BUG_ON() cannot be compiled out, but BUG_ON() must only be used when
proceeding is unarguably worse than crashing the machine. In recent
years a lot of BUG_ON()s have been removed or changed to warnings. We
need to maintain that attitude.

I don't like the idea of have two parallel debuging infrastructures that
you can choose between - it encourages confusion and brings no benefits.

Thanks,
NeilBrown


Attachments:
signature.asc (847.00 B)

2018-04-18 02:33:36

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 2/6] staging: lustre: remove libcfs/linux/libcfs.h

On Mon, Apr 16 2018, James Simmons wrote:

>> This include file is only included in one place,
>> and only contains a list of other include directives.
>> So just move all those to the place where this file
>> is included, and discard the file.
>>
>> One include directive uses a local name ("linux-cpu.h"), so
>> that needs to be given a proper path.
>>
>> Probably many of these should be remove from here, and moved to
>> just the files that need them.
>
> Nak. Dumping all the extra headers from linux/libcfs.h to libcfs.h is
> the wrong approach. The one header, libcfs.h, to be the only header
> in all lustre files is the wrong approach. I have been looking to
> unroll that mess. I have patch that I need to polish you that I can
> submit.

I think we both have the same goal - maybe just different paths to get
there. If you have something nearly ready to submit, I'm happy to wait
for it, then proceed on top of it.

Thanks,
NeilBrown


Attachments:
signature.asc (847.00 B)

2018-04-18 02:36:03

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 3/6] staging: lustre: remove include/linux/libcfs/linux/linux-cpu.h

On Mon, Apr 16 2018, James Simmons wrote:

>> This include file contains definitions used when CONFIG_SMP
>> is in effect. Other includes contain corresponding definitions
>> for when it isn't.
>> This can be hard to follow, so move the definitions to the one place.
>>
>> As HAVE_LIBCFS_CPT is defined precisely when CONFIG_SMP, we discard
>> that macro and just use CONFIG_SMP when needed.
>
> Nak. The lustre SMP is broken and needed to badly be reworked. I have it
> ready and can push it. I was waiting to see if I had to rebase it once
> the rc1 stuff but since their is a push to get everything out their I will
> push it.
>

Great - thanks for posting those. I might wait until they land in
Greg's tree, then see if I there is anything else I want to add.

Thanks,
NeilBrown


Attachments:
signature.asc (847.00 B)

2018-04-23 13:05:30

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

On Wed, Apr 18, 2018 at 12:17:37PM +1000, NeilBrown wrote:
> On Mon, Apr 16 2018, James Simmons wrote:
>
> >> CDEBUG_STACK() and CHECK_STACK() are macros to help with
> >> debugging, so move them from
> >> drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> >> to
> >> drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> >>
> >> This seems a more fitting location, and is a step towards
> >> removing linux/libcfs.h and simplifying the include file structure.
> >
> > Nak. Currently the lustre client always enables debugging but that
> > shouldn't be the case. What we do need is the able to turn off the
> > crazy debugging stuff. In the development branch of lustre it is
> > done with CDEBUG_ENABLED. We need something like that in Kconfig
> > much like we have CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK. Since we like
> > to be able to turn that off this should be moved to just after
> > LIBCFS_DEBUG_MSG_DATA_DECL. Then from CHECK_STACK down to CWARN()
> > it can be build out. When CDEBUG_ENABLED is disabled CDEBUG_LIMIT
> > would be empty.
>
> So why, exactly, is this an argument to justify a NAK?
> Are you just saying that the code I moved into libcfs_debug.h should be
> moved to somewhere a bit later in the file?
> That can easily be done when it is needed. It isn't needed now so why
> insist on it?
>
> Each patch should do one thing and make clear forward progress. This
> patch gets rid of an unnecessary file and brings related code together.
> I think that qualifies.

I agree, this just deletes an unused file, it changes no functionality
at all. Now applied.

greg k-h

2018-04-23 13:05:45

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 2/6] staging: lustre: remove libcfs/linux/libcfs.h

On Wed, Apr 18, 2018 at 12:32:01PM +1000, NeilBrown wrote:
> On Mon, Apr 16 2018, James Simmons wrote:
>
> >> This include file is only included in one place,
> >> and only contains a list of other include directives.
> >> So just move all those to the place where this file
> >> is included, and discard the file.
> >>
> >> One include directive uses a local name ("linux-cpu.h"), so
> >> that needs to be given a proper path.
> >>
> >> Probably many of these should be remove from here, and moved to
> >> just the files that need them.
> >
> > Nak. Dumping all the extra headers from linux/libcfs.h to libcfs.h is
> > the wrong approach. The one header, libcfs.h, to be the only header
> > in all lustre files is the wrong approach. I have been looking to
> > unroll that mess. I have patch that I need to polish you that I can
> > submit.
>
> I think we both have the same goal - maybe just different paths to get
> there. If you have something nearly ready to submit, I'm happy to wait
> for it, then proceed on top of it.

I've taken this patch as it doesn't make anything worse than the total
mess we have now :)

thanks,

greg k-h

2018-04-23 13:13:54

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 5/6] staging: lustre: move misc-device registration closer to related code.

On Mon, Apr 16, 2018 at 10:42:37AM +1000, NeilBrown wrote:
> The ioctl handler for the misc device is in lnet/libcfs/module.c
> but is it registered in lnet/libcfs/linux/linux-module.c.
>
> Keeping related code together make maintenance easier, so move the
> code.
>
> Signed-off-by: NeilBrown <[email protected]>

This and patch 6/6 did not apply due to something else changing these
files before these were sent in.

Can you rebase and resend?

thanks,

greg k-h

2018-04-23 13:15:32

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 3/6] staging: lustre: remove include/linux/libcfs/linux/linux-cpu.h

On Mon, Apr 16, 2018 at 04:52:55AM +0100, James Simmons wrote:
>
> > This include file contains definitions used when CONFIG_SMP
> > is in effect. Other includes contain corresponding definitions
> > for when it isn't.
> > This can be hard to follow, so move the definitions to the one place.
> >
> > As HAVE_LIBCFS_CPT is defined precisely when CONFIG_SMP, we discard
> > that macro and just use CONFIG_SMP when needed.
>
> Nak. The lustre SMP is broken and needed to badly be reworked. I have it
> ready and can push it. I was waiting to see if I had to rebase it once
> the rc1 stuff but since their is a push to get everything out their I will
> push it.

A NAK on some future code that might show up someday is not how we work
at all, sorry. That has caused other open source projects to die.
First patch submitted "wins".

And these are good patches. Nothing wrong with them, they clean stuff
up, remove more lines than they added and that's the proper thing to do
here. So I've applied the first 4, the last two didn't apply due to
changes from you that were accepted. See, you benifit also from this :)

thanks,

greg k-h