Hi there-
I know of at least two platforms (ppc64 and ia64) which allow cpus to
be physically or logically added and removed from a running system.
These are distinct operations from onlining or offlining, which is
well supported already. Right now there is little support in the core
cpu "driver" for dynamic addition or removal. The patch series which
follows implements support for this in a way which will (hopefully)
reduce code duplication and enforce some uniformity across the
relevant architectures.
For starters, the current situation is that cpu sysdevs are registered
from architecture code at boot. Already we have inconsistencies
betweeen the arches -- ia64 registers only online cpus, ppc64
registers all "possible" cpus. I propose to move the initial cpu
sysdev registrations to the cpu "driver" itself (drivers/base/cpu.c),
and to register only "present" cpus at boot.
But that breaks all the arch code which explicitly registers cpu
sysdevs. For instance, ppc64 wants to hang all kinds of attributes
off of the cpu devices for performance counter stuff. So code such as
this needs to be converted to register a sysdev_driver with the cpu
device class, which will allow the ppc64 code to be notified when a
cpu is added or removed. In the patches that follow I include the
changes necessary for ppc64, as an example. (An arch sweep or
temporary compatibility hack can come later if I get positive
responses to this approach.)
Also, there is the matter of the base numa "node" driver. Currently
the cpu driver makes symlinks from nodes to their cpus. This seems
backwards to me, so I have changed the node driver to create or remove
the symlinks upon cpu addition or removal, respectively, also using
the sysdev_driver approach. I've also converted base/drivers/node.c
to doing the boot-time node registration itself, like the cpu code.
Finally, I've added two new interfaces which wrap all this up --
cpu_add() and cpu_remove(). These carry out the necessary update to
cpu_present_map and take care of the cpu device registration. These
are meant to be invoked from the platform-specific code which
discovers and removes processors.
This is the first real device model-related hacking I've done. I'm
hoping Greg or Patrick will tell me whether I'm on the right track or
abusing the APIs :)
These patches have been boot-tested on ppc64. I haven't gotten to
test the removal paths yet.
Nathan
Register cpu system devices in the core code instead of leaving it to
the architecture. At boot, allocate an array of num_possible_cpus()
cpu sysdevs, and register sysdevs for cpus which are marked present.
Also, leave to the node "driver" the creation of symlinks from node to
cpu devices.
Change register_cpu so that it no longer requires struct cpu* and
struct node * arguments, only a logical cpu number. Break the weird
cpu->no_control semantics (for now). Introduce unregister_cpu, which
removes the cpu entry from sysfs.
Signed-off-by: Nathan Lynch <[email protected]>
---
diff -puN drivers/base/cpu.c~dynamic-cpu-registration drivers/base/cpu.c
--- 2.6.10-rc1/drivers/base/cpu.c~dynamic-cpu-registration 2004-10-24 00:09:39.000000000 -0500
+++ 2.6.10-rc1-nathanl/drivers/base/cpu.c 2004-10-24 03:50:13.000000000 -0500
@@ -56,35 +56,60 @@ static inline void register_cpu_control(
}
#endif /* CONFIG_HOTPLUG_CPU */
+static struct cpu *cpu_devices;
+
/*
* register_cpu - Setup a driverfs device for a CPU.
- * @cpu - Callers can set the cpu->no_control field to 1, to indicate not to
- * generate a control file in sysfs for this CPU.
* @num - CPU number to use when creating the device.
*
* Initialize and register the CPU device.
*/
-int __init register_cpu(struct cpu *cpu, int num, struct node *root)
+int register_cpu(int num)
{
int error;
+ struct cpu *cpu = &cpu_devices[num];
+
+ memset(cpu, 0, sizeof(*cpu));
cpu->node_id = cpu_to_node(num);
cpu->sysdev.id = num;
cpu->sysdev.cls = &cpu_sysdev_class;
error = sysdev_register(&cpu->sysdev);
- if (!error && root)
- error = sysfs_create_link(&root->sysdev.kobj,
- &cpu->sysdev.kobj,
- kobject_name(&cpu->sysdev.kobj));
+
+ /* XXX FIXME: cpu->no_control is always zero...
+ * Maybe should introduce an arch-overridable "hotpluggable" map.
+ */
if (!error && !cpu->no_control)
register_cpu_control(cpu);
return error;
}
+void unregister_cpu(int num)
+{
+ struct cpu *cpu = &cpu_devices[num];
+ sysdev_remove_file(&cpu->sysdev, &attr_online);
+ sysdev_unregister(&cpu->sysdev);
+}
int __init cpu_dev_init(void)
{
- return sysdev_class_register(&cpu_sysdev_class);
+ unsigned int cpu;
+ int ret = -ENOMEM;
+ size_t size = sizeof(*cpu_devices) * num_possible_cpus();
+
+ cpu_devices = kmalloc(size, GFP_KERNEL);
+ if (!cpu_devices)
+ goto out;
+
+ sysdev_class_register(&cpu_sysdev_class);
+
+ for_each_present_cpu(cpu) {
+ ret = register_cpu(cpu);
+ if (ret)
+ goto out;
+ }
+out:
+ return ret;
}
diff -puN include/linux/cpu.h~dynamic-cpu-registration include/linux/cpu.h
--- 2.6.10-rc1/include/linux/cpu.h~dynamic-cpu-registration 2004-10-24 00:09:39.000000000 -0500
+++ 2.6.10-rc1-nathanl/include/linux/cpu.h 2004-10-24 03:52:43.000000000 -0500
@@ -31,7 +31,8 @@ struct cpu {
struct sys_device sysdev;
};
-extern int register_cpu(struct cpu *, int, struct node *);
+extern int register_cpu(int);
+extern void unregister_cpu(int);
struct notifier_block;
#ifdef CONFIG_SMP
_
These functions safely update cpu_present_map (i.e. with the
cpucontrol semaphore held) and register or unregister the cpu device
as needed. These are needed by systems which can add or remove cpus
from the system after boot (e.g. ppc64 and ia64), and are intended to
be called from the platform-specific code such as the ACPI or Open
Firmware layers.
Signed-off-by: Nathan Lynch <[email protected]>
---
diff -puN include/linux/cpu.h~introduce-cpu_add-and-cpu_remove include/linux/cpu.h
--- 2.6.10-rc1/include/linux/cpu.h~introduce-cpu_add-and-cpu_remove 2004-10-24 03:52:59.000000000 -0500
+++ 2.6.10-rc1-nathanl/include/linux/cpu.h 2004-10-24 03:52:59.000000000 -0500
@@ -67,6 +67,8 @@ extern struct semaphore cpucontrol;
register_cpu_notifier(&fn##_nb); \
}
int cpu_down(unsigned int cpu);
+unsigned int cpu_add(void);
+void cpu_remove(unsigned int);
#define cpu_is_offline(cpu) unlikely(!cpu_online(cpu))
#else
#define lock_cpu_hotplug() do { } while (0)
diff -puN kernel/cpu.c~introduce-cpu_add-and-cpu_remove kernel/cpu.c
--- 2.6.10-rc1/kernel/cpu.c~introduce-cpu_add-and-cpu_remove 2004-10-24 03:52:59.000000000 -0500
+++ 2.6.10-rc1-nathanl/kernel/cpu.c 2004-10-24 03:52:59.000000000 -0500
@@ -180,6 +180,49 @@ out:
unlock_cpu_hotplug();
return err;
}
+
+/*
+ * Add a cpu to the system. Return the number of the cpu added,
+ * or NR_CPUS if no more slots available.
+ */
+unsigned int cpu_add(void)
+{
+ unsigned int cpu = NR_CPUS;
+
+ lock_cpu_hotplug();
+
+ if (num_present_cpus() == num_possible_cpus())
+ goto out;
+
+ for_each_cpu(cpu)
+ if (!cpu_present(cpu))
+ break;
+
+ if (register_cpu(cpu)) {
+ cpu = NR_CPUS;
+ goto out;
+ }
+ cpu_set(cpu, cpu_present_map);
+out:
+ unlock_cpu_hotplug();
+ return cpu;
+}
+
+/*
+ * Remove a cpu from the system.
+ */
+void cpu_remove(unsigned int cpu)
+{
+ lock_cpu_hotplug();
+
+ BUG_ON(cpu_present(cpu));
+
+ unregister_cpu(cpu);
+
+ cpu_clear(cpu, cpu_present_map);
+
+ unlock_cpu_hotplug();
+}
#else
static inline int cpu_run_sbin_hotplug(unsigned int cpu, const char *action)
{
_
Register numa node system devices in the core code instead of leaving
it to the architecture. Add an array of MAX_NUMNODES node devices and
register those which are online at boot. Create sysfs symlinks to
each node's cpu devices.
Signed-off-by: Nathan Lynch <[email protected]>
---
diff -puN drivers/base/node.c~move-node-sysdev-registration-to-core drivers/base/node.c
--- 2.6.10-rc1/drivers/base/node.c~move-node-sysdev-registration-to-core 2004-10-24 03:52:53.000000000 -0500
+++ 2.6.10-rc1-nathanl/drivers/base/node.c 2004-10-24 03:52:53.000000000 -0500
@@ -9,7 +9,9 @@
#include <linux/node.h>
#include <linux/hugetlb.h>
#include <linux/cpumask.h>
+#include <linux/nodemask.h>
#include <linux/topology.h>
+#include <linux/cpu.h>
static struct sysdev_class node_class = {
set_kset_name("node"),
@@ -133,9 +135,65 @@ int __init register_node(struct node *no
return error;
}
+static struct node *node_devices;
+
+static int node_cpu_add_dev (struct sys_device * sys_dev)
+{
+ unsigned int cpu = sys_dev->id;
+ int ret, node = cpu_to_node(cpu);
+
+ ret = sysfs_create_link(&node_devices[node].sysdev.kobj,
+ &sys_dev->kobj,
+ kobject_name(&sys_dev->kobj));
+ return ret;
+}
+
+static int node_cpu_remove_dev (struct sys_device * sys_dev)
+{
+ unsigned int cpu = sys_dev->id;
+ int node = cpu_to_node(cpu);
+
+ sysfs_remove_link(&node_devices[node].sysdev.kobj,
+ kobject_name(&sys_dev->kobj));
+ return 0;
+
+}
+
+/* Methods for notifying us when cpus are added and removed */
+static struct sysdev_driver node_cpu_sysdev_driver = {
+ .add = node_cpu_add_dev,
+ .remove = node_cpu_remove_dev,
+};
int __init register_node_type(void)
{
- return sysdev_class_register(&node_class);
+ int i, ret = 0;
+ size_t size = sizeof(*node_devices) * num_online_nodes();
+
+ sysdev_class_register(&node_class);
+
+ node_devices = kmalloc(size, GFP_KERNEL);
+ if (!node_devices)
+ return -ENOMEM;
+
+ memset(node_devices, 0, size);
+
+ for_each_online_node(i) {
+ int pnum = parent_node(i);
+ struct node *parent = NULL;
+
+ if (pnum != i)
+ parent = &node_devices[pnum];
+
+ ret = register_node(&node_devices[i], i, parent);
+
+ if (ret)
+ goto out;
+ }
+
+ ret = sysdev_driver_register(&cpu_sysdev_class,
+ &node_cpu_sysdev_driver);
+out:
+ return ret;
}
postcore_initcall(register_node_type);
_
Convert arch/ppc64/kernel/sysfs.c to use a sysdev_driver for setting
up platform-specific cpu attributes.
Signed-off-by: Nathan Lynch <[email protected]>
---
diff -puN arch/ppc64/kernel/sysfs.c~ppc64-convert-to-sysdev_driver arch/ppc64/kernel/sysfs.c
--- 2.6.10-rc1/arch/ppc64/kernel/sysfs.c~ppc64-convert-to-sysdev_driver 2004-10-24 03:57:05.000000000 -0500
+++ 2.6.10-rc1-nathanl/arch/ppc64/kernel/sysfs.c 2004-10-24 03:57:05.000000000 -0500
@@ -6,7 +6,6 @@
#include <linux/init.h>
#include <linux/sched.h>
#include <linux/module.h>
-#include <linux/nodemask.h>
#include <asm/current.h>
#include <asm/processor.h>
@@ -261,7 +260,7 @@ static SYSDEV_ATTR(pmc7, 0600, show_pmc7
static SYSDEV_ATTR(pmc8, 0600, show_pmc8, store_pmc8);
static SYSDEV_ATTR(purr, 0600, show_purr, NULL);
-static void __init register_cpu_pmc(struct sys_device *s)
+static void register_cpu_pmc(struct sys_device *s)
{
sysdev_create_file(s, &attr_mmcr0);
sysdev_create_file(s, &attr_mmcr1);
@@ -285,37 +284,32 @@ static void __init register_cpu_pmc(stru
sysdev_create_file(s, &attr_purr);
}
-
-/* NUMA stuff */
-
-#ifdef CONFIG_NUMA
-static struct node node_devices[MAX_NUMNODES];
-
-static void register_nodes(void)
+#ifdef CONFIG_HOTPLUG_CPU
+static void unregister_cpu_pmc(struct sys_device *s)
{
- int i;
+ sysdev_remove_file(s, &attr_mmcr0);
+ sysdev_remove_file(s, &attr_mmcr1);
- for (i = 0; i < MAX_NUMNODES; i++) {
- if (node_online(i)) {
- int p_node = parent_node(i);
- struct node *parent = NULL;
+ if (cur_cpu_spec->cpu_features & CPU_FTR_MMCRA)
+ sysdev_remove_file(s, &attr_mmcra);
- if (p_node != i)
- parent = &node_devices[p_node];
+ sysdev_remove_file(s, &attr_pmc1);
+ sysdev_remove_file(s, &attr_pmc2);
+ sysdev_remove_file(s, &attr_pmc3);
+ sysdev_remove_file(s, &attr_pmc4);
+ sysdev_remove_file(s, &attr_pmc5);
+ sysdev_remove_file(s, &attr_pmc6);
- register_node(&node_devices[i], i, parent);
- }
+ if (cur_cpu_spec->cpu_features & CPU_FTR_PMC8) {
+ sysdev_remove_file(s, &attr_pmc7);
+ sysdev_remove_file(s, &attr_pmc8);
}
-}
-#else
-static void register_nodes(void)
-{
- return;
-}
-#endif
+ if (cur_cpu_spec->cpu_features & CPU_FTR_SMT)
+ sysdev_remove_file(s, &attr_purr);
+}
+#endif /* CONFIG_HOTPLUG_CPU */
-/* Only valid if CPU is online. */
static ssize_t show_physical_id(struct sys_device *dev, char *buf)
{
struct cpu *cpu = container_of(dev, struct cpu, sysdev);
@@ -324,44 +318,44 @@ static ssize_t show_physical_id(struct s
}
static SYSDEV_ATTR(physical_id, 0444, show_physical_id, NULL);
-
-static DEFINE_PER_CPU(struct cpu, cpu_devices);
-
-static int __init topology_init(void)
+static int ppc64_cpu_add_dev(struct sys_device *sys_dev)
{
- int cpu;
- struct node *parent = NULL;
-
- register_nodes();
+ register_cpu_pmc(sys_dev);
- for_each_cpu(cpu) {
- struct cpu *c = &per_cpu(cpu_devices, cpu);
+ sysdev_create_file(sys_dev, &attr_physical_id);
-#ifdef CONFIG_NUMA
- parent = &node_devices[cpu_to_node(cpu)];
+#ifndef CONFIG_PPC_ISERIES
+ if (cur_cpu_spec->cpu_features & CPU_FTR_SMT)
+ sysdev_create_file(sys_dev, &attr_smt_snooze_delay);
#endif
- /*
- * For now, we just see if the system supports making
- * the RTAS calls for CPU hotplug. But, there may be a
- * more comprehensive way to do this for an individual
- * CPU. For instance, the boot cpu might never be valid
- * for hotplugging.
- */
- if (systemcfg->platform != PLATFORM_PSERIES_LPAR)
- c->no_control = 1;
-
- register_cpu(c, cpu, parent);
+ return 0;
+}
- register_cpu_pmc(&c->sysdev);
+#ifdef CONFIG_HOTPLUG_CPU
+static int ppc64_cpu_remove_dev (struct sys_device * sys_dev)
+{
+ unregister_cpu_pmc(sys_dev);
- sysdev_create_file(&c->sysdev, &attr_physical_id);
+ sysdev_remove_file(sys_dev, &attr_physical_id);
#ifndef CONFIG_PPC_ISERIES
- if (cur_cpu_spec->cpu_features & CPU_FTR_SMT)
- sysdev_create_file(&c->sysdev, &attr_smt_snooze_delay);
+ if (cur_cpu_spec->cpu_features & CPU_FTR_SMT)
+ sysdev_remove_file(sys_dev, &attr_smt_snooze_delay);
#endif
- }
-
return 0;
}
+#else
+#define ppc64_cpu_remove_dev NULL
+#endif /* CONFIG_HOTPLUG_CPU */
+
+static struct sysdev_driver ppc64_cpu_sysdev_driver = {
+ .add = ppc64_cpu_add_dev,
+ .remove = ppc64_cpu_remove_dev,
+};
+
+static int __init topology_init(void)
+{
+ return sysdev_driver_register(&cpu_sysdev_class,
+ &ppc64_cpu_sysdev_driver);
+}
__initcall(topology_init);
_
On Sun, 2004-10-24 at 03:42 -0600, Nathan Lynch wrote:
> For starters, the current situation is that cpu sysdevs are registered
> from architecture code at boot. Already we have inconsistencies
> betweeen the arches -- ia64 registers only online cpus, ppc64
> registers all "possible" cpus.
Um, how does ia64 bring up a new CPU without
a /sys/devices/system/cpu/cpuX/online?
I have no problem with unification, though.
Rusty.
--
A bad analogy is like a leaky screwdriver -- Richard Braakman
On Mon, 2004-10-25 at 16:12 +1000, Rusty Russell wrote:
> On Sun, 2004-10-24 at 03:42 -0600, Nathan Lynch wrote:
> > For starters, the current situation is that cpu sysdevs are registered
> > from architecture code at boot. Already we have inconsistencies
> > betweeen the arches -- ia64 registers only online cpus, ppc64
> > registers all "possible" cpus.
>
> Um, how does ia64 bring up a new CPU without
> a /sys/devices/system/cpu/cpuX/online?
I don't think they have that capability merged yet, but I have seen a
few patches for ACPI-based physical hotplug support go by, e.g.
http://lkml.org/lkml/2004/9/20/126
Nathan
On Sun, Oct 24, 2004 at 03:42:10AM -0600, Nathan Lynch wrote:
Hi Nathan,
this has been lying for a while, and didnt pay attension.. sorry for the late
response.
>
> I know of at least two platforms (ppc64 and ia64) which allow cpus to
> be physically or logically added and removed from a running system.
> These are distinct operations from onlining or offlining, which is
> well supported already. Right now there is little support in the core
> cpu "driver" for dynamic addition or removal. The patch series which
> follows implements support for this in a way which will (hopefully)
> reduce code duplication and enforce some uniformity across the
> relevant architectures.
I think unifying is very good, there are some minor suggestions, i will respond to those
patches separately.
>
> For starters, the current situation is that cpu sysdevs are registered
> from architecture code at boot. Already we have inconsistencies
> betweeen the arches -- ia64 registers only online cpus, ppc64
ia64 we register for all cpu's present in NUMA case. Say for e.g. you start with
maxcpus=2 on a 4 way system, we would create sysfs for all 4 cpus. (This is with the acpi patches
submitted. Before that we were doing this just for all possible cpus.
we probably did online only cpus for numa systems, due to the association is not known until the
node is present (as you have mentioned below. so we probably took a short cut for numa since that was a work TBD.
> registers all "possible" cpus. I propose to move the initial cpu
> sysdev registrations to the cpu "driver" itself (drivers/base/cpu.c),
> and to register only "present" cpus at boot.
>
> But that breaks all the arch code which explicitly registers cpu
> sysdevs. For instance, ppc64 wants to hang all kinds of attributes
> off of the cpu devices for performance counter stuff. So code such as
> this needs to be converted to register a sysdev_driver with the cpu
> device class, which will allow the ppc64 code to be notified when a
> cpu is added or removed. In the patches that follow I include the
> changes necessary for ppc64, as an example. (An arch sweep or
> temporary compatibility hack can come later if I get positive
> responses to this approach.)
>
> Also, there is the matter of the base numa "node" driver. Currently
> the cpu driver makes symlinks from nodes to their cpus. This seems
> backwards to me, so I have changed the node driver to create or remove
> the symlinks upon cpu addition or removal, respectively, also using
> the sysdev_driver approach. I've also converted base/drivers/node.c
> to doing the boot-time node registration itself, like the cpu code.
>
> Finally, I've added two new interfaces which wrap all this up --
> cpu_add() and cpu_remove(). These carry out the necessary update to
> cpu_present_map and take care of the cpu device registration. These
> are meant to be invoked from the platform-specific code which
> discovers and removes processors.
I think you want the device registration that create the sysfs file to the
arch code. If you look at the ACPI extensions to support physical cpu hotplug
we need to keep track of the acpi->logical association. so all we really need
is a bit off the bitmap, but the cpu is not yet ready for operation yet.
having helpers to get a index out of cpu_present_map can be a common helper routine, but
arch may have to munch with this data before general consumption.
--
Cheers,
Ashok Raj
- Linux OS & Technology Team
On Sun, Oct 24, 2004 at 05:42:17AM -0400, Nathan Lynch wrote:
>
> Register cpu system devices in the core code instead of leaving it to
> the architecture. At boot, allocate an array of num_possible_cpus()
> cpu sysdevs, and register sysdevs for cpus which are marked present.
> Also, leave to the node "driver" the creation of symlinks from node to
> cpu devices.
>
> Change register_cpu so that it no longer requires struct cpu* and
> struct node * arguments, only a logical cpu number. Break the weird
> cpu->no_control semantics (for now). Introduce unregister_cpu, which
> removes the cpu entry from sysfs.
>
> Signed-off-by: Nathan Lynch <[email protected]>
>
>
> - &cpu->sysdev.kobj,
> - kobject_name(&cpu->sysdev.kobj));
> +
> + /* XXX FIXME: cpu->no_control is always zero...
> + * Maybe should introduce an arch-overridable "hotpluggable" map.
> + */
Iam getting obsessed with these __attribute__((weak)) these days...:-)
simple solution seems like you can have a platform_prefilter() and post_filter() declared
in the core with weak atteibute, and let the platform that cares about this provide an override
function. So if you need to hang off additional files for platform this can be handy. so for
ppc64, based on LPAR or not, you can add these no_control flag before the file is created?
> if (!error && !cpu->no_control)
> register_cpu_control(cpu);
> return error;
> }
>
> +void unregister_cpu(int num)
> +{
> + struct cpu *cpu = &cpu_devices[num];
>
> + sysdev_remove_file(&cpu->sysdev, &attr_online);
> + sysdev_unregister(&cpu->sysdev);
> +}
>
> int __init cpu_dev_init(void)
> {
> - return sysdev_class_register(&cpu_sysdev_class);
> + unsigned int cpu;
> + int ret = -ENOMEM;
> + size_t size = sizeof(*cpu_devices) * num_possible_cpus();
> +
> + cpu_devices = kmalloc(size, GFP_KERNEL);
> + if (!cpu_devices)
> + goto out;
> +
> + sysdev_class_register(&cpu_sysdev_class);
> +
> + for_each_present_cpu(cpu) {
> + ret = register_cpu(cpu);
> + if (ret)
> + goto out;
> + }
> +out:
> + return ret;
> }
> diff -puN include/linux/cpu.h~dynamic-cpu-registration include/linux/cpu.h
> --- 2.6.10-rc1/include/linux/cpu.h~dynamic-cpu-registration 2004-10-24 00:09:39.000000000 -0500
> +++ 2.6.10-rc1-nathanl/include/linux/cpu.h 2004-10-24 03:52:43.000000000 -0500
> @@ -31,7 +31,8 @@ struct cpu {
> struct sys_device sysdev;
> };
>
> -extern int register_cpu(struct cpu *, int, struct node *);
> +extern int register_cpu(int);
> +extern void unregister_cpu(int);
> struct notifier_block;
>
> #ifdef CONFIG_SMP
>
> _
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Cheers,
Ashok Raj
- Linux OS & Technology Team
On Sun, Oct 24, 2004 at 03:42:10AM -0600, Nathan Lynch wrote:
Hi Natan,
Sorry I am replying to you mail so late as I got to see your mail now:)
Firstly good to see that some other architecture other than ia64 is planning to
support physical CPU hotplug. Recenlty I had submitted some patches for supporting
ACPI based physical cpu hotplug for IA64 arch. I will take a look at you patches and
give more comments later.
thanks for your efforts.
-Anil
> Hi there-
>
> I know of at least two platforms (ppc64 and ia64) which allow cpus to
> be physically or logically added and removed from a running system.
> These are distinct operations from onlining or offlining, which is
> well supported already. Right now there is little support in the core
> cpu "driver" for dynamic addition or removal. The patch series which
> follows implements support for this in a way which will (hopefully)
> reduce code duplication and enforce some uniformity across the
> relevant architectures.
>
> For starters, the current situation is that cpu sysdevs are registered
> from architecture code at boot. Already we have inconsistencies
> betweeen the arches -- ia64 registers only online cpus, ppc64
> registers all "possible" cpus. I propose to move the initial cpu
> sysdev registrations to the cpu "driver" itself (drivers/base/cpu.c),
> and to register only "present" cpus at boot.
>
> But that breaks all the arch code which explicitly registers cpu
> sysdevs. For instance, ppc64 wants to hang all kinds of attributes
> off of the cpu devices for performance counter stuff. So code such as
> this needs to be converted to register a sysdev_driver with the cpu
> device class, which will allow the ppc64 code to be notified when a
> cpu is added or removed. In the patches that follow I include the
> changes necessary for ppc64, as an example. (An arch sweep or
> temporary compatibility hack can come later if I get positive
> responses to this approach.)
>
> Also, there is the matter of the base numa "node" driver. Currently
> the cpu driver makes symlinks from nodes to their cpus. This seems
> backwards to me, so I have changed the node driver to create or remove
> the symlinks upon cpu addition or removal, respectively, also using
> the sysdev_driver approach. I've also converted base/drivers/node.c
> to doing the boot-time node registration itself, like the cpu code.
>
> Finally, I've added two new interfaces which wrap all this up --
> cpu_add() and cpu_remove(). These carry out the necessary update to
> cpu_present_map and take care of the cpu device registration. These
> are meant to be invoked from the platform-specific code which
> discovers and removes processors.
>
> This is the first real device model-related hacking I've done. I'm
> hoping Greg or Patrick will tell me whether I'm on the right track or
> abusing the APIs :)
>
> These patches have been boot-tested on ppc64. I haven't gotten to
> test the removal paths yet.
>
>
> Nathan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Sun, Oct 24, 2004 at 05:42:31AM -0400, Nathan Lynch wrote:
>
> These functions safely update cpu_present_map (i.e. with the
> cpucontrol semaphore held) and register or unregister the cpu device
> as needed. These are needed by systems which can add or remove cpus
> from the system after boot (e.g. ppc64 and ia64), and are intended to
> be called from the platform-specific code such as the ACPI or Open
> Firmware layers.
>
> Signed-off-by: Nathan Lynch <[email protected]>
>
>
> ---
>
>
> +
> +/*
> + * Add a cpu to the system. Return the number of the cpu added,
> + * or NR_CPUS if no more slots available.
> + */
> +unsigned int cpu_add(void)
> +{
> + unsigned int cpu = NR_CPUS;
> +
> + lock_cpu_hotplug();
> +
> + if (num_present_cpus() == num_possible_cpus())
> +goto out;
> +
> + for_each_cpu(cpu)
> + if (!cpu_present(cpu))
> + break;
could we simplify this by
cpus_compliment(cpu_compliment_map, cpu_present_map);
cpu = first_cpu(cpu_compliment_map);
> +
> + if (register_cpu(cpu)) {
> + cpu = NR_CPUS;
> + goto out;
> + }
> + cpu_set(cpu, cpu_present_map);
I would prefer that register_cpu is performed in arch side, as there may be other setup
necessary to capture the hardware->logical associations before consuming these.
> +out:
> + unlock_cpu_hotplug();
> + return cpu;
> +}
> +
> +/*
> + * Remove a cpu from the system.
> + */
> +void cpu_remove(unsigned int cpu)
> +{
> + lock_cpu_hotplug();
> +
> + BUG_ON(cpu_present(cpu));
> +
> + unregister_cpu(cpu);
> +
> + cpu_clear(cpu, cpu_present_map);
> +
> + unlock_cpu_hotplug();
> +}
> #else
> static inline int cpu_run_sbin_hotplug(unsigned int cpu, const char *action)
> {
>
> _
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Cheers,
Ashok Raj
- Linux OS & Technology Team
On Thu, 2004-11-04 at 17:57, Ashok Raj wrote:
> On Sun, Oct 24, 2004 at 05:42:31AM -0400, Nathan Lynch wrote:
> >
> > These functions safely update cpu_present_map (i.e. with the
> > cpucontrol semaphore held) and register or unregister the cpu device
> > as needed. These are needed by systems which can add or remove cpus
> > from the system after boot (e.g. ppc64 and ia64), and are intended to
> > be called from the platform-specific code such as the ACPI or Open
> > Firmware layers.
> >
> > Signed-off-by: Nathan Lynch <[email protected]>
> >
> >
> > ---
> >
> >
> > +
> > +/*
> > + * Add a cpu to the system. Return the number of the cpu added,
> > + * or NR_CPUS if no more slots available.
> > + */
> > +unsigned int cpu_add(void)
> > +{
> > + unsigned int cpu = NR_CPUS;
> > +
> > + lock_cpu_hotplug();
> > +
> > + if (num_present_cpus() == num_possible_cpus())
> > +goto out;
> > +
> > + for_each_cpu(cpu)
> > + if (!cpu_present(cpu))
> > + break;
>
> could we simplify this by
>
> cpus_compliment(cpu_compliment_map, cpu_present_map);
> cpu = first_cpu(cpu_compliment_map);
Well, since for_each_cpu() is defined like this:
#define for_each_cpu(cpu) for_each_cpu_mask((cpu), cpu_possible_map)
We could do:
cpus_andnot(new_cpu_map, cpu_possible_map, cpu_present_map);
cpu = first_cpu(new_cpu_map);
Or maybe even:
unsigned int cpu_add(void)
{
unsigned int cpu = NR_CPUS;
lock_cpu_hotplug();
if (num_present_cpus() == num_possible_cpus())
goto out;
cpus_andnot(new_cpu_map, cpu_possible_map, cpu_present_map);
for_each_cpu_mask(new_cpu_map)
if (!register_cpu(cpu)) {
cpu_set(cpu, cpu_present_map);
goto out;
}
cpu = NR_CPUS;
out:
unlock_cpu_hotplug();
return cpu;
}
since we want to try all possible but !present CPUs until we exhaust
them all or find one to bring online.
Simply complimenting the cpu_present_map could easily return CPUs which
aren't 'present' and aren't even 'possible' since cpu_possible_map
doesn't necessarily equal 0xFFFFFFFF (or however many FF's for your
particular platform! ;)
And FWIW, I like where you're going with this, Nathan. I wrote a lot of
the original system topology code for sysfs, most of which is thankfully
gone now! ;) I had hoped to clean up some of the ickier stuff, but
haven't gotten around to it.
-Matt
On Thu, 2004-11-04 at 17:09 -0800, Ashok Raj wrote:
> On Sun, Oct 24, 2004 at 03:42:10AM -0600, Nathan Lynch wrote:
> >
> > Finally, I've added two new interfaces which wrap all this up --
> > cpu_add() and cpu_remove(). These carry out the necessary update to
> > cpu_present_map and take care of the cpu device registration. These
> > are meant to be invoked from the platform-specific code which
> > discovers and removes processors.
>
> I think you want the device registration that create the sysfs file to the
> arch code.
No, I don't think the arch code should be registering the cpu devices
(or the node devices). There is very little that is arch-specific about
these, and the same code is more or less duplicated between the
architectures.
> If you look at the ACPI extensions to support physical cpu hotplug
> we need to keep track of the acpi->logical association. so all we really need
> is a bit off the bitmap, but the cpu is not yet ready for operation yet.
I see your point here, though, and I'm slightly embarrassed I forgot
that ppc64 has similar needs. What is needed is an arch-specific
__cpu_add which is called from cpu_add after the new cpu's bit has been
reserved, and which sets up the architecture's physical<->logical
associations or whatever. This follows the convention established in
the existing cpu code and keeps the manipulation of cpu_present_map in
one place.
I'll incorporate this in my next attempt.
Nathan
On Thu, 2004-11-04 at 17:51 -0800, Ashok Raj wrote:
> On Sun, Oct 24, 2004 at 05:42:17AM -0400, Nathan Lynch wrote:
> >
> > + /* XXX FIXME: cpu->no_control is always zero...
> > + * Maybe should introduce an arch-overridable "hotpluggable" map.
> > + */
> Iam getting obsessed with these __attribute__((weak)) these days...:-)
>
> simple solution seems like you can have a platform_prefilter() and post_filter() declared
> in the core with weak atteibute, and let the platform that cares about this provide an override
> function. So if you need to hang off additional files for platform this can be handy. so for
> ppc64, based on LPAR or not, you can add these no_control flag before the file is created?
I'm not sure using weak symbols is the way to take care of the
'no_control' field. I think having the arch implement a
__register_cpu(struct cpu*) helper which sets the the 'no_control'
attribute should be sufficient. E.g. IA64 and i386 implementations of
__register_cpu would set no_control=1 if the cpu is the boot processor.
With respect to the general issue of adding sysfs attributes to the cpu
devices, that's simply a matter of coding up a sysdev_driver as I did in
the node and ppc64 code in the other patches.
Nathan