2008-07-15 02:33:56

by Alex Chiang

[permalink] [raw]
Subject: [PATCH 00/14] Introduce cpu_enabled_map and friends

The following series presents cpu_enabled_map, which captures
the concept of a physically present and enabled CPU.

The complement of this map gives us present but disabled CPUs.
Perhaps the CPU was disabled by firmware for whatever reason.

For the most part, cpu_enabled_map will follow cpu_present_map.
On x86 and ia64, ACPI can tell us if a CPU is present, enabled,
or both. On those two archs, cpu_enabled_map is a subset of
cpu_present_map. All CPUs described in the MADT are now added
to cpu_present_map, but only ones with ACPI_MADT_ENABLED are
added to cpu_enabled_map.

Other archs may wish to do something similar, so some minimum
levels of support are included with this patch series, enabling
others to fill things in as desired. I certainly acknowledge
that a better job could be done.

Patch 01 introduces the actual map and interface.

Patches 02 -- 12 modify arch code in an attempt to populate the
map correctly.

Patch 12 is the most interesting one, in that we actually
demonstrate how an arch [ia64] would actually use cpu_enabled_map
to control calling cpu_up() on a given CPU.

Patch 13 fixes a potential overflow bug in ia64 ACPI code.

Patch 14 is the money patch. It demonstrates why we might
want to go through all these gyrations. Now that ia64 presents
*all* physically present CPUs in sysfs, even if they have been
disabled by firmware, we give userspace a way to poke at those
CPUs.

This is possible because present CPUs are in the ACPI namespace.
Even though we might not have a valid per_cpu pointer for a
disabled CPU (because we never called cpu_up() on it), we can
still obtain a valid ACPI handle for that CPU by walking the
namespace and executing methods underneath it.

The big picture implication is that we can allow userspace
to interact with disabled CPUs. In this particular example,
we provide a knob that lets a sysadmin schedule any present
CPU for firmware deconfiguration or enablement.

I have compile tested on:

- arm, sparc, powerpc, alpha, x86, and ia64

I have boot tested on x86 and ia64

None of my x86 platforms have firmware support for disabling
CPUs, but my ia64 ones do, so that's where I've done my most
extensive testing.

Thanks.

/ac

---

Alex Chiang (14):
ACPI: Provide /sys/devices/system/cpu/cpuN/deconfigure
[IA64] Avoid overflowing ia64_cpu_to_sapicid in acpi_map_lsapic()
[IA64] Populate and use cpu_enabled_map
x86: Populate cpu_enabled_map
[SPARC] Populate cpu_enabled_map
[SH] Populate cpu_enabled_map
[S390] Populate cpu_enabled_map
[POWERPC] Populate cpu_enabled_map
[PARISC] Populate cpu_enabled_map
[MIPS] Populate cpu_enabled_map
[ARM] Populate cpu_enabled_map
[ALPHA] Populate cpu_enabled_map
[M32R] Populate cpu_enabled_map
Introduce cpu_enabled_map and friends


arch/alpha/kernel/process.c | 2
arch/alpha/kernel/smp.c | 2
arch/arm/mach-realview/platsmp.c | 4
arch/ia64/kernel/acpi.c | 21 +-
arch/ia64/kernel/smpboot.c | 7 +
arch/m32r/kernel/smpboot.c | 1
arch/mips/kernel/smp.c | 1
arch/mips/kernel/smtc.c | 1
arch/parisc/kernel/processor.c | 1
arch/parisc/kernel/smp.c | 3
arch/powerpc/kernel/setup-common.c | 1
arch/powerpc/platforms/powermac/smp.c | 1
arch/powerpc/platforms/pseries/hotplug-cpu.c | 2
arch/s390/kernel/smp.c | 7 +
arch/sh/kernel/smp.c | 1
arch/sparc/kernel/smp.c | 1
arch/sparc64/kernel/mdesc.c | 1
arch/sparc64/kernel/smp.c | 1
arch/x86/kernel/acpi/boot.c | 7 -
arch/x86/kernel/apic_32.c | 1
arch/x86/kernel/apic_64.c | 1
arch/x86/kernel/smpboot.c | 2
arch/x86/mach-voyager/voyager_smp.c | 2
arch/x86/xen/smp.c | 1
drivers/acpi/Kconfig | 18 ++
drivers/acpi/Makefile | 4
drivers/acpi/processor_core.c | 8 +
drivers/acpi/processor_deconfigure.c | 275 ++++++++++++++++++++++++++
drivers/base/cpu.c | 9 +
include/acpi/processor.h | 6 +
include/asm-ia64/smp.h | 1
include/asm-m32r/smp.h | 1
include/linux/cpumask.h | 31 ++-
init/main.c | 1
kernel/sched.c | 16 +-
35 files changed, 415 insertions(+), 27 deletions(-)
create mode 100644 drivers/acpi/processor_deconfigure.c


2008-07-15 02:34:31

by Alex Chiang

[permalink] [raw]
Subject: [PATCH 02/14] [M32R] Populate cpu_enabled_map

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

/sys/devices/system/cpu/

It should be populated correctly.

Signed-off-by: Alex Chiang <[email protected]>
Cc: Hirokazu Takata <[email protected]>
---

arch/m32r/kernel/smpboot.c | 1 +
include/asm-m32r/smp.h | 1 +
2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/m32r/kernel/smpboot.c b/arch/m32r/kernel/smpboot.c
index 2c03ac1..4f2bbde 100644
--- a/arch/m32r/kernel/smpboot.c
+++ b/arch/m32r/kernel/smpboot.c
@@ -184,6 +184,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
physid_set(phys_id, phys_cpu_present_map);
#ifndef CONFIG_HOTPLUG_CPU
cpu_present_map = cpu_possible_map;
+ cpu_enabled_map = cpu_possible_map;
#endif

show_mp_info(nr_cpu);
diff --git a/include/asm-m32r/smp.h b/include/asm-m32r/smp.h
index 078e1a5..4ea1845 100644
--- a/include/asm-m32r/smp.h
+++ b/include/asm-m32r/smp.h
@@ -65,6 +65,7 @@ extern volatile int cpu_2_physid[NR_CPUS];
extern cpumask_t cpu_callout_map;
extern cpumask_t cpu_possible_map;
extern cpumask_t cpu_present_map;
+extern cpumask_t cpu_enabled_map;

static __inline__ int hard_smp_processor_id(void)
{

2008-07-15 02:35:25

by Alex Chiang

[permalink] [raw]
Subject: [PATCH 03/14] [ALPHA] Populate cpu_enabled_map

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

/sys/devices/system/cpu/

It should be populated correctly.

Signed-off-by: Alex Chiang <[email protected]>
Cc: Richard Henderson <[email protected]>
---

arch/alpha/kernel/process.c | 2 ++
arch/alpha/kernel/smp.c | 2 ++
2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 96ed82f..7a261e2 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -94,6 +94,7 @@ common_shutdown_1(void *generic_ptr)
flags |= 0x00040000UL; /* "remain halted" */
*pflags = flags;
cpu_clear(cpuid, cpu_present_map);
+ cpu_clear(cpuid, cpu_enabled_map);
halt();
}
#endif
@@ -120,6 +121,7 @@ common_shutdown_1(void *generic_ptr)
#ifdef CONFIG_SMP
/* Wait for the secondaries to halt. */
cpu_clear(boot_cpuid, cpu_present_map);
+ cpu_clear(boot_cpuid, cpu_enabled_map);
while (cpus_weight(cpu_present_map))
barrier();
#endif
diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index 2525692..ec53061 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -436,6 +436,7 @@ setup_smp(void)
if ((cpu->flags & 0x1cc) == 0x1cc) {
smp_num_probed++;
cpu_set(i, cpu_present_map);
+ cpu_set(i, cpu_enabled_map);
cpu->pal_revision = boot_cpu_palrev;
}

@@ -469,6 +470,7 @@ smp_prepare_cpus(unsigned int max_cpus)
/* Nothing to do on a UP box, or when told not to. */
if (smp_num_probed == 1 || max_cpus == 0) {
cpu_present_map = cpumask_of_cpu(boot_cpuid);
+ cpu_enabled_map = cpumask_of_cpu(boot_cpuid);
printk(KERN_INFO "SMP mode deactivated.\n");
return;
}

2008-07-15 02:35:48

by Alex Chiang

[permalink] [raw]
Subject: [PATCH 04/14] [ARM] Populate cpu_enabled_map

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

/sys/devices/system/cpu/

It should be populated correctly.

Signed-off-by: Alex Chiang <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Russell King <[email protected]>
---

arch/arm/mach-realview/platsmp.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/arm/mach-realview/platsmp.c b/arch/arm/mach-realview/platsmp.c
index 8e813ed..c33f298 100644
--- a/arch/arm/mach-realview/platsmp.c
+++ b/arch/arm/mach-realview/platsmp.c
@@ -241,8 +241,10 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
* Initialise the present map, which describes the set of CPUs
* actually populated at the present time.
*/
- for (i = 0; i < max_cpus; i++)
+ for (i = 0; i < max_cpus; i++) {
cpu_set(i, cpu_present_map);
+ cpu_set(i, cpu_enabled_map);
+ }

/*
* Initialise the SCU if there are more than one CPU and let

2008-07-15 02:36:31

by Alex Chiang

[permalink] [raw]
Subject: [PATCH 05/14] [MIPS] Populate cpu_enabled_map

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

/sys/devices/system/cpu/

It should be populated correctly.

Signed-off-by: Alex Chiang <[email protected]>
Cc: Ralf Baechle <[email protected]>
---

arch/mips/kernel/smp.c | 1 +
arch/mips/kernel/smtc.c | 1 +
2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index cdf87a9..8f7f742 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -304,6 +304,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
set_cpu_sibling_map(0);
#ifndef CONFIG_HOTPLUG_CPU
cpu_present_map = cpu_possible_map;
+ cpu_enabled_map = cpu_possible_map;
#endif
}

diff --git a/arch/mips/kernel/smtc.c b/arch/mips/kernel/smtc.c
index 3e86318..b0c16e0 100644
--- a/arch/mips/kernel/smtc.c
+++ b/arch/mips/kernel/smtc.c
@@ -498,6 +498,7 @@ void mipsmt_prepare_cpus(void)
while (tc < (((val & MVPCONF0_PTC) >> MVPCONF0_PTC_SHIFT) + 1)) {
cpu_clear(tc, phys_cpu_present_map);
cpu_clear(tc, cpu_present_map);
+ cpu_clear(tc, cpu_enabled_map);
tc++;
}

2008-07-15 02:36:51

by Alex Chiang

[permalink] [raw]
Subject: [PATCH 06/14] [PARISC] Populate cpu_enabled_map

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

/sys/devices/system/cpu/

It should be populated correctly.

Signed-off-by: Alex Chiang <[email protected]>
Cc: Kyle McMartin <[email protected]>
---

arch/parisc/kernel/processor.c | 1 +
arch/parisc/kernel/smp.c | 3 +++
2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/parisc/kernel/processor.c b/arch/parisc/kernel/processor.c
index 370086f..fb81222 100644
--- a/arch/parisc/kernel/processor.c
+++ b/arch/parisc/kernel/processor.c
@@ -201,6 +201,7 @@ static int __cpuinit processor_probe(struct parisc_device *dev)
#ifdef CONFIG_SMP
if (cpuid) {
cpu_set(cpuid, cpu_present_map);
+ cpu_set(cpuid, cpu_enabled_map);
cpu_up(cpuid);
}
#endif
diff --git a/arch/parisc/kernel/smp.c b/arch/parisc/kernel/smp.c
index 85fc775..a31b0ad 100644
--- a/arch/parisc/kernel/smp.c
+++ b/arch/parisc/kernel/smp.c
@@ -535,6 +535,7 @@ void __devinit smp_prepare_boot_cpu(void)

cpu_set(bootstrap_processor, cpu_online_map);
cpu_set(bootstrap_processor, cpu_present_map);
+ cpu_set(bootstrap_processor, cpu_enabled_map);
}


@@ -546,7 +547,9 @@ void __devinit smp_prepare_boot_cpu(void)
void __init smp_prepare_cpus(unsigned int max_cpus)
{
cpus_clear(cpu_present_map);
+ cpus_clear(cpu_enabled_map);
cpu_set(0, cpu_present_map);
+ cpu_set(0, cpu_enabled_map);

parisc_max_cpus = max_cpus;
if (!max_cpus)

2008-07-15 02:37:30

by Alex Chiang

[permalink] [raw]
Subject: [PATCH 08/14] [S390] Populate cpu_enabled_map

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

/sys/devices/system/cpu/

It should be populated correctly.

Signed-off-by: Alex Chiang <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
---

arch/s390/kernel/smp.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 5d4fa4b..f7ae20b 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -458,6 +458,7 @@ static int smp_rescan_cpus_sigp(cpumask_t avail)
if (!cpu_stopped(logical_cpu))
continue;
cpu_set(logical_cpu, cpu_present_map);
+ cpu_set(logical_cpu, cpu_enabled_map);
smp_cpu_state[logical_cpu] = CPU_STATE_CONFIGURED;
logical_cpu = next_cpu(logical_cpu, avail);
if (logical_cpu == NR_CPUS)
@@ -490,6 +491,7 @@ static int smp_rescan_cpus_sclp(cpumask_t avail)
__cpu_logical_map[logical_cpu] = cpu_id;
smp_cpu_polarization[logical_cpu] = POLARIZATION_UNKNWN;
cpu_set(logical_cpu, cpu_present_map);
+ cpu_set(logical_cpu, cpu_enabled_map);
if (cpu >= info->configured)
smp_cpu_state[logical_cpu] = CPU_STATE_STANDBY;
else
@@ -844,6 +846,7 @@ void __init smp_prepare_boot_cpu(void)

current_thread_info()->cpu = 0;
cpu_set(0, cpu_present_map);
+ cpu_set(0, cpu_enabled_map);
cpu_set(0, cpu_online_map);
S390_lowcore.percpu_offset = __per_cpu_offset[0];
current_set[0] = current;
@@ -1104,8 +1107,10 @@ int __ref smp_rescan_cpus(void)
cpus_andnot(newcpus, cpu_present_map, newcpus);
for_each_cpu_mask(cpu, newcpus) {
rc = smp_add_present_cpu(cpu);
- if (rc)
+ if (rc) {
cpu_clear(cpu, cpu_present_map);
+ cpu_clear(cpu, cpu_enabled_map);
+ }
}
rc = 0;
out:

2008-07-15 02:37:57

by Alex Chiang

[permalink] [raw]
Subject: [PATCH 09/14] [SH] Populate cpu_enabled_map

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

/sys/devices/system/cpu/

It should be populated correctly.

Signed-off-by: Alex Chiang <[email protected]>
Cc: Paul Mundt <[email protected]>
---

arch/sh/kernel/smp.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
index 5d039d1..b91fdfa 100644
--- a/arch/sh/kernel/smp.c
+++ b/arch/sh/kernel/smp.c
@@ -60,6 +60,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)

#ifndef CONFIG_HOTPLUG_CPU
cpu_present_map = cpu_possible_map;
+ cpu_enabled_map = cpu_possible_map;
#endif
}

2008-07-15 02:38:24

by Alex Chiang

[permalink] [raw]
Subject: [PATCH 12/14] [IA64] Populate and use cpu_enabled_map

Modify MADT local SAPIC parsing such that:

- all present CPUs added to cpu_present_map
- all enabled CPUs added to cpu_enabled_map

This change allows us to check during __cpu_up() if we should
actually bring up the CPU. That is, if a CPU is present, but not
enabled by firmware, we should not bring it up.

Contrariwise, by creating a sysfs interface for a disabled CPU,
we provide a hook that allows a user to interact with the CPU.

The most visible user interface change with this patch is that more
sysfs entries will appear in /sys/devices/system/cpu/cpuN/ on
multi-threaded systems with threads turned off.

The actual directories will be empty. A later patch in this series
will provide an example of using this new hook to provide a user
interface for disabled CPUs.

Signed-off-by: Alex Chiang <[email protected]>
Cc: Tony Luck <[email protected]>
---

arch/ia64/kernel/acpi.c | 16 +++++++++-------
arch/ia64/kernel/smpboot.c | 7 +++++++
include/asm-ia64/smp.h | 1 +
3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index 43687cc..1b4b338 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -212,13 +212,14 @@ acpi_parse_lsapic(struct acpi_subtable_header * header, const unsigned long end)

/*Skip BAD_MADT_ENTRY check, as lsapic size could vary */

- if (lsapic->lapic_flags & ACPI_MADT_ENABLED) {
#ifdef CONFIG_SMP
- smp_boot_data.cpu_phys_id[available_cpus] =
- (lsapic->id << 8) | lsapic->eid;
+ smp_boot_data.cpu_phys_id[available_cpus] =
+ (lsapic->id << 8) | lsapic->eid;
+
+ smp_boot_data.cpu_enabled[available_cpus] =
+ lsapic->lapic_flags & ACPI_MADT_ENABLED;
#endif
- ++available_cpus;
- }
+ ++available_cpus;

total_cpus++;
return 0;
@@ -872,8 +873,7 @@ int acpi_map_lsapic(acpi_handle handle, int *pcpu)

lsapic = (struct acpi_madt_local_sapic *)obj->buffer.pointer;

- if ((lsapic->header.type != ACPI_MADT_TYPE_LOCAL_SAPIC) ||
- (!(lsapic->lapic_flags & ACPI_MADT_ENABLED))) {
+ if (lsapic->header.type != ACPI_MADT_TYPE_LOCAL_SAPIC) {
kfree(buffer.pointer);
return -EINVAL;
}
@@ -892,6 +892,8 @@ int acpi_map_lsapic(acpi_handle handle, int *pcpu)
acpi_map_cpu2node(handle, cpu, physid);

cpu_set(cpu, cpu_present_map);
+ if (lsapic->lapic_flags & ACPI_MADT_ENABLED)
+ cpu_set(cpu, cpu_enabled_map);
ia64_cpu_to_sapicid[cpu] = physid;

*pcpu = cpu;
diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
index d7ad42b..caa1a44 100644
--- a/arch/ia64/kernel/smpboot.c
+++ b/arch/ia64/kernel/smpboot.c
@@ -582,14 +582,18 @@ smp_build_cpu_map (void)

ia64_cpu_to_sapicid[0] = boot_cpu_id;
cpus_clear(cpu_present_map);
+ cpus_clear(cpu_enabled_map);
cpu_set(0, cpu_present_map);
cpu_set(0, cpu_possible_map);
+ cpu_set(0, cpu_enabled_map);
for (cpu = 1, i = 0; i < smp_boot_data.cpu_count; i++) {
sapicid = smp_boot_data.cpu_phys_id[i];
if (sapicid == boot_cpu_id)
continue;
cpu_set(cpu, cpu_present_map);
cpu_set(cpu, cpu_possible_map);
+ if (smp_boot_data.cpu_enabled[cpu])
+ cpu_set(cpu, cpu_enabled_map);
ia64_cpu_to_sapicid[cpu] = sapicid;
cpu++;
}
@@ -820,6 +824,9 @@ __cpu_up (unsigned int cpu)
if (cpu_isset(cpu, cpu_callin_map))
return -EINVAL;

+ if (!cpu_isset(cpu, cpu_enabled_map))
+ return -EINVAL;
+
per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
/* Processor goes to start_secondary(), sets online flag */
ret = do_boot_cpu(sapicid, cpu);
diff --git a/include/asm-ia64/smp.h b/include/asm-ia64/smp.h
index ec5f355..e5f60e8 100644
--- a/include/asm-ia64/smp.h
+++ b/include/asm-ia64/smp.h
@@ -55,6 +55,7 @@ extern int smp_call_function_mask(cpumask_t mask, void (*func)(void *),
extern struct smp_boot_data {
int cpu_count;
int cpu_phys_id[NR_CPUS];
+ int cpu_enabled[NR_CPUS];
} smp_boot_data __initdata;

extern char no_int_routing __devinitdata;

2008-07-15 02:38:43

by Alex Chiang

[permalink] [raw]
Subject: [PATCH 13/14] [IA64] Avoid overflowing ia64_cpu_to_sapicid in acpi_map_lsapic()

acpi_map_lsapic tries to stuff a long into ia64_cpu_to_sapicid[],
which can only hold ints, so let's fix that.

We need to update the signature of acpi_map_cpu2node() too.

Signed-off-by: Alex Chiang <[email protected]>
Cc: Tony Luck <[email protected]>
---

arch/ia64/kernel/acpi.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index 1b4b338..a928b94 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -775,7 +775,7 @@ int acpi_gsi_to_irq(u32 gsi, unsigned int *irq)
*/
#ifdef CONFIG_ACPI_HOTPLUG_CPU
static
-int acpi_map_cpu2node(acpi_handle handle, int cpu, long physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
{
#ifdef CONFIG_ACPI_NUMA
int pxm_id;
@@ -855,8 +855,7 @@ int acpi_map_lsapic(acpi_handle handle, int *pcpu)
union acpi_object *obj;
struct acpi_madt_local_sapic *lsapic;
cpumask_t tmp_map;
- long physid;
- int cpu;
+ int cpu, physid;

if (ACPI_FAILURE(acpi_evaluate_object(handle, "_MAT", NULL, &buffer)))
return -EINVAL;

2008-07-15 02:39:05

by Alex Chiang

[permalink] [raw]
Subject: [PATCH 01/14] Introduce cpu_enabled_map and friends

Currently, the following cpu maps exist:

cpu_possible_map - map of populatable CPUs
cpu_present_map - map of populated CPUs
cpu_online_map - map of schedulable CPUs

These maps do not provide the concept of populated, but disabled CPUs.

That is, a system may contain CPU modules that are physically plugged
in, but disabled by system firmware. The existence of this class of
CPUs breaks the following assumption in smp_init():

for_each_present_cpu(cpu) {

.../...

if (!cpu_online(cpu))
cpu_up(cpu);
}

The assumption is that the kernel should attempt cpu_up() on every
physically populated CPU, which may not be desirable for present but
disabled CPUs.

By providing cpu_enabled_map, we can keep the above [simplifying]
assumption in smp_init(), and push the knowledge of disabled CPUs
and the decision to bring them up, down into arch specific code.

Signed-off-by: Alex Chiang <[email protected]>
---

drivers/base/cpu.c | 9 +++++++--
include/linux/cpumask.h | 31 ++++++++++++++++++++++++-------
init/main.c | 1 +
kernel/sched.c | 16 ++++++++++++----
4 files changed, 44 insertions(+), 13 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index e38dfed..bc300df 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -57,7 +57,10 @@ static SYSDEV_ATTR(online, 0644, show_online, store_online);

static void __cpuinit register_cpu_control(struct cpu *cpu)
{
- sysdev_create_file(&cpu->sysdev, &attr_online);
+ int logical_cpu = cpu->sysdev.id;
+
+ if (cpu_isset(logical_cpu, cpu_enabled_map))
+ sysdev_create_file(&cpu->sysdev, &attr_online);
}
void unregister_cpu(struct cpu *cpu)
{
@@ -125,11 +128,13 @@ struct sysdev_class_attribute attr_##type##_map = \
print_cpus_func(online);
print_cpus_func(possible);
print_cpus_func(present);
+print_cpus_func(enabled);

struct sysdev_class_attribute *cpu_state_attr[] = {
&attr_online_map,
&attr_possible_map,
&attr_present_map,
+ &attr_enabled_map,
};

static int cpu_states_init(void)
@@ -172,7 +177,7 @@ int __cpuinit register_cpu(struct cpu *cpu, int num)
register_cpu_under_node(num, cpu_to_node(num));

#ifdef CONFIG_KEXEC
- if (!error)
+ if ((!error) && cpu_isset(num, cpu_enabled_map))
error = sysdev_create_file(&cpu->sysdev, &attr_crash_notes);
#endif
return error;
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index c24875b..bba31aa 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -64,16 +64,19 @@
* int num_online_cpus() Number of online CPUs
* int num_possible_cpus() Number of all possible CPUs
* int num_present_cpus() Number of present CPUs
+ * int num_enabled_cpus() Number of enabled CPUs
*
* int cpu_online(cpu) Is some cpu online?
* int cpu_possible(cpu) Is some cpu possible?
* int cpu_present(cpu) Is some cpu present (can schedule)?
+ * int cpu_enabled(cpu) Is some cpu enabled (by firmware)?
*
* int any_online_cpu(mask) First online cpu in mask
*
* for_each_possible_cpu(cpu) for-loop cpu over cpu_possible_map
* for_each_online_cpu(cpu) for-loop cpu over cpu_online_map
* for_each_present_cpu(cpu) for-loop cpu over cpu_present_map
+ * for_each_enabled_cpu(cpu) for-loop cpu over cpu_enabled_map
*
* Subtlety:
* 1) The 'type-checked' form of cpu_isset() causes gcc (3.3.2, anyway)
@@ -359,16 +362,18 @@ static inline void __cpus_fold(cpumask_t *dstp, const cpumask_t *origp,

/*
* The following particular system cpumasks and operations manage
- * possible, present and online cpus. Each of them is a fixed size
- * bitmap of size NR_CPUS.
+ * possible, present, enabled and online cpus. Each of them is a fixed
+ * size bitmap of size NR_CPUS.
*
* #ifdef CONFIG_HOTPLUG_CPU
* cpu_possible_map - has bit 'cpu' set iff cpu is populatable
* cpu_present_map - has bit 'cpu' set iff cpu is populated
+ * cpu_enabled_map - has bit 'cpu' set iff cpu is enabled by firmware
* cpu_online_map - has bit 'cpu' set iff cpu available to scheduler
* #else
* cpu_possible_map - has bit 'cpu' set iff cpu is populated
* cpu_present_map - copy of cpu_possible_map
+ * cpu_enabled_map - copy of cpu_possible_map
* cpu_online_map - has bit 'cpu' set iff cpu available to scheduler
* #endif
*
@@ -377,9 +382,10 @@ static inline void __cpus_fold(cpumask_t *dstp, const cpumask_t *origp,
* time, as the set of CPU id's that it is possible might ever
* be plugged in at anytime during the life of that system boot.
* The cpu_present_map is dynamic(*), representing which CPUs
- * are currently plugged in. And cpu_online_map is the dynamic
- * subset of cpu_present_map, indicating those CPUs available
- * for scheduling.
+ * are currently plugged in. The cpu_enabled_map is also dynamic(*),
+ * and represents CPUs both plugged in and enabled by firmware.
+ * And cpu_online_map is the dynamic subset of cpu_present_map,
+ * indicating those CPUs available for scheduling.
*
* If HOTPLUG is enabled, then cpu_possible_map is forced to have
* all NR_CPUS bits set, otherwise it is just the set of CPUs that
@@ -389,8 +395,13 @@ static inline void __cpus_fold(cpumask_t *dstp, const cpumask_t *origp,
* depending on what ACPI reports as currently plugged in, otherwise
* cpu_present_map is just a copy of cpu_possible_map.
*
- * (*) Well, cpu_present_map is dynamic in the hotplug case. If not
- * hotplug, it's a copy of cpu_possible_map, hence fixed at boot.
+ * If HOTPLUG is enabled, then cpu_enabled_map varies dynamically,
+ * depending on what ACPI reports as currently enabled by firmware,
+ * otherwise cpu_enabled_map is just a copy of cpu_possible_map.
+ *
+ * (*) Well, cpu_present_map and cpu_enabled_map are dynamic in the
+ * hotplug case. If not hotplug, they're copies of cpu_possible_map,
+ * hence fixed at boot.
*
* Subtleties:
* 1) UP arch's (NR_CPUS == 1, CONFIG_SMP not defined) hardcode
@@ -416,21 +427,26 @@ static inline void __cpus_fold(cpumask_t *dstp, const cpumask_t *origp,
extern cpumask_t cpu_possible_map;
extern cpumask_t cpu_online_map;
extern cpumask_t cpu_present_map;
+extern cpumask_t cpu_enabled_map;

#if NR_CPUS > 1
#define num_online_cpus() cpus_weight(cpu_online_map)
#define num_possible_cpus() cpus_weight(cpu_possible_map)
#define num_present_cpus() cpus_weight(cpu_present_map)
+#define num_enabled_cpus() cpus_weight(cpu_enabled_map)
#define cpu_online(cpu) cpu_isset((cpu), cpu_online_map)
#define cpu_possible(cpu) cpu_isset((cpu), cpu_possible_map)
#define cpu_present(cpu) cpu_isset((cpu), cpu_present_map)
+#define cpu_enabled(cpu) cpu_isset((cpu), cpu_enabled_map)
#else
#define num_online_cpus() 1
#define num_possible_cpus() 1
#define num_present_cpus() 1
+#define num_enabled_cpus() 1
#define cpu_online(cpu) ((cpu) == 0)
#define cpu_possible(cpu) ((cpu) == 0)
#define cpu_present(cpu) ((cpu) == 0)
+#define cpu_enabled(cpu) ((cpu) == 0)
#endif

#define cpu_is_offline(cpu) unlikely(!cpu_online(cpu))
@@ -447,5 +463,6 @@ int __any_online_cpu(const cpumask_t *mask);
#define for_each_possible_cpu(cpu) for_each_cpu_mask((cpu), cpu_possible_map)
#define for_each_online_cpu(cpu) for_each_cpu_mask((cpu), cpu_online_map)
#define for_each_present_cpu(cpu) for_each_cpu_mask((cpu), cpu_present_map)
+#define for_each_enabled_cpu(cpu) for_each_cpu_mask((cpu), cpu_enabled_map)

#endif /* __LINUX_CPUMASK_H */
diff --git a/init/main.c b/init/main.c
index f7fb200..1fe50c6 100644
--- a/init/main.c
+++ b/init/main.c
@@ -520,6 +520,7 @@ static void __init boot_cpu_init(void)
/* Mark the boot cpu "present", "online" etc for SMP and UP case */
cpu_set(cpu, cpu_online_map);
cpu_set(cpu, cpu_present_map);
+ cpu_set(cpu, cpu_enabled_map);
cpu_set(cpu, cpu_possible_map);
}

diff --git a/kernel/sched.c b/kernel/sched.c
index 4e2f603..b04eb61 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5071,15 +5071,23 @@ asmlinkage long sys_sched_setaffinity(pid_t pid, unsigned int len,
}

/*
- * Represents all cpu's present in the system
+ * Represents all CPUs present in the system
* In systems capable of hotplug, this map could dynamically grow
- * as new cpu's are detected in the system via any platform specific
- * method, such as ACPI for e.g.
+ * as new CPUs are detected in the system via any platform specific
+ * method, such as ACPI.
*/
-
cpumask_t cpu_present_map __read_mostly;
EXPORT_SYMBOL(cpu_present_map);

+/*
+ * Represents all CPUs enabled by firmware in the system
+ * In systems capable of hotplug, this map could dynamically grow
+ * as new CPUs are detected in the system via any platform specific
+ * method, such as ACPI.
+ */
+cpumask_t cpu_enabled_map __read_mostly;
+EXPORT_SYMBOL(cpu_enabled_map);
+
#ifndef CONFIG_SMP
cpumask_t cpu_online_map __read_mostly = CPU_MASK_ALL;
EXPORT_SYMBOL(cpu_online_map);

2008-07-15 02:39:31

by Alex Chiang

[permalink] [raw]
Subject: [PATCH 07/14] [POWERPC] Populate cpu_enabled_map

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

/sys/devices/system/cpu/

It should be populated correctly.

Signed-off-by: Alex Chiang <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
---

arch/powerpc/kernel/setup-common.c | 1 +
arch/powerpc/platforms/powermac/smp.c | 1 +
arch/powerpc/platforms/pseries/hotplug-cpu.c | 2 ++
3 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index db540ea..a4c894a 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -413,6 +413,7 @@ void __init smp_setup_cpu_maps(void)
DBG(" thread %d -> cpu %d (hard id %d)\n",
j, cpu, intserv[j]);
cpu_set(cpu, cpu_present_map);
+ cpu_set(cpu, cpu_enabled_map);
set_hard_smp_processor_id(cpu, intserv[j]);
cpu_set(cpu, cpu_possible_map);
cpu++;
diff --git a/arch/powerpc/platforms/powermac/smp.c b/arch/powerpc/platforms/powermac/smp.c
index cb2d894..a74dada 100644
--- a/arch/powerpc/platforms/powermac/smp.c
+++ b/arch/powerpc/platforms/powermac/smp.c
@@ -317,6 +317,7 @@ static int __init smp_psurge_probe(void)
ncpus = NR_CPUS;
for (i = 1; i < ncpus ; ++i) {
cpu_set(i, cpu_present_map);
+ cpu_set(i, cpu_enabled_map);
set_hard_smp_processor_id(i, i);
}

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 1f03248..e738b07 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -186,6 +186,7 @@ static int pseries_add_processor(struct device_node *np)
for_each_cpu_mask(cpu, tmp) {
BUG_ON(cpu_isset(cpu, cpu_present_map));
cpu_set(cpu, cpu_present_map);
+ cpu_set(cpu, cpu_enabled_map);
set_hard_smp_processor_id(cpu, *intserv++);
}
err = 0;
@@ -218,6 +219,7 @@ static void pseries_remove_processor(struct device_node *np)
continue;
BUG_ON(cpu_online(cpu));
cpu_clear(cpu, cpu_present_map);
+ cpu_clear(cpu, cpu_enabled_map);
set_hard_smp_processor_id(cpu, -1);
break;
}

2008-07-15 02:39:54

by Alex Chiang

[permalink] [raw]
Subject: [PATCH 14/14] ACPI: Provide /sys/devices/system/cpu/cpuN/deconfigure

Provide a new sysfs interface for CPU deconfiguration.

Since no vendors can agree on terminology for related but slightly
different features, provide a method for a platform to implement
its own version of what it thinks 'deconfiguring' a CPU might be.

Provide an HP-specific CPU deconfiguration implementation.

Signed-off-by: Alex Chiang <[email protected]>
Cc: Andi Kleen <[email protected]>
---

drivers/acpi/Kconfig | 18 ++
drivers/acpi/Makefile | 4
drivers/acpi/processor_core.c | 8 +
drivers/acpi/processor_deconfigure.c | 275 ++++++++++++++++++++++++++++++++++
include/acpi/processor.h | 6 +
5 files changed, 311 insertions(+), 0 deletions(-)
create mode 100644 drivers/acpi/processor_deconfigure.c

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index c52fca8..36ad177 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -188,6 +188,24 @@ config ACPI_HOTPLUG_CPU
select ACPI_CONTAINER
default y

+config ACPI_DECONFIGURE_CPU
+ bool "Processor deconfiguration"
+ depends on ACPI_PROCESSOR
+ default n
+ help
+ This processor driver submodule allows a user to mark a CPU
+ for firmware disabling/enabling. It will create the following
+ sysfs file:
+
+ /sys/devices/system/cpu/cpuN/deconfigure
+
+ Behavior of this interface is highly vendor-dependent and
+ requires firmware support.
+
+ This option is NOT required for CPU hotplug support.
+
+ If unsure, say N.
+
config ACPI_THERMAL
tristate "Thermal Zone"
depends on ACPI_PROCESSOR
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 40b0fca..92a5037 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -35,6 +35,10 @@ ifdef CONFIG_CPU_FREQ
processor-objs += processor_perflib.o
endif

+ifdef CONFIG_ACPI_DECONFIGURE_CPU
+processor-objs += processor_deconfigure.o
+endif
+
obj-y += sleep/
obj-y += bus.o glue.o
obj-y += scan.o
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 9dd0fa9..ef582ca 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -1099,6 +1099,10 @@ static int __init acpi_processor_init(void)

acpi_processor_throttling_init();

+#ifdef CONFIG_ACPI_DECONFIGURE_CPU
+ acpi_processor_deconfigure_init();
+#endif
+
return 0;

out_cpuidle:
@@ -1112,6 +1116,10 @@ out_proc:

static void __exit acpi_processor_exit(void)
{
+
+#ifdef CONFIG_ACPI_DECONFIGURE_CPU
+ acpi_processor_deconfigure_exit();
+#endif
acpi_processor_ppc_exit();

acpi_thermal_cpufreq_exit();
diff --git a/drivers/acpi/processor_deconfigure.c b/drivers/acpi/processor_deconfigure.c
new file mode 100644
index 0000000..e656f97
--- /dev/null
+++ b/drivers/acpi/processor_deconfigure.c
@@ -0,0 +1,275 @@
+/*
+ * processor_deconfigure.c - CPU deconfiguration submodule of the
+ * ACPI processor driver
+ *
+ * (c) Copyright 2008 Hewlett-Packard Development Company, L.P.
+ * Alex Chiang <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ */
+
+#include <acpi/acpi.h>
+#include <acpi/acpi_drivers.h>
+#include <acpi/processor.h>
+
+static int supports_cpu_deconfigure;
+
+/*
+ * These function pointers must be overwritten by platforms supporting
+ * cpu deconfigure.
+ */
+static ssize_t (*show_deconfigure)(struct sys_device *, char *);
+static ssize_t (*store_deconfigure)(struct sys_device *, const char *, size_t);
+
+/*
+ * Under HP semantics, CPU deconfiguration is defined as removing a
+ * processor core or socket from operation at boot time, typically
+ * due to managability concerns, such as excessive detected errors.
+ *
+ * The HP semantics of 'deconfigure' are defined as:
+ *
+ * Mark CPU for deconfiguration at next boot.
+ * # echo 1 > /sys/devices/system/cpu/cpuN/deconfigure
+ *
+ * Mark CPU as enabled at next boot.
+ * # echo 0 > /sys/devices/system/cpu/cpuN/deconfigure
+ *
+ * Display next boot's deconfigure status
+ * 0x0 - not marked for deconfiguration
+ * 0x1 - scheduled deconfig at next boot
+ * 0x3 - scheduled, OS-requested deconfig at next boot
+ * 0x4 - thread disabled by firmware
+ * # cat /sys/devices/system/cpu/cpuN/deconfigure
+ *
+ * After echo'ing 0 or 1 into deconfigure, cat'ing the file will
+ * return the next boot's status. However, the CPU will not actually
+ * be deconfigured until the next boot.
+ *
+ * Attempting to configure or deconfigure a disabled thread is disallowed.
+ */
+struct hp_deconfigure_cb_args {
+ int cpu;
+ char *method;
+};
+
+static acpi_status hp_deconfigure_cb(acpi_handle handle,
+ u32 lvl,
+ void *context,
+ void **rv)
+{
+ int cpu;
+ acpi_status status;
+ acpi_integer scfg;
+ struct hp_deconfigure_cb_args *args = context;
+ union acpi_object object = { 0 };
+ struct acpi_buffer buffer = { sizeof(union acpi_object), &object };
+
+ status = acpi_evaluate_object(handle, NULL, NULL, &buffer);
+ if (ACPI_FAILURE(status))
+ return AE_OK;
+
+ cpu = object.processor.proc_id;
+ if (cpu != args->cpu)
+ return AE_OK;
+
+ /*
+ * Always check SCFG. If this is what the user actually wanted,
+ * great, just return the answer. If the user wanted something
+ * else, check to see if they were trying to poke a disabled
+ * hardware thread and disallow it if so.
+ */
+ status = acpi_evaluate_object(handle, "SCFG", NULL, &buffer);
+ scfg = object.integer.value;
+ if (!strcmp(args->method, "SCFG"))
+ **(int **)rv = ACPI_SUCCESS(status) ? scfg : -1;
+ /*
+ * Disallow E/DCFG on disabled threads
+ */
+ else if (scfg == 0x4)
+ **(int **)rv = -1;
+ else {
+ status = acpi_evaluate_object(handle, args->method,
+ NULL, &buffer);
+ **(int **)rv = ACPI_SUCCESS(status) ? status : -1;
+ }
+
+ return AE_CTRL_TERMINATE;
+}
+
+/*
+ * We can do this the easy way or the hard way. The easy way is,
+ * if the CPU is online, we have easy access to its ACPI handle
+ * via its per_cpu() data area, and we can call SCFG directly.
+ *
+ * The hard way is when the CPU is not online, and does not have
+ * a valid per_cpu() data area. In that case, we have to walk the
+ * ACPI namespace, looking for the CPU and calling SCFG that way.
+ */
+static ssize_t hp_show_deconfigure(struct sys_device *dev, char *buf)
+{
+ int logical_cpu;
+ unsigned long cfg;
+ struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+
+ logical_cpu = cpu->sysdev.id;
+
+ if (cpu_isset(logical_cpu, cpu_online_map)) {
+ unsigned long tmp;
+ acpi_status status;
+ struct acpi_processor *pr;
+
+ pr = processors[logical_cpu];
+ status = acpi_evaluate_integer(pr->handle, "SCFG", NULL, &tmp);
+ cfg = ACPI_SUCCESS(status) ? tmp : -1;
+ } else {
+ int ret;
+ void *ret_ptr = &ret;
+ struct hp_deconfigure_cb_args args;
+
+ args.cpu = logical_cpu;
+ args.method = "SCFG";
+ acpi_walk_namespace(ACPI_TYPE_PROCESSOR,
+ ACPI_ROOT_OBJECT,
+ ACPI_UINT32_MAX,
+ hp_deconfigure_cb,
+ &args,
+ (void *)&ret_ptr);
+ cfg = ret;
+ }
+
+ return sprintf(buf, "%#lx\n", cfg);
+}
+
+/*
+ * We can do this the easy way or the hard way. The easy way is,
+ * if the CPU is online, we have easy access to its ACPI handle
+ * via its per_cpu() data area, and we can call E/D-CFG directly.
+ *
+ * The hard way is when the CPU is not online, and does not have
+ * a valid per_cpu() data area. In that case, we have to walk the
+ * ACPI namespace, looking for the CPU and calling E/D-CFG that way.
+ */
+static ssize_t hp_store_deconfigure(struct sys_device *dev, const char *buf,
+ size_t count)
+{
+ ssize_t ret;
+ char *method;
+ int logical_cpu;
+ struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+
+ logical_cpu = cpu->sysdev.id;
+ switch (buf[0]) {
+ case '0':
+ method = "ECFG";
+ break;
+ case '1':
+ method = "DCFG";
+ break;
+ default:
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (cpu_isset(logical_cpu, cpu_online_map)) {
+ struct acpi_processor *pr;
+ pr = processors[logical_cpu];
+ ret = acpi_evaluate_object(pr->handle, method, NULL, NULL);
+ } else {
+ int r;
+ void *ret_ptr = &r;
+ struct hp_deconfigure_cb_args args;
+
+ args.cpu = logical_cpu;
+ args.method = method;
+ acpi_walk_namespace(ACPI_TYPE_PROCESSOR,
+ ACPI_ROOT_OBJECT,
+ ACPI_UINT32_MAX,
+ hp_deconfigure_cb,
+ &args,
+ (void *)&ret_ptr);
+ ret = r;
+ }
+
+ if (ret == 0)
+ if (!strcmp(method, "ECFG"))
+ cpu_set(logical_cpu, cpu_enabled_map);
+ else
+ cpu_clear(logical_cpu, cpu_enabled_map);
+
+out:
+ if (ret >= 0)
+ ret = count;
+ return ret;
+}
+
+static int hp_check_cpu_deconfigure(const struct dmi_system_id *d)
+{
+ acpi_handle hnd;
+ struct acpi_processor *pr;
+
+ /*
+ * Operating assumption is that either all or none of the CPUs
+ * will support deconfiguration.
+ */
+ pr = processors[0];
+ if (ACPI_SUCCESS(acpi_get_handle(pr->handle, "SCFG", &hnd))) {
+ supports_cpu_deconfigure = 1;
+ show_deconfigure = hp_show_deconfigure;
+ store_deconfigure = hp_store_deconfigure;
+ }
+
+ return 0;
+}
+
+static struct dmi_system_id cpu_deconfigure_dmi_table[] __initdata = {
+ {
+ .callback = hp_check_cpu_deconfigure,
+ .ident = "Hewlett-Packard",
+ .matches = {
+ DMI_MATCH(DMI_BIOS_VENDOR, "HP"),
+ },
+ },
+ {
+ .callback = hp_check_cpu_deconfigure,
+ .ident = "Hewlett-Packard",
+ .matches = {
+ DMI_MATCH(DMI_BIOS_VENDOR, "Hewlett-Packard"),
+ },
+ },
+ {}
+};
+
+static SYSDEV_ATTR(deconfigure, 0644, NULL, NULL);
+
+void __init acpi_processor_deconfigure_init(void)
+{
+ int i;
+ struct sys_device *sysdev;
+
+ dmi_check_system(cpu_deconfigure_dmi_table);
+
+ if (supports_cpu_deconfigure) {
+ attr_deconfigure.show = show_deconfigure;
+ attr_deconfigure.store = store_deconfigure;
+
+ for_each_present_cpu(i) {
+ sysdev = get_cpu_sysdev(i);
+ sysdev_create_file(sysdev, &attr_deconfigure);
+ }
+ }
+}
+
+void acpi_processor_deconfigure_exit(void)
+{
+ int i;
+ struct sys_device *sysdev;
+
+ if (supports_cpu_deconfigure) {
+ for_each_present_cpu(i) {
+ sysdev = get_cpu_sysdev(i);
+ sysdev_remove_file(sysdev, &attr_deconfigure);
+ }
+ }
+}
diff --git a/include/acpi/processor.h b/include/acpi/processor.h
index 06ebb6e..071fd42 100644
--- a/include/acpi/processor.h
+++ b/include/acpi/processor.h
@@ -289,6 +289,12 @@ static inline void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx
}
#endif

+#ifdef CONFIG_ACPI_DECONFIGURE_CPU
+/* in processor_deconfigure.c */
+void __init acpi_processor_deconfigure_init(void);
+void acpi_processor_deconfigure_exit(void);
+#endif
+
/* in processor_perflib.c */

#ifdef CONFIG_CPU_FREQ

2008-07-15 02:57:16

by Alex Chiang

[permalink] [raw]
Subject: [PATCH 11/14] x86: Populate cpu_enabled_map

Populate the cpu_enabled_map correctly.

Note that this patch does not actually make any decisions based
on the contents of the map.

However, as the map is presented via sysfs in:

/sys/devices/system/cpu/

It should be populated correctly.

There will be a user-visible change under the above directory.
cpuN/ entries for firmware-disabled CPUs will now appear, whereas
before, they did not due to a check against ACPI_MADT_ENABLED.

The cpuN/ entries will be empty, and the online file in the
above directory will reflect which CPUs are actually schedulable.

Signed-off-by: Alex Chiang <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
---

arch/x86/kernel/acpi/boot.c | 7 +++++--
arch/x86/kernel/apic_32.c | 1 +
arch/x86/kernel/apic_64.c | 1 +
arch/x86/kernel/smpboot.c | 2 ++
arch/x86/mach-voyager/voyager_smp.c | 2 ++
arch/x86/xen/smp.c | 1 +
6 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 33c5216..c6dc5da 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -559,8 +559,7 @@ static int __cpuinit _acpi_map_lsapic(acpi_handle handle, int *pcpu)

lapic = (struct acpi_madt_local_apic *)obj->buffer.pointer;

- if (lapic->header.type != ACPI_MADT_TYPE_LOCAL_APIC ||
- !(lapic->lapic_flags & ACPI_MADT_ENABLED)) {
+ if (lapic->header.type != ACPI_MADT_TYPE_LOCAL_APIC) {
kfree(buffer.pointer);
return -EINVAL;
}
@@ -584,6 +583,9 @@ static int __cpuinit _acpi_map_lsapic(acpi_handle handle, int *pcpu)
return -EINVAL;
}

+ if (lapic->lapic_flags & ACPI_MADT_ENABLED)
+ cpu_set(cpu, cpu_enabled_map);
+
cpu = first_cpu(new_map);

*pcpu = cpu;
@@ -601,6 +603,7 @@ int acpi_unmap_lsapic(int cpu)
{
per_cpu(x86_cpu_to_apicid, cpu) = -1;
cpu_clear(cpu, cpu_present_map);
+ cpu_clear(cpu, cpu_enabled_map);
num_processors--;

return (0);
diff --git a/arch/x86/kernel/apic_32.c b/arch/x86/kernel/apic_32.c
index 4b99b1b..787699e 100644
--- a/arch/x86/kernel/apic_32.c
+++ b/arch/x86/kernel/apic_32.c
@@ -1547,6 +1547,7 @@ void __cpuinit generic_processor_info(int apicid, int version)
#endif
cpu_set(cpu, cpu_possible_map);
cpu_set(cpu, cpu_present_map);
+ cpu_set(cpu, cpu_enabled_map);
}

/*
diff --git a/arch/x86/kernel/apic_64.c b/arch/x86/kernel/apic_64.c
index 0633cfd..ad21476 100644
--- a/arch/x86/kernel/apic_64.c
+++ b/arch/x86/kernel/apic_64.c
@@ -1104,6 +1104,7 @@ void __cpuinit generic_processor_info(int apicid, int version)

cpu_set(cpu, cpu_possible_map);
cpu_set(cpu, cpu_present_map);
+ cpu_set(cpu, cpu_enabled_map);
}

/*
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3e1cece..3d378cf 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -997,6 +997,7 @@ do_rest:
cpu_clear(cpu, cpu_callout_map); /* was set by do_boot_cpu() */
cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
cpu_clear(cpu, cpu_present_map);
+ cpu_clear(cpu, cpu_enabled_map);
per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
}

@@ -1086,6 +1087,7 @@ int __cpuinit native_cpu_up(unsigned int cpu)
static __init void disable_smp(void)
{
cpu_present_map = cpumask_of_cpu(0);
+ cpu_enabled_map = cpumask_of_cpu(0);
cpu_possible_map = cpumask_of_cpu(0);
#ifdef CONFIG_X86_32
smpboot_clear_io_apic_irqs();
diff --git a/arch/x86/mach-voyager/voyager_smp.c b/arch/x86/mach-voyager/voyager_smp.c
index 8acbf0c..8dee2b8 100644
--- a/arch/x86/mach-voyager/voyager_smp.c
+++ b/arch/x86/mach-voyager/voyager_smp.c
@@ -606,6 +606,7 @@ static void __init do_boot_cpu(__u8 cpu)
wmb();
cpu_set(cpu, cpu_callout_map);
cpu_set(cpu, cpu_present_map);
+ cpu_set(cpu, cpu_enabled_map);
} else {
printk("CPU%d FAILED TO BOOT: ", cpu);
if (*
@@ -1825,6 +1826,7 @@ static void __cpuinit voyager_smp_prepare_boot_cpu(void)
cpu_set(smp_processor_id(), cpu_callout_map);
cpu_set(smp_processor_id(), cpu_possible_map);
cpu_set(smp_processor_id(), cpu_present_map);
+ cpu_set(smp_processor_id(), cpu_enabled_map);
}

static int __cpuinit voyager_cpu_up(unsigned int cpu)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 94e6900..bc0a53d 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -209,6 +209,7 @@ void __init xen_smp_prepare_cpus(unsigned int max_cpus)
panic("failed fork for CPU %d", cpu);

cpu_set(cpu, cpu_present_map);
+ cpu_set(cpu, cpu_enabled_map);
}

//init_xenbus_allowed_cpumask();

2008-07-15 03:15:40

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 01/14] Introduce cpu_enabled_map and friends

On Mon, Jul 14, 2008 at 08:33:49PM -0600, Alex Chiang wrote:
> Currently, the following cpu maps exist:
>
> cpu_possible_map - map of populatable CPUs
> cpu_present_map - map of populated CPUs
> cpu_online_map - map of schedulable CPUs
>
> These maps do not provide the concept of populated, but disabled CPUs.
>
> That is, a system may contain CPU modules that are physically plugged
> in, but disabled by system firmware.

I don't understand why we want to know about these CPUs. Surely they
should be 'possible', but not 'present'? What useful thing can Linux do
with them?

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-07-15 05:52:00

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 07/14] [POWERPC] Populate cpu_enabled_map

On Mon, 2008-07-14 at 20:34 -0600, Alex Chiang wrote:
> Populate the cpu_enabled_map correctly.
>
> Note that this patch does not actually make any decisions based
> on the contents of the map.
>
> However, as the map is presented via sysfs in:
>
> /sys/devices/system/cpu/
>
> It should be populated correctly.

Care to educate me on the difference between online_map and
enabled_map ?

Cheers,
Ben.

> Signed-off-by: Alex Chiang <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> ---
>
> arch/powerpc/kernel/setup-common.c | 1 +
> arch/powerpc/platforms/powermac/smp.c | 1 +
> arch/powerpc/platforms/pseries/hotplug-cpu.c | 2 ++
> 3 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> index db540ea..a4c894a 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -413,6 +413,7 @@ void __init smp_setup_cpu_maps(void)
> DBG(" thread %d -> cpu %d (hard id %d)\n",
> j, cpu, intserv[j]);
> cpu_set(cpu, cpu_present_map);
> + cpu_set(cpu, cpu_enabled_map);
> set_hard_smp_processor_id(cpu, intserv[j]);
> cpu_set(cpu, cpu_possible_map);
> cpu++;
> diff --git a/arch/powerpc/platforms/powermac/smp.c b/arch/powerpc/platforms/powermac/smp.c
> index cb2d894..a74dada 100644
> --- a/arch/powerpc/platforms/powermac/smp.c
> +++ b/arch/powerpc/platforms/powermac/smp.c
> @@ -317,6 +317,7 @@ static int __init smp_psurge_probe(void)
> ncpus = NR_CPUS;
> for (i = 1; i < ncpus ; ++i) {
> cpu_set(i, cpu_present_map);
> + cpu_set(i, cpu_enabled_map);
> set_hard_smp_processor_id(i, i);
> }
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 1f03248..e738b07 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -186,6 +186,7 @@ static int pseries_add_processor(struct device_node *np)
> for_each_cpu_mask(cpu, tmp) {
> BUG_ON(cpu_isset(cpu, cpu_present_map));
> cpu_set(cpu, cpu_present_map);
> + cpu_set(cpu, cpu_enabled_map);
> set_hard_smp_processor_id(cpu, *intserv++);
> }
> err = 0;
> @@ -218,6 +219,7 @@ static void pseries_remove_processor(struct device_node *np)
> continue;
> BUG_ON(cpu_online(cpu));
> cpu_clear(cpu, cpu_present_map);
> + cpu_clear(cpu, cpu_enabled_map);
> set_hard_smp_processor_id(cpu, -1);
> break;
> }

2008-07-15 10:04:16

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 01/14] Introduce cpu_enabled_map and friends

Matthew Wilcox <[email protected]> writes:
>
> I don't understand why we want to know about these CPUs. Surely they
> should be 'possible', but not 'present'? What useful thing can Linux do
> with them?

He explained it in the intro, near the end (I nearly complained about
this too when I hadn't finished reading it completely :):

|The big picture implication is that we can allow userspace
|to interact with disabled CPUs. In this particular example,
|we provide a knob that lets a sysadmin schedule any present
|CPU for firmware deconfiguration or enablement.

The reason sounds pretty exotic, but ok.

-Andi

2008-07-15 10:27:21

by Russell King

[permalink] [raw]
Subject: Re: [PATCH 01/14] Introduce cpu_enabled_map and friends

On Tue, Jul 15, 2008 at 12:03:27PM +0200, Andi Kleen wrote:
> Matthew Wilcox <[email protected]> writes:
> >
> > I don't understand why we want to know about these CPUs. Surely they
> > should be 'possible', but not 'present'? What useful thing can Linux do
> > with them?
>
> He explained it in the intro, near the end (I nearly complained about
> this too when I hadn't finished reading it completely :):
>
> |The big picture implication is that we can allow userspace
> |to interact with disabled CPUs. In this particular example,
> |we provide a knob that lets a sysadmin schedule any present
> |CPU for firmware deconfiguration or enablement.
>
> The reason sounds pretty exotic, but ok.

I don't see why this needs to be cross architecture then - shouldn't
the generic kernel only be concerning itself with things that are
possible, present and/or online?

If you have an interface which allows you to change the machines
configuration in a machine specific way, shouldn't that be something
for that machine to support and forced upon the entire kernel?

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:

2008-07-15 17:06:52

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 14/14] ACPI: Provide /sys/devices/system/cpu/cpuN/deconfigure

Alex Chiang wrote:
> Provide a new sysfs interface for CPU deconfiguration.
>
> Since no vendors can agree on terminology for related but slightly
> different features, provide a method for a platform to implement
> its own version of what it thinks 'deconfiguring' a CPU might be.
>
> Provide an HP-specific CPU deconfiguration implementation.

Why are you ccing this to linux-arch? Dropped.

What is the standard status of these new SCFG and ECFG tables? Have they
been submitted for possible inclusion in ACPI? And is there a spec
available? I can't say I'm really thrilled with having HP specific
support in there.

It would be better at least if you could reserve the table names and
then drop the HP DMI check. This is needed anyways, otherwise the
standard at some point could add different ECFG/SCFG tables.

> + * After echo'ing 0 or 1 into deconfigure, cat'ing the file will
> + * return the next boot's status. However, the CPU will not actually
> + * be deconfigured until the next boot.

Now that seems like weird semantics for a public fixed API. What happens
when some other vendor adds hot deconfiguration?

My feeling is that this seems to be overly specific to your BIOS
and might better belong into some separate management tool. At least
until we can define a nice general API for this with clear semantics.
For what systems is this anyways?

-Andi

2008-07-15 17:57:52

by Alex Chiang

[permalink] [raw]
Subject: Re: [PATCH 01/14] Introduce cpu_enabled_map and friends

* Russell King <[email protected]>:
> On Tue, Jul 15, 2008 at 12:03:27PM +0200, Andi Kleen wrote:
> > Matthew Wilcox <[email protected]> writes:
> > >
> > > I don't understand why we want to know about these CPUs.
> > > Surely they should be 'possible', but not 'present'? What
> > > useful thing can Linux do with them?
> >
> > He explained it in the intro, near the end (I nearly
> > complained about this too when I hadn't finished reading it
> > completely :):
> >
> > |The big picture implication is that we can allow userspace
> > |to interact with disabled CPUs. In this particular example,
> > |we provide a knob that lets a sysadmin schedule any present
> > |CPU for firmware deconfiguration or enablement.
> >
> > The reason sounds pretty exotic, but ok.
>
> I don't see why this needs to be cross architecture then -
> shouldn't the generic kernel only be concerning itself with
> things that are possible, present and/or online?

I suppose that's a fair statement. Touching all the archs for
something 'exotic' like this does seem to be a bit of an
overkill.

My thought was that big SMP systems like ia64, possibly sparc and
ppc, and increasingly, x86, might find something like this
useful, as systems get larger and larger, and vendors are going
to want to do RAS-ish features, like the ability to keep CPUs in
firmware across reboots until told otherwise by the sysadmin.

Right now, a 'present' CPU strongly implies 'online' as well,
since we're calling cpu_up() for all 'present' CPUs in
smp_init(). But this hurts if:

- you don't actually want to bring up all 'present' CPUs
- you still want to interact with these weirdo zombie
CPUs that are 'present' but not 'online'

That second item refers to creating a sysfs interface for each
'present' CPU in topology_init().

This feature puts a tax on smaller archs like arm, but maybe I
could be smarter about it by using a

#define cpu_enabled_mask cpu_online_mask

Hrm?

> If you have an interface which allows you to change the
> machines configuration in a machine specific way, shouldn't
> that be something for that machine to support and forced upon
> the entire kernel?

I think that the generic kernel would be the appropriate place to
create a place for these zombie CPUs, and give the vendor specific
stuff a way to hook in.

I'd be interested in learning if any of the other 'big' archs
would have a use for something like this.

Thanks.

/ac

2008-07-15 18:16:58

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 01/14] Introduce cpu_enabled_map and friends

On Tue, Jul 15, 2008 at 11:57:40AM -0600, Alex Chiang wrote:
> My thought was that big SMP systems like ia64, possibly sparc and
> ppc, and increasingly, x86, might find something like this
> useful, as systems get larger and larger, and vendors are going
> to want to do RAS-ish features, like the ability to keep CPUs in
> firmware across reboots until told otherwise by the sysadmin.
>
> Right now, a 'present' CPU strongly implies 'online' as well,
> since we're calling cpu_up() for all 'present' CPUs in
> smp_init(). But this hurts if:
>
> - you don't actually want to bring up all 'present' CPUs
> - you still want to interact with these weirdo zombie
> CPUs that are 'present' but not 'online'

Have you considered simply failing __cpu_up() for CPUs that are
deconfigured by firmware?

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-07-15 18:40:50

by Alex Chiang

[permalink] [raw]
Subject: Re: [PATCH 14/14] ACPI: Provide /sys/devices/system/cpu/cpuN/deconfigure

* Andi Kleen <[email protected]>:
> Alex Chiang wrote:
>> Provide a new sysfs interface for CPU deconfiguration.
>>
>> Since no vendors can agree on terminology for related but slightly
>> different features, provide a method for a platform to implement
>> its own version of what it thinks 'deconfiguring' a CPU might be.
>>
>> Provide an HP-specific CPU deconfiguration implementation.
>
> Why are you ccing this to linux-arch? Dropped.

Hm, sorry.

I thought it would have been weird to send patches 1-13 / 14 to
linux-arch, but not send 14 / 14. Perhaps I should set a Reply-to:
in the future?

> What is the standard status of these new SCFG and ECFG tables?
> Have they been submitted for possible inclusion in ACPI? And is
> there a spec available? I can't say I'm really thrilled with
> having HP specific support in there.
>
> It would be better at least if you could reserve the table
> names and then drop the HP DMI check. This is needed anyways,
> otherwise the standard at some point could add different
> ECFG/SCFG tables.

These are not new tables -- they are methods that live underneath
processor objects in the namespace.

Yes, they are specific to HP, but because they are methods, there
shouldn't be any collision with other vendors defining methods
with the same name (with the DMI check).

>> + * After echo'ing 0 or 1 into deconfigure, cat'ing the file will
>> + * return the next boot's status. However, the CPU will not actually
>> + * be deconfigured until the next boot.
>
> Now that seems like weird semantics for a public fixed API.
> What happens when some other vendor adds hot deconfiguration?

Yeah, I'm not totally happy with it either. But I'd like to
clarify -- are you concerned with the name of the interface
(deconfigure) or the "nothing happens until next boot" behavior?

Would it help if I renamed it to "enabled" and had something
like:

echo 0 > enabled
echo 1 > enabled
cat enabled

And then that would map to vendor-specific behavior?

Or is it really the "nothing happens until next boot" thing that
bothers you?

> My feeling is that this seems to be overly specific to your
> BIOS and might better belong into some separate management
> tool. At least until we can define a nice general API for this
> with clear semantics.

Well, life would be a lot easier if we had a generic way to poke
at ACPI methods, but dev_acpi has been rejected multiple times.
;)

On a more serious note, my fear is that an interface like this is
not going to have agreement / clear semantics for a long time
because one vendor is going to want to call it 'deconfigured' and
another one might want to call it 'disabled' and a third might
want to call it 'puppy_dogs' and they'll all be kinda related but
not exactly the same.

That's why I was going down the road of creating at least one
generic interface, but with vendor-specific semantics.

Calling it 'enabled' might be a better idea.

> For what systems is this anyways?

We have HP ia64 systems shipping today that support this.

Thanks.

/ac

2008-07-15 18:59:16

by Russell King

[permalink] [raw]
Subject: Re: [PATCH 01/14] Introduce cpu_enabled_map and friends

On Tue, Jul 15, 2008 at 12:16:32PM -0600, Matthew Wilcox wrote:
> On Tue, Jul 15, 2008 at 11:57:40AM -0600, Alex Chiang wrote:
> > My thought was that big SMP systems like ia64, possibly sparc and
> > ppc, and increasingly, x86, might find something like this
> > useful, as systems get larger and larger, and vendors are going
> > to want to do RAS-ish features, like the ability to keep CPUs in
> > firmware across reboots until told otherwise by the sysadmin.
> >
> > Right now, a 'present' CPU strongly implies 'online' as well,
> > since we're calling cpu_up() for all 'present' CPUs in
> > smp_init(). But this hurts if:
> >
> > - you don't actually want to bring up all 'present' CPUs
> > - you still want to interact with these weirdo zombie
> > CPUs that are 'present' but not 'online'
>
> Have you considered simply failing __cpu_up() for CPUs that are
> deconfigured by firmware?

But what if you want to have a system boot with, say, 4 CPUs and
then decide at run time to bring up another 4 CPUs when required?

How about having smp_init() call into arch code to query whether
it should bring up a not-already-online CPU? Architectures that
want to do something special can then make the decision there and
everyone else can define the test completely away.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:

2008-07-15 19:15:39

by Alex Chiang

[permalink] [raw]
Subject: Re: [PATCH 01/14] Introduce cpu_enabled_map and friends

* Russell King <[email protected]>:
> On Tue, Jul 15, 2008 at 12:16:32PM -0600, Matthew Wilcox wrote:
> > On Tue, Jul 15, 2008 at 11:57:40AM -0600, Alex Chiang wrote:
> > > My thought was that big SMP systems like ia64, possibly sparc and
> > > ppc, and increasingly, x86, might find something like this
> > > useful, as systems get larger and larger, and vendors are going
> > > to want to do RAS-ish features, like the ability to keep CPUs in
> > > firmware across reboots until told otherwise by the sysadmin.
> > >
> > > Right now, a 'present' CPU strongly implies 'online' as well,
> > > since we're calling cpu_up() for all 'present' CPUs in
> > > smp_init(). But this hurts if:
> > >
> > > - you don't actually want to bring up all 'present' CPUs
> > > - you still want to interact with these weirdo zombie
> > > CPUs that are 'present' but not 'online'
> >
> > Have you considered simply failing __cpu_up() for CPUs that are
> > deconfigured by firmware?
>
> But what if you want to have a system boot with, say, 4 CPUs and
> then decide at run time to bring up another 4 CPUs when required?
>
> How about having smp_init() call into arch code to query whether
> it should bring up a not-already-online CPU? Architectures that
> want to do something special can then make the decision there and
> everyone else can define the test completely away.

So this is exactly what I'm doing. The ia64 patch has this hunk:

@@ -820,6 +824,9 @@ __cpu_up (unsigned int cpu)
if (cpu_isset(cpu, cpu_callin_map))
return -EINVAL;

+ if (!cpu_isset(cpu, cpu_enabled_map))
+ return -EINVAL;
+
per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
/* Processor goes to start_secondary(), sets online flag */
ret = do_boot_cpu(sapicid, cpu);

That was the easiest, most-straightforward solution I could think
of. If you have an idea for a version with lower taxes (doesn't
touch all the archs or can be #define'd out), I'm happy to hear
it.

Thanks.

/ac

2008-07-15 20:10:51

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH 00/14] Introduce cpu_enabled_map and friends

> Patch 14 is the money patch. It demonstrates why we might
> want to go through all these gyrations. Now that ia64 presents
> *all* physically present CPUs in sysfs, even if they have been
> disabled by firmware, we give userspace a way to poke at those
> CPUs.

There's only the one bit for "disabled by firmware" ... no extra
space for any extra information. How would userspace know that
it was safe to poke at a disabled cpu? Perhaps firmware disabled
it for some very good reason, and poking at it could cause system
instability.

-Tony
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2008-07-15 23:54:20

by Alex Chiang

[permalink] [raw]
Subject: Re: [PATCH 00/14] Introduce cpu_enabled_map and friends

I didn't include linux-ia64 originally. Sorry about that.

Here is the 00/14 cover email describing the patch series:

http://lkml.org/lkml/2008/7/14/468

Here is the 12/14 ia64 specific bit:

http://lkml.org/lkml/2008/7/14/478

Here is the 14/14 patch that Tony is referring to:

http://lkml.org/lkml/2008/7/14/482

* Luck, Tony <[email protected]>:
> > Patch 14 is the money patch. It demonstrates why we might
> > want to go through all these gyrations. Now that ia64 presents
> > *all* physically present CPUs in sysfs, even if they have been
> > disabled by firmware, we give userspace a way to poke at those
> > CPUs.
>
> There's only the one bit for "disabled by firmware" ... no extra
> space for any extra information. How would userspace know that
> it was safe to poke at a disabled cpu? Perhaps firmware disabled
> it for some very good reason, and poking at it could cause system
> instability.

My thought here was that it would be a vendor-specific thing. In
patch 14/14 I created:

/sys/devices/system/cpu/cpuN/deconfigure

(although /sys/device/system/cpu/cpuN/enabled would probably be
better)

I set up 'deconfigure' to have different implementations based on
a DMI, so it is very much an opt-in (especially since it's a
Kconfig option).

It would be the responsibility of the vendor to provide something
safe to poke at. In the sample implementation I gave, nothing
happens to the system until the *next* reboot, so it shouldn't
cause the current boot any distress.

A different implementation of deconfigure/enabled could return
an error to userspace if an operation was unsafe.

/ac

2008-07-16 01:04:35

by Alex Chiang

[permalink] [raw]
Subject: Re: [PATCH 07/14] [POWERPC] Populate cpu_enabled_map

* Benjamin Herrenschmidt <[email protected]>:
> On Mon, 2008-07-14 at 20:34 -0600, Alex Chiang wrote:
> > Populate the cpu_enabled_map correctly.
> >
> > Note that this patch does not actually make any decisions based
> > on the contents of the map.
> >
> > However, as the map is presented via sysfs in:
> >
> > /sys/devices/system/cpu/
> >
> > It should be populated correctly.
>
> Care to educate me on the difference between online_map and
> enabled_map ?

enabled_map is closer conceptually to present_map than
online_map.

present_map are CPUs that are actually plugged in

online_map are CPUs that have had cpu_up() called on them; ie.
schedulable

enabled_map is somewhere inbetween -- the CPUs are plugged in,
but we don't want to cpu_up() them. On hp ia64 systems, these
CPUs are disabled by system firmware.

Currently, a user can only configure/deconfigure the CPUs from
the system firmware interface. By providing a sysfs interface for
these CPUs, we can allow the user to configure/deconfigure them
from userspace. More realistically, higher level managability
software now has an OS-level interface to interact with these
CPUs.

Might this be useful for ppc and your hypervisor based
architecture? I could imagine your hypervisor telling the kernel
about all the physically present CPUs, but then you would be able
to have finer grained control using the enabled_map.

I haven't studied your code in depth, so maybe you can just do
everything with pure online/offline, but at least on my
platforms, there are use-cases where we might want something
in-between.

Thanks.

/ac

2008-07-16 01:11:55

by Alex Chiang

[permalink] [raw]
Subject: Re: [PATCH 01/14] Introduce cpu_enabled_map and friends

* Russell King <[email protected]>:
> On Tue, Jul 15, 2008 at 12:16:32PM -0600, Matthew Wilcox wrote:
> > On Tue, Jul 15, 2008 at 11:57:40AM -0600, Alex Chiang wrote:
> > > My thought was that big SMP systems like ia64, possibly sparc and
> > > ppc, and increasingly, x86, might find something like this
> > > useful, as systems get larger and larger, and vendors are going
> > > to want to do RAS-ish features, like the ability to keep CPUs in
> > > firmware across reboots until told otherwise by the sysadmin.
> > >
> > > Right now, a 'present' CPU strongly implies 'online' as well,
> > > since we're calling cpu_up() for all 'present' CPUs in
> > > smp_init(). But this hurts if:
> > >
> > > - you don't actually want to bring up all 'present' CPUs
> > > - you still want to interact with these weirdo zombie
> > > CPUs that are 'present' but not 'online'
> >
> > Have you considered simply failing __cpu_up() for CPUs that are
> > deconfigured by firmware?
>
> But what if you want to have a system boot with, say, 4 CPUs and
> then decide at run time to bring up another 4 CPUs when required?
>
> How about having smp_init() call into arch code to query whether
> it should bring up a not-already-online CPU? Architectures that
> want to do something special can then make the decision there and
> everyone else can define the test completely away.

I experimented today with an ia64-only solution, keeping track of
'present' vs 'enabled' vs 'online' all in arch-specific code.

The arch-specific stuff turns out to be more or less a wash; that
is, it's not too hard to keep it all in ia64.

However, the problem is, I would still need a generic
'enabled_map' to control whether 'online' and 'crash_notes'
entries get created for /sys/devices/system/cpu/cpuN/.

So if other archs are at least neutral on this class of CPUs, I
can work on another patchset that lowers the tax to a simple
#define for archs that don't care.

But if people hate this idea of a new map, I'd like to know so
that I'm not wasting my time and can work on a different solution
(what that would be, I have no idea at the moment).

Thanks.

/ac

2008-07-18 20:01:13

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 11/14] x86: Populate cpu_enabled_map

Alex Chiang wrote:
> Populate the cpu_enabled_map correctly.
>
> Note that this patch does not actually make any decisions based
> on the contents of the map.
>
> However, as the map is presented via sysfs in:
>
> /sys/devices/system/cpu/
>
> It should be populated correctly.
>
> There will be a user-visible change under the above directory.
> cpuN/ entries for firmware-disabled CPUs will now appear, whereas
> before, they did not due to a check against ACPI_MADT_ENABLED.
>
> The cpuN/ entries will be empty, and the online file in the
> above directory will reflect which CPUs are actually schedulable.
>
> Signed-off-by: Alex Chiang <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: H. Peter Anvin <[email protected]>

From an x86 standpoint this patchset seems reasonable to me.

Acked-by: H. Peter Anvin <[email protected]>

Since it is a panarch patchset it presumably should go via -mm rather
than in the arch trees, so I'm not going to add it to -tip.

Obviously, if the sematics of the operations don't make sense for other
architectures -- which I will leave up to the affected maintainers --
then that should be carefully considered if the generic operations can
be done better.

-hpa

2008-07-18 21:56:14

by Russell King

[permalink] [raw]
Subject: Re: [PATCH 01/14] Introduce cpu_enabled_map and friends

On Tue, Jul 15, 2008 at 01:15:15PM -0600, Alex Chiang wrote:
> * Russell King <[email protected]>:
> > On Tue, Jul 15, 2008 at 12:16:32PM -0600, Matthew Wilcox wrote:
> > > On Tue, Jul 15, 2008 at 11:57:40AM -0600, Alex Chiang wrote:
> > > > My thought was that big SMP systems like ia64, possibly sparc and
> > > > ppc, and increasingly, x86, might find something like this
> > > > useful, as systems get larger and larger, and vendors are going
> > > > to want to do RAS-ish features, like the ability to keep CPUs in
> > > > firmware across reboots until told otherwise by the sysadmin.
> > > >
> > > > Right now, a 'present' CPU strongly implies 'online' as well,
> > > > since we're calling cpu_up() for all 'present' CPUs in
> > > > smp_init(). But this hurts if:
> > > >
> > > > - you don't actually want to bring up all 'present' CPUs
> > > > - you still want to interact with these weirdo zombie
> > > > CPUs that are 'present' but not 'online'
> > >
> > > Have you considered simply failing __cpu_up() for CPUs that are
> > > deconfigured by firmware?
> >
> > But what if you want to have a system boot with, say, 4 CPUs and
> > then decide at run time to bring up another 4 CPUs when required?
> >
> > How about having smp_init() call into arch code to query whether
> > it should bring up a not-already-online CPU? Architectures that
> > want to do something special can then make the decision there and
> > everyone else can define the test completely away.
>
> So this is exactly what I'm doing. The ia64 patch has this hunk:
>
> @@ -820,6 +824,9 @@ __cpu_up (unsigned int cpu)
> if (cpu_isset(cpu, cpu_callin_map))
> return -EINVAL;
>
> + if (!cpu_isset(cpu, cpu_enabled_map))
> + return -EINVAL;
> +
> per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
> /* Processor goes to start_secondary(), sets online flag */
> ret = do_boot_cpu(sapicid, cpu);
>
> That was the easiest, most-straightforward solution I could think
> of. If you have an idea for a version with lower taxes (doesn't
> touch all the archs or can be #define'd out), I'm happy to hear
> it.

I think I did make a suggestion in the bit you quote from me above.

Let me be more explicit:

static void __init smp_init(void)
{
unsigned int cpu;

/* FIXME: This should be done in userspace --RR */
for_each_present_cpu(cpu) {
if (num_online_cpus() >= setup_max_cpus)
break;
- if (!cpu_online(cpu))
+ if (smp_cpu_enabled(cpu) && !cpu_online(cpu))
cpu_up(cpu);
}

/* Any cleanup work */
printk(KERN_INFO "Brought up %ld CPUs\n", (long)num_online_cpus());
smp_cpus_done(setup_max_cpus);
}

and have architectures provide 'smp_cpu_enabled(cpu)' which can either
be a function, inline function or a macro (and therefore possible to be
completely eliminated.)

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:

2008-07-18 23:06:54

by Alex Chiang

[permalink] [raw]
Subject: Re: [PATCH 11/14] x86: Populate cpu_enabled_map

* H. Peter Anvin <[email protected]>:
> Alex Chiang wrote:
>> Populate the cpu_enabled_map correctly.
>>
>> Note that this patch does not actually make any decisions based
>> on the contents of the map.
>>
>> However, as the map is presented via sysfs in:
>>
>> /sys/devices/system/cpu/
>>
>> It should be populated correctly.
>>
>> There will be a user-visible change under the above directory.
>> cpuN/ entries for firmware-disabled CPUs will now appear, whereas
>> before, they did not due to a check against ACPI_MADT_ENABLED.
>>
>> The cpuN/ entries will be empty, and the online file in the
>> above directory will reflect which CPUs are actually schedulable.
>>
>> Signed-off-by: Alex Chiang <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: H. Peter Anvin <[email protected]>
>
> From an x86 standpoint this patchset seems reasonable to me.
>
> Acked-by: H. Peter Anvin <[email protected]>

Thanks Peter. Let me try and rework the patchset according to
Russell's suggestion here:

http://lkml.org/lkml/2008/7/18/467

That approach seems cleaner to me.

> Obviously, if the sematics of the operations don't make sense
> for other architectures -- which I will leave up to the
> affected maintainers -- then that should be carefully
> considered if the generic operations can be done better.

Russell's solution avoids the issue with the ability to #define
the check away for archs that don't care.

cheers,

/ac

2008-07-18 23:08:47

by Alex Chiang

[permalink] [raw]
Subject: Re: [PATCH 01/14] Introduce cpu_enabled_map and friends

* Russell King <[email protected]>:
> On Tue, Jul 15, 2008 at 01:15:15PM -0600, Alex Chiang wrote:
> > * Russell King <[email protected]>:
> > >
> > > How about having smp_init() call into arch code to query whether
> > > it should bring up a not-already-online CPU? Architectures that
> > > want to do something special can then make the decision there and
> > > everyone else can define the test completely away.
> >
> > So this is exactly what I'm doing. The ia64 patch has this hunk:
> >
> > @@ -820,6 +824,9 @@ __cpu_up (unsigned int cpu)
> > if (cpu_isset(cpu, cpu_callin_map))
> > return -EINVAL;
> >
> > + if (!cpu_isset(cpu, cpu_enabled_map))
> > + return -EINVAL;
> > +
> > per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
> > /* Processor goes to start_secondary(), sets online flag */
> > ret = do_boot_cpu(sapicid, cpu);
> >
> > That was the easiest, most-straightforward solution I could think
> > of. If you have an idea for a version with lower taxes (doesn't
> > touch all the archs or can be #define'd out), I'm happy to hear
> > it.
>
> I think I did make a suggestion in the bit you quote from me above.
>
> Let me be more explicit:

Thanks, sorry for being dense.

> static void __init smp_init(void)
> {
> unsigned int cpu;
>
> /* FIXME: This should be done in userspace --RR */
> for_each_present_cpu(cpu) {
> if (num_online_cpus() >= setup_max_cpus)
> break;
> - if (!cpu_online(cpu))
> + if (smp_cpu_enabled(cpu) && !cpu_online(cpu))
> cpu_up(cpu);
> }
>
> /* Any cleanup work */
> printk(KERN_INFO "Brought up %ld CPUs\n", (long)num_online_cpus());
> smp_cpus_done(setup_max_cpus);
> }
>
> and have architectures provide 'smp_cpu_enabled(cpu)' which can either
> be a function, inline function or a macro (and therefore possible to be
> completely eliminated.)

Yup, this is nicer. I'll try this.

/ac