2023-12-13 12:48:12

by Russell King (Oracle)

[permalink] [raw]
Subject: [RFC PATCH v3 00/21] ACPI/arm64: add support for virtual cpu hotplug

Hi,

This is this remaining patches for ARM64 virtual cpu hotplug, which
follows on from the previous set of 21 patches that GregKH has
recently queued up, and "x86: intel_epb: Don't rely on link order"
which can be found at:

https://lore.kernel.org/r/[email protected]
https://lore.kernel.org/r/ZVyz/[email protected]

The entire series can be found at:

git://git.armlinux.org.uk/~rmk/linux-arm.git aarch64/hotplug-vcpu/head

The original cover message from the entire series is below the
diffstat.

Documentation/arch/arm64/cpu-hotplug.rst | 79 ++++++++++++++++
Documentation/arch/arm64/index.rst | 1 +
arch/arm64/include/asm/acpi.h | 11 +++
arch/arm64/kernel/acpi_numa.c | 11 ---
arch/arm64/kernel/psci.c | 2 +-
arch/arm64/kernel/smp.c | 3 +-
arch/loongarch/Kconfig | 2 +-
arch/loongarch/configs/loongson3_defconfig | 2 +-
arch/loongarch/kernel/acpi.c | 4 +-
arch/x86/Kconfig | 3 +-
arch/x86/kernel/acpi/boot.c | 4 +-
drivers/acpi/Kconfig | 13 ++-
drivers/acpi/acpi_processor.c | 141 ++++++++++++++++++++++++++---
drivers/acpi/bus.c | 16 ++++
drivers/acpi/device_pm.c | 2 +-
drivers/acpi/device_sysfs.c | 2 +-
drivers/acpi/internal.h | 1 -
drivers/acpi/property.c | 2 +-
drivers/acpi/scan.c | 140 ++++++++++++++++++----------
drivers/base/cpu.c | 16 +++-
drivers/irqchip/irq-gic-v3.c | 32 ++++---
include/acpi/acpi_bus.h | 1 +
include/acpi/actbl2.h | 1 +
include/linux/acpi.h | 10 +-
include/linux/cpumask.h | 25 +++++
kernel/cpu.c | 3 +
26 files changed, 421 insertions(+), 106 deletions(-)

On Tue, Oct 24, 2023 at 04:15:28PM +0100, Russell King (Oracle) wrote:
> Hi,
>
> I'm posting James' patch set updated with most of the review comments
> from his RFC v2 series back in September. Individual patches have a
> changelog attached at the bottom of the commit message. Those which
> I have finished updating have my S-o-b on them, those which still have
> outstanding review comments from RFC v2 do not. In some of these cases
> I've asked questions and am waiting for responses.
>
> I'm posting this as RFC v3 because there's still some unaddressed
> comments and it's clearly not ready for merging. Even if it was ready
> to be merged, it is too late in this development cycle to be taking
> this change in, so there would be little point posting it non-RFC.
> Also James stated that he's waiting for confirmation from the
> Kubernetes/Kata folk - I have no idea what the status is there.
>
> I will be sending each patch individually to a wider audience
> appropriate for that patch - apologies to those missing out on this
> cover message. I have added more mailing lists to the series with the
> exception of the acpica list in a hope of this cover message also
> reaching those folk.
>
> The changes that aren't included are:
>
> 1. Updates for my patch that was merged via Thomas (thanks!):
> c4dd854f740c cpu-hotplug: Provide prototypes for arch CPU registration
> rather than having this change spread through James' patches.
>
> 2. New patch - simplification of PA-RISC's smp_prepare_boot_cpu()
>
> 3. Moved "ACPI: Use the acpi_device_is_present() helper in more places"
> and "ACPI: Rename acpi_scan_device_not_present() to be about
> enumeration" to the beginning of the series - these two patches are
> already queued up for merging into 6.7.
>
> 4. Moved "arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into
> a helper" to the beginning of the series, which has been submitted,
> but as yet the fate of that posting isn't known.
>
> The first four patches in this series are provided for completness only.
>
> There is an additional patch in James' git tree that isn't in the set
> of patches that James posted: "ACPI: processor: Only call
> arch_unregister_cpu() if HOTPLUG_CPU is selected" which looks to me to
> be a workaround for arch_unregister_cpu() being under the ifdef. I've
> commented on this on the RFC v2 posting making a suggestion, but as yet
> haven't had any response.
>
> I've included almost all of James' original covering body below the
> diffstat.
>
> The reason that I'm doing this is to help move this code forward so
> hopefully it can be merged - which is why I have been keen to dig out
> from James' patches anything that can be merged and submit it
> separately, since this is a feature for which some users have a
> definite need for.
>
> Please note that I haven't tested this beyond building for aarch64 at
> the present time.
>
> The series can be found at:
>
> git://git.armlinux.org.uk/~rmk/linux-arm.git aarch64/hotplug-vcpu/v6.6-rc7
>
> Documentation/arch/arm64/cpu-hotplug.rst | 79 +++++++++++++++
> Documentation/arch/arm64/index.rst | 1 +
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/acpi.h | 11 +++
> arch/arm64/include/asm/cpu.h | 1 -
> arch/arm64/kernel/acpi_numa.c | 11 ---
> arch/arm64/kernel/psci.c | 2 +-
> arch/arm64/kernel/setup.c | 13 +--
> arch/arm64/kernel/smp.c | 5 +-
> arch/ia64/Kconfig | 3 +
> arch/ia64/include/asm/acpi.h | 2 +-
> arch/ia64/include/asm/cpu.h | 6 --
> arch/ia64/kernel/acpi.c | 6 +-
> arch/ia64/kernel/setup.c | 2 +-
> arch/ia64/kernel/topology.c | 35 +------
> arch/loongarch/Kconfig | 2 +
> arch/loongarch/configs/loongson3_defconfig | 2 +-
> arch/loongarch/kernel/acpi.c | 4 +-
> arch/loongarch/kernel/topology.c | 38 +-------
> arch/parisc/kernel/smp.c | 8 +-
> arch/riscv/Kconfig | 1 +
> arch/riscv/kernel/setup.c | 19 +---
> arch/x86/Kconfig | 3 +
> arch/x86/include/asm/cpu.h | 4 -
> arch/x86/kernel/acpi/boot.c | 4 +-
> arch/x86/kernel/cpu/intel_epb.c | 2 +-
> arch/x86/kernel/topology.c | 27 +-----
> drivers/acpi/Kconfig | 14 ++-
> drivers/acpi/acpi_processor.c | 151 +++++++++++++++++++++++------
> drivers/acpi/bus.c | 16 +++
> drivers/acpi/device_pm.c | 2 +-
> drivers/acpi/device_sysfs.c | 2 +-
> drivers/acpi/internal.h | 1 -
> drivers/acpi/processor_core.c | 2 +-
> drivers/acpi/property.c | 2 +-
> drivers/acpi/scan.c | 148 ++++++++++++++++++----------
> drivers/base/arch_topology.c | 38 +++++---
> drivers/base/cpu.c | 44 +++++++--
> drivers/base/init.c | 2 +-
> drivers/base/node.c | 7 --
> drivers/firmware/psci/psci.c | 2 +
> drivers/irqchip/irq-gic-v3.c | 38 +++++---
> include/acpi/acpi_bus.h | 1 +
> include/acpi/actbl2.h | 1 +
> include/linux/acpi.h | 13 ++-
> include/linux/cpu.h | 4 +
> include/linux/cpumask.h | 25 +++++
> kernel/cpu.c | 3 +
> 48 files changed, 516 insertions(+), 292 deletions(-)
>
>
> On Wed, Sep 13, 2023 at 04:37:48PM +0000, James Morse wrote:
> > Hello!
> >
> > Changes since RFC-v1:
> > * riscv is new, ia64 is gone
> > * The KVM support is different, and upstream - no need to patch the host.
> >
> > ---
> >
> > This series adds what looks like cpuhotplug support to arm64 for use in
> > virtual machines. It does this by moving the cpu_register() calls for
> > architectures that support ACPI out of the arch code by using
> > GENERIC_CPU_DEVICES, then into the ACPI processor driver.
> >
> > The kubernetes folk really want to be able to add CPUs to an existing VM,
> > in exactly the same way they do on x86. The use-case is pre-booting guests
> > with one CPU, then adding the number that were actually needed when the
> > workload is provisioned.
> >
> > Wait? Doesn't arm64 support cpuhotplug already!?
> > In the arm world, cpuhotplug gets used to mean removing the power from a CPU.
> > The CPU is offline, and remains present. For x86, and ACPI, cpuhotplug
> > has the additional step of physically removing the CPU, so that it isn't
> > present anymore.
> >
> > Arm64 doesn't support this, and can't support it: CPUs are really a slice
> > of the SoC, and there is not enough information in the existing ACPI tables
> > to describe which bits of the slice also got removed. Without a reference
> > machine: adding this support to the spec is a wild goose chase.
> >
> > Critically: everything described in the firmware tables must remain present.
> >
> > For a virtual machine this is easy as all the other bits of 'virtual SoC'
> > are emulated, so they can (and do) remain present when a vCPU is 'removed'.
> >
> > On a system that supports cpuhotplug the MADT has to describe every possible
> > CPU at boot. Under KVM, the vGIC needs to know about every possible vCPU before
> > the guest is started.
> > With these constraints, virtual-cpuhotplug is really just a hypervisor/firmware
> > policy about which CPUs can be brought online.
> >
> > This series adds support for virtual-cpuhotplug as exactly that: firmware
> > policy. This may even work on a physical machine too; for a guest the part of
> > firmware is played by the VMM. (typically Qemu).
> >
> > PSCI support is modified to return 'DENIED' if the CPU can't be brought
> > online/enabled yet. The CPU object's _STA method's enabled bit is used to
> > indicate firmware's current disposition. If the CPU has its enabled bit clear,
> > it will not be registered with sysfs, and attempts to bring it online will
> > fail. The notifications that _STA has changed its value then work in the same
> > way as physical hotplug, and firmware can cause the CPU to be registered some
> > time later, allowing it to be brought online.
> >
> > This creates something that looks like cpuhotplug to user-space, as the sysfs
> > files appear and disappear, and the udev notifications look the same.
> >
> > One notable difference is the CPU present mask, which is exposed via sysfs.
> > Because the CPUs remain present throughout, they can still be seen in that mask.
> > This value does get used by webbrowsers to estimate the number of CPUs
> > as the CPU online mask is constantly changed on mobile phones.
> >
> > Linux is tolerant of PSCI returning errors, as its always been allowed to do
> > that. To avoid confusing OS that can't tolerate this, we needed an additional
> > bit in the MADT GICC flags. This series copies ACPI_MADT_ONLINE_CAPABLE, which
> > appears to be for this purpose, but calls it ACPI_MADT_GICC_CPU_CAPABLE as it
> > has a different bit position in the GICC.
> >
> > This code is unconditionally enabled for all ACPI architectures.
> > If there are problems with firmware tables on some devices, the CPUs will
> > already be online by the time the acpi_processor_make_enabled() is called.
> > A mismatch here causes a firmware-bug message and kernel taint. This should
> > only affect people with broken firmware who also boot with maxcpus=1, and
> > bring CPUs online later.
> >
> > I had a go at switching the remaining architectures over to GENERIC_CPU_DEVICES,
> > so that the Kconfig symbol can be removed, but I got stuck with powerpc
> > and s390.
> >
> > I've only build tested Loongarch and riscv. I've removed the ia64 specific
> > patches, but left the changes in other patches to make git-grep review of
> > renames easier.
> >
> > If folk want to play along at home, you'll need a copy of Qemu that supports this.
> > https://github.com/salil-mehta/qemu.git salil/virt-cpuhp-armv8/rfc-v2-rc6
> >
> > Replace your '-smp' argument with something like:
> > | -smp cpus=1,maxcpus=3,cores=3,threads=1,sockets=1
> >
> > then feed the following to the Qemu montior;
> > | (qemu) device_add driver=host-arm-cpu,core-id=1,id=cpu1
> > | (qemu) device_del cpu1
> >
> >
> > Why is this still an RFC? I'm still looking for confirmation from the
> > kubernetes/kata folk that this works for them. Because of this I've culled
> > the CC list...
> >
> >
> > This series is based on v6.6-rc1, and can be retrieved from:
> > https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/ virtual_cpu_hotplug/rfc/v2
> >
> >
> > Thanks,
> >
> > James Morse (34):
> > ACPI: Move ACPI_HOTPLUG_CPU to be disabled on arm64 and riscv
> > drivers: base: Use present CPUs in GENERIC_CPU_DEVICES
> > drivers: base: Allow parts of GENERIC_CPU_DEVICES to be overridden
> > drivers: base: Move cpu_dev_init() after node_dev_init()
> > drivers: base: Print a warning instead of panic() when register_cpu()
> > fails
> > arm64: setup: Switch over to GENERIC_CPU_DEVICES using
> > arch_register_cpu()
> > x86: intel_epb: Don't rely on link order
> > x86/topology: Switch over to GENERIC_CPU_DEVICES
> > LoongArch: Switch over to GENERIC_CPU_DEVICES
> > riscv: Switch over to GENERIC_CPU_DEVICES
> > arch_topology: Make register_cpu_capacity_sysctl() tolerant to late
> > CPUs
> > ACPI: Use the acpi_device_is_present() helper in more places
> > ACPI: Rename acpi_scan_device_not_present() to be about enumeration
> > ACPI: Only enumerate enabled (or functional) devices
> > ACPI: processor: Add support for processors described as container
> > packages
> > ACPI: processor: Register CPUs that are online, but not described in
> > the DSDT
> > ACPI: processor: Register all CPUs from acpi_processor_get_info()
> > ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'
> > ACPI: Move acpi_bus_trim_one() before acpi_scan_hot_remove()
> > ACPI: Rename acpi_processor_hotadd_init and remove pre-processor
> > guards
> > ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug
> > ACPI: Check _STA present bit before making CPUs not present
> > ACPI: Warn when the present bit changes but the feature is not enabled
> > drivers: base: Implement weak arch_unregister_cpu()
> > LoongArch: Use the __weak version of arch_unregister_cpu()
> > arm64: acpi: Move get_cpu_for_acpi_id() to a header
> > ACPICA: Add new MADT GICC flags fields [code first?]
> > arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a
> > helper
> > irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc()
> > irqchip/gic-v3: Add support for ACPI's disabled but 'online capable'
> > CPUs
> > ACPI: add support to register CPUs based on the _STA enabled bit
> > arm64: document virtual CPU hotplug's expectations
> > ACPI: Add _OSC bits to advertise OS support for toggling CPU
> > present/enabled
> > cpumask: Add enabled cpumask for present CPUs that can be brought
> > online
> >
> > Jean-Philippe Brucker (1):
> > arm64: psci: Ignore DENIED CPUs
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
>

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!


2023-12-13 12:49:50

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 04/21] ACPI: processor: Register all CPUs from acpi_processor_get_info()

From: James Morse <[email protected]>

To allow ACPI to skip the call to arch_register_cpu() when the _STA
value indicates the CPU can't be brought online right now, move the
arch_register_cpu() call into acpi_processor_get_info().

Systems can still be booted with 'acpi=off', or not include an
ACPI description at all. For these, the CPUs continue to be
registered by cpu_dev_register_generic().

This moves the CPU register logic back to a subsys_initcall(),
while the memory nodes will have been registered earlier.

Signed-off-by: James Morse <[email protected]>
Reviewed-by: Gavin Shan <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
---
Changes since RFC v2:
* Fixup comment in acpi_processor_get_info() (Gavin Shan)
* Add comment in cpu_dev_register_generic() (Gavin Shan)
---
drivers/acpi/acpi_processor.c | 12 ++++++++++++
drivers/base/cpu.c | 6 +++++-
2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 0511f2bc10bc..e7ed4730cbbe 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -314,6 +314,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
cpufreq_add_device("acpi-cpufreq");
}

+ /*
+ * Register CPUs that are present. get_cpu_device() is used to skip
+ * duplicate CPU descriptions from firmware.
+ */
+ if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
+ !get_cpu_device(pr->id)) {
+ int ret = arch_register_cpu(pr->id);
+
+ if (ret)
+ return ret;
+ }
+
/*
* Extra Processor objects may be enumerated on MP systems with
* less than the max # of CPUs. They should be ignored _iff
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 47de0f140ba6..13d052bf13f4 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -553,7 +553,11 @@ static void __init cpu_dev_register_generic(void)
{
int i, ret;

- if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES))
+ /*
+ * When ACPI is enabled, CPUs are registered via
+ * acpi_processor_get_info().
+ */
+ if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES) || !acpi_disabled)
return;

for_each_present_cpu(i) {
--
2.30.2

2023-12-13 12:49:56

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

From: James Morse <[email protected]>

The code behind ACPI_HOTPLUG_CPU allows a not-present CPU to become
present. This isn't the only use of HOTPLUG_CPU. On arm64 and riscv
CPUs can be taken offline as a power saving measure.

On arm64 an offline CPU may be disabled by firmware, preventing it from
being brought back online, but it remains present throughout.

Adding code to prevent user-space trying to online these disabled CPUs
needs some additional terminology.

Rename the Kconfig symbol CONFIG_ACPI_HOTPLUG_PRESENT_CPU to reflect
that it makes possible CPUs present.

HOTPLUG_CPU is untouched as this is only about the ACPI mechanism.

Signed-off-by: James Morse <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
---
Changes since RFC v2:
* Add Loongarch update
Changes since RFC v3:
* Dropped ia64 changes
---
arch/loongarch/Kconfig | 2 +-
arch/loongarch/configs/loongson3_defconfig | 2 +-
arch/loongarch/kernel/acpi.c | 4 ++--
arch/x86/Kconfig | 2 +-
arch/x86/kernel/acpi/boot.c | 4 ++--
drivers/acpi/Kconfig | 4 ++--
drivers/acpi/acpi_processor.c | 10 +++++-----
include/linux/acpi.h | 6 +++---
8 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 15d05dd2b7f3..b1e87b90468d 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -5,7 +5,7 @@ config LOONGARCH
select ACPI
select ACPI_GENERIC_GSI if ACPI
select ACPI_MCFG if ACPI
- select ACPI_HOTPLUG_CPU if ACPI_PROCESSOR && HOTPLUG_CPU
+ select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR && HOTPLUG_CPU
select ACPI_PPTT if ACPI
select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
select ARCH_BINFMT_ELF_STATE
diff --git a/arch/loongarch/configs/loongson3_defconfig b/arch/loongarch/configs/loongson3_defconfig
index 33795e4a5bd6..85d37b143077 100644
--- a/arch/loongarch/configs/loongson3_defconfig
+++ b/arch/loongarch/configs/loongson3_defconfig
@@ -59,7 +59,7 @@ CONFIG_ACPI_SPCR_TABLE=y
CONFIG_ACPI_TAD=y
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_IPMI=m
-CONFIG_ACPI_HOTPLUG_CPU=y
+CONFIG_ACPI_HOTPLUG_PRESENT_CPU=y
CONFIG_ACPI_PCI_SLOT=y
CONFIG_ACPI_HOTPLUG_MEMORY=y
CONFIG_EFI_ZBOOT=y
diff --git a/arch/loongarch/kernel/acpi.c b/arch/loongarch/kernel/acpi.c
index 8e00a754e548..dfa56119b56f 100644
--- a/arch/loongarch/kernel/acpi.c
+++ b/arch/loongarch/kernel/acpi.c
@@ -288,7 +288,7 @@ void __init arch_reserve_mem_area(acpi_physical_address addr, size_t size)
memblock_reserve(addr, size);
}

-#ifdef CONFIG_ACPI_HOTPLUG_CPU
+#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU

#include <acpi/processor.h>

@@ -340,4 +340,4 @@ int acpi_unmap_cpu(int cpu)
}
EXPORT_SYMBOL(acpi_unmap_cpu);

-#endif /* CONFIG_ACPI_HOTPLUG_CPU */
+#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8330c4ac26b3..64fc7c475ab0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -59,7 +59,7 @@ config X86
#
select ACPI_LEGACY_TABLES_LOOKUP if ACPI
select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
- select ACPI_HOTPLUG_CPU if ACPI_PROCESSOR && HOTPLUG_CPU
+ select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR && HOTPLUG_CPU
select ARCH_32BIT_OFF_T if X86_32
select ARCH_CLOCKSOURCE_INIT
select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 1a0dd80d81ac..33d259ddd188 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -826,7 +826,7 @@ static void __init acpi_set_irq_model_ioapic(void)
/*
* ACPI based hotplug support for CPU
*/
-#ifdef CONFIG_ACPI_HOTPLUG_CPU
+#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU
#include <acpi/processor.h>

static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
@@ -875,7 +875,7 @@ int acpi_unmap_cpu(int cpu)
return (0);
}
EXPORT_SYMBOL(acpi_unmap_cpu);
-#endif /* CONFIG_ACPI_HOTPLUG_CPU */
+#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */

int acpi_register_ioapic(acpi_handle handle, u64 phys_addr, u32 gsi_base)
{
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index a3acfc750fce..9c5a43d0aff4 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -306,7 +306,7 @@ config ACPI_IPMI
To compile this driver as a module, choose M here:
the module will be called as acpi_ipmi.

-config ACPI_HOTPLUG_CPU
+config ACPI_HOTPLUG_PRESENT_CPU
bool
depends on ACPI_PROCESSOR && HOTPLUG_CPU
select ACPI_CONTAINER
@@ -400,7 +400,7 @@ config ACPI_PCI_SLOT

config ACPI_CONTAINER
bool "Container and Module Devices"
- default (ACPI_HOTPLUG_MEMORY || ACPI_HOTPLUG_CPU)
+ default (ACPI_HOTPLUG_MEMORY || ACPI_HOTPLUG_PRESENT_CPU)
help
This driver supports ACPI Container and Module devices (IDs
ACPI0004, PNP0A05, and PNP0A06).
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index e7ed4730cbbe..c8e960ff0aca 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -183,7 +183,7 @@ static void __init acpi_pcc_cpufreq_init(void) {}
#endif /* CONFIG_X86 */

/* Initialization */
-#ifdef CONFIG_ACPI_HOTPLUG_CPU
+#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU
static int acpi_processor_hotadd_init(struct acpi_processor *pr)
{
unsigned long long sta;
@@ -228,7 +228,7 @@ static inline int acpi_processor_hotadd_init(struct acpi_processor *pr)
{
return -ENODEV;
}
-#endif /* CONFIG_ACPI_HOTPLUG_CPU */
+#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */

static int acpi_processor_get_info(struct acpi_device *device)
{
@@ -461,7 +461,7 @@ static int acpi_processor_add(struct acpi_device *device,
return result;
}

-#ifdef CONFIG_ACPI_HOTPLUG_CPU
+#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU
/* Removal */
static void acpi_processor_remove(struct acpi_device *device)
{
@@ -505,7 +505,7 @@ static void acpi_processor_remove(struct acpi_device *device)
free_cpumask_var(pr->throttling.shared_cpu_map);
kfree(pr);
}
-#endif /* CONFIG_ACPI_HOTPLUG_CPU */
+#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */

#ifdef CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC
bool __init processor_physically_present(acpi_handle handle)
@@ -630,7 +630,7 @@ static const struct acpi_device_id processor_device_ids[] = {
static struct acpi_scan_handler processor_handler = {
.ids = processor_device_ids,
.attach = acpi_processor_add,
-#ifdef CONFIG_ACPI_HOTPLUG_CPU
+#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU
.detach = acpi_processor_remove,
#endif
.hotplug = {
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 4db54e928b36..36071bc11acd 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -301,12 +301,12 @@ static inline int acpi_processor_evaluate_cst(acpi_handle handle, u32 cpu,
}
#endif

-#ifdef CONFIG_ACPI_HOTPLUG_CPU
+#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU
/* Arch dependent functions for cpu hotplug support */
int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, u32 acpi_id,
int *pcpu);
int acpi_unmap_cpu(int cpu);
-#endif /* CONFIG_ACPI_HOTPLUG_CPU */
+#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */

#ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
int acpi_get_ioapic_id(acpi_handle handle, u32 gsi_base, u64 *phys_addr);
@@ -629,7 +629,7 @@ static inline u32 acpi_osc_ctx_get_cxl_control(struct acpi_osc_context *context)
#define ACPI_GSB_ACCESS_ATTRIB_RAW_PROCESS 0x0000000F

/* Enable _OST when all relevant hotplug operations are enabled */
-#if defined(CONFIG_ACPI_HOTPLUG_CPU) && \
+#if defined(CONFIG_ACPI_HOTPLUG_PRESENT_CPU) && \
defined(CONFIG_ACPI_HOTPLUG_MEMORY) && \
defined(CONFIG_ACPI_CONTAINER)
#define ACPI_HOTPLUG_OST
--
2.30.2

2023-12-13 12:50:02

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 01/21] ACPI: Only enumerate enabled (or functional) devices

From: James Morse <[email protected]>

Today the ACPI enumeration code 'visits' all devices that are present.

This is a problem for arm64, where CPUs are always present, but not
always enabled. When a device-check occurs because the firmware-policy
has changed and a CPU is now enabled, the following error occurs:
| acpi ACPI0007:48: Enumeration failure

This is ultimately because acpi_dev_ready_for_enumeration() returns
true for a device that is not enabled. The ACPI Processor driver
will not register such CPUs as they are not 'decoding their resources'.

Change acpi_dev_ready_for_enumeration() to also check the enabled bit.
ACPI allows a device to be functional instead of maintaining the
present and enabled bit. Make this behaviour an explicit check with
a reference to the spec, and then check the present and enabled bits.
This is needed to avoid enumerating present && functional devices that
are not enabled.

Signed-off-by: James Morse <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
---
If this change causes problems on deployed hardware, I suggest an
arch opt-in: ACPI_IGNORE_STA_ENABLED, that causes
acpi_dev_ready_for_enumeration() to only check the present bit.

Changes since RFC v2:
* Incorporate comment suggestion by Gavin Shan.
Other review comments from Jonathan Cameron not yet addressed.
---
drivers/acpi/device_pm.c | 2 +-
drivers/acpi/device_sysfs.c | 2 +-
drivers/acpi/internal.h | 1 -
drivers/acpi/property.c | 2 +-
drivers/acpi/scan.c | 24 ++++++++++++++----------
5 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
index 3b4d048c4941..e3c80f3b3b57 100644
--- a/drivers/acpi/device_pm.c
+++ b/drivers/acpi/device_pm.c
@@ -313,7 +313,7 @@ int acpi_bus_init_power(struct acpi_device *device)
return -EINVAL;

device->power.state = ACPI_STATE_UNKNOWN;
- if (!acpi_device_is_present(device)) {
+ if (!acpi_dev_ready_for_enumeration(device)) {
device->flags.initialized = false;
return -ENXIO;
}
diff --git a/drivers/acpi/device_sysfs.c b/drivers/acpi/device_sysfs.c
index 23373faa35ec..a0256d2493a7 100644
--- a/drivers/acpi/device_sysfs.c
+++ b/drivers/acpi/device_sysfs.c
@@ -141,7 +141,7 @@ static int create_pnp_modalias(const struct acpi_device *acpi_dev, char *modalia
struct acpi_hardware_id *id;

/* Avoid unnecessarily loading modules for non present devices. */
- if (!acpi_device_is_present(acpi_dev))
+ if (!acpi_dev_ready_for_enumeration(acpi_dev))
return 0;

/*
diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
index 866c7c4ed233..a1b45e345bcc 100644
--- a/drivers/acpi/internal.h
+++ b/drivers/acpi/internal.h
@@ -107,7 +107,6 @@ int acpi_device_setup_files(struct acpi_device *dev);
void acpi_device_remove_files(struct acpi_device *dev);
void acpi_device_add_finalize(struct acpi_device *device);
void acpi_free_pnp_ids(struct acpi_device_pnp *pnp);
-bool acpi_device_is_present(const struct acpi_device *adev);
bool acpi_device_is_battery(struct acpi_device *adev);
bool acpi_device_is_first_physical_node(struct acpi_device *adev,
const struct device *dev);
diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c
index 6979a3f9f90a..14d6948fd88a 100644
--- a/drivers/acpi/property.c
+++ b/drivers/acpi/property.c
@@ -1420,7 +1420,7 @@ static bool acpi_fwnode_device_is_available(const struct fwnode_handle *fwnode)
if (!is_acpi_device_node(fwnode))
return false;

- return acpi_device_is_present(to_acpi_device_node(fwnode));
+ return acpi_dev_ready_for_enumeration(to_acpi_device_node(fwnode));
}

static const void *
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 02bb2cce423f..728649a2a251 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -304,7 +304,7 @@ static int acpi_scan_device_check(struct acpi_device *adev)
int error;

acpi_bus_get_status(adev);
- if (acpi_device_is_present(adev)) {
+ if (acpi_dev_ready_for_enumeration(adev)) {
/*
* This function is only called for device objects for which
* matching scan handlers exist. The only situation in which
@@ -338,7 +338,7 @@ static int acpi_scan_bus_check(struct acpi_device *adev, void *not_used)
int error;

acpi_bus_get_status(adev);
- if (!acpi_device_is_present(adev)) {
+ if (!acpi_dev_ready_for_enumeration(adev)) {
acpi_scan_device_not_enumerated(adev);
return 0;
}
@@ -1913,11 +1913,6 @@ static bool acpi_device_should_be_hidden(acpi_handle handle)
return true;
}

-bool acpi_device_is_present(const struct acpi_device *adev)
-{
- return adev->status.present || adev->status.functional;
-}
-
static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
const char *idstr,
const struct acpi_device_id **matchid)
@@ -2381,16 +2376,25 @@ EXPORT_SYMBOL_GPL(acpi_dev_clear_dependencies);
* acpi_dev_ready_for_enumeration - Check if the ACPI device is ready for enumeration
* @device: Pointer to the &struct acpi_device to check
*
- * Check if the device is present and has no unmet dependencies.
+ * Check if the device is functional or enabled and has no unmet dependencies.
*
- * Return true if the device is ready for enumeratino. Otherwise, return false.
+ * Return true if the device is ready for enumeration. Otherwise, return false.
*/
bool acpi_dev_ready_for_enumeration(const struct acpi_device *device)
{
if (device->flags.honor_deps && device->dep_unmet)
return false;

- return acpi_device_is_present(device);
+ /*
+ * ACPI 6.5's 6.3.7 "_STA (Device Status)" allows firmware to return
+ * (!present && functional) for certain types of devices that should be
+ * enumerated. Note that the enabled bit can't be sert until the present
+ * bit is set.
+ */
+ if (device->status.present)
+ return device->status.enabled;
+ else
+ return device->status.functional;
}
EXPORT_SYMBOL_GPL(acpi_dev_ready_for_enumeration);

--
2.30.2

2023-12-13 12:50:13

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 06/21] ACPI: Move acpi_bus_trim_one() before acpi_scan_hot_remove()

From: James Morse <[email protected]>

A subsequent patch will change acpi_scan_hot_remove() to call
acpi_bus_trim_one() instead of acpi_bus_trim(), meaning it can no longer
rely on the prototype in the header file.

Move these functions further up the file.
No change in behaviour.

Signed-off-by: James Morse <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Reviewed-by: Gavin Shan <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
---
drivers/acpi/scan.c | 76 ++++++++++++++++++++++-----------------------
1 file changed, 38 insertions(+), 38 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 728649a2a251..ec42fe9d0611 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -244,6 +244,44 @@ static int acpi_scan_try_to_offline(struct acpi_device *device)
return 0;
}

+static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used)
+{
+ struct acpi_scan_handler *handler = adev->handler;
+
+ acpi_dev_for_each_child_reverse(adev, acpi_bus_trim_one, NULL);
+
+ adev->flags.match_driver = false;
+ if (handler) {
+ if (handler->detach)
+ handler->detach(adev);
+
+ adev->handler = NULL;
+ } else {
+ device_release_driver(&adev->dev);
+ }
+ /*
+ * Most likely, the device is going away, so put it into D3cold before
+ * that.
+ */
+ acpi_device_set_power(adev, ACPI_STATE_D3_COLD);
+ adev->flags.initialized = false;
+ acpi_device_clear_enumerated(adev);
+
+ return 0;
+}
+
+/**
+ * acpi_bus_trim - Detach scan handlers and drivers from ACPI device objects.
+ * @adev: Root of the ACPI namespace scope to walk.
+ *
+ * Must be called under acpi_scan_lock.
+ */
+void acpi_bus_trim(struct acpi_device *adev)
+{
+ acpi_bus_trim_one(adev, NULL);
+}
+EXPORT_SYMBOL_GPL(acpi_bus_trim);
+
static int acpi_scan_hot_remove(struct acpi_device *device)
{
acpi_handle handle = device->handle;
@@ -2513,44 +2551,6 @@ int acpi_bus_scan(acpi_handle handle)
}
EXPORT_SYMBOL(acpi_bus_scan);

-static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used)
-{
- struct acpi_scan_handler *handler = adev->handler;
-
- acpi_dev_for_each_child_reverse(adev, acpi_bus_trim_one, NULL);
-
- adev->flags.match_driver = false;
- if (handler) {
- if (handler->detach)
- handler->detach(adev);
-
- adev->handler = NULL;
- } else {
- device_release_driver(&adev->dev);
- }
- /*
- * Most likely, the device is going away, so put it into D3cold before
- * that.
- */
- acpi_device_set_power(adev, ACPI_STATE_D3_COLD);
- adev->flags.initialized = false;
- acpi_device_clear_enumerated(adev);
-
- return 0;
-}
-
-/**
- * acpi_bus_trim - Detach scan handlers and drivers from ACPI device objects.
- * @adev: Root of the ACPI namespace scope to walk.
- *
- * Must be called under acpi_scan_lock.
- */
-void acpi_bus_trim(struct acpi_device *adev)
-{
- acpi_bus_trim_one(adev, NULL);
-}
-EXPORT_SYMBOL_GPL(acpi_bus_trim);
-
int acpi_bus_register_early_device(int type)
{
struct acpi_device *device = NULL;
--
2.30.2

2023-12-13 12:51:02

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 02/21] ACPI: processor: Add support for processors described as container packages

From: James Morse <[email protected]>

ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
5.2.12:

"Starting with ACPI Specification 6.3, the use of the Processor() object
was deprecated. Only legacy systems should continue with this usage. On
the Itanium architecture only, a _UID is provided for the Processor()
that is a string object. This usage of _UID is also deprecated since it
can preclude an OSPM from being able to match a processor to a
non-enumerable device, such as those defined in the MADT. From ACPI
Specification 6.3 onward, all processor objects for all architectures
except Itanium must now use Device() objects with an _HID of ACPI0007,
and use only integer _UID values."

Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors

Duplicate descriptions are not allowed, the ACPI processor driver already
parses the UID from both devices and containers. acpi_processor_get_info()
returns an error if the UID exists twice in the DSDT.

The missing probe for CPUs described as packages creates a problem for
moving the cpu_register() calls into the acpi_processor driver, as CPUs
described like this don't get registered, leading to errors from other
subsystems when they try to add new sysfs entries to the CPU node.
(e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)

To fix this, parse the processor container and call acpi_processor_add()
for each processor that is discovered like this. The processor container
handler is added with acpi_scan_add_handler(), so no detach call will
arrive.

Qemu TCG describes CPUs using processor devices in a processor container.
For more information, see build_cpus_aml() in Qemu hw/acpi/cpu.c and
https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#processor-container-device

Signed-off-by: James Morse <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
---
Outstanding comments:
https://lore.kernel.org/r/[email protected]
https://lore.kernel.org/r/[email protected]
---
drivers/acpi/acpi_processor.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 4fe2ef54088c..6a542e0ce396 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -626,9 +626,31 @@ static struct acpi_scan_handler processor_handler = {
},
};

+static acpi_status acpi_processor_container_walk(acpi_handle handle,
+ u32 lvl,
+ void *context,
+ void **rv)
+{
+ struct acpi_device *adev;
+ acpi_status status;
+
+ adev = acpi_get_acpi_dev(handle);
+ if (!adev)
+ return AE_ERROR;
+
+ status = acpi_processor_add(adev, &processor_device_ids[0]);
+ acpi_put_acpi_dev(adev);
+
+ return status;
+}
+
static int acpi_processor_container_attach(struct acpi_device *dev,
const struct acpi_device_id *id)
{
+ acpi_walk_namespace(ACPI_TYPE_PROCESSOR, dev->handle,
+ ACPI_UINT32_MAX, acpi_processor_container_walk,
+ NULL, NULL, NULL);
+
return 1;
}

--
2.30.2

2023-12-13 12:51:04

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 11/21] ACPI: Warn when the present bit changes but the feature is not enabled

From: James Morse <[email protected]>

ACPI firmware can trigger the events to add and remove CPUs, but the
OS may not support this.

Print an error message when this happens.

This gives early warning on arm64 systems that don't support
CONFIG_ACPI_HOTPLUG_PRESENT_CPU, as making CPUs not present has
side effects for other parts of the system.

Signed-off-by: James Morse <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Reviewed-by: Gavin Shan <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
---
Changes since RFC v2:
* Update commit message with suggestion from Gavin Shan
---
drivers/acpi/acpi_processor.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 19fceb3ec4e2..b7a94c1348b0 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -189,8 +189,10 @@ static int acpi_processor_make_present(struct acpi_processor *pr)
acpi_status status;
int ret;

- if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
+ if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU)) {
+ pr_err_once("Changing CPU present bit is not supported\n");
return -ENODEV;
+ }

if (invalid_phys_cpuid(pr->phys_id))
return -ENODEV;
@@ -462,8 +464,10 @@ static void acpi_processor_make_not_present(struct acpi_device *device)
{
struct acpi_processor *pr;

- if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
+ if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU)) {
+ pr_err_once("Changing CPU present bit is not supported");
return;
+ }

pr = acpi_driver_data(device);
if (pr->id >= nr_cpu_ids)
--
2.30.2

2023-12-13 12:51:05

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 03/21] ACPI: processor: Register CPUs that are online, but not described in the DSDT

From: James Morse <[email protected]>

ACPI has two descriptions of CPUs, one in the MADT/APIC table, the other
in the DSDT. Both are required. (ACPI 6.5's 8.4 "Declaring Processors"
says "Each processor in the system must be declared in the ACPI
namespace"). Having two descriptions allows firmware authors to get
this wrong.

If CPUs are described in the MADT/APIC, they will be brought online
early during boot. Once the register_cpu() calls are moved to ACPI,
they will be based on the DSDT description of the CPUs. When CPUs are
missing from the DSDT description, they will end up online, but not
registered.

Add a helper that runs after acpi_init() has completed to register
CPUs that are online, but weren't found in the DSDT. Any CPU that
is registered by this code triggers a firmware-bug warning and kernel
taint.

Qemu TCG only describes the first CPU in the DSDT, unless cpu-hotplug
is configured.

Signed-off-by: James Morse <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Reviewed-by: Gavin Shan <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
---
drivers/acpi/acpi_processor.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 6a542e0ce396..0511f2bc10bc 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -791,6 +791,25 @@ void __init acpi_processor_init(void)
acpi_pcc_cpufreq_init();
}

+static int __init acpi_processor_register_missing_cpus(void)
+{
+ int cpu;
+
+ if (acpi_disabled)
+ return 0;
+
+ for_each_online_cpu(cpu) {
+ if (!get_cpu_device(cpu)) {
+ pr_err_once(FW_BUG "CPU %u has no ACPI namespace description!\n", cpu);
+ add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
+ arch_register_cpu(cpu);
+ }
+ }
+
+ return 0;
+}
+subsys_initcall_sync(acpi_processor_register_missing_cpus);
+
#ifdef CONFIG_ACPI_PROCESSOR_CSTATE
/**
* acpi_processor_claim_cst_control - Request _CST control from the platform.
--
2.30.2

2023-12-13 12:51:12

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 09/21] ACPI: convert acpi_processor_post_eject() to use IS_ENABLED()

Rather than ifdef'ing acpi_processor_post_eject() and its use site, use
IS_ENABLED() to increase compile coverage.

Signed-off-by: Russell King (Oracle) <[email protected]>
---
drivers/acpi/acpi_processor.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index b6f5005985c3..01c460881662 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -457,12 +457,14 @@ static int acpi_processor_add(struct acpi_device *device,
return result;
}

-#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU
/* Removal */
static void acpi_processor_post_eject(struct acpi_device *device)
{
struct acpi_processor *pr;

+ if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
+ return;
+
if (!device || !acpi_driver_data(device))
return;

@@ -501,7 +503,6 @@ static void acpi_processor_post_eject(struct acpi_device *device)
free_cpumask_var(pr->throttling.shared_cpu_map);
kfree(pr);
}
-#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */

#ifdef CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC
bool __init processor_physically_present(acpi_handle handle)
@@ -626,9 +627,7 @@ static const struct acpi_device_id processor_device_ids[] = {
static struct acpi_scan_handler processor_handler = {
.ids = processor_device_ids,
.attach = acpi_processor_add,
-#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU
.post_eject = acpi_processor_post_eject,
-#endif
.hotplug = {
.enabled = true,
},
--
2.30.2

2023-12-13 12:51:17

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 08/21] ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug

From: James Morse <[email protected]>

struct acpi_scan_handler has a detach callback that is used to remove
a driver when a bus is changed. When interacting with an eject-request,
the detach callback is called before _EJ0.

This means the ACPI processor driver can't use _STA to determine if a
CPU has been made not-present, or some of the other _STA bits have been
changed. acpi_processor_remove() needs to know the value of _STA after
_EJ0 has been called.

Add a post_eject callback to struct acpi_scan_handler. This is called
after acpi_scan_hot_remove() has successfully called _EJ0. Because
acpi_bus_trim_one() also clears the handler pointer, it needs to be
told if the caller will go on to call acpi_bus_post_eject(), so
that acpi_device_clear_enumerated() and clearing the handler pointer
can be deferred. The existing not-used pointer is used for this.

Signed-off-by: James Morse <[email protected]>
Reviewed-by: Joanthan Cameron <[email protected]>
Reviewed-by: Gavin Shan <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
---
Outstanding comments:
https://lore.kernel.org/r/[email protected]
https://lore.kernel.org/r/[email protected]
---
drivers/acpi/acpi_processor.c | 4 +--
drivers/acpi/scan.c | 52 ++++++++++++++++++++++++++++++-----
include/acpi/acpi_bus.h | 1 +
3 files changed, 48 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 26e3efb74614..b6f5005985c3 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -459,7 +459,7 @@ static int acpi_processor_add(struct acpi_device *device,

#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU
/* Removal */
-static void acpi_processor_remove(struct acpi_device *device)
+static void acpi_processor_post_eject(struct acpi_device *device)
{
struct acpi_processor *pr;

@@ -627,7 +627,7 @@ static struct acpi_scan_handler processor_handler = {
.ids = processor_device_ids,
.attach = acpi_processor_add,
#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU
- .detach = acpi_processor_remove,
+ .post_eject = acpi_processor_post_eject,
#endif
.hotplug = {
.enabled = true,
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index ec42fe9d0611..6ffd65e9e512 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -244,18 +244,28 @@ static int acpi_scan_try_to_offline(struct acpi_device *device)
return 0;
}

-static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used)
+/**
+ * acpi_bus_trim_one() - Detach scan handlers and drivers from ACPI device
+ * objects.
+ * @adev: Root of the ACPI namespace scope to walk.
+ * @eject: Pointer to a bool that indicates if this was due to an
+ * eject-request.
+ *
+ * Must be called under acpi_scan_lock.
+ * If @eject points to true, clearing the device enumeration is deferred until
+ * acpi_bus_post_eject() is called.
+ */
+static int acpi_bus_trim_one(struct acpi_device *adev, void *eject)
{
struct acpi_scan_handler *handler = adev->handler;
+ bool is_eject = *(bool *)eject;

- acpi_dev_for_each_child_reverse(adev, acpi_bus_trim_one, NULL);
+ acpi_dev_for_each_child_reverse(adev, acpi_bus_trim_one, eject);

adev->flags.match_driver = false;
if (handler) {
if (handler->detach)
handler->detach(adev);
-
- adev->handler = NULL;
} else {
device_release_driver(&adev->dev);
}
@@ -265,7 +275,12 @@ static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used)
*/
acpi_device_set_power(adev, ACPI_STATE_D3_COLD);
adev->flags.initialized = false;
- acpi_device_clear_enumerated(adev);
+
+ /* For eject this is deferred to acpi_bus_post_eject() */
+ if (!is_eject) {
+ adev->handler = NULL;
+ acpi_device_clear_enumerated(adev);
+ }

return 0;
}
@@ -278,15 +293,36 @@ static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used)
*/
void acpi_bus_trim(struct acpi_device *adev)
{
- acpi_bus_trim_one(adev, NULL);
+ bool eject = false;
+
+ acpi_bus_trim_one(adev, &eject);
}
EXPORT_SYMBOL_GPL(acpi_bus_trim);

+static int acpi_bus_post_eject(struct acpi_device *adev, void *not_used)
+{
+ struct acpi_scan_handler *handler = adev->handler;
+
+ acpi_dev_for_each_child_reverse(adev, acpi_bus_post_eject, NULL);
+
+ if (handler) {
+ if (handler->post_eject)
+ handler->post_eject(adev);
+
+ adev->handler = NULL;
+ }
+
+ acpi_device_clear_enumerated(adev);
+
+ return 0;
+}
+
static int acpi_scan_hot_remove(struct acpi_device *device)
{
acpi_handle handle = device->handle;
unsigned long long sta;
acpi_status status;
+ bool eject = true;

if (device->handler && device->handler->hotplug.demand_offline) {
if (!acpi_scan_is_offline(device, true))
@@ -299,7 +335,7 @@ static int acpi_scan_hot_remove(struct acpi_device *device)

acpi_handle_debug(handle, "Ejecting\n");

- acpi_bus_trim(device);
+ acpi_bus_trim_one(device, &eject);

acpi_evaluate_lck(handle, 0);
/*
@@ -322,6 +358,8 @@ static int acpi_scan_hot_remove(struct acpi_device *device)
} else if (sta & ACPI_STA_DEVICE_ENABLED) {
acpi_handle_warn(handle,
"Eject incomplete - status 0x%llx\n", sta);
+ } else {
+ acpi_bus_post_eject(device, NULL);
}

return 0;
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 1216d72c650f..c887c2dfc5b5 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -130,6 +130,7 @@ struct acpi_scan_handler {
bool (*match)(const char *idstr, const struct acpi_device_id **matchid);
int (*attach)(struct acpi_device *dev, const struct acpi_device_id *id);
void (*detach)(struct acpi_device *dev);
+ void (*post_eject)(struct acpi_device *dev);
void (*bind)(struct device *phys_dev);
void (*unbind)(struct device *phys_dev);
struct acpi_hotplug_profile hotplug;
--
2.30.2

2023-12-13 12:51:30

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 12/21] arm64: acpi: Move get_cpu_for_acpi_id() to a header

From: James Morse <[email protected]>

ACPI identifies CPUs by UID. get_cpu_for_acpi_id() maps the ACPI UID
to the linux CPU number.

The helper to retrieve this mapping is only available in arm64's numa
code.

Move it to live next to get_acpi_id_for_cpu().

Signed-off-by: James Morse <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Reviewed-by: Gavin Shan <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
---
arch/arm64/include/asm/acpi.h | 11 +++++++++++
arch/arm64/kernel/acpi_numa.c | 11 -----------
2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index 6792a1f83f2a..bc9a6656fc0c 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -119,6 +119,17 @@ static inline u32 get_acpi_id_for_cpu(unsigned int cpu)
return acpi_cpu_get_madt_gicc(cpu)->uid;
}

+static inline int get_cpu_for_acpi_id(u32 uid)
+{
+ int cpu;
+
+ for (cpu = 0; cpu < nr_cpu_ids; cpu++)
+ if (uid == get_acpi_id_for_cpu(cpu))
+ return cpu;
+
+ return -EINVAL;
+}
+
static inline void arch_fix_phys_package_id(int num, u32 slot) { }
void __init acpi_init_cpus(void);
int apei_claim_sea(struct pt_regs *regs);
diff --git a/arch/arm64/kernel/acpi_numa.c b/arch/arm64/kernel/acpi_numa.c
index e51535a5f939..0c036a9a3c33 100644
--- a/arch/arm64/kernel/acpi_numa.c
+++ b/arch/arm64/kernel/acpi_numa.c
@@ -34,17 +34,6 @@ int __init acpi_numa_get_nid(unsigned int cpu)
return acpi_early_node_map[cpu];
}

-static inline int get_cpu_for_acpi_id(u32 uid)
-{
- int cpu;
-
- for (cpu = 0; cpu < nr_cpu_ids; cpu++)
- if (uid == get_acpi_id_for_cpu(cpu))
- return cpu;
-
- return -EINVAL;
-}
-
static int __init acpi_parse_gicc_pxm(union acpi_subtable_headers *header,
const unsigned long end)
{
--
2.30.2

2023-12-13 12:52:06

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 07/21] ACPI: Rename acpi_processor_hotadd_init and remove pre-processor guards

From: James Morse <[email protected]>

acpi_processor_hotadd_init() will make a CPU present by mapping it
based on its hardware id.

'hotadd_init' is ambiguous once there are two different behaviours
for cpu hotplug. This is for toggling the _STA present bit. Subsequent
patches will add support for toggling the _STA enabled bit, named
acpi_processor_make_enabled().

Rename it acpi_processor_make_present() to make it clear this is
for CPUs that were not previously present.

Expose the function prototypes it uses to allow the preprocessor
guards to be removed. The IS_ENABLED() check will let the compiler
dead-code elimination pass remove this if it isn't going to be
used.

Signed-off-by: James Morse <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
---
Outstanding comments:
https://lore.kernel.org/r/[email protected]
https://lore.kernel.org/r/[email protected]
For this comment, we use IS_ENABLED() in multiple places in the kernel in
this way, and it isn't a problem.
---
drivers/acpi/acpi_processor.c | 14 +++++---------
include/linux/acpi.h | 2 --
2 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index c8e960ff0aca..26e3efb74614 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -183,13 +183,15 @@ static void __init acpi_pcc_cpufreq_init(void) {}
#endif /* CONFIG_X86 */

/* Initialization */
-#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU
-static int acpi_processor_hotadd_init(struct acpi_processor *pr)
+static int acpi_processor_make_present(struct acpi_processor *pr)
{
unsigned long long sta;
acpi_status status;
int ret;

+ if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
+ return -ENODEV;
+
if (invalid_phys_cpuid(pr->phys_id))
return -ENODEV;

@@ -223,12 +225,6 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr)
cpu_maps_update_done();
return ret;
}
-#else
-static inline int acpi_processor_hotadd_init(struct acpi_processor *pr)
-{
- return -ENODEV;
-}
-#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */

static int acpi_processor_get_info(struct acpi_device *device)
{
@@ -335,7 +331,7 @@ static int acpi_processor_get_info(struct acpi_device *device)
* because cpuid <-> apicid mapping is persistent now.
*/
if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
- int ret = acpi_processor_hotadd_init(pr);
+ int ret = acpi_processor_make_present(pr);

if (ret)
return ret;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 36071bc11acd..19d009ca9e7a 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -301,12 +301,10 @@ static inline int acpi_processor_evaluate_cst(acpi_handle handle, u32 cpu,
}
#endif

-#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU
/* Arch dependent functions for cpu hotplug support */
int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, u32 acpi_id,
int *pcpu);
int acpi_unmap_cpu(int cpu);
-#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */

#ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
int acpi_get_ioapic_id(acpi_handle handle, u32 gsi_base, u64 *phys_addr);
--
2.30.2

2023-12-13 12:52:44

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 13/21] ACPICA: Add new MADT GICC flags fields

From: James Morse <[email protected]>

Add the new flag field to the MADT's GICC structure.

'Online Capable' indicates a disabled CPU can be enabled later. See
ACPI specification 6.5 Tabel 5.37: GICC CPU Interface Flags.

Signed-off-by: James Morse <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
---
This patch probably needs to go via the upstream acpica project,
but is included here so the feature can be tested.

If the ACPICA header files are updated before merging this patch set,
this patch will need to be dropped.

Changes since RFC v2:
* Add ACPI specification reference.
---
include/acpi/actbl2.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
index 3751ae69432f..c433a079d8e1 100644
--- a/include/acpi/actbl2.h
+++ b/include/acpi/actbl2.h
@@ -1046,6 +1046,7 @@ struct acpi_madt_generic_interrupt {
/* ACPI_MADT_ENABLED (1) Processor is usable if set */
#define ACPI_MADT_PERFORMANCE_IRQ_MODE (1<<1) /* 01: Performance Interrupt Mode */
#define ACPI_MADT_VGIC_IRQ_MODE (1<<2) /* 02: VGIC Maintenance Interrupt mode */
+#define ACPI_MADT_GICC_CPU_CAPABLE (1<<3) /* 03: CPU is online capable */

/* 12: Generic Distributor (ACPI 5.0 + ACPI 6.0 changes) */

--
2.30.2

2023-12-13 12:53:03

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 18/21] ACPI: processor: Only call arch_unregister_cpu() if HOTPLUG_CPU is selected

From: James Morse <[email protected]>

The kbuild robot points out that configurations without HOTPLUG_CPU
selected can try to build acpi_processor_post_eject() without success
as arch_unregister_cpu() is not defined.

Check this explicitly. This will be merged into:
| ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug
for any subsequent posting.

Reported-by: kbuild test robot <[email protected]>
Signed-off-by: James Morse <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
---
This should probably be squashed into an earlier patch.
---
drivers/acpi/acpi_processor.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 5dabb426481f..ea12e70dfd39 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -537,7 +537,7 @@ static void acpi_processor_post_eject(struct acpi_device *device)
unsigned long long sta;
acpi_status status;

- if (!device)
+ if (!IS_ENABLED(CONFIG_HOTPLUG_CPU) || !device)
return;

pr = acpi_driver_data(device);
--
2.30.2

2023-12-13 12:53:07

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 17/21] ACPI: add support to register CPUs based on the _STA enabled bit

From: James Morse <[email protected]>

acpi_processor_get_info() registers all present CPUs. Registering a
CPU is what creates the sysfs entries and triggers the udev
notifications.

arm64 virtual machines that support 'virtual cpu hotplug' use the
enabled bit to indicate whether the CPU can be brought online, as
the existing ACPI tables require all hardware to be described and
present.

If firmware describes a CPU as present, but disabled, skip the
registration. Such CPUs are present, but can't be brought online for
whatever reason. (e.g. firmware/hypervisor policy).

Once firmware sets the enabled bit, the CPU can be registered and
brought online by user-space. Online CPUs, or CPUs that are missing
an _STA method must always be registered.

Signed-off-by: James Morse <[email protected]>
Reviewed-by: Gavin Shan <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
---
drivers/acpi/acpi_processor.c | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index b7a94c1348b0..5dabb426481f 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -228,6 +228,32 @@ static int acpi_processor_make_present(struct acpi_processor *pr)
return ret;
}

+static int acpi_processor_make_enabled(struct acpi_processor *pr)
+{
+ unsigned long long sta;
+ acpi_status status;
+ bool present, enabled;
+
+ if (!acpi_has_method(pr->handle, "_STA"))
+ return arch_register_cpu(pr->id);
+
+ status = acpi_evaluate_integer(pr->handle, "_STA", NULL, &sta);
+ if (ACPI_FAILURE(status))
+ return -ENODEV;
+
+ present = sta & ACPI_STA_DEVICE_PRESENT;
+ enabled = sta & ACPI_STA_DEVICE_ENABLED;
+
+ if (cpu_online(pr->id) && (!present || !enabled)) {
+ pr_err_once(FW_BUG "CPU %u is online, but described as not present or disabled!\n", pr->id);
+ add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
+ } else if (!present || !enabled) {
+ return -ENODEV;
+ }
+
+ return arch_register_cpu(pr->id);
+}
+
static int acpi_processor_get_info(struct acpi_device *device)
{
union acpi_object object = { 0 };
@@ -318,7 +344,7 @@ static int acpi_processor_get_info(struct acpi_device *device)
*/
if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
!get_cpu_device(pr->id)) {
- int ret = arch_register_cpu(pr->id);
+ int ret = acpi_processor_make_enabled(pr);

if (ret)
return ret;
@@ -526,6 +552,9 @@ static void acpi_processor_post_eject(struct acpi_device *device)
acpi_processor_make_not_present(device);
return;
}
+
+ if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED))
+ arch_unregister_cpu(pr->id);
}

#ifdef CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC
--
2.30.2

2023-12-13 12:53:13

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 10/21] ACPI: Check _STA present bit before making CPUs not present

From: James Morse <[email protected]>

When called acpi_processor_post_eject() unconditionally make a CPU
not-present and unregisters it.

To add support for AML events where the CPU has become disabled, but
remains present, the _STA method should be checked before calling
acpi_processor_remove().

Rename acpi_processor_post_eject() acpi_processor_remove_possible(), and
check the _STA before calling.

Adding the function prototype for arch_unregister_cpu() allows the
preprocessor guards to be removed.

After this change CPUs will remain registered and visible to
user-space as offline if buggy firmware triggers an eject-request,
but doesn't clear the corresponding _STA bits after _EJ0 has been
called.

Signed-off-by: James Morse <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
---
Changes since RFC v3:
* Move IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU) into separate patch.
Outstanding comments:
https://lore.kernel.org/r/[email protected]
https://lore.kernel.org/r/[email protected]
This contains a repeat of the IS_ENABLED() issue which we don't think
is a problem - but there is another issue mentioned in that comment.
---
drivers/acpi/acpi_processor.c | 28 ++++++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 01c460881662..19fceb3ec4e2 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -458,16 +458,13 @@ static int acpi_processor_add(struct acpi_device *device,
}

/* Removal */
-static void acpi_processor_post_eject(struct acpi_device *device)
+static void acpi_processor_make_not_present(struct acpi_device *device)
{
struct acpi_processor *pr;

if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
return;

- if (!device || !acpi_driver_data(device))
- return;
-
pr = acpi_driver_data(device);
if (pr->id >= nr_cpu_ids)
goto out;
@@ -504,6 +501,29 @@ static void acpi_processor_post_eject(struct acpi_device *device)
kfree(pr);
}

+static void acpi_processor_post_eject(struct acpi_device *device)
+{
+ struct acpi_processor *pr;
+ unsigned long long sta;
+ acpi_status status;
+
+ if (!device)
+ return;
+
+ pr = acpi_driver_data(device);
+ if (!pr || pr->id >= nr_cpu_ids || invalid_phys_cpuid(pr->phys_id))
+ return;
+
+ status = acpi_evaluate_integer(pr->handle, "_STA", NULL, &sta);
+ if (ACPI_FAILURE(status))
+ return;
+
+ if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_PRESENT)) {
+ acpi_processor_make_not_present(device);
+ return;
+ }
+}
+
#ifdef CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC
bool __init processor_physically_present(acpi_handle handle)
{
--
2.30.2

2023-12-13 12:53:25

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 15/21] irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs

From: James Morse <[email protected]>

To support virtual CPU hotplug, ACPI has added an 'online capable' bit
to the MADT GICC entries. This indicates a disabled CPU entry may not
be possible to online via PSCI until firmware has set enabled bit in
_STA.

What about the redistributor in the GICC entry? ACPI doesn't want to say.
Assume the worst: When a redistributor is described in the GICC entry,
but the entry is marked as disabled at boot, assume the redistributor
is inaccessible.

The GICv3 driver doesn't support late online of redistributors, so this
means the corresponding CPU can't be brought online either. Clear the
possible and present bits.

Systems that want CPU hotplug in a VM can ensure their redistributors
are always-on, and describe them that way with a GICR entry in the MADT.

When mapping redistributors found via GICC entries, handle the case
where the arch code believes the CPU is present and possible, but it
does not have an accessible redistributor. Print a warning and clear
the present and possible bits.

Signed-off-by: James Morse <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
----
Disabled but online-capable CPUs cause this message to be printed
if their redistributors are described via GICC:
| GICv3: CPU 3's redistributor is inaccessible: this CPU can't be brought online

If ACPI's _STA tries to make the cpu present later, this message is printed:
| Changing CPU present bit is not supported

Changes since RFC v2:
* use gicc->flags & (ACPI_MADT_ENABLED | ACPI_MADT_GICC_CPU_CAPABLE)
---
drivers/irqchip/irq-gic-v3.c | 14 ++++++++++++++
include/linux/acpi.h | 2 +-
2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index ebecd4546830..6d0f98d3540e 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -2370,11 +2370,25 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header,
(struct acpi_madt_generic_interrupt *)header;
u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK;
u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2;
+ int cpu = get_cpu_for_acpi_id(gicc->uid);
void __iomem *redist_base;

if (!acpi_gicc_is_usable(gicc))
return 0;

+ /*
+ * Capable but disabled CPUs can be brought online later. What about
+ * the redistributor? ACPI doesn't want to say!
+ * Virtual hotplug systems can use the MADT's "always-on" GICR entries.
+ * Otherwise, prevent such CPUs from being brought online.
+ */
+ if (!(gicc->flags & ACPI_MADT_ENABLED)) {
+ pr_warn_once("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu);
+ set_cpu_present(cpu, false);
+ set_cpu_possible(cpu, false);
+ return 0;
+ }
+
redist_base = ioremap(gicc->gicr_base_address, size);
if (!redist_base)
return -ENOMEM;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 19d009ca9e7a..00be66683505 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -238,7 +238,7 @@ void acpi_table_print_madt_entry (struct acpi_subtable_header *madt);

static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc)
{
- return gicc->flags & ACPI_MADT_ENABLED;
+ return gicc->flags & (ACPI_MADT_ENABLED | ACPI_MADT_GICC_CPU_CAPABLE);
}

/* the following numa functions are architecture-dependent */
--
2.30.2

2023-12-13 12:53:29

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 16/21] arm64: psci: Ignore DENIED CPUs

From: Jean-Philippe Brucker <[email protected]>

When a CPU is marked as disabled, but online capable in the MADT, PSCI
applies some firmware policy to control when it can be brought online.
PSCI returns DENIED to a CPU_ON request if this is not currently
permitted. The OS can learn the current policy from the _STA enabled bit.

Handle the PSCI DENIED return code gracefully instead of printing an
error.

See https://developer.arm.com/documentation/den0022/f/?lang=en page 58.

Signed-off-by: Jean-Philippe Brucker <[email protected]>
[ morse: Rewrote commit message ]
Signed-off-by: James Morse <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
---
Changes since RFC v2
* Add specification reference
* Use EPERM rather than EPROBE_DEFER
Changes since RFC v3:
* Use EPERM everywhere
* Drop unnecessary changes to drivers/firmware/psci/psci.c
---
arch/arm64/kernel/psci.c | 2 +-
arch/arm64/kernel/smp.c | 3 ++-
2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
index 29a8e444db83..fabd732d0a2d 100644
--- a/arch/arm64/kernel/psci.c
+++ b/arch/arm64/kernel/psci.c
@@ -40,7 +40,7 @@ static int cpu_psci_cpu_boot(unsigned int cpu)
{
phys_addr_t pa_secondary_entry = __pa_symbol(secondary_entry);
int err = psci_ops.cpu_on(cpu_logical_map(cpu), pa_secondary_entry);
- if (err)
+ if (err && err != -EPERM)
pr_err("failed to boot CPU%d (%d)\n", cpu, err);

return err;
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index defbab84e9e5..6bc9094feb19 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -132,7 +132,8 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
/* Now bring the CPU into our world */
ret = boot_secondary(cpu, idle);
if (ret) {
- pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
+ if (ret != -EPERM)
+ pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
return ret;
}

--
2.30.2

2023-12-13 12:53:34

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 19/21] arm64: document virtual CPU hotplug's expectations

From: James Morse <[email protected]>

Add a description of physical and virtual CPU hotplug, explain the
differences and elaborate on what is required in ACPI for a working
virtual hotplug system.

Signed-off-by: James Morse <[email protected]>
---
Outstanding comment:
https://lore.kernel.org/r/[email protected]
---
Documentation/arch/arm64/cpu-hotplug.rst | 79 ++++++++++++++++++++++++
Documentation/arch/arm64/index.rst | 1 +
2 files changed, 80 insertions(+)
create mode 100644 Documentation/arch/arm64/cpu-hotplug.rst

diff --git a/Documentation/arch/arm64/cpu-hotplug.rst b/Documentation/arch/arm64/cpu-hotplug.rst
new file mode 100644
index 000000000000..76ba8d932c72
--- /dev/null
+++ b/Documentation/arch/arm64/cpu-hotplug.rst
@@ -0,0 +1,79 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. _cpuhp_index:
+
+====================
+CPU Hotplug and ACPI
+====================
+
+CPU hotplug in the arm64 world is commonly used to describe the kernel taking
+CPUs online/offline using PSCI. This document is about ACPI firmware allowing
+CPUs that were not available during boot to be added to the system later.
+
+``possible`` and ``present`` refer to the state of the CPU as seen by linux.
+
+
+CPU Hotplug on physical systems - CPUs not present at boot
+----------------------------------------------------------
+
+Physical systems need to mark a CPU that is ``possible`` but not ``present`` as
+being ``present``. An example would be a dual socket machine, where the package
+in one of the sockets can be replaced while the system is running.
+
+This is not supported.
+
+In the arm64 world CPUs are not a single device but a slice of the system.
+There are no systems that support the physical addition (or removal) of CPUs
+while the system is running, and ACPI is not able to sufficiently describe
+them.
+
+e.g. New CPUs come with new caches, but the platform's cache toplogy is
+described in a static table, the PPTT. How caches are shared between CPUs is
+not discoverable, and must be described by firmware.
+
+e.g. The GIC redistributor for each CPU must be accessed by the driver during
+boot to discover the system wide supported features. ACPI's MADT GICC
+structures can describe a redistributor associated with a disabled CPU, but
+can't describe whether the redistributor is accessible, only that it is not
+'always on'.
+
+arm64's ACPI tables assume that everything described is ``present``.
+
+
+CPU Hotplug on virtual systems - CPUs not enabled at boot
+---------------------------------------------------------
+
+Virtual systems have the advantage that all the properties the system will
+ever have can be described at boot. There are no power-domain considerations
+as such devices are emulated.
+
+CPU Hotplug on virtual systems is supported. It is distinct from physical
+CPU Hotplug as all resources are described as ``present``, but CPUs may be
+marked as disabled by firmware. Only the CPU's online/offline behaviour is
+influenced by firmware. An example is where a virtual machine boots with a
+single CPU, and additional CPUs are added once a cloud orchestrator deploys
+the workload.
+
+For a virtual machine, the VMM (e.g. Qemu) plays the part of firmware.
+
+Virtual hotplug is implemented as a firmware policy affecting which CPUs can be
+brought online. Firmware can enforce its policy via PSCI's return codes. e.g.
+``DENIED``.
+
+The ACPI tables must describe all the resources of the virtual machine. CPUs
+that firmware wishes to disable either from boot (or later) should not be
+``enabled`` in the MADT GICC structures, but should have the ``online capable``
+bit set, to indicate they can be enabled later. The boot CPU must be marked as
+``enabled``. The 'always on' GICR structure must be used to describe the
+redistributors.
+
+CPUs described as ``online capable`` but not ``enabled`` can be set to enabled
+by the DSDT's Processor object's _STA method. On virtual systems the _STA method
+must always report the CPU as ``present``. Changes to the firmware policy can
+be notified to the OS via device-check or eject-request.
+
+CPUs described as ``enabled`` in the static table, should not have their _STA
+modified dynamically by firmware. Soft-restart features such as kexec will
+re-read the static properties of the system from these static tables, and
+may malfunction if these no longer describe the running system. Linux will
+re-discover the dynamic properties of the system from the _STA method later
+during boot.
diff --git a/Documentation/arch/arm64/index.rst b/Documentation/arch/arm64/index.rst
index d08e924204bf..78544de0a8a9 100644
--- a/Documentation/arch/arm64/index.rst
+++ b/Documentation/arch/arm64/index.rst
@@ -13,6 +13,7 @@ ARM64 Architecture
asymmetric-32bit
booting
cpu-feature-registers
+ cpu-hotplug
elf_hwcaps
hugetlbpage
kdump
--
2.30.2

2023-12-13 12:53:38

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 14/21] irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc()

From: James Morse <[email protected]>

gic_acpi_match_gicc() is only called via gic_acpi_count_gicr_regions().
It should only count the number of enabled redistributors, but it
also tries to sanity check the GICC entry, currently returning an
error if the Enabled bit is set, but the gicr_base_address is zero.

Adding support for the online-capable bit to the sanity check
complicates it, for no benefit. The existing check implicitly
depends on gic_acpi_count_gicr_regions() previous failing to find
any GICR regions (as it is valid to have gicr_base_address of zero if
the redistributors are described via a GICR entry).

Instead of complicating the check, remove it. Failures that happen
at this point cause the irqchip not to register, meaning no irqs
can be requested. The kernel grinds to a panic() pretty quickly.

Without the check, MADT tables that exhibit this problem are still
caught by gic_populate_rdist(), which helpfully also prints what
went wrong:
| CPU4: mpidr 100 has no re-distributor!

Signed-off-by: James Morse <[email protected]>
Reviewed-by: Gavin Shan <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
---
drivers/irqchip/irq-gic-v3.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 98b0329b7154..ebecd4546830 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -2420,21 +2420,15 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header,

/*
* If GICC is enabled and has valid gicr base address, then it means
- * GICR base is presented via GICC
+ * GICR base is presented via GICC. The redistributor is only known to
+ * be accessible if the GICC is marked as enabled. If this bit is not
+ * set, we'd need to add the redistributor at runtime, which isn't
+ * supported.
*/
- if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) {
+ if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address)
acpi_data.enabled_rdists++;
- return 0;
- }

- /*
- * It's perfectly valid firmware can pass disabled GICC entry, driver
- * should not treat as errors, skip the entry instead of probe fail.
- */
- if (!acpi_gicc_is_usable(gicc))
- return 0;
-
- return -ENODEV;
+ return 0;
}

static int __init gic_acpi_count_gicr_regions(void)
--
2.30.2

2023-12-13 12:53:44

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 21/21] cpumask: Add enabled cpumask for present CPUs that can be brought online

From: James Morse <[email protected]>

The 'offline' file in sysfs shows all offline CPUs, including those
that aren't present. User-space is expected to remove not-present CPUs
from this list to learn which CPUs could be brought online.

CPUs can be present but not-enabled. These CPUs can't be brought online
until the firmware policy changes, which comes with an ACPI notification
that will register the CPUs.

With only the offline and present files, user-space is unable to
determine which CPUs it can try to bring online. Add a new CPU mask
that shows this based on all the registered CPUs.

Signed-off-by: James Morse <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
---
Outstanding comment:
https://lore.kernel.org/r/[email protected]
---
drivers/base/cpu.c | 10 ++++++++++
include/linux/cpumask.h | 25 +++++++++++++++++++++++++
kernel/cpu.c | 3 +++
3 files changed, 38 insertions(+)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 13d052bf13f4..a6e96a0a92b7 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -95,6 +95,7 @@ void unregister_cpu(struct cpu *cpu)
{
int logical_cpu = cpu->dev.id;

+ set_cpu_enabled(logical_cpu, false);
unregister_cpu_under_node(logical_cpu, cpu_to_node(logical_cpu));

device_unregister(&cpu->dev);
@@ -273,6 +274,13 @@ static ssize_t print_cpus_offline(struct device *dev,
}
static DEVICE_ATTR(offline, 0444, print_cpus_offline, NULL);

+static ssize_t print_cpus_enabled(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(cpu_enabled_mask));
+}
+static DEVICE_ATTR(enabled, 0444, print_cpus_enabled, NULL);
+
static ssize_t print_cpus_isolated(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -413,6 +421,7 @@ int register_cpu(struct cpu *cpu, int num)
register_cpu_under_node(num, cpu_to_node(num));
dev_pm_qos_expose_latency_limit(&cpu->dev,
PM_QOS_RESUME_LATENCY_NO_CONSTRAINT);
+ set_cpu_enabled(num, true);

return 0;
}
@@ -494,6 +503,7 @@ static struct attribute *cpu_root_attrs[] = {
&cpu_attrs[2].attr.attr,
&dev_attr_kernel_max.attr,
&dev_attr_offline.attr,
+ &dev_attr_enabled.attr,
&dev_attr_isolated.attr,
#ifdef CONFIG_NO_HZ_FULL
&dev_attr_nohz_full.attr,
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index cfb545841a2c..cc72a0887f04 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -92,6 +92,7 @@ static inline void set_nr_cpu_ids(unsigned int nr)
*
* cpu_possible_mask- has bit 'cpu' set iff cpu is populatable
* cpu_present_mask - has bit 'cpu' set iff cpu is populated
+ * cpu_enabled_mask - has bit 'cpu' set iff cpu can be brought online
* cpu_online_mask - has bit 'cpu' set iff cpu available to scheduler
* cpu_active_mask - has bit 'cpu' set iff cpu available to migration
*
@@ -124,11 +125,13 @@ static inline void set_nr_cpu_ids(unsigned int nr)

extern struct cpumask __cpu_possible_mask;
extern struct cpumask __cpu_online_mask;
+extern struct cpumask __cpu_enabled_mask;
extern struct cpumask __cpu_present_mask;
extern struct cpumask __cpu_active_mask;
extern struct cpumask __cpu_dying_mask;
#define cpu_possible_mask ((const struct cpumask *)&__cpu_possible_mask)
#define cpu_online_mask ((const struct cpumask *)&__cpu_online_mask)
+#define cpu_enabled_mask ((const struct cpumask *)&__cpu_enabled_mask)
#define cpu_present_mask ((const struct cpumask *)&__cpu_present_mask)
#define cpu_active_mask ((const struct cpumask *)&__cpu_active_mask)
#define cpu_dying_mask ((const struct cpumask *)&__cpu_dying_mask)
@@ -993,6 +996,7 @@ extern const DECLARE_BITMAP(cpu_all_bits, NR_CPUS);
#else
#define for_each_possible_cpu(cpu) for_each_cpu((cpu), cpu_possible_mask)
#define for_each_online_cpu(cpu) for_each_cpu((cpu), cpu_online_mask)
+#define for_each_enabled_cpu(cpu) for_each_cpu((cpu), cpu_enabled_mask)
#define for_each_present_cpu(cpu) for_each_cpu((cpu), cpu_present_mask)
#endif

@@ -1015,6 +1019,15 @@ set_cpu_possible(unsigned int cpu, bool possible)
cpumask_clear_cpu(cpu, &__cpu_possible_mask);
}

+static inline void
+set_cpu_enabled(unsigned int cpu, bool can_be_onlined)
+{
+ if (can_be_onlined)
+ cpumask_set_cpu(cpu, &__cpu_enabled_mask);
+ else
+ cpumask_clear_cpu(cpu, &__cpu_enabled_mask);
+}
+
static inline void
set_cpu_present(unsigned int cpu, bool present)
{
@@ -1096,6 +1109,7 @@ static __always_inline unsigned int num_online_cpus(void)
return raw_atomic_read(&__num_online_cpus);
}
#define num_possible_cpus() cpumask_weight(cpu_possible_mask)
+#define num_enabled_cpus() cpumask_weight(cpu_enabled_mask)
#define num_present_cpus() cpumask_weight(cpu_present_mask)
#define num_active_cpus() cpumask_weight(cpu_active_mask)

@@ -1104,6 +1118,11 @@ static inline bool cpu_online(unsigned int cpu)
return cpumask_test_cpu(cpu, cpu_online_mask);
}

+static inline bool cpu_enabled(unsigned int cpu)
+{
+ return cpumask_test_cpu(cpu, cpu_enabled_mask);
+}
+
static inline bool cpu_possible(unsigned int cpu)
{
return cpumask_test_cpu(cpu, cpu_possible_mask);
@@ -1128,6 +1147,7 @@ static inline bool cpu_dying(unsigned int cpu)

#define num_online_cpus() 1U
#define num_possible_cpus() 1U
+#define num_enabled_cpus() 1U
#define num_present_cpus() 1U
#define num_active_cpus() 1U

@@ -1141,6 +1161,11 @@ static inline bool cpu_possible(unsigned int cpu)
return cpu == 0;
}

+static inline bool cpu_enabled(unsigned int cpu)
+{
+ return cpu == 0;
+}
+
static inline bool cpu_present(unsigned int cpu)
{
return cpu == 0;
diff --git a/kernel/cpu.c b/kernel/cpu.c
index a86972a91991..fe0a5189f8ae 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -3122,6 +3122,9 @@ EXPORT_SYMBOL(__cpu_possible_mask);
struct cpumask __cpu_online_mask __read_mostly;
EXPORT_SYMBOL(__cpu_online_mask);

+struct cpumask __cpu_enabled_mask __read_mostly;
+EXPORT_SYMBOL(__cpu_enabled_mask);
+
struct cpumask __cpu_present_mask __read_mostly;
EXPORT_SYMBOL(__cpu_present_mask);

--
2.30.2

2023-12-13 12:54:22

by Russell King (Oracle)

[permalink] [raw]
Subject: [PATCH RFC v3 20/21] ACPI: Add _OSC bits to advertise OS support for toggling CPU present/enabled

From: James Morse <[email protected]>

Platform firmware can disabled a CPU, or make it not-present by making
an eject-request notification, then waiting for the os to make it offline
and call _EJx. After the firmware updates _STA with the new status.

Not all operating systems support this. For arm64 making CPUs not-present
has never been supported. For all ACPI architectures, making CPUs disabled
has recently been added. Firmware can't know what the OS has support for.

Add two new _OSC bits to advertise whether the OS supports the _STA enabled
or present bits being toggled for CPUs. This will be important for arm64
if systems that support physical CPU hotplug ever appear as arm64 linux
doesn't currently support this, so firmware shouldn't try.

Advertising this support to firmware is useful for cloud orchestrators
to know whether they can scale a particular VM by adding CPUs.

Signed-off-by: James Morse <[email protected]>
Tested-by: Miguel Luis <[email protected]>
Tested-by: Vishnu Pajjuri <[email protected]>
Tested-by: Jianyong Wu <[email protected]>
---
I'm assuming Loongarch machines do not support physical CPU hotplug.

Changes since RFC v3:
* Drop ia64 changes
* Update James' comment below "---" to remove reference to ia64

Outstanding comment:
https://lore.kernel.org/r/[email protected]
---
arch/x86/Kconfig | 1 +
drivers/acpi/Kconfig | 9 +++++++++
drivers/acpi/acpi_processor.c | 14 +++++++++++++-
drivers/acpi/bus.c | 16 ++++++++++++++++
include/linux/acpi.h | 4 ++++
5 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 64fc7c475ab0..33fc4dcd950c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -60,6 +60,7 @@ config X86
select ACPI_LEGACY_TABLES_LOOKUP if ACPI
select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR && HOTPLUG_CPU
+ select ACPI_HOTPLUG_IGNORE_OSC if ACPI && HOTPLUG_CPU
select ARCH_32BIT_OFF_T if X86_32
select ARCH_CLOCKSOURCE_INIT
select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 9c5a43d0aff4..020e7c0ab985 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -311,6 +311,15 @@ config ACPI_HOTPLUG_PRESENT_CPU
depends on ACPI_PROCESSOR && HOTPLUG_CPU
select ACPI_CONTAINER

+config ACPI_HOTPLUG_IGNORE_OSC
+ bool
+ depends on ACPI_HOTPLUG_PRESENT_CPU
+ help
+ Ignore whether firmware acknowledged support for toggling the CPU
+ present bit in _STA. Some architectures predate the _OSC bits, so
+ firmware doesn't know to do this.
+
+
config ACPI_PROCESSOR_AGGREGATOR
tristate "Processor Aggregator"
depends on ACPI_PROCESSOR
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index ea12e70dfd39..5bb207a7a1dd 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -182,6 +182,18 @@ static void __init acpi_pcc_cpufreq_init(void)
static void __init acpi_pcc_cpufreq_init(void) {}
#endif /* CONFIG_X86 */

+static bool acpi_processor_hotplug_present_supported(void)
+{
+ if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
+ return false;
+
+ /* x86 systems pre-date the _OSC bit */
+ if (IS_ENABLED(CONFIG_ACPI_HOTPLUG_IGNORE_OSC))
+ return true;
+
+ return osc_sb_hotplug_present_support_acked;
+}
+
/* Initialization */
static int acpi_processor_make_present(struct acpi_processor *pr)
{
@@ -189,7 +201,7 @@ static int acpi_processor_make_present(struct acpi_processor *pr)
acpi_status status;
int ret;

- if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU)) {
+ if (!acpi_processor_hotplug_present_supported()) {
pr_err_once("Changing CPU present bit is not supported\n");
return -ENODEV;
}
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 72e64c0718c9..7122450739d6 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -298,6 +298,13 @@ EXPORT_SYMBOL_GPL(osc_sb_native_usb4_support_confirmed);

bool osc_sb_cppc2_support_acked;

+/*
+ * ACPI 6.? Proposed Operating System Capabilities for modifying CPU
+ * present/enable.
+ */
+bool osc_sb_hotplug_enabled_support_acked;
+bool osc_sb_hotplug_present_support_acked;
+
static u8 sb_uuid_str[] = "0811B06E-4A27-44F9-8D60-3CBBC22E7B48";
static void acpi_bus_osc_negotiate_platform_control(void)
{
@@ -346,6 +353,11 @@ static void acpi_bus_osc_negotiate_platform_control(void)

if (!ghes_disable)
capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
+
+ capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_HOTPLUG_ENABLED_SUPPORT;
+ if (IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
+ capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_HOTPLUG_PRESENT_SUPPORT;
+
if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
return;

@@ -383,6 +395,10 @@ static void acpi_bus_osc_negotiate_platform_control(void)
capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_NATIVE_USB4_SUPPORT;
osc_cpc_flexible_adr_space_confirmed =
capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_CPC_FLEXIBLE_ADR_SPACE;
+ osc_sb_hotplug_enabled_support_acked =
+ capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_HOTPLUG_ENABLED_SUPPORT;
+ osc_sb_hotplug_present_support_acked =
+ capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_HOTPLUG_PRESENT_SUPPORT;
}

kfree(context.ret.pointer);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 00be66683505..c572abac803c 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -559,12 +559,16 @@ acpi_status acpi_run_osc(acpi_handle handle, struct acpi_osc_context *context);
#define OSC_SB_NATIVE_USB4_SUPPORT 0x00040000
#define OSC_SB_PRM_SUPPORT 0x00200000
#define OSC_SB_FFH_OPR_SUPPORT 0x00400000
+#define OSC_SB_HOTPLUG_ENABLED_SUPPORT 0x00800000
+#define OSC_SB_HOTPLUG_PRESENT_SUPPORT 0x01000000

extern bool osc_sb_apei_support_acked;
extern bool osc_pc_lpi_support_confirmed;
extern bool osc_sb_native_usb4_support_confirmed;
extern bool osc_sb_cppc2_support_acked;
extern bool osc_cpc_flexible_adr_space_confirmed;
+extern bool osc_sb_hotplug_enabled_support_acked;
+extern bool osc_sb_hotplug_present_support_acked;

/* USB4 Capabilities */
#define OSC_USB_USB3_TUNNELING 0x00000001
--
2.30.2

2023-12-14 17:32:54

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 01/21] ACPI: Only enumerate enabled (or functional) devices

On Wed, 13 Dec 2023 12:49:16 +0000
Russell King (Oracle) <[email protected]> wrote:

> From: James Morse <[email protected]>
>
> Today the ACPI enumeration code 'visits' all devices that are present.
>
> This is a problem for arm64, where CPUs are always present, but not
> always enabled. When a device-check occurs because the firmware-policy
> has changed and a CPU is now enabled, the following error occurs:
> | acpi ACPI0007:48: Enumeration failure
>
> This is ultimately because acpi_dev_ready_for_enumeration() returns
> true for a device that is not enabled. The ACPI Processor driver
> will not register such CPUs as they are not 'decoding their resources'.
>
> Change acpi_dev_ready_for_enumeration() to also check the enabled bit.
> ACPI allows a device to be functional instead of maintaining the
> present and enabled bit. Make this behaviour an explicit check with
> a reference to the spec, and then check the present and enabled bits.
> This is needed to avoid enumerating present && functional devices that
> are not enabled.
>
> Signed-off-by: James Morse <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>
> Signed-off-by: Russell King (Oracle) <[email protected]>
> ---
> If this change causes problems on deployed hardware, I suggest an
> arch opt-in: ACPI_IGNORE_STA_ENABLED, that causes
> acpi_dev_ready_for_enumeration() to only check the present bit.

My gut feeling (having made ACPI 'fixes' in the past that ran into
horribly broken firmware and had to be reverted) is reduce the blast
radius preemptively from the start. I'd love to live in a world were
that wasn't necessary but I don't trust all the generators of ACPI tables.
I'll leave it to Rafael and other ACPI experts suggest how narrow we should
make it though - arch opt in might be narrow enough.

>
> Changes since RFC v2:
> * Incorporate comment suggestion by Gavin Shan.
> Other review comments from Jonathan Cameron not yet addressed.

Looking back, I think this was mainly a suggestion for a minor
possible optimization by ignoring the case of !present && enabled
when designing the logic because that's not allowed by the spec.

You made that change in v3.

Otherwise, comments were trivial comment clarifications that I'm not
that worried about.

One comment typo inline.

With assumption others will comment on when this change should be
chicken bit'd out.

Reviewed-by: Jonathan Cameron <[email protected]>


> ---
> drivers/acpi/device_pm.c | 2 +-
> drivers/acpi/device_sysfs.c | 2 +-
> drivers/acpi/internal.h | 1 -
> drivers/acpi/property.c | 2 +-
> drivers/acpi/scan.c | 24 ++++++++++++++----------
> 5 files changed, 17 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> index 3b4d048c4941..e3c80f3b3b57 100644
> --- a/drivers/acpi/device_pm.c
> +++ b/drivers/acpi/device_pm.c
> @@ -313,7 +313,7 @@ int acpi_bus_init_power(struct acpi_device *device)
> return -EINVAL;
>
> device->power.state = ACPI_STATE_UNKNOWN;
> - if (!acpi_device_is_present(device)) {
> + if (!acpi_dev_ready_for_enumeration(device)) {
> device->flags.initialized = false;
> return -ENXIO;
> }
> diff --git a/drivers/acpi/device_sysfs.c b/drivers/acpi/device_sysfs.c
> index 23373faa35ec..a0256d2493a7 100644
> --- a/drivers/acpi/device_sysfs.c
> +++ b/drivers/acpi/device_sysfs.c
> @@ -141,7 +141,7 @@ static int create_pnp_modalias(const struct acpi_device *acpi_dev, char *modalia
> struct acpi_hardware_id *id;
>
> /* Avoid unnecessarily loading modules for non present devices. */
> - if (!acpi_device_is_present(acpi_dev))
> + if (!acpi_dev_ready_for_enumeration(acpi_dev))
> return 0;
>
> /*
> diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
> index 866c7c4ed233..a1b45e345bcc 100644
> --- a/drivers/acpi/internal.h
> +++ b/drivers/acpi/internal.h
> @@ -107,7 +107,6 @@ int acpi_device_setup_files(struct acpi_device *dev);
> void acpi_device_remove_files(struct acpi_device *dev);
> void acpi_device_add_finalize(struct acpi_device *device);
> void acpi_free_pnp_ids(struct acpi_device_pnp *pnp);
> -bool acpi_device_is_present(const struct acpi_device *adev);
> bool acpi_device_is_battery(struct acpi_device *adev);
> bool acpi_device_is_first_physical_node(struct acpi_device *adev,
> const struct device *dev);
> diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c
> index 6979a3f9f90a..14d6948fd88a 100644
> --- a/drivers/acpi/property.c
> +++ b/drivers/acpi/property.c
> @@ -1420,7 +1420,7 @@ static bool acpi_fwnode_device_is_available(const struct fwnode_handle *fwnode)
> if (!is_acpi_device_node(fwnode))
> return false;
>
> - return acpi_device_is_present(to_acpi_device_node(fwnode));
> + return acpi_dev_ready_for_enumeration(to_acpi_device_node(fwnode));
> }
>
> static const void *
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 02bb2cce423f..728649a2a251 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -304,7 +304,7 @@ static int acpi_scan_device_check(struct acpi_device *adev)
> int error;
>
> acpi_bus_get_status(adev);
> - if (acpi_device_is_present(adev)) {
> + if (acpi_dev_ready_for_enumeration(adev)) {
> /*
> * This function is only called for device objects for which
> * matching scan handlers exist. The only situation in which
> @@ -338,7 +338,7 @@ static int acpi_scan_bus_check(struct acpi_device *adev, void *not_used)
> int error;
>
> acpi_bus_get_status(adev);
> - if (!acpi_device_is_present(adev)) {
> + if (!acpi_dev_ready_for_enumeration(adev)) {
> acpi_scan_device_not_enumerated(adev);
> return 0;
> }
> @@ -1913,11 +1913,6 @@ static bool acpi_device_should_be_hidden(acpi_handle handle)
> return true;
> }
>
> -bool acpi_device_is_present(const struct acpi_device *adev)
> -{
> - return adev->status.present || adev->status.functional;
> -}
> -
> static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
> const char *idstr,
> const struct acpi_device_id **matchid)
> @@ -2381,16 +2376,25 @@ EXPORT_SYMBOL_GPL(acpi_dev_clear_dependencies);
> * acpi_dev_ready_for_enumeration - Check if the ACPI device is ready for enumeration
> * @device: Pointer to the &struct acpi_device to check
> *
> - * Check if the device is present and has no unmet dependencies.
> + * Check if the device is functional or enabled and has no unmet dependencies.
> *
> - * Return true if the device is ready for enumeratino. Otherwise, return false.
> + * Return true if the device is ready for enumeration. Otherwise, return false.
> */
> bool acpi_dev_ready_for_enumeration(const struct acpi_device *device)
> {
> if (device->flags.honor_deps && device->dep_unmet)
> return false;
>
> - return acpi_device_is_present(device);
> + /*
> + * ACPI 6.5's 6.3.7 "_STA (Device Status)" allows firmware to return
> + * (!present && functional) for certain types of devices that should be
> + * enumerated. Note that the enabled bit can't be sert until the present

set until

> + * bit is set.
> + */
> + if (device->status.present)
> + return device->status.enabled;
> + else
> + return device->status.functional;
> }
> EXPORT_SYMBOL_GPL(acpi_dev_ready_for_enumeration);
>

2023-12-14 17:37:15

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 02/21] ACPI: processor: Add support for processors described as container packages

On Wed, 13 Dec 2023 12:49:21 +0000
Russell King (Oracle) <[email protected]> wrote:

> From: James Morse <[email protected]>
>
> ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
> 5.2.12:
>
> "Starting with ACPI Specification 6.3, the use of the Processor() object
> was deprecated. Only legacy systems should continue with this usage. On
> the Itanium architecture only, a _UID is provided for the Processor()
> that is a string object. This usage of _UID is also deprecated since it
> can preclude an OSPM from being able to match a processor to a
> non-enumerable device, such as those defined in the MADT. From ACPI
> Specification 6.3 onward, all processor objects for all architectures
> except Itanium must now use Device() objects with an _HID of ACPI0007,
> and use only integer _UID values."
>
> Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors
>
> Duplicate descriptions are not allowed, the ACPI processor driver already
> parses the UID from both devices and containers. acpi_processor_get_info()
> returns an error if the UID exists twice in the DSDT.
>
> The missing probe for CPUs described as packages creates a problem for
> moving the cpu_register() calls into the acpi_processor driver, as CPUs
> described like this don't get registered, leading to errors from other
> subsystems when they try to add new sysfs entries to the CPU node.
> (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)
>
> To fix this, parse the processor container and call acpi_processor_add()
> for each processor that is discovered like this. The processor container
> handler is added with acpi_scan_add_handler(), so no detach call will
> arrive.
>
> Qemu TCG describes CPUs using processor devices in a processor container.
> For more information, see build_cpus_aml() in Qemu hw/acpi/cpu.c and
> https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#processor-container-device
>
> Signed-off-by: James Morse <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>
> ---
> Outstanding comments:
> https://lore.kernel.org/r/[email protected]
Looks like you resolved those (were all patch description things).

So I'm happy.
Reviewed-by: Jonathan Cameron <[email protected]>

Thanks,

J
> https://lore.kernel.org/r/[email protected]
> ---
> drivers/acpi/acpi_processor.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 4fe2ef54088c..6a542e0ce396 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -626,9 +626,31 @@ static struct acpi_scan_handler processor_handler = {
> },
> };
>
> +static acpi_status acpi_processor_container_walk(acpi_handle handle,
> + u32 lvl,
> + void *context,
> + void **rv)
> +{
> + struct acpi_device *adev;
> + acpi_status status;
> +
> + adev = acpi_get_acpi_dev(handle);
> + if (!adev)
> + return AE_ERROR;
> +
> + status = acpi_processor_add(adev, &processor_device_ids[0]);
> + acpi_put_acpi_dev(adev);
> +
> + return status;
> +}
> +
> static int acpi_processor_container_attach(struct acpi_device *dev,
> const struct acpi_device_id *id)
> {
> + acpi_walk_namespace(ACPI_TYPE_PROCESSOR, dev->handle,
> + ACPI_UINT32_MAX, acpi_processor_container_walk,
> + NULL, NULL, NULL);
> +
> return 1;
> }
>

2023-12-14 17:38:34

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 04/21] ACPI: processor: Register all CPUs from acpi_processor_get_info()

On Wed, 13 Dec 2023 12:49:31 +0000
Russell King (Oracle) <[email protected]> wrote:

> From: James Morse <[email protected]>
>
> To allow ACPI to skip the call to arch_register_cpu() when the _STA
> value indicates the CPU can't be brought online right now, move the
> arch_register_cpu() call into acpi_processor_get_info().
>
> Systems can still be booted with 'acpi=off', or not include an
> ACPI description at all. For these, the CPUs continue to be
> registered by cpu_dev_register_generic().
>
> This moves the CPU register logic back to a subsys_initcall(),
> while the memory nodes will have been registered earlier.
>
> Signed-off-by: James Morse <[email protected]>
> Reviewed-by: Gavin Shan <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>
> Signed-off-by: Russell King (Oracle) <[email protected]>
LGTM as well.

Reviewed-by: Jonathan Cameron <[email protected]>

2023-12-14 17:43:57

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 07/21] ACPI: Rename acpi_processor_hotadd_init and remove pre-processor guards

On Wed, 13 Dec 2023 12:49:47 +0000
Russell King (Oracle) <[email protected]> wrote:

> From: James Morse <[email protected]>
>
> acpi_processor_hotadd_init() will make a CPU present by mapping it
> based on its hardware id.
>
> 'hotadd_init' is ambiguous once there are two different behaviours
> for cpu hotplug. This is for toggling the _STA present bit. Subsequent
> patches will add support for toggling the _STA enabled bit, named
> acpi_processor_make_enabled().
>
> Rename it acpi_processor_make_present() to make it clear this is
> for CPUs that were not previously present.
>
> Expose the function prototypes it uses to allow the preprocessor
> guards to be removed. The IS_ENABLED() check will let the compiler
> dead-code elimination pass remove this if it isn't going to be
> used.
>
> Signed-off-by: James Morse <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>
> ---
> Outstanding comments:
> https://lore.kernel.org/r/[email protected]

If it's not caused a build warning yet, chances are high this is fine.

Reviewed-by: Jonathan Cameron <[email protected]>

> https://lore.kernel.org/r/[email protected]
> For this comment, we use IS_ENABLED() in multiple places in the kernel in
> this way, and it isn't a problem.
> ---
> drivers/acpi/acpi_processor.c | 14 +++++---------
> include/linux/acpi.h | 2 --
> 2 files changed, 5 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index c8e960ff0aca..26e3efb74614 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -183,13 +183,15 @@ static void __init acpi_pcc_cpufreq_init(void) {}
> #endif /* CONFIG_X86 */
>
> /* Initialization */
> -#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU
> -static int acpi_processor_hotadd_init(struct acpi_processor *pr)
> +static int acpi_processor_make_present(struct acpi_processor *pr)
> {
> unsigned long long sta;
> acpi_status status;
> int ret;
>
> + if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
> + return -ENODEV;
> +
> if (invalid_phys_cpuid(pr->phys_id))
> return -ENODEV;
>
> @@ -223,12 +225,6 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr)
> cpu_maps_update_done();
> return ret;
> }
> -#else
> -static inline int acpi_processor_hotadd_init(struct acpi_processor *pr)
> -{
> - return -ENODEV;
> -}
> -#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */
>
> static int acpi_processor_get_info(struct acpi_device *device)
> {
> @@ -335,7 +331,7 @@ static int acpi_processor_get_info(struct acpi_device *device)
> * because cpuid <-> apicid mapping is persistent now.
> */
> if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> - int ret = acpi_processor_hotadd_init(pr);
> + int ret = acpi_processor_make_present(pr);
>
> if (ret)
> return ret;
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 36071bc11acd..19d009ca9e7a 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -301,12 +301,10 @@ static inline int acpi_processor_evaluate_cst(acpi_handle handle, u32 cpu,
> }
> #endif
>
> -#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU
> /* Arch dependent functions for cpu hotplug support */
> int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, u32 acpi_id,
> int *pcpu);
> int acpi_unmap_cpu(int cpu);
> -#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */
>
> #ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
> int acpi_get_ioapic_id(acpi_handle handle, u32 gsi_base, u64 *phys_addr);

2023-12-14 17:47:24

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 01/21] ACPI: Only enumerate enabled (or functional) devices

On Thu, Dec 14, 2023 at 6:32 PM Jonathan Cameron
<[email protected]> wrote:
>
> On Wed, 13 Dec 2023 12:49:16 +0000
> Russell King (Oracle) <[email protected]> wrote:
>
> > From: James Morse <[email protected]>
> >
> > Today the ACPI enumeration code 'visits' all devices that are present.
> >
> > This is a problem for arm64, where CPUs are always present, but not
> > always enabled. When a device-check occurs because the firmware-policy
> > has changed and a CPU is now enabled, the following error occurs:
> > | acpi ACPI0007:48: Enumeration failure
> >
> > This is ultimately because acpi_dev_ready_for_enumeration() returns
> > true for a device that is not enabled. The ACPI Processor driver
> > will not register such CPUs as they are not 'decoding their resources'.
> >
> > Change acpi_dev_ready_for_enumeration() to also check the enabled bit.
> > ACPI allows a device to be functional instead of maintaining the
> > present and enabled bit. Make this behaviour an explicit check with
> > a reference to the spec, and then check the present and enabled bits.
> > This is needed to avoid enumerating present && functional devices that
> > are not enabled.
> >
> > Signed-off-by: James Morse <[email protected]>
> > Tested-by: Miguel Luis <[email protected]>
> > Tested-by: Vishnu Pajjuri <[email protected]>
> > Tested-by: Jianyong Wu <[email protected]>
> > Signed-off-by: Russell King (Oracle) <[email protected]>
> > ---
> > If this change causes problems on deployed hardware, I suggest an
> > arch opt-in: ACPI_IGNORE_STA_ENABLED, that causes
> > acpi_dev_ready_for_enumeration() to only check the present bit.
>
> My gut feeling (having made ACPI 'fixes' in the past that ran into
> horribly broken firmware and had to be reverted) is reduce the blast
> radius preemptively from the start. I'd love to live in a world were
> that wasn't necessary but I don't trust all the generators of ACPI tables.
> I'll leave it to Rafael and other ACPI experts suggest how narrow we should
> make it though - arch opt in might be narrow enough.

A chicken bit wouldn't help much IMO, especially in the cases when
working setups get broken.

I would very much prefer to limit the scope of it, say to processors
only, in the first place.

2023-12-14 17:55:41

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 01/21] ACPI: Only enumerate enabled (or functional) devices

On Thu, Dec 14, 2023 at 05:32:41PM +0000, Jonathan Cameron wrote:
> On Wed, 13 Dec 2023 12:49:16 +0000
> Russell King (Oracle) <[email protected]> wrote:
>
> > From: James Morse <[email protected]>
> >
> > Today the ACPI enumeration code 'visits' all devices that are present.
> >
> > This is a problem for arm64, where CPUs are always present, but not
> > always enabled. When a device-check occurs because the firmware-policy
> > has changed and a CPU is now enabled, the following error occurs:
> > | acpi ACPI0007:48: Enumeration failure
> >
> > This is ultimately because acpi_dev_ready_for_enumeration() returns
> > true for a device that is not enabled. The ACPI Processor driver
> > will not register such CPUs as they are not 'decoding their resources'.
> >
> > Change acpi_dev_ready_for_enumeration() to also check the enabled bit.
> > ACPI allows a device to be functional instead of maintaining the
> > present and enabled bit. Make this behaviour an explicit check with
> > a reference to the spec, and then check the present and enabled bits.
> > This is needed to avoid enumerating present && functional devices that
> > are not enabled.
> >
> > Signed-off-by: James Morse <[email protected]>
> > Tested-by: Miguel Luis <[email protected]>
> > Tested-by: Vishnu Pajjuri <[email protected]>
> > Tested-by: Jianyong Wu <[email protected]>
> > Signed-off-by: Russell King (Oracle) <[email protected]>
> > ---
> > If this change causes problems on deployed hardware, I suggest an
> > arch opt-in: ACPI_IGNORE_STA_ENABLED, that causes
> > acpi_dev_ready_for_enumeration() to only check the present bit.
>
> My gut feeling (having made ACPI 'fixes' in the past that ran into
> horribly broken firmware and had to be reverted) is reduce the blast
> radius preemptively from the start. I'd love to live in a world were
> that wasn't necessary but I don't trust all the generators of ACPI tables.
> I'll leave it to Rafael and other ACPI experts suggest how narrow we should
> make it though - arch opt in might be narrow enough.

Yes, I think an arch opt-in would be the most sensible way forward, if
Rafael concurs with that idea. I notice that what I wrote there was
actually an opt-out. I'll fix that.

> > + /*
> > + * ACPI 6.5's 6.3.7 "_STA (Device Status)" allows firmware to return
> > + * (!present && functional) for certain types of devices that should be
> > + * enumerated. Note that the enabled bit can't be sert until the present
>
> set until

Thanks for spotting that, fixed.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2023-12-14 17:58:39

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 02/21] ACPI: processor: Add support for processors described as container packages

On Thu, Dec 14, 2023 at 05:36:26PM +0000, Jonathan Cameron wrote:
> On Wed, 13 Dec 2023 12:49:21 +0000
> Russell King (Oracle) <[email protected]> wrote:
>
> > From: James Morse <[email protected]>
> >
> > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
> > 5.2.12:
> >
> > "Starting with ACPI Specification 6.3, the use of the Processor() object
> > was deprecated. Only legacy systems should continue with this usage. On
> > the Itanium architecture only, a _UID is provided for the Processor()
> > that is a string object. This usage of _UID is also deprecated since it
> > can preclude an OSPM from being able to match a processor to a
> > non-enumerable device, such as those defined in the MADT. From ACPI
> > Specification 6.3 onward, all processor objects for all architectures
> > except Itanium must now use Device() objects with an _HID of ACPI0007,
> > and use only integer _UID values."
> >
> > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors
> >
> > Duplicate descriptions are not allowed, the ACPI processor driver already
> > parses the UID from both devices and containers. acpi_processor_get_info()
> > returns an error if the UID exists twice in the DSDT.
> >
> > The missing probe for CPUs described as packages creates a problem for
> > moving the cpu_register() calls into the acpi_processor driver, as CPUs
> > described like this don't get registered, leading to errors from other
> > subsystems when they try to add new sysfs entries to the CPU node.
> > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)
> >
> > To fix this, parse the processor container and call acpi_processor_add()
> > for each processor that is discovered like this. The processor container
> > handler is added with acpi_scan_add_handler(), so no detach call will
> > arrive.
> >
> > Qemu TCG describes CPUs using processor devices in a processor container.
> > For more information, see build_cpus_aml() in Qemu hw/acpi/cpu.c and
> > https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#processor-container-device
> >
> > Signed-off-by: James Morse <[email protected]>
> > Tested-by: Miguel Luis <[email protected]>
> > Tested-by: Vishnu Pajjuri <[email protected]>
> > Tested-by: Jianyong Wu <[email protected]>
> > ---
> > Outstanding comments:
> > https://lore.kernel.org/r/[email protected]
> Looks like you resolved those (were all patch description things).
>
> So I'm happy.
> Reviewed-by: Jonathan Cameron <[email protected]>

Great, I wasn't sure if I had resolved them to your satisfaction, so I
kept the reference to your original review. I've now removed it and
added your r-b. Thanks.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2023-12-14 18:00:55

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

On Thu, Dec 14, 2023 at 05:41:07PM +0000, Jonathan Cameron wrote:
> > Signed-off-by: James Morse <[email protected]>
> > Tested-by: Miguel Luis <[email protected]>
> > Tested-by: Vishnu Pajjuri <[email protected]>
> > Tested-by: Jianyong Wu <[email protected]>
> > Signed-off-by: Russell King (Oracle) <[email protected]>
> Formatting nitpick inline. Either way FWIW:
> Reviewed-by: Jonathan Cameron <[email protected]>

Thanks, but you're absolutely correct about the nitpick, so I've fixed
that too!

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2023-12-14 18:04:22

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 07/21] ACPI: Rename acpi_processor_hotadd_init and remove pre-processor guards

On Thu, Dec 14, 2023 at 05:43:37PM +0000, Jonathan Cameron wrote:
> On Wed, 13 Dec 2023 12:49:47 +0000
> Russell King (Oracle) <[email protected]> wrote:
>
> > From: James Morse <[email protected]>
> >
> > acpi_processor_hotadd_init() will make a CPU present by mapping it
> > based on its hardware id.
> >
> > 'hotadd_init' is ambiguous once there are two different behaviours
> > for cpu hotplug. This is for toggling the _STA present bit. Subsequent
> > patches will add support for toggling the _STA enabled bit, named
> > acpi_processor_make_enabled().
> >
> > Rename it acpi_processor_make_present() to make it clear this is
> > for CPUs that were not previously present.
> >
> > Expose the function prototypes it uses to allow the preprocessor
> > guards to be removed. The IS_ENABLED() check will let the compiler
> > dead-code elimination pass remove this if it isn't going to be
> > used.
> >
> > Signed-off-by: James Morse <[email protected]>
> > Tested-by: Miguel Luis <[email protected]>
> > Tested-by: Vishnu Pajjuri <[email protected]>
> > Tested-by: Jianyong Wu <[email protected]>
> > ---
> > Outstanding comments:
> > https://lore.kernel.org/r/[email protected]
>
> If it's not caused a build warning yet, chances are high this is fine.
>
> Reviewed-by: Jonathan Cameron <[email protected]>
>
> > https://lore.kernel.org/r/[email protected]
> > For this comment, we use IS_ENABLED() in multiple places in the kernel in
> > this way, and it isn't a problem.

Yes, for both of these comments, I think they aren't something that
needs any action - these patches have been published in my tree since
October, and that is subject to the kernel build bot which hasn't found
any issues.

So, I'll add your r-b, add my s-o-b, and remove the "outstanding
comments" from this patch.

Thanks.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2023-12-14 18:11:13

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 01/21] ACPI: Only enumerate enabled (or functional) devices

On Thu, Dec 14, 2023 at 06:47:00PM +0100, Rafael J. Wysocki wrote:
> On Thu, Dec 14, 2023 at 6:32 PM Jonathan Cameron
> <[email protected]> wrote:
> >
> > On Wed, 13 Dec 2023 12:49:16 +0000
> > Russell King (Oracle) <[email protected]> wrote:
> >
> > > From: James Morse <[email protected]>
> > >
> > > Today the ACPI enumeration code 'visits' all devices that are present.
> > >
> > > This is a problem for arm64, where CPUs are always present, but not
> > > always enabled. When a device-check occurs because the firmware-policy
> > > has changed and a CPU is now enabled, the following error occurs:
> > > | acpi ACPI0007:48: Enumeration failure
> > >
> > > This is ultimately because acpi_dev_ready_for_enumeration() returns
> > > true for a device that is not enabled. The ACPI Processor driver
> > > will not register such CPUs as they are not 'decoding their resources'.
> > >
> > > Change acpi_dev_ready_for_enumeration() to also check the enabled bit.
> > > ACPI allows a device to be functional instead of maintaining the
> > > present and enabled bit. Make this behaviour an explicit check with
> > > a reference to the spec, and then check the present and enabled bits.
> > > This is needed to avoid enumerating present && functional devices that
> > > are not enabled.
> > >
> > > Signed-off-by: James Morse <[email protected]>
> > > Tested-by: Miguel Luis <[email protected]>
> > > Tested-by: Vishnu Pajjuri <[email protected]>
> > > Tested-by: Jianyong Wu <[email protected]>
> > > Signed-off-by: Russell King (Oracle) <[email protected]>
> > > ---
> > > If this change causes problems on deployed hardware, I suggest an
> > > arch opt-in: ACPI_IGNORE_STA_ENABLED, that causes
> > > acpi_dev_ready_for_enumeration() to only check the present bit.
> >
> > My gut feeling (having made ACPI 'fixes' in the past that ran into
> > horribly broken firmware and had to be reverted) is reduce the blast
> > radius preemptively from the start. I'd love to live in a world were
> > that wasn't necessary but I don't trust all the generators of ACPI tables.
> > I'll leave it to Rafael and other ACPI experts suggest how narrow we should
> > make it though - arch opt in might be narrow enough.
>
> A chicken bit wouldn't help much IMO, especially in the cases when
> working setups get broken.
>
> I would very much prefer to limit the scope of it, say to processors
> only, in the first place.

Thanks for the feedback and the idea.

I guess we need something like:

if (device->status.present)
return device->device_type != ACPI_BUS_TYPE_PROCESSOR ||
device->status.enabled;
else
return device->status.functional;

so we only check device->status.enabled for processor-type devices?

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2023-12-14 18:17:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 01/21] ACPI: Only enumerate enabled (or functional) devices

On Thu, Dec 14, 2023 at 7:10 PM Russell King (Oracle)
<[email protected]> wrote:
>
> On Thu, Dec 14, 2023 at 06:47:00PM +0100, Rafael J. Wysocki wrote:
> > On Thu, Dec 14, 2023 at 6:32 PM Jonathan Cameron
> > <[email protected]> wrote:
> > >
> > > On Wed, 13 Dec 2023 12:49:16 +0000
> > > Russell King (Oracle) <[email protected]> wrote:
> > >
> > > > From: James Morse <[email protected]>
> > > >
> > > > Today the ACPI enumeration code 'visits' all devices that are present.
> > > >
> > > > This is a problem for arm64, where CPUs are always present, but not
> > > > always enabled. When a device-check occurs because the firmware-policy
> > > > has changed and a CPU is now enabled, the following error occurs:
> > > > | acpi ACPI0007:48: Enumeration failure
> > > >
> > > > This is ultimately because acpi_dev_ready_for_enumeration() returns
> > > > true for a device that is not enabled. The ACPI Processor driver
> > > > will not register such CPUs as they are not 'decoding their resources'.
> > > >
> > > > Change acpi_dev_ready_for_enumeration() to also check the enabled bit.
> > > > ACPI allows a device to be functional instead of maintaining the
> > > > present and enabled bit. Make this behaviour an explicit check with
> > > > a reference to the spec, and then check the present and enabled bits.
> > > > This is needed to avoid enumerating present && functional devices that
> > > > are not enabled.
> > > >
> > > > Signed-off-by: James Morse <[email protected]>
> > > > Tested-by: Miguel Luis <[email protected]>
> > > > Tested-by: Vishnu Pajjuri <[email protected]>
> > > > Tested-by: Jianyong Wu <[email protected]>
> > > > Signed-off-by: Russell King (Oracle) <[email protected]>
> > > > ---
> > > > If this change causes problems on deployed hardware, I suggest an
> > > > arch opt-in: ACPI_IGNORE_STA_ENABLED, that causes
> > > > acpi_dev_ready_for_enumeration() to only check the present bit.
> > >
> > > My gut feeling (having made ACPI 'fixes' in the past that ran into
> > > horribly broken firmware and had to be reverted) is reduce the blast
> > > radius preemptively from the start. I'd love to live in a world were
> > > that wasn't necessary but I don't trust all the generators of ACPI tables.
> > > I'll leave it to Rafael and other ACPI experts suggest how narrow we should
> > > make it though - arch opt in might be narrow enough.
> >
> > A chicken bit wouldn't help much IMO, especially in the cases when
> > working setups get broken.
> >
> > I would very much prefer to limit the scope of it, say to processors
> > only, in the first place.
>
> Thanks for the feedback and the idea.
>
> I guess we need something like:
>
> if (device->status.present)
> return device->device_type != ACPI_BUS_TYPE_PROCESSOR ||
> device->status.enabled;
> else
> return device->status.functional;
>
> so we only check device->status.enabled for processor-type devices?

Yes, something like this.

2023-12-14 18:37:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 01/21] ACPI: Only enumerate enabled (or functional) devices

On Thu, Dec 14, 2023 at 7:16 PM Rafael J. Wysocki <[email protected]> wrote:
>
> On Thu, Dec 14, 2023 at 7:10 PM Russell King (Oracle)
> <[email protected]> wrote:
> >
> > On Thu, Dec 14, 2023 at 06:47:00PM +0100, Rafael J. Wysocki wrote:
> > > On Thu, Dec 14, 2023 at 6:32 PM Jonathan Cameron
> > > <[email protected]> wrote:
> > > >
> > > > On Wed, 13 Dec 2023 12:49:16 +0000
> > > > Russell King (Oracle) <[email protected]> wrote:
> > > >
> > > > > From: James Morse <[email protected]>
> > > > >
> > > > > Today the ACPI enumeration code 'visits' all devices that are present.
> > > > >
> > > > > This is a problem for arm64, where CPUs are always present, but not
> > > > > always enabled. When a device-check occurs because the firmware-policy
> > > > > has changed and a CPU is now enabled, the following error occurs:
> > > > > | acpi ACPI0007:48: Enumeration failure
> > > > >
> > > > > This is ultimately because acpi_dev_ready_for_enumeration() returns
> > > > > true for a device that is not enabled. The ACPI Processor driver
> > > > > will not register such CPUs as they are not 'decoding their resources'.
> > > > >
> > > > > Change acpi_dev_ready_for_enumeration() to also check the enabled bit.
> > > > > ACPI allows a device to be functional instead of maintaining the
> > > > > present and enabled bit. Make this behaviour an explicit check with
> > > > > a reference to the spec, and then check the present and enabled bits.
> > > > > This is needed to avoid enumerating present && functional devices that
> > > > > are not enabled.
> > > > >
> > > > > Signed-off-by: James Morse <[email protected]>
> > > > > Tested-by: Miguel Luis <[email protected]>
> > > > > Tested-by: Vishnu Pajjuri <[email protected]>
> > > > > Tested-by: Jianyong Wu <[email protected]>
> > > > > Signed-off-by: Russell King (Oracle) <[email protected]>
> > > > > ---
> > > > > If this change causes problems on deployed hardware, I suggest an
> > > > > arch opt-in: ACPI_IGNORE_STA_ENABLED, that causes
> > > > > acpi_dev_ready_for_enumeration() to only check the present bit.
> > > >
> > > > My gut feeling (having made ACPI 'fixes' in the past that ran into
> > > > horribly broken firmware and had to be reverted) is reduce the blast
> > > > radius preemptively from the start. I'd love to live in a world were
> > > > that wasn't necessary but I don't trust all the generators of ACPI tables.
> > > > I'll leave it to Rafael and other ACPI experts suggest how narrow we should
> > > > make it though - arch opt in might be narrow enough.
> > >
> > > A chicken bit wouldn't help much IMO, especially in the cases when
> > > working setups get broken.
> > >
> > > I would very much prefer to limit the scope of it, say to processors
> > > only, in the first place.
> >
> > Thanks for the feedback and the idea.
> >
> > I guess we need something like:
> >
> > if (device->status.present)
> > return device->device_type != ACPI_BUS_TYPE_PROCESSOR ||
> > device->status.enabled;
> > else
> > return device->status.functional;
> >
> > so we only check device->status.enabled for processor-type devices?
>
> Yes, something like this.

However, that is not sufficient, because there are
ACPI_BUS_TYPE_DEVICE devices representing processors.

I'm not sure about a clean way to do it ATM.

2023-12-14 19:44:11

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

On Wed, 13 Dec 2023 12:49:37 +0000
Russell King (Oracle) <[email protected]> wrote:

> From: James Morse <[email protected]>
>
> The code behind ACPI_HOTPLUG_CPU allows a not-present CPU to become
> present. This isn't the only use of HOTPLUG_CPU. On arm64 and riscv
> CPUs can be taken offline as a power saving measure.
>
> On arm64 an offline CPU may be disabled by firmware, preventing it from
> being brought back online, but it remains present throughout.
>
> Adding code to prevent user-space trying to online these disabled CPUs
> needs some additional terminology.
>
> Rename the Kconfig symbol CONFIG_ACPI_HOTPLUG_PRESENT_CPU to reflect
> that it makes possible CPUs present.
>
> HOTPLUG_CPU is untouched as this is only about the ACPI mechanism.
>
> Signed-off-by: James Morse <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>
> Signed-off-by: Russell King (Oracle) <[email protected]>
Formatting nitpick inline. Either way FWIW:
Reviewed-by: Jonathan Cameron <[email protected]>

> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 4db54e928b36..36071bc11acd 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h

> #ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
> int acpi_get_ioapic_id(acpi_handle handle, u32 gsi_base, u64 *phys_addr);
> @@ -629,7 +629,7 @@ static inline u32 acpi_osc_ctx_get_cxl_control(struct acpi_osc_context *context)
> #define ACPI_GSB_ACCESS_ATTRIB_RAW_PROCESS 0x0000000F
>
> /* Enable _OST when all relevant hotplug operations are enabled */
> -#if defined(CONFIG_ACPI_HOTPLUG_CPU) && \
> +#if defined(CONFIG_ACPI_HOTPLUG_PRESENT_CPU) && \

Trivial but I think there is a tab to many before that \

> defined(CONFIG_ACPI_HOTPLUG_MEMORY) && \
> defined(CONFIG_ACPI_CONTAINER)
> #define ACPI_HOTPLUG_OST

2023-12-15 15:32:31

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 01/21] ACPI: Only enumerate enabled (or functional) devices

On Thu, Dec 14, 2023 at 07:37:10PM +0100, Rafael J. Wysocki wrote:
> On Thu, Dec 14, 2023 at 7:16 PM Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Thu, Dec 14, 2023 at 7:10 PM Russell King (Oracle)
> > <[email protected]> wrote:
> > > I guess we need something like:
> > >
> > > if (device->status.present)
> > > return device->device_type != ACPI_BUS_TYPE_PROCESSOR ||
> > > device->status.enabled;
> > > else
> > > return device->status.functional;
> > >
> > > so we only check device->status.enabled for processor-type devices?
> >
> > Yes, something like this.
>
> However, that is not sufficient, because there are
> ACPI_BUS_TYPE_DEVICE devices representing processors.
>
> I'm not sure about a clean way to do it ATM.

Ok, how about:

static bool acpi_dev_is_processor(const struct acpi_device *device)
{
struct acpi_hardware_id *hwid;

if (device->device_type == ACPI_BUS_TYPE_PROCESSOR)
return true;

if (device->device_type != ACPI_BUS_TYPE_DEVICE)
return false;

list_for_each_entry(hwid, &device->pnp.ids, list)
if (!strcmp(ACPI_PROCESSOR_OBJECT_HID, hwid->id) ||
!strcmp(ACPI_PROCESSOR_DEVICE_HID, hwid->id))
return true;

return false;
}

and then:

if (device->status.present)
return !acpi_dev_is_processor(device) || device->status.enabled;
else
return device->status.functional;

?

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2023-12-15 16:12:52

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 09/21] ACPI: convert acpi_processor_post_eject() to use IS_ENABLED()

On Wed, 13 Dec 2023 12:49:57 +0000
"Russell King (Oracle)" <[email protected]> wrote:

> Rather than ifdef'ing acpi_processor_post_eject() and its use site, use
> IS_ENABLED() to increase compile coverage.
>
> Signed-off-by: Russell King (Oracle) <[email protected]>

Reviewed-by: Jonathan Cameron <[email protected]>

2023-12-15 16:17:08

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 01/21] ACPI: Only enumerate enabled (or functional) devices

On Fri, 15 Dec 2023 15:31:55 +0000
"Russell King (Oracle)" <[email protected]> wrote:

> On Thu, Dec 14, 2023 at 07:37:10PM +0100, Rafael J. Wysocki wrote:
> > On Thu, Dec 14, 2023 at 7:16 PM Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > On Thu, Dec 14, 2023 at 7:10 PM Russell King (Oracle)
> > > <[email protected]> wrote:
> > > > I guess we need something like:
> > > >
> > > > if (device->status.present)
> > > > return device->device_type != ACPI_BUS_TYPE_PROCESSOR ||
> > > > device->status.enabled;
> > > > else
> > > > return device->status.functional;
> > > >
> > > > so we only check device->status.enabled for processor-type devices?
> > >
> > > Yes, something like this.
> >
> > However, that is not sufficient, because there are
> > ACPI_BUS_TYPE_DEVICE devices representing processors.
> >
> > I'm not sure about a clean way to do it ATM.
>
> Ok, how about:
>
> static bool acpi_dev_is_processor(const struct acpi_device *device)
> {
> struct acpi_hardware_id *hwid;
>
> if (device->device_type == ACPI_BUS_TYPE_PROCESSOR)
> return true;
>
> if (device->device_type != ACPI_BUS_TYPE_DEVICE)
> return false;
>
> list_for_each_entry(hwid, &device->pnp.ids, list)
> if (!strcmp(ACPI_PROCESSOR_OBJECT_HID, hwid->id) ||
> !strcmp(ACPI_PROCESSOR_DEVICE_HID, hwid->id))
> return true;
>
> return false;
> }
>
> and then:
>
> if (device->status.present)
> return !acpi_dev_is_processor(device) || device->status.enabled;
> else
> return device->status.functional;
>
> ?
>
Changing it to CPU only for now makes sense to me and I think this code snippet should do the
job. Nice and simple.

2023-12-15 16:18:21

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 10/21] ACPI: Check _STA present bit before making CPUs not present

On Wed, 13 Dec 2023 12:50:02 +0000
Russell King (Oracle) <[email protected]> wrote:

> From: James Morse <[email protected]>
>
> When called acpi_processor_post_eject() unconditionally make a CPU
> not-present and unregisters it.
>
> To add support for AML events where the CPU has become disabled, but
> remains present, the _STA method should be checked before calling
> acpi_processor_remove().
>
> Rename acpi_processor_post_eject() acpi_processor_remove_possible(), and
> check the _STA before calling.
>
> Adding the function prototype for arch_unregister_cpu() allows the
> preprocessor guards to be removed.
>
> After this change CPUs will remain registered and visible to
> user-space as offline if buggy firmware triggers an eject-request,
> but doesn't clear the corresponding _STA bits after _EJ0 has been
> called.
>
> Signed-off-by: James Morse <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>
LGTM
Reviewed-by: Jonathan Cameron <[email protected]>

2023-12-15 16:23:42

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 13/21] ACPICA: Add new MADT GICC flags fields

On Wed, 13 Dec 2023 12:50:18 +0000
Russell King (Oracle) <[email protected]> wrote:

> From: James Morse <[email protected]>
>
> Add the new flag field to the MADT's GICC structure.
>
> 'Online Capable' indicates a disabled CPU can be enabled later. See
> ACPI specification 6.5 Tabel 5.37: GICC CPU Interface Flags.
>
> Signed-off-by: James Morse <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>
> Signed-off-by: Russell King (Oracle) <[email protected]>

I see there is an acpica pull request including this bit but with a different name
For reference.
https://github.com/acpica/acpica/pull/914/commits/453a5f67567786522021d5f6913f561f8b3cabf6

+CC Lorenzo who submitted that.

> ---
> This patch probably needs to go via the upstream acpica project,
> but is included here so the feature can be tested.
>
> If the ACPICA header files are updated before merging this patch set,
> this patch will need to be dropped.
>
> Changes since RFC v2:
> * Add ACPI specification reference.
> ---
> include/acpi/actbl2.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
> index 3751ae69432f..c433a079d8e1 100644
> --- a/include/acpi/actbl2.h
> +++ b/include/acpi/actbl2.h
> @@ -1046,6 +1046,7 @@ struct acpi_madt_generic_interrupt {
> /* ACPI_MADT_ENABLED (1) Processor is usable if set */
> #define ACPI_MADT_PERFORMANCE_IRQ_MODE (1<<1) /* 01: Performance Interrupt Mode */
> #define ACPI_MADT_VGIC_IRQ_MODE (1<<2) /* 02: VGIC Maintenance Interrupt mode */
> +#define ACPI_MADT_GICC_CPU_CAPABLE (1<<3) /* 03: CPU is online capable */

ACPI_MADT_GICC_ONLINE_CAPABLE

>
> /* 12: Generic Distributor (ACPI 5.0 + ACPI 6.0 changes) */
>


2023-12-15 16:33:58

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 14/21] irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc()

On Wed, 13 Dec 2023 12:50:23 +0000
Russell King (Oracle) <[email protected]> wrote:

> From: James Morse <[email protected]>
>
> gic_acpi_match_gicc() is only called via gic_acpi_count_gicr_regions().
> It should only count the number of enabled redistributors, but it
> also tries to sanity check the GICC entry, currently returning an
> error if the Enabled bit is set, but the gicr_base_address is zero.
>
> Adding support for the online-capable bit to the sanity check
> complicates it, for no benefit. The existing check implicitly
> depends on gic_acpi_count_gicr_regions() previous failing to find
> any GICR regions (as it is valid to have gicr_base_address of zero if
> the redistributors are described via a GICR entry).
>
> Instead of complicating the check, remove it. Failures that happen
> at this point cause the irqchip not to register, meaning no irqs
> can be requested. The kernel grinds to a panic() pretty quickly.
>
> Without the check, MADT tables that exhibit this problem are still
> caught by gic_populate_rdist(), which helpfully also prints what
> went wrong:
> | CPU4: mpidr 100 has no re-distributor!
>
> Signed-off-by: James Morse <[email protected]>
> Reviewed-by: Gavin Shan <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>
> Signed-off-by: Russell King (Oracle) <[email protected]>
> ---
> drivers/irqchip/irq-gic-v3.c | 18 ++++++------------
> 1 file changed, 6 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 98b0329b7154..ebecd4546830 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -2420,21 +2420,15 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header,
>
> /*
> * If GICC is enabled and has valid gicr base address, then it means
> - * GICR base is presented via GICC
> + * GICR base is presented via GICC. The redistributor is only known to
> + * be accessible if the GICC is marked as enabled. If this bit is not
> + * set, we'd need to add the redistributor at runtime, which isn't
> + * supported.
> */
> - if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) {
> + if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address)

I was very vague in previous review. I think the reasons you are switching
from acpi_gicc_is_useable(gicc) to the gicc->flags & ACPI_MADT_ENABLED
needs calling out as I'm fairly sure that this point in the series at least
acpi_gicc_is_usable is same as current upstream:

static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc)
{
return gicc->flags & ACPI_MADT_ENABLED;
}

> acpi_data.enabled_rdists++;
> - return 0;
> - }
>
> - /*
> - * It's perfectly valid firmware can pass disabled GICC entry, driver
> - * should not treat as errors, skip the entry instead of probe fail.
> - */
> - if (!acpi_gicc_is_usable(gicc))
> - return 0;
> -
> - return -ENODEV;
> + return 0;
> }
>
> static int __init gic_acpi_count_gicr_regions(void)


2023-12-15 16:39:06

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 15/21] irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs

On Wed, 13 Dec 2023 12:50:28 +0000
Russell King (Oracle) <[email protected]> wrote:

> From: James Morse <[email protected]>
>
> To support virtual CPU hotplug, ACPI has added an 'online capable' bit
> to the MADT GICC entries. This indicates a disabled CPU entry may not
> be possible to online via PSCI until firmware has set enabled bit in
> _STA.
>
> What about the redistributor in the GICC entry? ACPI doesn't want to say.
> Assume the worst: When a redistributor is described in the GICC entry,
> but the entry is marked as disabled at boot, assume the redistributor
> is inaccessible.
>
> The GICv3 driver doesn't support late online of redistributors, so this
> means the corresponding CPU can't be brought online either. Clear the
> possible and present bits.
>
> Systems that want CPU hotplug in a VM can ensure their redistributors
> are always-on, and describe them that way with a GICR entry in the MADT.
>
> When mapping redistributors found via GICC entries, handle the case
> where the arch code believes the CPU is present and possible, but it
> does not have an accessible redistributor. Print a warning and clear
> the present and possible bits.
>
> Signed-off-by: James Morse <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>
> Signed-off-by: Russell King (Oracle) <[email protected]>

Seems resonable, but this contains the blob that makes the change I called
out in the previous patch relevant. With a forwards reference in that patch.

Reviewed-by: Jonathan Cameron <[email protected]>

> ----
> Disabled but online-capable CPUs cause this message to be printed
> if their redistributors are described via GICC:
> | GICv3: CPU 3's redistributor is inaccessible: this CPU can't be brought online
>
> If ACPI's _STA tries to make the cpu present later, this message is printed:
> | Changing CPU present bit is not supported
>
> Changes since RFC v2:
> * use gicc->flags & (ACPI_MADT_ENABLED | ACPI_MADT_GICC_CPU_CAPABLE)
> ---
> drivers/irqchip/irq-gic-v3.c | 14 ++++++++++++++
> include/linux/acpi.h | 2 +-
> 2 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index ebecd4546830..6d0f98d3540e 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -2370,11 +2370,25 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header,
> (struct acpi_madt_generic_interrupt *)header;
> u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK;
> u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2;
> + int cpu = get_cpu_for_acpi_id(gicc->uid);
> void __iomem *redist_base;
>
> if (!acpi_gicc_is_usable(gicc))
> return 0;
>
> + /*
> + * Capable but disabled CPUs can be brought online later. What about
> + * the redistributor? ACPI doesn't want to say!
> + * Virtual hotplug systems can use the MADT's "always-on" GICR entries.
> + * Otherwise, prevent such CPUs from being brought online.
> + */
> + if (!(gicc->flags & ACPI_MADT_ENABLED)) {
> + pr_warn_once("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu);
> + set_cpu_present(cpu, false);
> + set_cpu_possible(cpu, false);
> + return 0;
> + }
> +
> redist_base = ioremap(gicc->gicr_base_address, size);
> if (!redist_base)
> return -ENOMEM;
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 19d009ca9e7a..00be66683505 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -238,7 +238,7 @@ void acpi_table_print_madt_entry (struct acpi_subtable_header *madt);
>
> static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc)
> {
> - return gicc->flags & ACPI_MADT_ENABLED;
> + return gicc->flags & (ACPI_MADT_ENABLED | ACPI_MADT_GICC_CPU_CAPABLE);

This is where the change is made that broke the code path in
the previous patch. No problem with splitting that across patches but maybe call out
why in the patch intro for previous patch.

> }
>
> /* the following numa functions are architecture-dependent */


2023-12-15 16:40:49

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 16/21] arm64: psci: Ignore DENIED CPUs

On Wed, 13 Dec 2023 12:50:33 +0000
Russell King (Oracle) <[email protected]> wrote:

> From: Jean-Philippe Brucker <[email protected]>
>
> When a CPU is marked as disabled, but online capable in the MADT, PSCI
> applies some firmware policy to control when it can be brought online.
> PSCI returns DENIED to a CPU_ON request if this is not currently
> permitted. The OS can learn the current policy from the _STA enabled bit.
>
> Handle the PSCI DENIED return code gracefully instead of printing an
> error.
>
> See https://developer.arm.com/documentation/den0022/f/?lang=en page 58.
>
> Signed-off-by: Jean-Philippe Brucker <[email protected]>
> [ morse: Rewrote commit message ]
> Signed-off-by: James Morse <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>
> Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>

2023-12-15 16:50:33

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 18/21] ACPI: processor: Only call arch_unregister_cpu() if HOTPLUG_CPU is selected

On Wed, 13 Dec 2023 12:50:43 +0000
Russell King (Oracle) <[email protected]> wrote:

> From: James Morse <[email protected]>
>
> The kbuild robot points out that configurations without HOTPLUG_CPU
> selected can try to build acpi_processor_post_eject() without success
> as arch_unregister_cpu() is not defined.
>
> Check this explicitly. This will be merged into:
> | ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug
> for any subsequent posting.
>
> Reported-by: kbuild test robot <[email protected]>
> Signed-off-by: James Morse <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>
> ---
> This should probably be squashed into an earlier patch.

Agreed. If not
Reviewed-by: Jonathan Cameron <[email protected]>



2023-12-15 16:53:51

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 13/21] ACPICA: Add new MADT GICC flags fields

On Fri, Dec 15, 2023 at 04:23:22PM +0000, Jonathan Cameron wrote:
> On Wed, 13 Dec 2023 12:50:18 +0000
> Russell King (Oracle) <[email protected]> wrote:
>
> > From: James Morse <[email protected]>
> >
> > Add the new flag field to the MADT's GICC structure.
> >
> > 'Online Capable' indicates a disabled CPU can be enabled later. See
> > ACPI specification 6.5 Tabel 5.37: GICC CPU Interface Flags.
> >
> > Signed-off-by: James Morse <[email protected]>
> > Tested-by: Miguel Luis <[email protected]>
> > Tested-by: Vishnu Pajjuri <[email protected]>
> > Tested-by: Jianyong Wu <[email protected]>
> > Signed-off-by: Russell King (Oracle) <[email protected]>
>
> I see there is an acpica pull request including this bit but with a different name
> For reference.
> https://github.com/acpica/acpica/pull/914/commits/453a5f67567786522021d5f6913f561f8b3cabf6
>
> +CC Lorenzo who submitted that.

> > +#define ACPI_MADT_GICC_CPU_CAPABLE (1<<3) /* 03: CPU is online capable */
>
> ACPI_MADT_GICC_ONLINE_CAPABLE

It's somewhat disappointing, but no big deal. It's easy enough to change
"irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs"
to use Lorenzo's name when that patch hits - and it becomes one less
patch in this patch set when Lorenzo's change eventually hits mainline.

Does anyone know how long it may take for Lorenzo's change to get into
mainline? Would it be by the 6.8 merge window or the following one?

Thanks.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2023-12-15 17:05:02

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 19/21] arm64: document virtual CPU hotplug's expectations

On Wed, 13 Dec 2023 12:50:49 +0000
Russell King (Oracle) <[email protected]> wrote:

> From: James Morse <[email protected]>
>
> Add a description of physical and virtual CPU hotplug, explain the
> differences and elaborate on what is required in ACPI for a working
> virtual hotplug system.
>
> Signed-off-by: James Morse <[email protected]>
> ---
> Outstanding comment:
> https://lore.kernel.org/r/[email protected]

Hmm. This one is the comment that if we allow for a legacy unware guest, we
have no way of indicating that CPUS that were enabled at boot can ever be removed.

Effectively that means that without the cloud being aware of the VM capabilities
before it is booted (and can maybe use the proposed OSC) there is no way of knowing
if a CPU can be removed. Sounds profitable :)

I'm fine with that. So as long a people grasp the concern and we make sure that
the QEMU side doesn't change it's legacy behavior (I think we are fine in Salil's
latest set).

Reviewed-by: Jonathan Cameron <[email protected]>


Jonathan


> ---
> Documentation/arch/arm64/cpu-hotplug.rst | 79 ++++++++++++++++++++++++
> Documentation/arch/arm64/index.rst | 1 +
> 2 files changed, 80 insertions(+)
> create mode 100644 Documentation/arch/arm64/cpu-hotplug.rst
>
> diff --git a/Documentation/arch/arm64/cpu-hotplug.rst b/Documentation/arch/arm64/cpu-hotplug.rst
> new file mode 100644
> index 000000000000..76ba8d932c72
> --- /dev/null
> +++ b/Documentation/arch/arm64/cpu-hotplug.rst
> @@ -0,0 +1,79 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. _cpuhp_index:
> +
> +====================
> +CPU Hotplug and ACPI
> +====================
> +
> +CPU hotplug in the arm64 world is commonly used to describe the kernel taking
> +CPUs online/offline using PSCI. This document is about ACPI firmware allowing
> +CPUs that were not available during boot to be added to the system later.
> +
> +``possible`` and ``present`` refer to the state of the CPU as seen by linux.
> +
> +
> +CPU Hotplug on physical systems - CPUs not present at boot
> +----------------------------------------------------------
> +
> +Physical systems need to mark a CPU that is ``possible`` but not ``present`` as
> +being ``present``. An example would be a dual socket machine, where the package
> +in one of the sockets can be replaced while the system is running.
> +
> +This is not supported.
> +
> +In the arm64 world CPUs are not a single device but a slice of the system.
> +There are no systems that support the physical addition (or removal) of CPUs
> +while the system is running, and ACPI is not able to sufficiently describe
> +them.
> +
> +e.g. New CPUs come with new caches, but the platform's cache toplogy is
> +described in a static table, the PPTT. How caches are shared between CPUs is
> +not discoverable, and must be described by firmware.
> +
> +e.g. The GIC redistributor for each CPU must be accessed by the driver during
> +boot to discover the system wide supported features. ACPI's MADT GICC
> +structures can describe a redistributor associated with a disabled CPU, but
> +can't describe whether the redistributor is accessible, only that it is not
> +'always on'.
> +
> +arm64's ACPI tables assume that everything described is ``present``.
> +
> +
> +CPU Hotplug on virtual systems - CPUs not enabled at boot
> +---------------------------------------------------------
> +
> +Virtual systems have the advantage that all the properties the system will
> +ever have can be described at boot. There are no power-domain considerations
> +as such devices are emulated.
> +
> +CPU Hotplug on virtual systems is supported. It is distinct from physical
> +CPU Hotplug as all resources are described as ``present``, but CPUs may be
> +marked as disabled by firmware. Only the CPU's online/offline behaviour is
> +influenced by firmware. An example is where a virtual machine boots with a
> +single CPU, and additional CPUs are added once a cloud orchestrator deploys
> +the workload.
> +
> +For a virtual machine, the VMM (e.g. Qemu) plays the part of firmware.
> +
> +Virtual hotplug is implemented as a firmware policy affecting which CPUs can be
> +brought online. Firmware can enforce its policy via PSCI's return codes. e.g.
> +``DENIED``.
> +
> +The ACPI tables must describe all the resources of the virtual machine. CPUs
> +that firmware wishes to disable either from boot (or later) should not be
> +``enabled`` in the MADT GICC structures, but should have the ``online capable``
> +bit set, to indicate they can be enabled later. The boot CPU must be marked as
> +``enabled``. The 'always on' GICR structure must be used to describe the
> +redistributors.
> +
> +CPUs described as ``online capable`` but not ``enabled`` can be set to enabled
> +by the DSDT's Processor object's _STA method. On virtual systems the _STA method
> +must always report the CPU as ``present``. Changes to the firmware policy can
> +be notified to the OS via device-check or eject-request.
> +
> +CPUs described as ``enabled`` in the static table, should not have their _STA
> +modified dynamically by firmware. Soft-restart features such as kexec will
> +re-read the static properties of the system from these static tables, and
> +may malfunction if these no longer describe the running system. Linux will
> +re-discover the dynamic properties of the system from the _STA method later
> +during boot.
> diff --git a/Documentation/arch/arm64/index.rst b/Documentation/arch/arm64/index.rst
> index d08e924204bf..78544de0a8a9 100644
> --- a/Documentation/arch/arm64/index.rst
> +++ b/Documentation/arch/arm64/index.rst
> @@ -13,6 +13,7 @@ ARM64 Architecture
> asymmetric-32bit
> booting
> cpu-feature-registers
> + cpu-hotplug
> elf_hwcaps
> hugetlbpage
> kdump


2023-12-15 17:20:56

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 20/21] ACPI: Add _OSC bits to advertise OS support for toggling CPU present/enabled

On Wed, 13 Dec 2023 12:50:54 +0000
Russell King (Oracle) <[email protected]> wrote:

> From: James Morse <[email protected]>
>
> Platform firmware can disabled a CPU, or make it not-present by making
> an eject-request notification, then waiting for the os to make it offline
OS

> and call _EJx. After the firmware updates _STA with the new status.
>
> Not all operating systems support this. For arm64 making CPUs not-present
> has never been supported. For all ACPI architectures, making CPUs disabled
> has recently been added. Firmware can't know what the OS has support for.
>
> Add two new _OSC bits to advertise whether the OS supports the _STA enabled
> or present bits being toggled for CPUs. This will be important for arm64
> if systems that support physical CPU hotplug ever appear as arm64 linux
> doesn't currently support this, so firmware shouldn't try.
>
> Advertising this support to firmware is useful for cloud orchestrators
> to know whether they can scale a particular VM by adding CPUs.
>
> Signed-off-by: James Morse <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>

I'm very much in favor of this _OSC but it hasn't been accepted yet I think...
https://bugzilla.tianocore.org/show_bug.cgi?id=4481

Jose? Github suggests you are the proposer on this.

btw v4 looks ok but v5 in the tianocore github seems to have lost the actual OSC part.

Jonathan

> ---
> I'm assuming Loongarch machines do not support physical CPU hotplug.
>
> Changes since RFC v3:
> * Drop ia64 changes
> * Update James' comment below "---" to remove reference to ia64
>
> Outstanding comment:
> https://lore.kernel.org/r/[email protected]



> ---
> arch/x86/Kconfig | 1 +
> drivers/acpi/Kconfig | 9 +++++++++
> drivers/acpi/acpi_processor.c | 14 +++++++++++++-
> drivers/acpi/bus.c | 16 ++++++++++++++++
> include/linux/acpi.h | 4 ++++
> 5 files changed, 43 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 64fc7c475ab0..33fc4dcd950c 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -60,6 +60,7 @@ config X86
> select ACPI_LEGACY_TABLES_LOOKUP if ACPI
> select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
> select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR && HOTPLUG_CPU
> + select ACPI_HOTPLUG_IGNORE_OSC if ACPI && HOTPLUG_CPU
> select ARCH_32BIT_OFF_T if X86_32
> select ARCH_CLOCKSOURCE_INIT
> select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index 9c5a43d0aff4..020e7c0ab985 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -311,6 +311,15 @@ config ACPI_HOTPLUG_PRESENT_CPU
> depends on ACPI_PROCESSOR && HOTPLUG_CPU
> select ACPI_CONTAINER
>
> +config ACPI_HOTPLUG_IGNORE_OSC
> + bool
> + depends on ACPI_HOTPLUG_PRESENT_CPU
> + help
> + Ignore whether firmware acknowledged support for toggling the CPU
> + present bit in _STA. Some architectures predate the _OSC bits, so
> + firmware doesn't know to do this.
> +
> +
> config ACPI_PROCESSOR_AGGREGATOR
> tristate "Processor Aggregator"
> depends on ACPI_PROCESSOR
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index ea12e70dfd39..5bb207a7a1dd 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -182,6 +182,18 @@ static void __init acpi_pcc_cpufreq_init(void)
> static void __init acpi_pcc_cpufreq_init(void) {}
> #endif /* CONFIG_X86 */
>
> +static bool acpi_processor_hotplug_present_supported(void)
> +{
> + if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
> + return false;
> +
> + /* x86 systems pre-date the _OSC bit */
> + if (IS_ENABLED(CONFIG_ACPI_HOTPLUG_IGNORE_OSC))
> + return true;
> +
> + return osc_sb_hotplug_present_support_acked;
> +}
> +
> /* Initialization */
> static int acpi_processor_make_present(struct acpi_processor *pr)
> {
> @@ -189,7 +201,7 @@ static int acpi_processor_make_present(struct acpi_processor *pr)
> acpi_status status;
> int ret;
>
> - if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU)) {
> + if (!acpi_processor_hotplug_present_supported()) {
> pr_err_once("Changing CPU present bit is not supported\n");
> return -ENODEV;
> }
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index 72e64c0718c9..7122450739d6 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -298,6 +298,13 @@ EXPORT_SYMBOL_GPL(osc_sb_native_usb4_support_confirmed);
>
> bool osc_sb_cppc2_support_acked;
>
> +/*
> + * ACPI 6.? Proposed Operating System Capabilities for modifying CPU
> + * present/enable.
> + */
> +bool osc_sb_hotplug_enabled_support_acked;
> +bool osc_sb_hotplug_present_support_acked;
> +
> static u8 sb_uuid_str[] = "0811B06E-4A27-44F9-8D60-3CBBC22E7B48";
> static void acpi_bus_osc_negotiate_platform_control(void)
> {
> @@ -346,6 +353,11 @@ static void acpi_bus_osc_negotiate_platform_control(void)
>
> if (!ghes_disable)
> capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
> +
> + capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_HOTPLUG_ENABLED_SUPPORT;
> + if (IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
> + capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_HOTPLUG_PRESENT_SUPPORT;
> +
> if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
> return;
>
> @@ -383,6 +395,10 @@ static void acpi_bus_osc_negotiate_platform_control(void)
> capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_NATIVE_USB4_SUPPORT;
> osc_cpc_flexible_adr_space_confirmed =
> capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_CPC_FLEXIBLE_ADR_SPACE;
> + osc_sb_hotplug_enabled_support_acked =
> + capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_HOTPLUG_ENABLED_SUPPORT;
> + osc_sb_hotplug_present_support_acked =
> + capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_HOTPLUG_PRESENT_SUPPORT;
> }
>
> kfree(context.ret.pointer);
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 00be66683505..c572abac803c 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -559,12 +559,16 @@ acpi_status acpi_run_osc(acpi_handle handle, struct acpi_osc_context *context);
> #define OSC_SB_NATIVE_USB4_SUPPORT 0x00040000
> #define OSC_SB_PRM_SUPPORT 0x00200000
> #define OSC_SB_FFH_OPR_SUPPORT 0x00400000
> +#define OSC_SB_HOTPLUG_ENABLED_SUPPORT 0x00800000
> +#define OSC_SB_HOTPLUG_PRESENT_SUPPORT 0x01000000
>
> extern bool osc_sb_apei_support_acked;
> extern bool osc_pc_lpi_support_confirmed;
> extern bool osc_sb_native_usb4_support_confirmed;
> extern bool osc_sb_cppc2_support_acked;
> extern bool osc_cpc_flexible_adr_space_confirmed;
> +extern bool osc_sb_hotplug_enabled_support_acked;
> +extern bool osc_sb_hotplug_present_support_acked;
>
> /* USB4 Capabilities */
> #define OSC_USB_USB3_TUNNELING 0x00000001


2023-12-15 17:28:17

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 21/21] cpumask: Add enabled cpumask for present CPUs that can be brought online

On Wed, 13 Dec 2023 12:50:59 +0000
Russell King (Oracle) <[email protected]> wrote:

> From: James Morse <[email protected]>
>
> The 'offline' file in sysfs shows all offline CPUs, including those
> that aren't present. User-space is expected to remove not-present CPUs
> from this list to learn which CPUs could be brought online.
>
> CPUs can be present but not-enabled. These CPUs can't be brought online
> until the firmware policy changes, which comes with an ACPI notification
> that will register the CPUs.
>
> With only the offline and present files, user-space is unable to
> determine which CPUs it can try to bring online. Add a new CPU mask
> that shows this based on all the registered CPUs.
>
> Signed-off-by: James Morse <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>
> ---

Needs docs
Documentation/ABI/testing/sysfs-devices-system-cpu
seems to have the rest of the similar entries.

> Outstanding comment:
> https://lore.kernel.org/r/[email protected]
Very fussy reviewer. I'd ignore him on this :)

Code is fine.

Thanks for taking this forwards. Maybe the end of this saga is
close!

Jonathan

> ---
> drivers/base/cpu.c | 10 ++++++++++
> include/linux/cpumask.h | 25 +++++++++++++++++++++++++
> kernel/cpu.c | 3 +++
> 3 files changed, 38 insertions(+)
>
> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index 13d052bf13f4..a6e96a0a92b7 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -95,6 +95,7 @@ void unregister_cpu(struct cpu *cpu)
> {
> int logical_cpu = cpu->dev.id;
>
> + set_cpu_enabled(logical_cpu, false);
> unregister_cpu_under_node(logical_cpu, cpu_to_node(logical_cpu));
>
> device_unregister(&cpu->dev);
> @@ -273,6 +274,13 @@ static ssize_t print_cpus_offline(struct device *dev,
> }
> static DEVICE_ATTR(offline, 0444, print_cpus_offline, NULL);
>
> +static ssize_t print_cpus_enabled(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + return sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(cpu_enabled_mask));
> +}
> +static DEVICE_ATTR(enabled, 0444, print_cpus_enabled, NULL);
> +
> static ssize_t print_cpus_isolated(struct device *dev,
> struct device_attribute *attr, char *buf)
> {
> @@ -413,6 +421,7 @@ int register_cpu(struct cpu *cpu, int num)
> register_cpu_under_node(num, cpu_to_node(num));
> dev_pm_qos_expose_latency_limit(&cpu->dev,
> PM_QOS_RESUME_LATENCY_NO_CONSTRAINT);
> + set_cpu_enabled(num, true);
>
> return 0;
> }
> @@ -494,6 +503,7 @@ static struct attribute *cpu_root_attrs[] = {
> &cpu_attrs[2].attr.attr,
> &dev_attr_kernel_max.attr,
> &dev_attr_offline.attr,
> + &dev_attr_enabled.attr,
> &dev_attr_isolated.attr,
> #ifdef CONFIG_NO_HZ_FULL
> &dev_attr_nohz_full.attr,
> diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> index cfb545841a2c..cc72a0887f04 100644
> --- a/include/linux/cpumask.h
> +++ b/include/linux/cpumask.h
> @@ -92,6 +92,7 @@ static inline void set_nr_cpu_ids(unsigned int nr)
> *
> * cpu_possible_mask- has bit 'cpu' set iff cpu is populatable
> * cpu_present_mask - has bit 'cpu' set iff cpu is populated
> + * cpu_enabled_mask - has bit 'cpu' set iff cpu can be brought online
> * cpu_online_mask - has bit 'cpu' set iff cpu available to scheduler
> * cpu_active_mask - has bit 'cpu' set iff cpu available to migration
> *
> @@ -124,11 +125,13 @@ static inline void set_nr_cpu_ids(unsigned int nr)
>
> extern struct cpumask __cpu_possible_mask;
> extern struct cpumask __cpu_online_mask;
> +extern struct cpumask __cpu_enabled_mask;
> extern struct cpumask __cpu_present_mask;
> extern struct cpumask __cpu_active_mask;
> extern struct cpumask __cpu_dying_mask;
> #define cpu_possible_mask ((const struct cpumask *)&__cpu_possible_mask)
> #define cpu_online_mask ((const struct cpumask *)&__cpu_online_mask)
> +#define cpu_enabled_mask ((const struct cpumask *)&__cpu_enabled_mask)
> #define cpu_present_mask ((const struct cpumask *)&__cpu_present_mask)
> #define cpu_active_mask ((const struct cpumask *)&__cpu_active_mask)
> #define cpu_dying_mask ((const struct cpumask *)&__cpu_dying_mask)
> @@ -993,6 +996,7 @@ extern const DECLARE_BITMAP(cpu_all_bits, NR_CPUS);
> #else
> #define for_each_possible_cpu(cpu) for_each_cpu((cpu), cpu_possible_mask)
> #define for_each_online_cpu(cpu) for_each_cpu((cpu), cpu_online_mask)
> +#define for_each_enabled_cpu(cpu) for_each_cpu((cpu), cpu_enabled_mask)
> #define for_each_present_cpu(cpu) for_each_cpu((cpu), cpu_present_mask)
> #endif
>
> @@ -1015,6 +1019,15 @@ set_cpu_possible(unsigned int cpu, bool possible)
> cpumask_clear_cpu(cpu, &__cpu_possible_mask);
> }
>
> +static inline void
> +set_cpu_enabled(unsigned int cpu, bool can_be_onlined)
> +{
> + if (can_be_onlined)
> + cpumask_set_cpu(cpu, &__cpu_enabled_mask);
> + else
> + cpumask_clear_cpu(cpu, &__cpu_enabled_mask);
> +}
> +
> static inline void
> set_cpu_present(unsigned int cpu, bool present)
> {
> @@ -1096,6 +1109,7 @@ static __always_inline unsigned int num_online_cpus(void)
> return raw_atomic_read(&__num_online_cpus);
> }
> #define num_possible_cpus() cpumask_weight(cpu_possible_mask)
> +#define num_enabled_cpus() cpumask_weight(cpu_enabled_mask)
> #define num_present_cpus() cpumask_weight(cpu_present_mask)
> #define num_active_cpus() cpumask_weight(cpu_active_mask)
>
> @@ -1104,6 +1118,11 @@ static inline bool cpu_online(unsigned int cpu)
> return cpumask_test_cpu(cpu, cpu_online_mask);
> }
>
> +static inline bool cpu_enabled(unsigned int cpu)
> +{
> + return cpumask_test_cpu(cpu, cpu_enabled_mask);
> +}
> +
> static inline bool cpu_possible(unsigned int cpu)
> {
> return cpumask_test_cpu(cpu, cpu_possible_mask);
> @@ -1128,6 +1147,7 @@ static inline bool cpu_dying(unsigned int cpu)
>
> #define num_online_cpus() 1U
> #define num_possible_cpus() 1U
> +#define num_enabled_cpus() 1U
> #define num_present_cpus() 1U
> #define num_active_cpus() 1U
>
> @@ -1141,6 +1161,11 @@ static inline bool cpu_possible(unsigned int cpu)
> return cpu == 0;
> }
>
> +static inline bool cpu_enabled(unsigned int cpu)
> +{
> + return cpu == 0;
> +}
> +
> static inline bool cpu_present(unsigned int cpu)
> {
> return cpu == 0;
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index a86972a91991..fe0a5189f8ae 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -3122,6 +3122,9 @@ EXPORT_SYMBOL(__cpu_possible_mask);
> struct cpumask __cpu_online_mask __read_mostly;
> EXPORT_SYMBOL(__cpu_online_mask);
>
> +struct cpumask __cpu_enabled_mask __read_mostly;
> +EXPORT_SYMBOL(__cpu_enabled_mask);
> +
> struct cpumask __cpu_present_mask __read_mostly;
> EXPORT_SYMBOL(__cpu_present_mask);
>


2023-12-15 19:41:51

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH RFC v3 21/21] cpumask: Add enabled cpumask for present CPUs that can be brought online

On Wed, Dec 13 2023 at 12:50, Russell King (Oracle) wrote:
> From: James Morse <[email protected]>
>
> The 'offline' file in sysfs shows all offline CPUs, including those
> that aren't present. User-space is expected to remove not-present CPUs
> from this list to learn which CPUs could be brought online.
>
> CPUs can be present but not-enabled. These CPUs can't be brought online
> until the firmware policy changes, which comes with an ACPI notification
> that will register the CPUs.
>
> With only the offline and present files, user-space is unable to
> determine which CPUs it can try to bring online. Add a new CPU mask
> that shows this based on all the registered CPUs.
>
> Signed-off-by: James Morse <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>

Acked-by: Thomas Gleixner <[email protected]>

2023-12-15 19:47:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 01/21] ACPI: Only enumerate enabled (or functional) devices

On Friday, December 15, 2023 5:15:39 PM CET Jonathan Cameron wrote:
> On Fri, 15 Dec 2023 15:31:55 +0000
> "Russell King (Oracle)" <[email protected]> wrote:
>
> > On Thu, Dec 14, 2023 at 07:37:10PM +0100, Rafael J. Wysocki wrote:
> > > On Thu, Dec 14, 2023 at 7:16 PM Rafael J. Wysocki <[email protected]> wrote:
> > > >
> > > > On Thu, Dec 14, 2023 at 7:10 PM Russell King (Oracle)
> > > > <[email protected]> wrote:
> > > > > I guess we need something like:
> > > > >
> > > > > if (device->status.present)
> > > > > return device->device_type != ACPI_BUS_TYPE_PROCESSOR ||
> > > > > device->status.enabled;
> > > > > else
> > > > > return device->status.functional;
> > > > >
> > > > > so we only check device->status.enabled for processor-type devices?
> > > >
> > > > Yes, something like this.
> > >
> > > However, that is not sufficient, because there are
> > > ACPI_BUS_TYPE_DEVICE devices representing processors.
> > >
> > > I'm not sure about a clean way to do it ATM.
> >
> > Ok, how about:
> >
> > static bool acpi_dev_is_processor(const struct acpi_device *device)
> > {
> > struct acpi_hardware_id *hwid;
> >
> > if (device->device_type == ACPI_BUS_TYPE_PROCESSOR)
> > return true;
> >
> > if (device->device_type != ACPI_BUS_TYPE_DEVICE)
> > return false;
> >
> > list_for_each_entry(hwid, &device->pnp.ids, list)
> > if (!strcmp(ACPI_PROCESSOR_OBJECT_HID, hwid->id) ||
> > !strcmp(ACPI_PROCESSOR_DEVICE_HID, hwid->id))
> > return true;
> >
> > return false;
> > }
> >
> > and then:
> >
> > if (device->status.present)
> > return !acpi_dev_is_processor(device) || device->status.enabled;
> > else
> > return device->status.functional;
> >
> > ?
> >
> Changing it to CPU only for now makes sense to me and I think this code snippet should do the
> job. Nice and simple.

Well, except that it does checks that are done elsewhere slightly
differently, which from the maintenance POV is not nice.

Maybe something like the appended patch (untested).

---
drivers/acpi/acpi_processor.c | 11 +++++++++++
drivers/acpi/internal.h | 3 +++
drivers/acpi/scan.c | 24 +++++++++++++++++++++++-
3 files changed, 37 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/acpi/acpi_processor.c
===================================================================
--- linux-pm.orig/drivers/acpi/acpi_processor.c
+++ linux-pm/drivers/acpi/acpi_processor.c
@@ -644,6 +644,17 @@ static struct acpi_scan_handler processo
},
};

+bool acpi_device_is_processor(const struct acpi_device *adev)
+{
+ if (adev->device_type == ACPI_BUS_TYPE_PROCESSOR)
+ return true;
+
+ if (adev->device_type != ACPI_BUS_TYPE_DEVICE)
+ return false;
+
+ return acpi_scan_check_handler(adev, &processor_handler);
+}
+
static int acpi_processor_container_attach(struct acpi_device *dev,
const struct acpi_device_id *id)
{
Index: linux-pm/drivers/acpi/internal.h
===================================================================
--- linux-pm.orig/drivers/acpi/internal.h
+++ linux-pm/drivers/acpi/internal.h
@@ -62,6 +62,8 @@ void acpi_sysfs_add_hotplug_profile(stru
int acpi_scan_add_handler_with_hotplug(struct acpi_scan_handler *handler,
const char *hotplug_profile_name);
void acpi_scan_hotplug_enabled(struct acpi_hotplug_profile *hotplug, bool val);
+bool acpi_scan_check_handler(const struct acpi_device *adev,
+ struct acpi_scan_handler *handler);

#ifdef CONFIG_DEBUG_FS
extern struct dentry *acpi_debugfs_dir;
@@ -133,6 +135,7 @@ int acpi_bus_register_early_device(int t
const struct acpi_device *acpi_companion_match(const struct device *dev);
int __acpi_device_uevent_modalias(const struct acpi_device *adev,
struct kobj_uevent_env *env);
+bool acpi_device_is_processor(const struct acpi_device *adev);

/* --------------------------------------------------------------------------
Power Resource
Index: linux-pm/drivers/acpi/scan.c
===================================================================
--- linux-pm.orig/drivers/acpi/scan.c
+++ linux-pm/drivers/acpi/scan.c
@@ -1938,6 +1938,19 @@ static bool acpi_scan_handler_matching(s
return false;
}

+bool acpi_scan_check_handler(const struct acpi_device *adev,
+ struct acpi_scan_handler *handler)
+{
+ struct acpi_hardware_id *hwid;
+
+ list_for_each_entry(hwid, &adev->pnp.ids, list) {
+ if (acpi_scan_handler_matching(handler, hwid->id, NULL))
+ return true;
+ }
+
+ return false;
+}
+
static struct acpi_scan_handler *acpi_scan_match_handler(const char *idstr,
const struct acpi_device_id **matchid)
{
@@ -2410,7 +2423,16 @@ bool acpi_dev_ready_for_enumeration(cons
if (device->flags.honor_deps && device->dep_unmet)
return false;

- return acpi_device_is_present(device);
+ if (device->status.functional)
+ return true;
+
+ if (!device->status.present)
+ return false;
+
+ if (device->status.enabled)
+ return true; /* Fast path. */
+
+ return !acpi_device_is_processor(device);
}
EXPORT_SYMBOL_GPL(acpi_dev_ready_for_enumeration);





2023-12-18 09:23:39

by Lorenzo Pieralisi

[permalink] [raw]
Subject: Re: [PATCH RFC v3 13/21] ACPICA: Add new MADT GICC flags fields

On Fri, Dec 15, 2023 at 04:53:28PM +0000, Russell King (Oracle) wrote:
> On Fri, Dec 15, 2023 at 04:23:22PM +0000, Jonathan Cameron wrote:
> > On Wed, 13 Dec 2023 12:50:18 +0000
> > Russell King (Oracle) <[email protected]> wrote:
> >
> > > From: James Morse <[email protected]>
> > >
> > > Add the new flag field to the MADT's GICC structure.
> > >
> > > 'Online Capable' indicates a disabled CPU can be enabled later. See
> > > ACPI specification 6.5 Tabel 5.37: GICC CPU Interface Flags.
> > >
> > > Signed-off-by: James Morse <[email protected]>
> > > Tested-by: Miguel Luis <[email protected]>
> > > Tested-by: Vishnu Pajjuri <[email protected]>
> > > Tested-by: Jianyong Wu <[email protected]>
> > > Signed-off-by: Russell King (Oracle) <[email protected]>
> >
> > I see there is an acpica pull request including this bit but with a different name
> > For reference.
> > https://github.com/acpica/acpica/pull/914/commits/453a5f67567786522021d5f6913f561f8b3cabf6
> >
> > +CC Lorenzo who submitted that.
>
> > > +#define ACPI_MADT_GICC_CPU_CAPABLE (1<<3) /* 03: CPU is online capable */
> >
> > ACPI_MADT_GICC_ONLINE_CAPABLE
>
> It's somewhat disappointing, but no big deal. It's easy enough to change
> "irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs"
> to use Lorenzo's name when that patch hits - and it becomes one less
> patch in this patch set when Lorenzo's change eventually hits mainline.
>
> Does anyone know how long it may take for Lorenzo's change to get into
> mainline? Would it be by the 6.8 merge window or the following one?

I wish I knew. I submitted ACPICA changes for the online capable bit
since I had to add additional flags on top (ie DMA coherent) and it
would not make sense to submit the latter without the former.

I'd be great if the ACPICA headers can make it into Linux for the upcoming
merge window, not sure what I can do to fasttrack the process though
(I shall ping the maintainers).

Lorenzo

> Thanks.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2023-12-18 12:15:45

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 21/21] cpumask: Add enabled cpumask for present CPUs that can be brought online

On Fri, Dec 15, 2023 at 05:18:31PM +0000, Jonathan Cameron wrote:
> On Wed, 13 Dec 2023 12:50:59 +0000
> Russell King (Oracle) <[email protected]> wrote:
>
> > From: James Morse <[email protected]>
> >
> > The 'offline' file in sysfs shows all offline CPUs, including those
> > that aren't present. User-space is expected to remove not-present CPUs
> > from this list to learn which CPUs could be brought online.
> >
> > CPUs can be present but not-enabled. These CPUs can't be brought online
> > until the firmware policy changes, which comes with an ACPI notification
> > that will register the CPUs.
> >
> > With only the offline and present files, user-space is unable to
> > determine which CPUs it can try to bring online. Add a new CPU mask
> > that shows this based on all the registered CPUs.
> >
> > Signed-off-by: James Morse <[email protected]>
> > Tested-by: Miguel Luis <[email protected]>
> > Tested-by: Vishnu Pajjuri <[email protected]>
> > Tested-by: Jianyong Wu <[email protected]>
> > ---
>
> Needs docs
> Documentation/ABI/testing/sysfs-devices-system-cpu
> seems to have the rest of the similar entries.

Any ideas what I put in there as "Date" ? It seems to me that we have
little idea when this might be merged.. I could use the date of the
commit (Nov 2022).

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2023-12-18 13:11:24

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 18/21] ACPI: processor: Only call arch_unregister_cpu() if HOTPLUG_CPU is selected

On Fri, Dec 15, 2023 at 04:50:09PM +0000, Jonathan Cameron wrote:
> On Wed, 13 Dec 2023 12:50:43 +0000
> Russell King (Oracle) <[email protected]> wrote:
>
> > From: James Morse <[email protected]>
> >
> > The kbuild robot points out that configurations without HOTPLUG_CPU
> > selected can try to build acpi_processor_post_eject() without success
> > as arch_unregister_cpu() is not defined.
> >
> > Check this explicitly. This will be merged into:
> > | ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug
> > for any subsequent posting.
> >
> > Reported-by: kbuild test robot <[email protected]>
> > Signed-off-by: James Morse <[email protected]>
> > Tested-by: Miguel Luis <[email protected]>
> > Tested-by: Vishnu Pajjuri <[email protected]>
> > Tested-by: Jianyong Wu <[email protected]>
> > ---
> > This should probably be squashed into an earlier patch.
>
> Agreed. If not
> Reviewed-by: Jonathan Cameron <[email protected]>

I'm not convinced that "ACPI: Add post_eject to struct acpi_scan_handler
for cpu hotplug" is the correct commit to squash this into.

As far as acpi_processor.c is concerned, This commit merely renames
acpi_processor_remove() to be acpi_processor_post_eject(). The function
references arch_unregister_cpu() before and after this change, and its
build is dependent on CONFIG_ACPI_HOTPLUG_PRESENT_CPU being defined.

Commit "ACPI: convert acpi_processor_post_eject() to use IS_ENABLED()"
removed the ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU surrounding
acpi_processor_post_eject, and that symbol depends on
CONFIG_HOTPLUG_CPU, so I think this commit is also fine.

Commit "ACPI: Check _STA present bit before making CPUs not present"
rewrites the function - the original body gets called
acpi_processor_make_not_present() and a new acpi_processor_post_eject()
is created. At this point, it doesn't reference arch_unregister_cpu().

Commit "ACPI: add support to register CPUs based on the _STA enabled
bit" adds a reference to arch_unregister_cpu() in this new
acpi_processor_post_eject() - so I think this is the correct commit
this change should be merged into.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2023-12-18 13:13:33

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 17/21] ACPI: add support to register CPUs based on the _STA enabled bit

On Wed, Dec 13, 2023 at 12:50:38PM +0000, Russell King wrote:
> From: James Morse <[email protected]>
>
> acpi_processor_get_info() registers all present CPUs. Registering a
> CPU is what creates the sysfs entries and triggers the udev
> notifications.
>
> arm64 virtual machines that support 'virtual cpu hotplug' use the
> enabled bit to indicate whether the CPU can be brought online, as
> the existing ACPI tables require all hardware to be described and
> present.
>
> If firmware describes a CPU as present, but disabled, skip the
> registration. Such CPUs are present, but can't be brought online for
> whatever reason. (e.g. firmware/hypervisor policy).
>
> Once firmware sets the enabled bit, the CPU can be registered and
> brought online by user-space. Online CPUs, or CPUs that are missing
> an _STA method must always be registered.

...

> @@ -526,6 +552,9 @@ static void acpi_processor_post_eject(struct acpi_device *device)
> acpi_processor_make_not_present(device);
> return;
> }
> +
> + if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED))
> + arch_unregister_cpu(pr->id);

This change isn't described in the commit log, but seems to be the cause
of the build error identified by the kernel build bot that is fixed
later in this series. I'm wondering whether this should be in a
different patch, maybe "ACPI: Check _STA present bit before making CPUs
not present" ?

Or maybe my brain isn't working properly (due to being Covid positive.)
Any thoughts, Jonathan?

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2023-12-18 13:25:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 13/21] ACPICA: Add new MADT GICC flags fields

On Mon, Dec 18, 2023 at 10:23 AM Lorenzo Pieralisi
<[email protected]> wrote:
>
> On Fri, Dec 15, 2023 at 04:53:28PM +0000, Russell King (Oracle) wrote:
> > On Fri, Dec 15, 2023 at 04:23:22PM +0000, Jonathan Cameron wrote:
> > > On Wed, 13 Dec 2023 12:50:18 +0000
> > > Russell King (Oracle) <[email protected]> wrote:
> > >
> > > > From: James Morse <[email protected]>
> > > >
> > > > Add the new flag field to the MADT's GICC structure.
> > > >
> > > > 'Online Capable' indicates a disabled CPU can be enabled later. See
> > > > ACPI specification 6.5 Tabel 5.37: GICC CPU Interface Flags.
> > > >
> > > > Signed-off-by: James Morse <[email protected]>
> > > > Tested-by: Miguel Luis <[email protected]>
> > > > Tested-by: Vishnu Pajjuri <[email protected]>
> > > > Tested-by: Jianyong Wu <[email protected]>
> > > > Signed-off-by: Russell King (Oracle) <[email protected]>
> > >
> > > I see there is an acpica pull request including this bit but with a different name
> > > For reference.
> > > https://github.com/acpica/acpica/pull/914/commits/453a5f67567786522021d5f6913f561f8b3cabf6
> > >
> > > +CC Lorenzo who submitted that.
> >
> > > > +#define ACPI_MADT_GICC_CPU_CAPABLE (1<<3) /* 03: CPU is online capable */
> > >
> > > ACPI_MADT_GICC_ONLINE_CAPABLE
> >
> > It's somewhat disappointing, but no big deal. It's easy enough to change
> > "irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs"
> > to use Lorenzo's name when that patch hits - and it becomes one less
> > patch in this patch set when Lorenzo's change eventually hits mainline.
> >
> > Does anyone know how long it may take for Lorenzo's change to get into
> > mainline? Would it be by the 6.8 merge window or the following one?
>
> I wish I knew. I submitted ACPICA changes for the online capable bit
> since I had to add additional flags on top (ie DMA coherent) and it
> would not make sense to submit the latter without the former.
>
> I'd be great if the ACPICA headers can make it into Linux for the upcoming
> merge window, not sure what I can do to fasttrack the process though
> (I shall ping the maintainers).

If your upstream pull request has been merged, I can pick up Linux
patches carrying Link: tags pointing to the upstream ACPICA commits in
that pull request.

Thanks!

2023-12-18 16:33:58

by Lorenzo Pieralisi

[permalink] [raw]
Subject: Re: [PATCH RFC v3 13/21] ACPICA: Add new MADT GICC flags fields

On Mon, Dec 18, 2023 at 02:14:30PM +0100, Rafael J. Wysocki wrote:
> On Mon, Dec 18, 2023 at 10:23 AM Lorenzo Pieralisi
> <[email protected]> wrote:
> >
> > On Fri, Dec 15, 2023 at 04:53:28PM +0000, Russell King (Oracle) wrote:
> > > On Fri, Dec 15, 2023 at 04:23:22PM +0000, Jonathan Cameron wrote:
> > > > On Wed, 13 Dec 2023 12:50:18 +0000
> > > > Russell King (Oracle) <[email protected]> wrote:
> > > >
> > > > > From: James Morse <[email protected]>
> > > > >
> > > > > Add the new flag field to the MADT's GICC structure.
> > > > >
> > > > > 'Online Capable' indicates a disabled CPU can be enabled later. See
> > > > > ACPI specification 6.5 Tabel 5.37: GICC CPU Interface Flags.
> > > > >
> > > > > Signed-off-by: James Morse <[email protected]>
> > > > > Tested-by: Miguel Luis <[email protected]>
> > > > > Tested-by: Vishnu Pajjuri <[email protected]>
> > > > > Tested-by: Jianyong Wu <[email protected]>
> > > > > Signed-off-by: Russell King (Oracle) <[email protected]>
> > > >
> > > > I see there is an acpica pull request including this bit but with a different name
> > > > For reference.
> > > > https://github.com/acpica/acpica/pull/914/commits/453a5f67567786522021d5f6913f561f8b3cabf6
> > > >
> > > > +CC Lorenzo who submitted that.
> > >
> > > > > +#define ACPI_MADT_GICC_CPU_CAPABLE (1<<3) /* 03: CPU is online capable */
> > > >
> > > > ACPI_MADT_GICC_ONLINE_CAPABLE
> > >
> > > It's somewhat disappointing, but no big deal. It's easy enough to change
> > > "irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs"
> > > to use Lorenzo's name when that patch hits - and it becomes one less
> > > patch in this patch set when Lorenzo's change eventually hits mainline.
> > >
> > > Does anyone know how long it may take for Lorenzo's change to get into
> > > mainline? Would it be by the 6.8 merge window or the following one?
> >
> > I wish I knew. I submitted ACPICA changes for the online capable bit
> > since I had to add additional flags on top (ie DMA coherent) and it
> > would not make sense to submit the latter without the former.
> >
> > I'd be great if the ACPICA headers can make it into Linux for the upcoming
> > merge window, not sure what I can do to fasttrack the process though
> > (I shall ping the maintainers).
>
> If your upstream pull request has been merged, I can pick up Linux
> patches carrying Link: tags pointing to the upstream ACPICA commits in
> that pull request.

Thank you, I don't think it has been merged yet (and it requires
review because I am not that familiar with the ACPICA code base).

Hopefully it should be an extended kernel cycle so it might be
possible to get these headers in v6.8, if you deem that reasonable
of course once the PR is merged.

Thanks,
Lorenzo

2023-12-18 20:18:02

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 02/21] ACPI: processor: Add support for processors described as container packages

On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
>
> From: James Morse <[email protected]>
>
> ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
> 5.2.12:
>
> "Starting with ACPI Specification 6.3, the use of the Processor() object
> was deprecated. Only legacy systems should continue with this usage. On
> the Itanium architecture only, a _UID is provided for the Processor()
> that is a string object. This usage of _UID is also deprecated since it
> can preclude an OSPM from being able to match a processor to a
> non-enumerable device, such as those defined in the MADT. From ACPI
> Specification 6.3 onward, all processor objects for all architectures
> except Itanium must now use Device() objects with an _HID of ACPI0007,
> and use only integer _UID values."
>
> Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors
>
> Duplicate descriptions are not allowed, the ACPI processor driver already
> parses the UID from both devices and containers. acpi_processor_get_info()
> returns an error if the UID exists twice in the DSDT.

I'm not really sure how the above is related to the actual patch.

> The missing probe for CPUs described as packages

It is unclear what exactly is meant by "CPUs described as packages".

From the patch, it looks like those would be Processor() objects
defined under a processor container device.

> creates a problem for
> moving the cpu_register() calls into the acpi_processor driver, as CPUs
> described like this don't get registered, leading to errors from other
> subsystems when they try to add new sysfs entries to the CPU node.
> (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)
>
> To fix this, parse the processor container and call acpi_processor_add()
> for each processor that is discovered like this.

Discovered like what?

> The processor container
> handler is added with acpi_scan_add_handler(), so no detach call will
> arrive.

The above requires clarification too.

> Qemu TCG describes CPUs using processor devices in a processor container.
> For more information, see build_cpus_aml() in Qemu hw/acpi/cpu.c and
> https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#processor-container-device
>
> Signed-off-by: James Morse <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>
> ---
> Outstanding comments:
> https://lore.kernel.org/r/[email protected]
> https://lore.kernel.org/r/[email protected]
> ---
> drivers/acpi/acpi_processor.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 4fe2ef54088c..6a542e0ce396 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -626,9 +626,31 @@ static struct acpi_scan_handler processor_handler = {
> },
> };
>
> +static acpi_status acpi_processor_container_walk(acpi_handle handle,
> + u32 lvl,
> + void *context,
> + void **rv)
> +{
> + struct acpi_device *adev;
> + acpi_status status;
> +
> + adev = acpi_get_acpi_dev(handle);
> + if (!adev)
> + return AE_ERROR;

Why is the reference counting needed here?

Wouldn't acpi_fetch_acpi_dev() suffice?

Also, should the walk really be terminated on the first error?

> +
> + status = acpi_processor_add(adev, &processor_device_ids[0]);
> + acpi_put_acpi_dev(adev);
> +
> + return status;
> +}
> +
> static int acpi_processor_container_attach(struct acpi_device *dev,
> const struct acpi_device_id *id)
> {
> + acpi_walk_namespace(ACPI_TYPE_PROCESSOR, dev->handle,
> + ACPI_UINT32_MAX, acpi_processor_container_walk,
> + NULL, NULL, NULL);

This covers processor objects only, so why is this not needed for
processor devices defined under a processor container object?

It is not obvious, so it would be nice to add a comment explaining the
difference.

> +
> return 1;
> }
>
> --

2023-12-18 20:22:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 03/21] ACPI: processor: Register CPUs that are online, but not described in the DSDT

On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
>
> From: James Morse <[email protected]>
>
> ACPI has two descriptions of CPUs, one in the MADT/APIC table, the other
> in the DSDT. Both are required. (ACPI 6.5's 8.4 "Declaring Processors"
> says "Each processor in the system must be declared in the ACPI
> namespace"). Having two descriptions allows firmware authors to get
> this wrong.
>
> If CPUs are described in the MADT/APIC, they will be brought online
> early during boot. Once the register_cpu() calls are moved to ACPI,
> they will be based on the DSDT description of the CPUs. When CPUs are
> missing from the DSDT description, they will end up online, but not
> registered.
>
> Add a helper that runs after acpi_init() has completed to register
> CPUs that are online, but weren't found in the DSDT. Any CPU that
> is registered by this code triggers a firmware-bug warning and kernel
> taint.
>
> Qemu TCG only describes the first CPU in the DSDT, unless cpu-hotplug
> is configured.

So why is this a kernel problem?

> Signed-off-by: James Morse <[email protected]>
> Reviewed-by: Jonathan Cameron <[email protected]>
> Reviewed-by: Gavin Shan <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>
> Signed-off-by: Russell King (Oracle) <[email protected]>
> ---
> drivers/acpi/acpi_processor.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 6a542e0ce396..0511f2bc10bc 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -791,6 +791,25 @@ void __init acpi_processor_init(void)
> acpi_pcc_cpufreq_init();
> }
>
> +static int __init acpi_processor_register_missing_cpus(void)
> +{
> + int cpu;
> +
> + if (acpi_disabled)
> + return 0;
> +
> + for_each_online_cpu(cpu) {
> + if (!get_cpu_device(cpu)) {
> + pr_err_once(FW_BUG "CPU %u has no ACPI namespace description!\n", cpu);
> + add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
> + arch_register_cpu(cpu);

Which part of this code is related to ACPI?

> + }
> + }
> +
> + return 0;
> +}
> +subsys_initcall_sync(acpi_processor_register_missing_cpus);
> +
> #ifdef CONFIG_ACPI_PROCESSOR_CSTATE
> /**
> * acpi_processor_claim_cst_control - Request _CST control from the platform.
> --

2023-12-18 20:31:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 04/21] ACPI: processor: Register all CPUs from acpi_processor_get_info()

On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
>
> From: James Morse <[email protected]>
>
> To allow ACPI to skip the call to arch_register_cpu() when the _STA
> value indicates the CPU can't be brought online right now, move the
> arch_register_cpu() call into acpi_processor_get_info().

This kind of looks backwards to me and has a potential to become
super-confusing.

I would instead add a way for the generic code to ask the platform
firmware whether or not the given CPU is enabled and so it can be
registered.

> Systems can still be booted with 'acpi=off', or not include an
> ACPI description at all. For these, the CPUs continue to be
> registered by cpu_dev_register_generic().
>
> This moves the CPU register logic back to a subsys_initcall(),
> while the memory nodes will have been registered earlier.

Isn't this somewhat risky?

> Signed-off-by: James Morse <[email protected]>
> Reviewed-by: Gavin Shan <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>
> Signed-off-by: Russell King (Oracle) <[email protected]>
> ---
> Changes since RFC v2:
> * Fixup comment in acpi_processor_get_info() (Gavin Shan)
> * Add comment in cpu_dev_register_generic() (Gavin Shan)
> ---
> drivers/acpi/acpi_processor.c | 12 ++++++++++++
> drivers/base/cpu.c | 6 +++++-
> 2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 0511f2bc10bc..e7ed4730cbbe 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -314,6 +314,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
> cpufreq_add_device("acpi-cpufreq");
> }
>
> + /*
> + * Register CPUs that are present. get_cpu_device() is used to skip
> + * duplicate CPU descriptions from firmware.
> + */
> + if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
> + !get_cpu_device(pr->id)) {
> + int ret = arch_register_cpu(pr->id);
> +
> + if (ret)
> + return ret;
> + }
> +
> /*
> * Extra Processor objects may be enumerated on MP systems with
> * less than the max # of CPUs. They should be ignored _iff
> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index 47de0f140ba6..13d052bf13f4 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -553,7 +553,11 @@ static void __init cpu_dev_register_generic(void)
> {
> int i, ret;
>
> - if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES))
> + /*
> + * When ACPI is enabled, CPUs are registered via
> + * acpi_processor_get_info().
> + */
> + if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES) || !acpi_disabled)
> return;
>
> for_each_present_cpu(i) {
> --
> 2.30.2
>
>

2023-12-18 20:35:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
>
> From: James Morse <[email protected]>
>
> The code behind ACPI_HOTPLUG_CPU allows a not-present CPU to become
> present.

Right.

> This isn't the only use of HOTPLUG_CPU. On arm64 and riscv
> CPUs can be taken offline as a power saving measure.

But still there is the case in which a non-present CPU can become
present, isn't it there?

> On arm64 an offline CPU may be disabled by firmware, preventing it from
> being brought back online, but it remains present throughout.
>
> Adding code to prevent user-space trying to online these disabled CPUs
> needs some additional terminology.
>
> Rename the Kconfig symbol CONFIG_ACPI_HOTPLUG_PRESENT_CPU to reflect
> that it makes possible CPUs present.

Honestly, I don't think that this change is necessary or even useful.

2023-12-27 11:15:37

by Lorenzo Pieralisi

[permalink] [raw]
Subject: Re: [PATCH RFC v3 13/21] ACPICA: Add new MADT GICC flags fields

On Mon, Dec 18, 2023 at 02:14:30PM +0100, Rafael J. Wysocki wrote:
> On Mon, Dec 18, 2023 at 10:23 AM Lorenzo Pieralisi
> <[email protected]> wrote:
> >
> > On Fri, Dec 15, 2023 at 04:53:28PM +0000, Russell King (Oracle) wrote:
> > > On Fri, Dec 15, 2023 at 04:23:22PM +0000, Jonathan Cameron wrote:
> > > > On Wed, 13 Dec 2023 12:50:18 +0000
> > > > Russell King (Oracle) <[email protected]> wrote:
> > > >
> > > > > From: James Morse <[email protected]>
> > > > >
> > > > > Add the new flag field to the MADT's GICC structure.
> > > > >
> > > > > 'Online Capable' indicates a disabled CPU can be enabled later. See
> > > > > ACPI specification 6.5 Tabel 5.37: GICC CPU Interface Flags.
> > > > >
> > > > > Signed-off-by: James Morse <[email protected]>
> > > > > Tested-by: Miguel Luis <[email protected]>
> > > > > Tested-by: Vishnu Pajjuri <[email protected]>
> > > > > Tested-by: Jianyong Wu <[email protected]>
> > > > > Signed-off-by: Russell King (Oracle) <[email protected]>
> > > >
> > > > I see there is an acpica pull request including this bit but with a different name
> > > > For reference.
> > > > https://github.com/acpica/acpica/pull/914/commits/453a5f67567786522021d5f6913f561f8b3cabf6
> > > >
> > > > +CC Lorenzo who submitted that.
> > >
> > > > > +#define ACPI_MADT_GICC_CPU_CAPABLE (1<<3) /* 03: CPU is online capable */
> > > >
> > > > ACPI_MADT_GICC_ONLINE_CAPABLE
> > >
> > > It's somewhat disappointing, but no big deal. It's easy enough to change
> > > "irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs"
> > > to use Lorenzo's name when that patch hits - and it becomes one less
> > > patch in this patch set when Lorenzo's change eventually hits mainline.
> > >
> > > Does anyone know how long it may take for Lorenzo's change to get into
> > > mainline? Would it be by the 6.8 merge window or the following one?
> >
> > I wish I knew. I submitted ACPICA changes for the online capable bit
> > since I had to add additional flags on top (ie DMA coherent) and it
> > would not make sense to submit the latter without the former.
> >
> > I'd be great if the ACPICA headers can make it into Linux for the upcoming
> > merge window, not sure what I can do to fasttrack the process though
> > (I shall ping the maintainers).
>
> If your upstream pull request has been merged, I can pick up Linux
> patches carrying Link: tags pointing to the upstream ACPICA commits in
> that pull request.

ACPICA PR was merged, sent the Linuxized version along with the GIC changes
here:

https://lore.kernel.org/lkml/[email protected]

Thanks,
Lorenzo

2024-01-02 13:08:42

by Jose Marinho

[permalink] [raw]
Subject: RE: [PATCH RFC v3 20/21] ACPI: Add _OSC bits to advertise OS support for toggling CPU present/enabled

Hi Jonathan,

> -----Original Message-----
> From: Jonathan Cameron <[email protected]>
> Sent: Friday, December 15, 2023 5:12 PM
> To: Russell King (Oracle) <[email protected]>
> Cc: [email protected]; [email protected]; linux-
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected]; acpica-
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; linux-
> [email protected]; Salil Mehta <[email protected]>; Jean-Philippe
> Brucker <[email protected]>; Jianyong Wu <[email protected]>;
> Justin He <[email protected]>; James Morse <[email protected]>;
> Jose Marinho <[email protected]>; Samer El-Haj-Mahmoud <Samer.El-
> [email protected]>
> Subject: Re: [PATCH RFC v3 20/21] ACPI: Add _OSC bits to advertise OS
> support for toggling CPU present/enabled
>
> On Wed, 13 Dec 2023 12:50:54 +0000
> Russell King (Oracle) <[email protected]> wrote:
>
> > From: James Morse <[email protected]>
> >
> > Platform firmware can disabled a CPU, or make it not-present by making
> > an eject-request notification, then waiting for the os to make it
> > offline
> OS
>
> > and call _EJx. After the firmware updates _STA with the new status.
> >
> > Not all operating systems support this. For arm64 making CPUs
> > not-present has never been supported. For all ACPI architectures,
> > making CPUs disabled has recently been added. Firmware can't know what
> the OS has support for.
> >
> > Add two new _OSC bits to advertise whether the OS supports the _STA
> > enabled or present bits being toggled for CPUs. This will be important
> > for arm64 if systems that support physical CPU hotplug ever appear as
> > arm64 linux doesn't currently support this, so firmware shouldn't try.
> >
> > Advertising this support to firmware is useful for cloud orchestrators
> > to know whether they can scale a particular VM by adding CPUs.
> >
> > Signed-off-by: James Morse <[email protected]>
> > Tested-by: Miguel Luis <[email protected]>
> > Tested-by: Vishnu Pajjuri <[email protected]>
> > Tested-by: Jianyong Wu <[email protected]>
>
> I'm very much in favor of this _OSC but it hasn't been accepted yet I think...
> https://bugzilla.tianocore.org/show_bug.cgi?id=4481
>
> Jose? Github suggests you are the proposer on this.

The addition of these _OSC bits was proposed by us on the forum in question.
The forum opted to pause the definition until additional practical information could be provided on the use-cases.

If anyone is interested in progressing the _OSC bit definition, you are invited to express that interest in the Bugzilla ticket.
Information that you should provide to increase the chances of the ticket being reopened:
- use-case for the new _OSC bits,
- what breaks (if anything) without the proposed _OSC bits.

We did receive additional comments:
- the proposed _OSC bits are not generic: the bits simply convey whether the guest OS understands CPU hot-plug, but it says nothing about the number of CPUs that the OS supports.
- There could be alternate schemes that do not rely on spec changes. E.g. there could be a hypervisor IMPDEF mechanism to describe if an OS image supports CPU hot-plug.

>
> btw v4 looks ok but v5 in the tianocore github seems to have lost the actual
> OSC part.

Agree that, if we do progress with this spec change, v4 is the correct formulation we should adopt.

Regards,
Jose

>
> Jonathan
>
> > ---
> > I'm assuming Loongarch machines do not support physical CPU hotplug.
> >
> > Changes since RFC v3:
> > * Drop ia64 changes
> > * Update James' comment below "---" to remove reference to ia64
> >
> > Outstanding comment:
> > https://lore.kernel.org/r/[email protected]
>
>
>
> > ---
> > arch/x86/Kconfig | 1 +
> > drivers/acpi/Kconfig | 9 +++++++++
> > drivers/acpi/acpi_processor.c | 14 +++++++++++++-
> > drivers/acpi/bus.c | 16 ++++++++++++++++
> > include/linux/acpi.h | 4 ++++
> > 5 files changed, 43 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index
> > 64fc7c475ab0..33fc4dcd950c 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -60,6 +60,7 @@ config X86
> > select ACPI_LEGACY_TABLES_LOOKUP if ACPI
> > select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
> > select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR
> && HOTPLUG_CPU
> > + select ACPI_HOTPLUG_IGNORE_OSC if ACPI &&
> HOTPLUG_CPU
> > select ARCH_32BIT_OFF_T if X86_32
> > select ARCH_CLOCKSOURCE_INIT
> > select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
> > diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index
> > 9c5a43d0aff4..020e7c0ab985 100644
> > --- a/drivers/acpi/Kconfig
> > +++ b/drivers/acpi/Kconfig
> > @@ -311,6 +311,15 @@ config ACPI_HOTPLUG_PRESENT_CPU
> > depends on ACPI_PROCESSOR && HOTPLUG_CPU
> > select ACPI_CONTAINER
> >
> > +config ACPI_HOTPLUG_IGNORE_OSC
> > + bool
> > + depends on ACPI_HOTPLUG_PRESENT_CPU
> > + help
> > + Ignore whether firmware acknowledged support for toggling the CPU
> > + present bit in _STA. Some architectures predate the _OSC bits, so
> > + firmware doesn't know to do this.
> > +
> > +
> > config ACPI_PROCESSOR_AGGREGATOR
> > tristate "Processor Aggregator"
> > depends on ACPI_PROCESSOR
> > diff --git a/drivers/acpi/acpi_processor.c
> > b/drivers/acpi/acpi_processor.c index ea12e70dfd39..5bb207a7a1dd
> > 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -182,6 +182,18 @@ static void __init acpi_pcc_cpufreq_init(void)
> > static void __init acpi_pcc_cpufreq_init(void) {} #endif /*
> > CONFIG_X86 */
> >
> > +static bool acpi_processor_hotplug_present_supported(void)
> > +{
> > + if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
> > + return false;
> > +
> > + /* x86 systems pre-date the _OSC bit */
> > + if (IS_ENABLED(CONFIG_ACPI_HOTPLUG_IGNORE_OSC))
> > + return true;
> > +
> > + return osc_sb_hotplug_present_support_acked;
> > +}
> > +
> > /* Initialization */
> > static int acpi_processor_make_present(struct acpi_processor *pr) {
> > @@ -189,7 +201,7 @@ static int acpi_processor_make_present(struct
> acpi_processor *pr)
> > acpi_status status;
> > int ret;
> >
> > - if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU)) {
> > + if (!acpi_processor_hotplug_present_supported()) {
> > pr_err_once("Changing CPU present bit is not supported\n");
> > return -ENODEV;
> > }
> > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c index
> > 72e64c0718c9..7122450739d6 100644
> > --- a/drivers/acpi/bus.c
> > +++ b/drivers/acpi/bus.c
> > @@ -298,6 +298,13 @@
> > EXPORT_SYMBOL_GPL(osc_sb_native_usb4_support_confirmed);
> >
> > bool osc_sb_cppc2_support_acked;
> >
> > +/*
> > + * ACPI 6.? Proposed Operating System Capabilities for modifying CPU
> > + * present/enable.
> > + */
> > +bool osc_sb_hotplug_enabled_support_acked;
> > +bool osc_sb_hotplug_present_support_acked;
> > +
> > static u8 sb_uuid_str[] = "0811B06E-4A27-44F9-8D60-3CBBC22E7B48";
> > static void acpi_bus_osc_negotiate_platform_control(void)
> > {
> > @@ -346,6 +353,11 @@ static void
> > acpi_bus_osc_negotiate_platform_control(void)
> >
> > if (!ghes_disable)
> > capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
> > +
> > + capbuf[OSC_SUPPORT_DWORD] |=
> OSC_SB_HOTPLUG_ENABLED_SUPPORT;
> > + if (IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
> > + capbuf[OSC_SUPPORT_DWORD] |=
> OSC_SB_HOTPLUG_PRESENT_SUPPORT;
> > +
> > if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
> > return;
> >
> > @@ -383,6 +395,10 @@ static void
> acpi_bus_osc_negotiate_platform_control(void)
> > capbuf_ret[OSC_SUPPORT_DWORD] &
> OSC_SB_NATIVE_USB4_SUPPORT;
> > osc_cpc_flexible_adr_space_confirmed =
> > capbuf_ret[OSC_SUPPORT_DWORD] &
> OSC_SB_CPC_FLEXIBLE_ADR_SPACE;
> > + osc_sb_hotplug_enabled_support_acked =
> > + capbuf_ret[OSC_SUPPORT_DWORD] &
> OSC_SB_HOTPLUG_ENABLED_SUPPORT;
> > + osc_sb_hotplug_present_support_acked =
> > + capbuf_ret[OSC_SUPPORT_DWORD] &
> OSC_SB_HOTPLUG_PRESENT_SUPPORT;
> > }
> >
> > kfree(context.ret.pointer);
> > diff --git a/include/linux/acpi.h b/include/linux/acpi.h index
> > 00be66683505..c572abac803c 100644
> > --- a/include/linux/acpi.h
> > +++ b/include/linux/acpi.h
> > @@ -559,12 +559,16 @@ acpi_status acpi_run_osc(acpi_handle handle,
> struct acpi_osc_context *context);
> > #define OSC_SB_NATIVE_USB4_SUPPORT 0x00040000
> > #define OSC_SB_PRM_SUPPORT 0x00200000
> > #define OSC_SB_FFH_OPR_SUPPORT 0x00400000
> > +#define OSC_SB_HOTPLUG_ENABLED_SUPPORT 0x00800000
> > +#define OSC_SB_HOTPLUG_PRESENT_SUPPORT 0x01000000
> >
> > extern bool osc_sb_apei_support_acked; extern bool
> > osc_pc_lpi_support_confirmed; extern bool
> > osc_sb_native_usb4_support_confirmed;
> > extern bool osc_sb_cppc2_support_acked; extern bool
> > osc_cpc_flexible_adr_space_confirmed;
> > +extern bool osc_sb_hotplug_enabled_support_acked;
> > +extern bool osc_sb_hotplug_present_support_acked;
> >
> > /* USB4 Capabilities */
> > #define OSC_USB_USB3_TUNNELING 0x00000001


2024-01-02 14:46:00

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 01/21] ACPI: Only enumerate enabled (or functional) devices

On Fri, 15 Dec 2023 20:47:31 +0100
"Rafael J. Wysocki" <[email protected]> wrote:

> On Friday, December 15, 2023 5:15:39 PM CET Jonathan Cameron wrote:
> > On Fri, 15 Dec 2023 15:31:55 +0000
> > "Russell King (Oracle)" <[email protected]> wrote:
> >
> > > On Thu, Dec 14, 2023 at 07:37:10PM +0100, Rafael J. Wysocki wrote:
> > > > On Thu, Dec 14, 2023 at 7:16 PM Rafael J. Wysocki <[email protected]> wrote:
> > > > >
> > > > > On Thu, Dec 14, 2023 at 7:10 PM Russell King (Oracle)
> > > > > <[email protected]> wrote:
> > > > > > I guess we need something like:
> > > > > >
> > > > > > if (device->status.present)
> > > > > > return device->device_type != ACPI_BUS_TYPE_PROCESSOR ||
> > > > > > device->status.enabled;
> > > > > > else
> > > > > > return device->status.functional;
> > > > > >
> > > > > > so we only check device->status.enabled for processor-type devices?
> > > > >
> > > > > Yes, something like this.
> > > >
> > > > However, that is not sufficient, because there are
> > > > ACPI_BUS_TYPE_DEVICE devices representing processors.
> > > >
> > > > I'm not sure about a clean way to do it ATM.
> > >
> > > Ok, how about:
> > >
> > > static bool acpi_dev_is_processor(const struct acpi_device *device)
> > > {
> > > struct acpi_hardware_id *hwid;
> > >
> > > if (device->device_type == ACPI_BUS_TYPE_PROCESSOR)
> > > return true;
> > >
> > > if (device->device_type != ACPI_BUS_TYPE_DEVICE)
> > > return false;
> > >
> > > list_for_each_entry(hwid, &device->pnp.ids, list)
> > > if (!strcmp(ACPI_PROCESSOR_OBJECT_HID, hwid->id) ||
> > > !strcmp(ACPI_PROCESSOR_DEVICE_HID, hwid->id))
> > > return true;
> > >
> > > return false;
> > > }
> > >
> > > and then:
> > >
> > > if (device->status.present)
> > > return !acpi_dev_is_processor(device) || device->status.enabled;
> > > else
> > > return device->status.functional;
> > >
> > > ?
> > >
> > Changing it to CPU only for now makes sense to me and I think this code snippet should do the
> > job. Nice and simple.
>
> Well, except that it does checks that are done elsewhere slightly
> differently, which from the maintenance POV is not nice.
>
> Maybe something like the appended patch (untested).

Hi Rafael,

As far as I can see that's functionally equivalent, so looks good to me.
I'm not set up to test this today though, so will defer to Russell on whether
there is anything missing

Thanks for putting this together.

Jonathan

>
> ---
> drivers/acpi/acpi_processor.c | 11 +++++++++++
> drivers/acpi/internal.h | 3 +++
> drivers/acpi/scan.c | 24 +++++++++++++++++++++++-
> 3 files changed, 37 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/acpi/acpi_processor.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/acpi_processor.c
> +++ linux-pm/drivers/acpi/acpi_processor.c
> @@ -644,6 +644,17 @@ static struct acpi_scan_handler processo
> },
> };
>
> +bool acpi_device_is_processor(const struct acpi_device *adev)
> +{
> + if (adev->device_type == ACPI_BUS_TYPE_PROCESSOR)
> + return true;
> +
> + if (adev->device_type != ACPI_BUS_TYPE_DEVICE)
> + return false;
> +
> + return acpi_scan_check_handler(adev, &processor_handler);
> +}
> +
> static int acpi_processor_container_attach(struct acpi_device *dev,
> const struct acpi_device_id *id)
> {
> Index: linux-pm/drivers/acpi/internal.h
> ===================================================================
> --- linux-pm.orig/drivers/acpi/internal.h
> +++ linux-pm/drivers/acpi/internal.h
> @@ -62,6 +62,8 @@ void acpi_sysfs_add_hotplug_profile(stru
> int acpi_scan_add_handler_with_hotplug(struct acpi_scan_handler *handler,
> const char *hotplug_profile_name);
> void acpi_scan_hotplug_enabled(struct acpi_hotplug_profile *hotplug, bool val);
> +bool acpi_scan_check_handler(const struct acpi_device *adev,
> + struct acpi_scan_handler *handler);
>
> #ifdef CONFIG_DEBUG_FS
> extern struct dentry *acpi_debugfs_dir;
> @@ -133,6 +135,7 @@ int acpi_bus_register_early_device(int t
> const struct acpi_device *acpi_companion_match(const struct device *dev);
> int __acpi_device_uevent_modalias(const struct acpi_device *adev,
> struct kobj_uevent_env *env);
> +bool acpi_device_is_processor(const struct acpi_device *adev);
>
> /* --------------------------------------------------------------------------
> Power Resource
> Index: linux-pm/drivers/acpi/scan.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/scan.c
> +++ linux-pm/drivers/acpi/scan.c
> @@ -1938,6 +1938,19 @@ static bool acpi_scan_handler_matching(s
> return false;
> }
>
> +bool acpi_scan_check_handler(const struct acpi_device *adev,
> + struct acpi_scan_handler *handler)
> +{
> + struct acpi_hardware_id *hwid;
> +
> + list_for_each_entry(hwid, &adev->pnp.ids, list) {
> + if (acpi_scan_handler_matching(handler, hwid->id, NULL))
> + return true;
> + }
> +
> + return false;
> +}
> +
> static struct acpi_scan_handler *acpi_scan_match_handler(const char *idstr,
> const struct acpi_device_id **matchid)
> {
> @@ -2410,7 +2423,16 @@ bool acpi_dev_ready_for_enumeration(cons
> if (device->flags.honor_deps && device->dep_unmet)
> return false;
>
> - return acpi_device_is_present(device);
> + if (device->status.functional)
> + return true;
> +
> + if (!device->status.present)
> + return false;
> +
> + if (device->status.enabled)
> + return true; /* Fast path. */
> +
> + return !acpi_device_is_processor(device);
> }
> EXPORT_SYMBOL_GPL(acpi_dev_ready_for_enumeration);
>
>
>
>


2024-01-02 14:53:51

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 17/21] ACPI: add support to register CPUs based on the _STA enabled bit

On Mon, 18 Dec 2023 13:03:32 +0000
"Russell King (Oracle)" <[email protected]> wrote:

> On Wed, Dec 13, 2023 at 12:50:38PM +0000, Russell King wrote:
> > From: James Morse <[email protected]>
> >
> > acpi_processor_get_info() registers all present CPUs. Registering a
> > CPU is what creates the sysfs entries and triggers the udev
> > notifications.
> >
> > arm64 virtual machines that support 'virtual cpu hotplug' use the
> > enabled bit to indicate whether the CPU can be brought online, as
> > the existing ACPI tables require all hardware to be described and
> > present.
> >
> > If firmware describes a CPU as present, but disabled, skip the
> > registration. Such CPUs are present, but can't be brought online for
> > whatever reason. (e.g. firmware/hypervisor policy).
> >
> > Once firmware sets the enabled bit, the CPU can be registered and
> > brought online by user-space. Online CPUs, or CPUs that are missing
> > an _STA method must always be registered.
>
> ...
>
> > @@ -526,6 +552,9 @@ static void acpi_processor_post_eject(struct acpi_device *device)
> > acpi_processor_make_not_present(device);
> > return;
> > }
> > +
> > + if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED))
> > + arch_unregister_cpu(pr->id);
>
> This change isn't described in the commit log, but seems to be the cause
> of the build error identified by the kernel build bot that is fixed
> later in this series. I'm wondering whether this should be in a
> different patch, maybe "ACPI: Check _STA present bit before making CPUs
> not present" ?

Would seem a bit odd to call arch_unregister_cpu() way before the code
is added to call the matching arch_registers_cpu()

Mind you this eject doesn't just apply to those CPUs that are registered
later I think, but instead to all. So we run into the spec hole that
there is no way to identify initially 'enabled' CPUs that might be disabled
later.

>
> Or maybe my brain isn't working properly (due to being Covid positive.)
> Any thoughts, Jonathan?

I'll go with a resounding 'not sure' on where this change belongs.
I blame my non existent start of the year hangover.
Hope you have recovered!

Jonathan


2024-01-02 15:17:13

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 20/21] ACPI: Add _OSC bits to advertise OS support for toggling CPU present/enabled

On Tue, 2 Jan 2024 13:07:25 +0000
Jose Marinho <[email protected]> wrote:

> Hi Jonathan,
>
> > -----Original Message-----
> > From: Jonathan Cameron <[email protected]>
> > Sent: Friday, December 15, 2023 5:12 PM
> > To: Russell King (Oracle) <[email protected]>
> > Cc: [email protected]; [email protected]; linux-
> > [email protected]; [email protected]; linux-
> > [email protected]; [email protected]; linux-
> > [email protected]; [email protected]; [email protected]; acpica-
> > [email protected]; [email protected]; linux-
> > [email protected]; [email protected]; linux-
> > [email protected]; Salil Mehta <[email protected]>; Jean-Philippe
> > Brucker <[email protected]>; Jianyong Wu <[email protected]>;
> > Justin He <[email protected]>; James Morse <[email protected]>;
> > Jose Marinho <[email protected]>; Samer El-Haj-Mahmoud <Samer.El-
> > [email protected]>
> > Subject: Re: [PATCH RFC v3 20/21] ACPI: Add _OSC bits to advertise OS
> > support for toggling CPU present/enabled
> >
> > On Wed, 13 Dec 2023 12:50:54 +0000
> > Russell King (Oracle) <[email protected]> wrote:
> >
> > > From: James Morse <[email protected]>
> > >
> > > Platform firmware can disabled a CPU, or make it not-present by making
> > > an eject-request notification, then waiting for the os to make it
> > > offline
> > OS
> >
> > > and call _EJx. After the firmware updates _STA with the new status.
> > >
> > > Not all operating systems support this. For arm64 making CPUs
> > > not-present has never been supported. For all ACPI architectures,
> > > making CPUs disabled has recently been added. Firmware can't know what
> > the OS has support for.
> > >
> > > Add two new _OSC bits to advertise whether the OS supports the _STA
> > > enabled or present bits being toggled for CPUs. This will be important
> > > for arm64 if systems that support physical CPU hotplug ever appear as
> > > arm64 linux doesn't currently support this, so firmware shouldn't try.
> > >
> > > Advertising this support to firmware is useful for cloud orchestrators
> > > to know whether they can scale a particular VM by adding CPUs.
> > >
> > > Signed-off-by: James Morse <[email protected]>
> > > Tested-by: Miguel Luis <[email protected]>
> > > Tested-by: Vishnu Pajjuri <[email protected]>
> > > Tested-by: Jianyong Wu <[email protected]>
> >
> > I'm very much in favor of this _OSC but it hasn't been accepted yet I think...
> > https://bugzilla.tianocore.org/show_bug.cgi?id=4481
> >
> > Jose? Github suggests you are the proposer on this.
>
> The addition of these _OSC bits was proposed by us on the forum in question.
> The forum opted to pause the definition until additional practical information could be provided on the use-cases.
>
> If anyone is interested in progressing the _OSC bit definition, you are invited to express that interest in the Bugzilla ticket.

I've poked around a bit and can't find any reference to how to actually get a bugzilla account
bugzilla.tianocore.org. Any pointers? I'm sure I had one at one stage, but
trying every plausible email address and the forgotten password link got me nowhere.

> Information that you should provide to increase the chances of the ticket being reopened:
> - use-case for the new _OSC bits,

Really annoying without it as a hypervisor can't query if a guest can do anything useful
if the host does virtual CPU hotplug via this newly added route.
Given this is new functionality and there is non trivial effort required by the
host to instantiate such a CPU it would be nice to be able to find out if the
feature is supported by the Guest OS without having to basically suck it an see
with hypervisors having to do a trial hotplug just to see if it 'might' work.

> - what breaks (if anything) without the proposed _OSC bits.

Nothing breaks - you can merrily poke in hotplugged CPUs with the attendant creation
of resources in the host and have them disappear into a black hole.
That's ugly but not broken as such. Hopefully a hypervisor will not keep trying
until the first attempt either succeeds or fails.

>
> We did receive additional comments:
> - the proposed _OSC bits are not generic: the bits simply convey whether the guest OS understands CPU hot-plug, but it says nothing about the number of CPUs that the OS supports.

If a guest says it supports this feature, you would hope it supports it for the
number of CPUs that have the present bit set but the enabled not.
I'd clarify that in the text rather than provide a means of querying the number of CPUs supported.
Number wouldn't be sufficient anyway as it wouldn't indicate 'which' CPUs are supported.
Nothing says they have to be contiguous or lowest IDs etc.

> - There could be alternate schemes that do not rely on spec changes. E.g. there could be a hypervisor IMPDEF mechanism to describe if an OS image supports CPU hot-plug.

Sigh. Yes, that could be done at the cost of every guest having to be made aware of every
hypervisor impdef mechanism. Trying to avoid that mess is why I think an _OSC makes sense
as then everyone can use the same control.

No particular reason we should use ACPI at all for VMs :)

>
> >
> > btw v4 looks ok but v5 in the tianocore github seems to have lost the actual
> > OSC part.
>
> Agree that, if we do progress with this spec change, v4 is the correct formulation we should adopt.
>
Thanks for the update.

Overall this is a question we need to resolve soon. If this code otherwise goes in linux
without the OSC we will always need to support the 'suck it and see' approach as we'll never
know if the guest fell down the hole. Thus if not added soon we might as well not add it at
all and we'll all be looking at the code and thinking "that's ugly and shouldn't
have been necessary" for years to come.

+CC Kangkang as he might be able to help get this started again.

Jonathan

> Regards,
> Jose
>
> >
> > Jonathan
> >
> > > ---
> > > I'm assuming Loongarch machines do not support physical CPU hotplug.
> > >
> > > Changes since RFC v3:
> > > * Drop ia64 changes
> > > * Update James' comment below "---" to remove reference to ia64
> > >
> > > Outstanding comment:
> > > https://lore.kernel.org/r/[email protected]
> >
> >
> >
> > > ---
> > > arch/x86/Kconfig | 1 +
> > > drivers/acpi/Kconfig | 9 +++++++++
> > > drivers/acpi/acpi_processor.c | 14 +++++++++++++-
> > > drivers/acpi/bus.c | 16 ++++++++++++++++
> > > include/linux/acpi.h | 4 ++++
> > > 5 files changed, 43 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index
> > > 64fc7c475ab0..33fc4dcd950c 100644
> > > --- a/arch/x86/Kconfig
> > > +++ b/arch/x86/Kconfig
> > > @@ -60,6 +60,7 @@ config X86
> > > select ACPI_LEGACY_TABLES_LOOKUP if ACPI
> > > select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
> > > select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR
> > && HOTPLUG_CPU
> > > + select ACPI_HOTPLUG_IGNORE_OSC if ACPI &&
> > HOTPLUG_CPU
> > > select ARCH_32BIT_OFF_T if X86_32
> > > select ARCH_CLOCKSOURCE_INIT
> > > select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
> > > diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index
> > > 9c5a43d0aff4..020e7c0ab985 100644
> > > --- a/drivers/acpi/Kconfig
> > > +++ b/drivers/acpi/Kconfig
> > > @@ -311,6 +311,15 @@ config ACPI_HOTPLUG_PRESENT_CPU
> > > depends on ACPI_PROCESSOR && HOTPLUG_CPU
> > > select ACPI_CONTAINER
> > >
> > > +config ACPI_HOTPLUG_IGNORE_OSC
> > > + bool
> > > + depends on ACPI_HOTPLUG_PRESENT_CPU
> > > + help
> > > + Ignore whether firmware acknowledged support for toggling the CPU
> > > + present bit in _STA. Some architectures predate the _OSC bits, so
> > > + firmware doesn't know to do this.
> > > +
> > > +
> > > config ACPI_PROCESSOR_AGGREGATOR
> > > tristate "Processor Aggregator"
> > > depends on ACPI_PROCESSOR
> > > diff --git a/drivers/acpi/acpi_processor.c
> > > b/drivers/acpi/acpi_processor.c index ea12e70dfd39..5bb207a7a1dd
> > > 100644
> > > --- a/drivers/acpi/acpi_processor.c
> > > +++ b/drivers/acpi/acpi_processor.c
> > > @@ -182,6 +182,18 @@ static void __init acpi_pcc_cpufreq_init(void)
> > > static void __init acpi_pcc_cpufreq_init(void) {} #endif /*
> > > CONFIG_X86 */
> > >
> > > +static bool acpi_processor_hotplug_present_supported(void)
> > > +{
> > > + if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
> > > + return false;
> > > +
> > > + /* x86 systems pre-date the _OSC bit */
> > > + if (IS_ENABLED(CONFIG_ACPI_HOTPLUG_IGNORE_OSC))
> > > + return true;
> > > +
> > > + return osc_sb_hotplug_present_support_acked;
> > > +}
> > > +
> > > /* Initialization */
> > > static int acpi_processor_make_present(struct acpi_processor *pr) {
> > > @@ -189,7 +201,7 @@ static int acpi_processor_make_present(struct
> > acpi_processor *pr)
> > > acpi_status status;
> > > int ret;
> > >
> > > - if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU)) {
> > > + if (!acpi_processor_hotplug_present_supported()) {
> > > pr_err_once("Changing CPU present bit is not supported\n");
> > > return -ENODEV;
> > > }
> > > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c index
> > > 72e64c0718c9..7122450739d6 100644
> > > --- a/drivers/acpi/bus.c
> > > +++ b/drivers/acpi/bus.c
> > > @@ -298,6 +298,13 @@
> > > EXPORT_SYMBOL_GPL(osc_sb_native_usb4_support_confirmed);
> > >
> > > bool osc_sb_cppc2_support_acked;
> > >
> > > +/*
> > > + * ACPI 6.? Proposed Operating System Capabilities for modifying CPU
> > > + * present/enable.
> > > + */
> > > +bool osc_sb_hotplug_enabled_support_acked;
> > > +bool osc_sb_hotplug_present_support_acked;
> > > +
> > > static u8 sb_uuid_str[] = "0811B06E-4A27-44F9-8D60-3CBBC22E7B48";
> > > static void acpi_bus_osc_negotiate_platform_control(void)
> > > {
> > > @@ -346,6 +353,11 @@ static void
> > > acpi_bus_osc_negotiate_platform_control(void)
> > >
> > > if (!ghes_disable)
> > > capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
> > > +
> > > + capbuf[OSC_SUPPORT_DWORD] |=
> > OSC_SB_HOTPLUG_ENABLED_SUPPORT;
> > > + if (IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
> > > + capbuf[OSC_SUPPORT_DWORD] |=
> > OSC_SB_HOTPLUG_PRESENT_SUPPORT;
> > > +
> > > if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
> > > return;
> > >
> > > @@ -383,6 +395,10 @@ static void
> > acpi_bus_osc_negotiate_platform_control(void)
> > > capbuf_ret[OSC_SUPPORT_DWORD] &
> > OSC_SB_NATIVE_USB4_SUPPORT;
> > > osc_cpc_flexible_adr_space_confirmed =
> > > capbuf_ret[OSC_SUPPORT_DWORD] &
> > OSC_SB_CPC_FLEXIBLE_ADR_SPACE;
> > > + osc_sb_hotplug_enabled_support_acked =
> > > + capbuf_ret[OSC_SUPPORT_DWORD] &
> > OSC_SB_HOTPLUG_ENABLED_SUPPORT;
> > > + osc_sb_hotplug_present_support_acked =
> > > + capbuf_ret[OSC_SUPPORT_DWORD] &
> > OSC_SB_HOTPLUG_PRESENT_SUPPORT;
> > > }
> > >
> > > kfree(context.ret.pointer);
> > > diff --git a/include/linux/acpi.h b/include/linux/acpi.h index
> > > 00be66683505..c572abac803c 100644
> > > --- a/include/linux/acpi.h
> > > +++ b/include/linux/acpi.h
> > > @@ -559,12 +559,16 @@ acpi_status acpi_run_osc(acpi_handle handle,
> > struct acpi_osc_context *context);
> > > #define OSC_SB_NATIVE_USB4_SUPPORT 0x00040000
> > > #define OSC_SB_PRM_SUPPORT 0x00200000
> > > #define OSC_SB_FFH_OPR_SUPPORT 0x00400000
> > > +#define OSC_SB_HOTPLUG_ENABLED_SUPPORT 0x00800000
> > > +#define OSC_SB_HOTPLUG_PRESENT_SUPPORT 0x01000000
> > >
> > > extern bool osc_sb_apei_support_acked; extern bool
> > > osc_pc_lpi_support_confirmed; extern bool
> > > osc_sb_native_usb4_support_confirmed;
> > > extern bool osc_sb_cppc2_support_acked; extern bool
> > > osc_cpc_flexible_adr_space_confirmed;
> > > +extern bool osc_sb_hotplug_enabled_support_acked;
> > > +extern bool osc_sb_hotplug_present_support_acked;
> > >
> > > /* USB4 Capabilities */
> > > #define OSC_USB_USB3_TUNNELING 0x00000001
>


2024-01-02 15:20:12

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 21/21] cpumask: Add enabled cpumask for present CPUs that can be brought online

On Mon, 18 Dec 2023 12:14:14 +0000
"Russell King (Oracle)" <[email protected]> wrote:

> On Fri, Dec 15, 2023 at 05:18:31PM +0000, Jonathan Cameron wrote:
> > On Wed, 13 Dec 2023 12:50:59 +0000
> > Russell King (Oracle) <[email protected]> wrote:
> >
> > > From: James Morse <[email protected]>
> > >
> > > The 'offline' file in sysfs shows all offline CPUs, including those
> > > that aren't present. User-space is expected to remove not-present CPUs
> > > from this list to learn which CPUs could be brought online.
> > >
> > > CPUs can be present but not-enabled. These CPUs can't be brought online
> > > until the firmware policy changes, which comes with an ACPI notification
> > > that will register the CPUs.
> > >
> > > With only the offline and present files, user-space is unable to
> > > determine which CPUs it can try to bring online. Add a new CPU mask
> > > that shows this based on all the registered CPUs.
> > >
> > > Signed-off-by: James Morse <[email protected]>
> > > Tested-by: Miguel Luis <[email protected]>
> > > Tested-by: Vishnu Pajjuri <[email protected]>
> > > Tested-by: Jianyong Wu <[email protected]>
> > > ---
> >
> > Needs docs
> > Documentation/ABI/testing/sysfs-devices-system-cpu
> > seems to have the rest of the similar entries.
>
> Any ideas what I put in there as "Date" ? It seems to me that we have
> little idea when this might be merged.. I could use the date of the
> commit (Nov 2022).
>

That's always a guess at best. Hopefully whoever picks this up
fixes the date up or asks for a new version with it fixed just before
they do.

J

2024-01-02 15:36:29

by Jose Marinho

[permalink] [raw]
Subject: RE: [PATCH RFC v3 20/21] ACPI: Add _OSC bits to advertise OS support for toggling CPU present/enabled



> -----Original Message-----
> From: Jonathan Cameron <[email protected]>
> Sent: Tuesday, January 2, 2024 3:17 PM
> To: Jose Marinho <[email protected]>
> Cc: Russell King (Oracle) <[email protected]>; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; linux-arm-
> [email protected]; [email protected];
> [email protected]; [email protected]; acpica-
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> Salil Mehta <[email protected]>; Jean-Philippe Brucker <jean-
> [email protected]>; Jianyong Wu <[email protected]>; Justin He
> <[email protected]>; James Morse <[email protected]>; Samer El-Haj-
> Mahmoud <[email protected]>; nd <[email protected]>; Kangkang
> Shen <[email protected]>
> Subject: Re: [PATCH RFC v3 20/21] ACPI: Add _OSC bits to advertise OS support
> for toggling CPU present/enabled
>
> On Tue, 2 Jan 2024 13:07:25 +0000
> Jose Marinho <[email protected]> wrote:
>
> > Hi Jonathan,
> >
> > > -----Original Message-----
> > > From: Jonathan Cameron <[email protected]>
> > > Sent: Friday, December 15, 2023 5:12 PM
> > > To: Russell King (Oracle) <[email protected]>
> > > Cc: [email protected]; [email protected]; linux-
> > > [email protected]; [email protected]; linux-
> > > [email protected]; [email protected]; linux-
> > > [email protected]; [email protected]; [email protected];
> > > acpica- [email protected]; [email protected];
> > > linux- [email protected]; [email protected]; linux-
> > > [email protected]; Salil Mehta <[email protected]>;
> > > Jean-Philippe Brucker <[email protected]>; Jianyong Wu
> > > <[email protected]>; Justin He <[email protected]>; James Morse
> > > <[email protected]>; Jose Marinho <[email protected]>; Samer
> > > El-Haj-Mahmoud <Samer.El- [email protected]>
> > > Subject: Re: [PATCH RFC v3 20/21] ACPI: Add _OSC bits to advertise
> > > OS support for toggling CPU present/enabled
> > >
> > > On Wed, 13 Dec 2023 12:50:54 +0000
> > > Russell King (Oracle) <[email protected]> wrote:
> > >
> > > > From: James Morse <[email protected]>
> > > >
> > > > Platform firmware can disabled a CPU, or make it not-present by
> > > > making an eject-request notification, then waiting for the os to
> > > > make it offline
> > > OS
> > >
> > > > and call _EJx. After the firmware updates _STA with the new status.
> > > >
> > > > Not all operating systems support this. For arm64 making CPUs
> > > > not-present has never been supported. For all ACPI architectures,
> > > > making CPUs disabled has recently been added. Firmware can't know
> > > > what
> > > the OS has support for.
> > > >
> > > > Add two new _OSC bits to advertise whether the OS supports the
> > > > _STA enabled or present bits being toggled for CPUs. This will be
> > > > important for arm64 if systems that support physical CPU hotplug
> > > > ever appear as
> > > > arm64 linux doesn't currently support this, so firmware shouldn't try.
> > > >
> > > > Advertising this support to firmware is useful for cloud
> > > > orchestrators to know whether they can scale a particular VM by adding
> CPUs.
> > > >
> > > > Signed-off-by: James Morse <[email protected]>
> > > > Tested-by: Miguel Luis <[email protected]>
> > > > Tested-by: Vishnu Pajjuri <[email protected]>
> > > > Tested-by: Jianyong Wu <[email protected]>
> > >
> > > I'm very much in favor of this _OSC but it hasn't been accepted yet I think...
> > > https://bugzilla.tianocore.org/show_bug.cgi?id=4481
> > >
> > > Jose? Github suggests you are the proposer on this.
> >
> > The addition of these _OSC bits was proposed by us on the forum in question.
> > The forum opted to pause the definition until additional practical information
> could be provided on the use-cases.
> >
> > If anyone is interested in progressing the _OSC bit definition, you are invited to
> express that interest in the Bugzilla ticket.
>
> I've poked around a bit and can't find any reference to how to actually get a
> bugzilla account bugzilla.tianocore.org. Any pointers? I'm sure I had one at one
> stage, but trying every plausible email address and the forgotten password link
> got me nowhere.
>

The procedure to get a new account is described here: https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues
The immediate next steps are:
- Join https://edk2.groups.io/g/devel, and subscribe edk2 | devel group.
- Send the email with the detail reason to Bugzilla Admin ([email protected]) , this email address will be created as Bugzilla account.

> > Information that you should provide to increase the chances of the ticket being
> reopened:
> > - use-case for the new _OSC bits,
>
> Really annoying without it as a hypervisor can't query if a guest can do anything
> useful if the host does virtual CPU hotplug via this newly added route.
> Given this is new functionality and there is non trivial effort required by the host
> to instantiate such a CPU it would be nice to be able to find out if the feature is
> supported by the Guest OS without having to basically suck it an see with
> hypervisors having to do a trial hotplug just to see if it 'might' work.
>
> > - what breaks (if anything) without the proposed _OSC bits.
>
> Nothing breaks - you can merrily poke in hotplugged CPUs with the attendant
> creation of resources in the host and have them disappear into a black hole.
> That's ugly but not broken as such. Hopefully a hypervisor will not keep trying
> until the first attempt either succeeds or fails.
>
> >
> > We did receive additional comments:
> > - the proposed _OSC bits are not generic: the bits simply convey whether the
> guest OS understands CPU hot-plug, but it says nothing about the number of CPUs
> that the OS supports.
>
> If a guest says it supports this feature, you would hope it supports it for the
> number of CPUs that have the present bit set but the enabled not.
> I'd clarify that in the text rather than provide a means of querying the number of
> CPUs supported.
> Number wouldn't be sufficient anyway as it wouldn't indicate 'which' CPUs are
> supported.
> Nothing says they have to be contiguous or lowest IDs etc.
>
> > - There could be alternate schemes that do not rely on spec changes. E.g. there
> could be a hypervisor IMPDEF mechanism to describe if an OS image supports
> CPU hot-plug.
>
> Sigh. Yes, that could be done at the cost of every guest having to be made aware
> of every hypervisor impdef mechanism. Trying to avoid that mess is why I think
> an _OSC makes sense as then everyone can use the same control.
>
> No particular reason we should use ACPI at all for VMs :)
>
> >
> > >
> > > btw v4 looks ok but v5 in the tianocore github seems to have lost
> > > the actual OSC part.
> >
> > Agree that, if we do progress with this spec change, v4 is the correct formulation
> we should adopt.
> >
> Thanks for the update.
>
> Overall this is a question we need to resolve soon. If this code otherwise goes in
> linux without the OSC we will always need to support the 'suck it and see'
> approach as we'll never know if the guest fell down the hole. Thus if not added
> soon we might as well not add it at all and we'll all be looking at the code and
> thinking "that's ugly and shouldn't have been necessary" for years to come.
>
> +CC Kangkang as he might be able to help get this started again.

We're keen to support the progress of this ECR.

Regards,
Jose

>
> Jonathan
>
> > Regards,
> > Jose
> >
> > >
> > > Jonathan
> > >
> > > > ---
> > > > I'm assuming Loongarch machines do not support physical CPU hotplug.
> > > >
> > > > Changes since RFC v3:
> > > > * Drop ia64 changes
> > > > * Update James' comment below "---" to remove reference to ia64
> > > >
> > > > Outstanding comment:
> > > > https://lore.kernel.org/r/[email protected]
> > >
> > >
> > >
> > > > ---
> > > > arch/x86/Kconfig | 1 +
> > > > drivers/acpi/Kconfig | 9 +++++++++
> > > > drivers/acpi/acpi_processor.c | 14 +++++++++++++-
> > > > drivers/acpi/bus.c | 16 ++++++++++++++++
> > > > include/linux/acpi.h | 4 ++++
> > > > 5 files changed, 43 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index
> > > > 64fc7c475ab0..33fc4dcd950c 100644
> > > > --- a/arch/x86/Kconfig
> > > > +++ b/arch/x86/Kconfig
> > > > @@ -60,6 +60,7 @@ config X86
> > > > select ACPI_LEGACY_TABLES_LOOKUP if ACPI
> > > > select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
> > > > select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR
> > > && HOTPLUG_CPU
> > > > + select ACPI_HOTPLUG_IGNORE_OSC if ACPI &&
> > > HOTPLUG_CPU
> > > > select ARCH_32BIT_OFF_T if X86_32
> > > > select ARCH_CLOCKSOURCE_INIT
> > > > select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
> > > > diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index
> > > > 9c5a43d0aff4..020e7c0ab985 100644
> > > > --- a/drivers/acpi/Kconfig
> > > > +++ b/drivers/acpi/Kconfig
> > > > @@ -311,6 +311,15 @@ config ACPI_HOTPLUG_PRESENT_CPU
> > > > depends on ACPI_PROCESSOR && HOTPLUG_CPU
> > > > select ACPI_CONTAINER
> > > >
> > > > +config ACPI_HOTPLUG_IGNORE_OSC
> > > > + bool
> > > > + depends on ACPI_HOTPLUG_PRESENT_CPU
> > > > + help
> > > > + Ignore whether firmware acknowledged support for toggling the CPU
> > > > + present bit in _STA. Some architectures predate the _OSC bits, so
> > > > + firmware doesn't know to do this.
> > > > +
> > > > +
> > > > config ACPI_PROCESSOR_AGGREGATOR
> > > > tristate "Processor Aggregator"
> > > > depends on ACPI_PROCESSOR
> > > > diff --git a/drivers/acpi/acpi_processor.c
> > > > b/drivers/acpi/acpi_processor.c index ea12e70dfd39..5bb207a7a1dd
> > > > 100644
> > > > --- a/drivers/acpi/acpi_processor.c
> > > > +++ b/drivers/acpi/acpi_processor.c
> > > > @@ -182,6 +182,18 @@ static void __init
> > > > acpi_pcc_cpufreq_init(void) static void __init
> > > > acpi_pcc_cpufreq_init(void) {} #endif /*
> > > > CONFIG_X86 */
> > > >
> > > > +static bool acpi_processor_hotplug_present_supported(void)
> > > > +{
> > > > + if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
> > > > + return false;
> > > > +
> > > > + /* x86 systems pre-date the _OSC bit */
> > > > + if (IS_ENABLED(CONFIG_ACPI_HOTPLUG_IGNORE_OSC))
> > > > + return true;
> > > > +
> > > > + return osc_sb_hotplug_present_support_acked;
> > > > +}
> > > > +
> > > > /* Initialization */
> > > > static int acpi_processor_make_present(struct acpi_processor *pr)
> > > > { @@ -189,7 +201,7 @@ static int
> > > > acpi_processor_make_present(struct
> > > acpi_processor *pr)
> > > > acpi_status status;
> > > > int ret;
> > > >
> > > > - if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU)) {
> > > > + if (!acpi_processor_hotplug_present_supported()) {
> > > > pr_err_once("Changing CPU present bit is not supported\n");
> > > > return -ENODEV;
> > > > }
> > > > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c index
> > > > 72e64c0718c9..7122450739d6 100644
> > > > --- a/drivers/acpi/bus.c
> > > > +++ b/drivers/acpi/bus.c
> > > > @@ -298,6 +298,13 @@
> > > > EXPORT_SYMBOL_GPL(osc_sb_native_usb4_support_confirmed);
> > > >
> > > > bool osc_sb_cppc2_support_acked;
> > > >
> > > > +/*
> > > > + * ACPI 6.? Proposed Operating System Capabilities for modifying
> > > > +CPU
> > > > + * present/enable.
> > > > + */
> > > > +bool osc_sb_hotplug_enabled_support_acked;
> > > > +bool osc_sb_hotplug_present_support_acked;
> > > > +
> > > > static u8 sb_uuid_str[] = "0811B06E-4A27-44F9-8D60-3CBBC22E7B48";
> > > > static void acpi_bus_osc_negotiate_platform_control(void)
> > > > {
> > > > @@ -346,6 +353,11 @@ static void
> > > > acpi_bus_osc_negotiate_platform_control(void)
> > > >
> > > > if (!ghes_disable)
> > > > capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
> > > > +
> > > > + capbuf[OSC_SUPPORT_DWORD] |=
> > > OSC_SB_HOTPLUG_ENABLED_SUPPORT;
> > > > + if (IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
> > > > + capbuf[OSC_SUPPORT_DWORD] |=
> > > OSC_SB_HOTPLUG_PRESENT_SUPPORT;
> > > > +
> > > > if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
> > > > return;
> > > >
> > > > @@ -383,6 +395,10 @@ static void
> > > acpi_bus_osc_negotiate_platform_control(void)
> > > > capbuf_ret[OSC_SUPPORT_DWORD] &
> > > OSC_SB_NATIVE_USB4_SUPPORT;
> > > > osc_cpc_flexible_adr_space_confirmed =
> > > > capbuf_ret[OSC_SUPPORT_DWORD] &
> > > OSC_SB_CPC_FLEXIBLE_ADR_SPACE;
> > > > + osc_sb_hotplug_enabled_support_acked =
> > > > + capbuf_ret[OSC_SUPPORT_DWORD] &
> > > OSC_SB_HOTPLUG_ENABLED_SUPPORT;
> > > > + osc_sb_hotplug_present_support_acked =
> > > > + capbuf_ret[OSC_SUPPORT_DWORD] &
> > > OSC_SB_HOTPLUG_PRESENT_SUPPORT;
> > > > }
> > > >
> > > > kfree(context.ret.pointer);
> > > > diff --git a/include/linux/acpi.h b/include/linux/acpi.h index
> > > > 00be66683505..c572abac803c 100644
> > > > --- a/include/linux/acpi.h
> > > > +++ b/include/linux/acpi.h
> > > > @@ -559,12 +559,16 @@ acpi_status acpi_run_osc(acpi_handle handle,
> > > struct acpi_osc_context *context);
> > > > #define OSC_SB_NATIVE_USB4_SUPPORT 0x00040000
> > > > #define OSC_SB_PRM_SUPPORT 0x00200000
> > > > #define OSC_SB_FFH_OPR_SUPPORT 0x00400000
> > > > +#define OSC_SB_HOTPLUG_ENABLED_SUPPORT 0x00800000
> > > > +#define OSC_SB_HOTPLUG_PRESENT_SUPPORT 0x01000000
> > > >
> > > > extern bool osc_sb_apei_support_acked; extern bool
> > > > osc_pc_lpi_support_confirmed; extern bool
> > > > osc_sb_native_usb4_support_confirmed;
> > > > extern bool osc_sb_cppc2_support_acked; extern bool
> > > > osc_cpc_flexible_adr_space_confirmed;
> > > > +extern bool osc_sb_hotplug_enabled_support_acked;
> > > > +extern bool osc_sb_hotplug_present_support_acked;
> > > >
> > > > /* USB4 Capabilities */
> > > > #define OSC_USB_USB3_TUNNELING 0x00000001
> >


2024-01-09 15:53:15

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 02/21] ACPI: processor: Add support for processors described as container packages

On Mon, Dec 18, 2023 at 09:17:34PM +0100, Rafael J. Wysocki wrote:
> On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> >
> > From: James Morse <[email protected]>
> >
> > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
> > 5.2.12:
> >
> > "Starting with ACPI Specification 6.3, the use of the Processor() object
> > was deprecated. Only legacy systems should continue with this usage. On
> > the Itanium architecture only, a _UID is provided for the Processor()
> > that is a string object. This usage of _UID is also deprecated since it
> > can preclude an OSPM from being able to match a processor to a
> > non-enumerable device, such as those defined in the MADT. From ACPI
> > Specification 6.3 onward, all processor objects for all architectures
> > except Itanium must now use Device() objects with an _HID of ACPI0007,
> > and use only integer _UID values."
> >
> > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors
> >
> > Duplicate descriptions are not allowed, the ACPI processor driver already
> > parses the UID from both devices and containers. acpi_processor_get_info()
> > returns an error if the UID exists twice in the DSDT.
>
> I'm not really sure how the above is related to the actual patch.
>
> > The missing probe for CPUs described as packages
>
> It is unclear what exactly is meant by "CPUs described as packages".
>
> From the patch, it looks like those would be Processor() objects
> defined under a processor container device.
>
> > creates a problem for
> > moving the cpu_register() calls into the acpi_processor driver, as CPUs
> > described like this don't get registered, leading to errors from other
> > subsystems when they try to add new sysfs entries to the CPU node.
> > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)
> >
> > To fix this, parse the processor container and call acpi_processor_add()
> > for each processor that is discovered like this.
>
> Discovered like what?
>
> > The processor container
> > handler is added with acpi_scan_add_handler(), so no detach call will
> > arrive.
>
> The above requires clarification too.

The above comments... yea. As I didn't write the commit description, but
James did, and James has basically vanished, I don't think these can be
answered, short of rewriting the entire commit message, with me spending
a lot of time with the ACPI specification trying to get the terminology
right - because at lot of the above on the face of it seems to be things
to do with wrong terminology being used.

I wasn't expecting this level of issues with this patch set, and I now
feel completely out of my depth with this series. I'm wondering whether
I should even continue with it, since I don't have the ACPI knowledge
to address a lot of these comments.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-01-09 16:05:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 02/21] ACPI: processor: Add support for processors described as container packages

On Tue, Jan 9, 2024 at 4:49 PM Russell King (Oracle)
<[email protected]> wrote:
>
> On Mon, Dec 18, 2023 at 09:17:34PM +0100, Rafael J. Wysocki wrote:
> > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > >
> > > From: James Morse <[email protected]>
> > >
> > > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
> > > 5.2.12:
> > >
> > > "Starting with ACPI Specification 6.3, the use of the Processor() object
> > > was deprecated. Only legacy systems should continue with this usage. On
> > > the Itanium architecture only, a _UID is provided for the Processor()
> > > that is a string object. This usage of _UID is also deprecated since it
> > > can preclude an OSPM from being able to match a processor to a
> > > non-enumerable device, such as those defined in the MADT. From ACPI
> > > Specification 6.3 onward, all processor objects for all architectures
> > > except Itanium must now use Device() objects with an _HID of ACPI0007,
> > > and use only integer _UID values."
> > >
> > > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors
> > >
> > > Duplicate descriptions are not allowed, the ACPI processor driver already
> > > parses the UID from both devices and containers. acpi_processor_get_info()
> > > returns an error if the UID exists twice in the DSDT.
> >
> > I'm not really sure how the above is related to the actual patch.
> >
> > > The missing probe for CPUs described as packages
> >
> > It is unclear what exactly is meant by "CPUs described as packages".
> >
> > From the patch, it looks like those would be Processor() objects
> > defined under a processor container device.
> >
> > > creates a problem for
> > > moving the cpu_register() calls into the acpi_processor driver, as CPUs
> > > described like this don't get registered, leading to errors from other
> > > subsystems when they try to add new sysfs entries to the CPU node.
> > > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)
> > >
> > > To fix this, parse the processor container and call acpi_processor_add()
> > > for each processor that is discovered like this.
> >
> > Discovered like what?
> >
> > > The processor container
> > > handler is added with acpi_scan_add_handler(), so no detach call will
> > > arrive.
> >
> > The above requires clarification too.
>
> The above comments... yea. As I didn't write the commit description, but
> James did, and James has basically vanished, I don't think these can be
> answered, short of rewriting the entire commit message, with me spending
> a lot of time with the ACPI specification trying to get the terminology
> right - because at lot of the above on the face of it seems to be things
> to do with wrong terminology being used.
>
> I wasn't expecting this level of issues with this patch set, and I now
> feel completely out of my depth with this series. I'm wondering whether
> I should even continue with it, since I don't have the ACPI knowledge
> to address a lot of these comments.

Well, sorry about this.

I met James at the LPC last year, so he seems to be still around, in
some way at least..

2024-01-09 16:13:43

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 02/21] ACPI: processor: Add support for processors described as container packages

On Tue, Jan 09, 2024 at 05:05:15PM +0100, Rafael J. Wysocki wrote:
> On Tue, Jan 9, 2024 at 4:49 PM Russell King (Oracle)
> <[email protected]> wrote:
> >
> > On Mon, Dec 18, 2023 at 09:17:34PM +0100, Rafael J. Wysocki wrote:
> > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > >
> > > > From: James Morse <[email protected]>
> > > >
> > > > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
> > > > 5.2.12:
> > > >
> > > > "Starting with ACPI Specification 6.3, the use of the Processor() object
> > > > was deprecated. Only legacy systems should continue with this usage. On
> > > > the Itanium architecture only, a _UID is provided for the Processor()
> > > > that is a string object. This usage of _UID is also deprecated since it
> > > > can preclude an OSPM from being able to match a processor to a
> > > > non-enumerable device, such as those defined in the MADT. From ACPI
> > > > Specification 6.3 onward, all processor objects for all architectures
> > > > except Itanium must now use Device() objects with an _HID of ACPI0007,
> > > > and use only integer _UID values."
> > > >
> > > > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors
> > > >
> > > > Duplicate descriptions are not allowed, the ACPI processor driver already
> > > > parses the UID from both devices and containers. acpi_processor_get_info()
> > > > returns an error if the UID exists twice in the DSDT.
> > >
> > > I'm not really sure how the above is related to the actual patch.
> > >
> > > > The missing probe for CPUs described as packages
> > >
> > > It is unclear what exactly is meant by "CPUs described as packages".
> > >
> > > From the patch, it looks like those would be Processor() objects
> > > defined under a processor container device.
> > >
> > > > creates a problem for
> > > > moving the cpu_register() calls into the acpi_processor driver, as CPUs
> > > > described like this don't get registered, leading to errors from other
> > > > subsystems when they try to add new sysfs entries to the CPU node.
> > > > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)
> > > >
> > > > To fix this, parse the processor container and call acpi_processor_add()
> > > > for each processor that is discovered like this.
> > >
> > > Discovered like what?
> > >
> > > > The processor container
> > > > handler is added with acpi_scan_add_handler(), so no detach call will
> > > > arrive.
> > >
> > > The above requires clarification too.
> >
> > The above comments... yea. As I didn't write the commit description, but
> > James did, and James has basically vanished, I don't think these can be
> > answered, short of rewriting the entire commit message, with me spending
> > a lot of time with the ACPI specification trying to get the terminology
> > right - because at lot of the above on the face of it seems to be things
> > to do with wrong terminology being used.
> >
> > I wasn't expecting this level of issues with this patch set, and I now
> > feel completely out of my depth with this series. I'm wondering whether
> > I should even continue with it, since I don't have the ACPI knowledge
> > to address a lot of these comments.
>
> Well, sorry about this.
>
> I met James at the LPC last year, so he seems to be still around, in
> some way at least..

On the previous posting, I wanted James to comment on some of the
feedback from Jonathan, and despite explicitly asking, there has been
nothing but radio silence ever since James' last post of this series.

So, I now deem this work to be completely dead in the water, and not
going to happen - not unless others can input on your comments.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-01-09 19:27:55

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 14/21] irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc()

On Fri, Dec 15, 2023 at 04:33:01PM +0000, Jonathan Cameron wrote:
> On Wed, 13 Dec 2023 12:50:23 +0000
> Russell King (Oracle) <[email protected]> wrote:
>
> > From: James Morse <[email protected]>
> >
> > gic_acpi_match_gicc() is only called via gic_acpi_count_gicr_regions().
> > It should only count the number of enabled redistributors, but it
> > also tries to sanity check the GICC entry, currently returning an
> > error if the Enabled bit is set, but the gicr_base_address is zero.
> >
> > Adding support for the online-capable bit to the sanity check
> > complicates it, for no benefit. The existing check implicitly
> > depends on gic_acpi_count_gicr_regions() previous failing to find
> > any GICR regions (as it is valid to have gicr_base_address of zero if
> > the redistributors are described via a GICR entry).
> >
> > Instead of complicating the check, remove it. Failures that happen
> > at this point cause the irqchip not to register, meaning no irqs
> > can be requested. The kernel grinds to a panic() pretty quickly.
> >
> > Without the check, MADT tables that exhibit this problem are still
> > caught by gic_populate_rdist(), which helpfully also prints what
> > went wrong:
> > | CPU4: mpidr 100 has no re-distributor!
> >
> > Signed-off-by: James Morse <[email protected]>
> > Reviewed-by: Gavin Shan <[email protected]>
> > Tested-by: Miguel Luis <[email protected]>
> > Tested-by: Vishnu Pajjuri <[email protected]>
> > Tested-by: Jianyong Wu <[email protected]>
> > Signed-off-by: Russell King (Oracle) <[email protected]>
> > ---
> > drivers/irqchip/irq-gic-v3.c | 18 ++++++------------
> > 1 file changed, 6 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > index 98b0329b7154..ebecd4546830 100644
> > --- a/drivers/irqchip/irq-gic-v3.c
> > +++ b/drivers/irqchip/irq-gic-v3.c
> > @@ -2420,21 +2420,15 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header,
> >
> > /*
> > * If GICC is enabled and has valid gicr base address, then it means
> > - * GICR base is presented via GICC
> > + * GICR base is presented via GICC. The redistributor is only known to
> > + * be accessible if the GICC is marked as enabled. If this bit is not
> > + * set, we'd need to add the redistributor at runtime, which isn't
> > + * supported.
> > */
> > - if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) {
> > + if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address)
>
> I was very vague in previous review. I think the reasons you are switching
> from acpi_gicc_is_useable(gicc) to the gicc->flags & ACPI_MADT_ENABLED
> needs calling out as I'm fairly sure that this point in the series at least
> acpi_gicc_is_usable is same as current upstream:
>
> static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc)
> {
> return gicc->flags & ACPI_MADT_ENABLED;
> }

In a previous patch adding acpi_gicc_is_usable() c54e52f84d7a ("arm64,
irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper") this
was:

- if ((gicc->flags & ACPI_MADT_ENABLED) && gicc->gicr_base_address) {
+ if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) {

so effectively this is undoing that particular change, which raises in
my mind why the change was made in the first place if it's just going
to be reverted in a later patch (because in a following patch,
acpi_gicc_is_usable() has an additional condition added to it that
isn't applicable here.) which effectively makes acpi_gicc_is_usable()
return true if either ACPI_MADT_ENABLED _or_
ACPI_MADT_GICC_ONLINE_CAPABLE (as it is now known) are set.

However, if ACPI_MADT_GICC_ONLINE_CAPABLE is set, does that actually
mean that the GICC is usable? I'm not sure it does. ACPI v6.5 says that
this bit indicates that the system supports enabling this processor
later. Is the GICC of a currently disabled processor "usable"...

Clearly, the intention of this change is not to count this GICC entry
if it is marked ACPI_MADT_GICC_ONLINE_CAPABLE, but I feel that isn't
described in the commit message.

Moreover, I am getting the feeling that there are _two_ changes going
on here - there's the change that's talked about in the commit message
(the complex validation that seems unnecessary) and then there's the
preparation for the change to acpi_gicc_is_usable() - which maybe
should be in the following patch where it would be less confusing.

Would you agree?

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-01-11 10:20:43

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 01/21] ACPI: Only enumerate enabled (or functional) devices

On Tue, 2 Jan 2024 14:39:25 +0000
Jonathan Cameron <[email protected]> wrote:

> On Fri, 15 Dec 2023 20:47:31 +0100
> "Rafael J. Wysocki" <[email protected]> wrote:
>
> > On Friday, December 15, 2023 5:15:39 PM CET Jonathan Cameron wrote:
> > > On Fri, 15 Dec 2023 15:31:55 +0000
> > > "Russell King (Oracle)" <[email protected]> wrote:
> > >
> > > > On Thu, Dec 14, 2023 at 07:37:10PM +0100, Rafael J. Wysocki wrote:
> > > > > On Thu, Dec 14, 2023 at 7:16 PM Rafael J. Wysocki <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, Dec 14, 2023 at 7:10 PM Russell King (Oracle)
> > > > > > <[email protected]> wrote:
> > > > > > > I guess we need something like:
> > > > > > >
> > > > > > > if (device->status.present)
> > > > > > > return device->device_type != ACPI_BUS_TYPE_PROCESSOR ||
> > > > > > > device->status.enabled;
> > > > > > > else
> > > > > > > return device->status.functional;
> > > > > > >
> > > > > > > so we only check device->status.enabled for processor-type devices?
> > > > > >
> > > > > > Yes, something like this.
> > > > >
> > > > > However, that is not sufficient, because there are
> > > > > ACPI_BUS_TYPE_DEVICE devices representing processors.
> > > > >
> > > > > I'm not sure about a clean way to do it ATM.
> > > >
> > > > Ok, how about:
> > > >
> > > > static bool acpi_dev_is_processor(const struct acpi_device *device)
> > > > {
> > > > struct acpi_hardware_id *hwid;
> > > >
> > > > if (device->device_type == ACPI_BUS_TYPE_PROCESSOR)
> > > > return true;
> > > >
> > > > if (device->device_type != ACPI_BUS_TYPE_DEVICE)
> > > > return false;
> > > >
> > > > list_for_each_entry(hwid, &device->pnp.ids, list)
> > > > if (!strcmp(ACPI_PROCESSOR_OBJECT_HID, hwid->id) ||
> > > > !strcmp(ACPI_PROCESSOR_DEVICE_HID, hwid->id))
> > > > return true;
> > > >
> > > > return false;
> > > > }
> > > >
> > > > and then:
> > > >
> > > > if (device->status.present)
> > > > return !acpi_dev_is_processor(device) || device->status.enabled;
> > > > else
> > > > return device->status.functional;
> > > >
> > > > ?
> > > >
> > > Changing it to CPU only for now makes sense to me and I think this code snippet should do the
> > > job. Nice and simple.
> >
> > Well, except that it does checks that are done elsewhere slightly
> > differently, which from the maintenance POV is not nice.
> >
> > Maybe something like the appended patch (untested).
>
> Hi Rafael,
>
> As far as I can see that's functionally equivalent, so looks good to me.
> I'm not set up to test this today though, so will defer to Russell on whether
> there is anything missing
>
> Thanks for putting this together.

This is rather embarrassing...

I span this up on a QEMU instance with some prints to find out we need
the !acpi_device_is_processor() restriction.
On my 'random' test setup it fails on one device. ACPI0017 - which I
happen to know rather well. It's the weird pseudo device that lets
a CXL aware OS know there is a CEDT table to probe.

Whilst I really don't like that hack (it is all about making software
distribution of out of tree modules easier rather than something
fundamental), I'm the CXL QEMU maintainer :(

Will fix that, but it shows there is at least one broken firmware out
there.

On plus side, Rafael's code seems to work as expected and lets that
buggy firwmare carry on working :) So lets pretend the bug in qemu
is a deliberate test case!

Jonathan

p.s. My test setup blows up later for an unrelated reason with latest
kernel, so I'll be off debugging that for a while :(


>
> Jonathan
>
> >
> > ---
> > drivers/acpi/acpi_processor.c | 11 +++++++++++
> > drivers/acpi/internal.h | 3 +++
> > drivers/acpi/scan.c | 24 +++++++++++++++++++++++-
> > 3 files changed, 37 insertions(+), 1 deletion(-)
> >
> > Index: linux-pm/drivers/acpi/acpi_processor.c
> > ===================================================================
> > --- linux-pm.orig/drivers/acpi/acpi_processor.c
> > +++ linux-pm/drivers/acpi/acpi_processor.c
> > @@ -644,6 +644,17 @@ static struct acpi_scan_handler processo
> > },
> > };
> >
> > +bool acpi_device_is_processor(const struct acpi_device *adev)
> > +{
> > + if (adev->device_type == ACPI_BUS_TYPE_PROCESSOR)
> > + return true;
> > +
> > + if (adev->device_type != ACPI_BUS_TYPE_DEVICE)
> > + return false;
> > +
> > + return acpi_scan_check_handler(adev, &processor_handler);
> > +}
> > +
> > static int acpi_processor_container_attach(struct acpi_device *dev,
> > const struct acpi_device_id *id)
> > {
> > Index: linux-pm/drivers/acpi/internal.h
> > ===================================================================
> > --- linux-pm.orig/drivers/acpi/internal.h
> > +++ linux-pm/drivers/acpi/internal.h
> > @@ -62,6 +62,8 @@ void acpi_sysfs_add_hotplug_profile(stru
> > int acpi_scan_add_handler_with_hotplug(struct acpi_scan_handler *handler,
> > const char *hotplug_profile_name);
> > void acpi_scan_hotplug_enabled(struct acpi_hotplug_profile *hotplug, bool val);
> > +bool acpi_scan_check_handler(const struct acpi_device *adev,
> > + struct acpi_scan_handler *handler);
> >
> > #ifdef CONFIG_DEBUG_FS
> > extern struct dentry *acpi_debugfs_dir;
> > @@ -133,6 +135,7 @@ int acpi_bus_register_early_device(int t
> > const struct acpi_device *acpi_companion_match(const struct device *dev);
> > int __acpi_device_uevent_modalias(const struct acpi_device *adev,
> > struct kobj_uevent_env *env);
> > +bool acpi_device_is_processor(const struct acpi_device *adev);
> >
> > /* --------------------------------------------------------------------------
> > Power Resource
> > Index: linux-pm/drivers/acpi/scan.c
> > ===================================================================
> > --- linux-pm.orig/drivers/acpi/scan.c
> > +++ linux-pm/drivers/acpi/scan.c
> > @@ -1938,6 +1938,19 @@ static bool acpi_scan_handler_matching(s
> > return false;
> > }
> >
> > +bool acpi_scan_check_handler(const struct acpi_device *adev,
> > + struct acpi_scan_handler *handler)
> > +{
> > + struct acpi_hardware_id *hwid;
> > +
> > + list_for_each_entry(hwid, &adev->pnp.ids, list) {
> > + if (acpi_scan_handler_matching(handler, hwid->id, NULL))
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > static struct acpi_scan_handler *acpi_scan_match_handler(const char *idstr,
> > const struct acpi_device_id **matchid)
> > {
> > @@ -2410,7 +2423,16 @@ bool acpi_dev_ready_for_enumeration(cons
> > if (device->flags.honor_deps && device->dep_unmet)
> > return false;
> >
> > - return acpi_device_is_present(device);
> > + if (device->status.functional)
> > + return true;
> > +
> > + if (!device->status.present)
> > + return false;
> > +
> > + if (device->status.enabled)
> > + return true; /* Fast path. */
> > +
> > + return !acpi_device_is_processor(device);
> > }
> > EXPORT_SYMBOL_GPL(acpi_dev_ready_for_enumeration);
> >
> >
> >
> >
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


2024-01-11 10:26:41

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 01/21] ACPI: Only enumerate enabled (or functional) devices

On Thu, Jan 11, 2024 at 10:19:49AM +0000, Jonathan Cameron wrote:
> On Tue, 2 Jan 2024 14:39:25 +0000
> Jonathan Cameron <[email protected]> wrote:
>
> > On Fri, 15 Dec 2023 20:47:31 +0100
> > "Rafael J. Wysocki" <[email protected]> wrote:
> >
> > > On Friday, December 15, 2023 5:15:39 PM CET Jonathan Cameron wrote:
> > > > On Fri, 15 Dec 2023 15:31:55 +0000
> > > > "Russell King (Oracle)" <[email protected]> wrote:
> > > >
> > > > > On Thu, Dec 14, 2023 at 07:37:10PM +0100, Rafael J. Wysocki wrote:
> > > > > > On Thu, Dec 14, 2023 at 7:16 PM Rafael J. Wysocki <[email protected]> wrote:
> > > > > > >
> > > > > > > On Thu, Dec 14, 2023 at 7:10 PM Russell King (Oracle)
> > > > > > > <[email protected]> wrote:
> > > > > > > > I guess we need something like:
> > > > > > > >
> > > > > > > > if (device->status.present)
> > > > > > > > return device->device_type != ACPI_BUS_TYPE_PROCESSOR ||
> > > > > > > > device->status.enabled;
> > > > > > > > else
> > > > > > > > return device->status.functional;
> > > > > > > >
> > > > > > > > so we only check device->status.enabled for processor-type devices?
> > > > > > >
> > > > > > > Yes, something like this.
> > > > > >
> > > > > > However, that is not sufficient, because there are
> > > > > > ACPI_BUS_TYPE_DEVICE devices representing processors.
> > > > > >
> > > > > > I'm not sure about a clean way to do it ATM.
> > > > >
> > > > > Ok, how about:
> > > > >
> > > > > static bool acpi_dev_is_processor(const struct acpi_device *device)
> > > > > {
> > > > > struct acpi_hardware_id *hwid;
> > > > >
> > > > > if (device->device_type == ACPI_BUS_TYPE_PROCESSOR)
> > > > > return true;
> > > > >
> > > > > if (device->device_type != ACPI_BUS_TYPE_DEVICE)
> > > > > return false;
> > > > >
> > > > > list_for_each_entry(hwid, &device->pnp.ids, list)
> > > > > if (!strcmp(ACPI_PROCESSOR_OBJECT_HID, hwid->id) ||
> > > > > !strcmp(ACPI_PROCESSOR_DEVICE_HID, hwid->id))
> > > > > return true;
> > > > >
> > > > > return false;
> > > > > }
> > > > >
> > > > > and then:
> > > > >
> > > > > if (device->status.present)
> > > > > return !acpi_dev_is_processor(device) || device->status.enabled;
> > > > > else
> > > > > return device->status.functional;
> > > > >
> > > > > ?
> > > > >
> > > > Changing it to CPU only for now makes sense to me and I think this code snippet should do the
> > > > job. Nice and simple.
> > >
> > > Well, except that it does checks that are done elsewhere slightly
> > > differently, which from the maintenance POV is not nice.
> > >
> > > Maybe something like the appended patch (untested).
> >
> > Hi Rafael,
> >
> > As far as I can see that's functionally equivalent, so looks good to me.
> > I'm not set up to test this today though, so will defer to Russell on whether
> > there is anything missing
> >
> > Thanks for putting this together.
>
> This is rather embarrassing...
>
> I span this up on a QEMU instance with some prints to find out we need
> the !acpi_device_is_processor() restriction.
> On my 'random' test setup it fails on one device. ACPI0017 - which I
> happen to know rather well. It's the weird pseudo device that lets
> a CXL aware OS know there is a CEDT table to probe.
>
> Whilst I really don't like that hack (it is all about making software
> distribution of out of tree modules easier rather than something
> fundamental), I'm the CXL QEMU maintainer :(
>
> Will fix that, but it shows there is at least one broken firmware out
> there.
>
> On plus side, Rafael's code seems to work as expected and lets that
> buggy firwmare carry on working :) So lets pretend the bug in qemu
> is a deliberate test case!

Lol, thanks for a test case and showing that Rafael's approach is
indeed necessary.

Would your test quality for a tested-by for this? For reference, this
is my current version below with Rafael's update:

8<====
From: Russell King (Oracle) <[email protected]>
Subject: [PATCH] ACPI: Only enumerate enabled (or functional) processor
devices

From: James Morse <[email protected]>

Today the ACPI enumeration code 'visits' all devices that are present.

This is a problem for arm64, where CPUs are always present, but not
always enabled. When a device-check occurs because the firmware-policy
has changed and a CPU is now enabled, the following error occurs:
| acpi ACPI0007:48: Enumeration failure

This is ultimately because acpi_dev_ready_for_enumeration() returns
true for a device that is not enabled. The ACPI Processor driver
will not register such CPUs as they are not 'decoding their resources'.

ACPI allows a device to be functional instead of maintaining the
present and enabled bit, but we can't simply check the enabled bit
for all devices since firmware can be buggy.

If ACPI indicates that the device is present and enabled, then all well
and good, we can enumate it. However, if the device is present and not
enabled, then we also check whether the device is a processor device
to limit the impact of this new check to just processor devices.

This avoids enumerating present && functional processor devices that
are not enabled.

Signed-off-by: James Morse <[email protected]>
Co-developed-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
---
Changes since RFC v2:
* Incorporate comment suggestion by Gavin Shan.
Changes since RFC v3:
* Fixed "sert" typo.
Changes since RFC v3 (smaller series):
* Restrict checking the enabled bit to processor devices, update
commit comments.
* Use Rafael's suggestion in
https://lore.kernel.org/r/5760569.DvuYhMxLoT@kreacher
---
drivers/acpi/acpi_processor.c | 11 ++++++++
drivers/acpi/device_pm.c | 2 +-
drivers/acpi/device_sysfs.c | 2 +-
drivers/acpi/internal.h | 4 ++-
drivers/acpi/property.c | 2 +-
drivers/acpi/scan.c | 49 ++++++++++++++++++++++++++++-------
6 files changed, 56 insertions(+), 14 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 4fe2ef54088c..cf7c1cca69dd 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -626,6 +626,17 @@ static struct acpi_scan_handler processor_handler = {
},
};

+bool acpi_device_is_processor(const struct acpi_device *adev)
+{
+ if (adev->device_type == ACPI_BUS_TYPE_PROCESSOR)
+ return true;
+
+ if (adev->device_type != ACPI_BUS_TYPE_DEVICE)
+ return false;
+
+ return acpi_scan_check_handler(adev, &processor_handler);
+}
+
static int acpi_processor_container_attach(struct acpi_device *dev,
const struct acpi_device_id *id)
{
diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
index 3b4d048c4941..e3c80f3b3b57 100644
--- a/drivers/acpi/device_pm.c
+++ b/drivers/acpi/device_pm.c
@@ -313,7 +313,7 @@ int acpi_bus_init_power(struct acpi_device *device)
return -EINVAL;

device->power.state = ACPI_STATE_UNKNOWN;
- if (!acpi_device_is_present(device)) {
+ if (!acpi_dev_ready_for_enumeration(device)) {
device->flags.initialized = false;
return -ENXIO;
}
diff --git a/drivers/acpi/device_sysfs.c b/drivers/acpi/device_sysfs.c
index 23373faa35ec..a0256d2493a7 100644
--- a/drivers/acpi/device_sysfs.c
+++ b/drivers/acpi/device_sysfs.c
@@ -141,7 +141,7 @@ static int create_pnp_modalias(const struct acpi_device *acpi_dev, char *modalia
struct acpi_hardware_id *id;

/* Avoid unnecessarily loading modules for non present devices. */
- if (!acpi_device_is_present(acpi_dev))
+ if (!acpi_dev_ready_for_enumeration(acpi_dev))
return 0;

/*
diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
index 866c7c4ed233..9388d4c8674a 100644
--- a/drivers/acpi/internal.h
+++ b/drivers/acpi/internal.h
@@ -62,6 +62,8 @@ void acpi_sysfs_add_hotplug_profile(struct acpi_hotplug_profile *hotplug,
int acpi_scan_add_handler_with_hotplug(struct acpi_scan_handler *handler,
const char *hotplug_profile_name);
void acpi_scan_hotplug_enabled(struct acpi_hotplug_profile *hotplug, bool val);
+bool acpi_scan_check_handler(const struct acpi_device *adev,
+ struct acpi_scan_handler *handler);

#ifdef CONFIG_DEBUG_FS
extern struct dentry *acpi_debugfs_dir;
@@ -107,7 +109,6 @@ int acpi_device_setup_files(struct acpi_device *dev);
void acpi_device_remove_files(struct acpi_device *dev);
void acpi_device_add_finalize(struct acpi_device *device);
void acpi_free_pnp_ids(struct acpi_device_pnp *pnp);
-bool acpi_device_is_present(const struct acpi_device *adev);
bool acpi_device_is_battery(struct acpi_device *adev);
bool acpi_device_is_first_physical_node(struct acpi_device *adev,
const struct device *dev);
@@ -119,6 +120,7 @@ int acpi_bus_register_early_device(int type);
const struct acpi_device *acpi_companion_match(const struct device *dev);
int __acpi_device_uevent_modalias(const struct acpi_device *adev,
struct kobj_uevent_env *env);
+bool acpi_device_is_processor(const struct acpi_device *adev);

/* --------------------------------------------------------------------------
Power Resource
diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c
index 6979a3f9f90a..14d6948fd88a 100644
--- a/drivers/acpi/property.c
+++ b/drivers/acpi/property.c
@@ -1420,7 +1420,7 @@ static bool acpi_fwnode_device_is_available(const struct fwnode_handle *fwnode)
if (!is_acpi_device_node(fwnode))
return false;

- return acpi_device_is_present(to_acpi_device_node(fwnode));
+ return acpi_dev_ready_for_enumeration(to_acpi_device_node(fwnode));
}

static const void *
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 02bb2cce423f..f94d1f744bcc 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -304,7 +304,7 @@ static int acpi_scan_device_check(struct acpi_device *adev)
int error;

acpi_bus_get_status(adev);
- if (acpi_device_is_present(adev)) {
+ if (acpi_dev_ready_for_enumeration(adev)) {
/*
* This function is only called for device objects for which
* matching scan handlers exist. The only situation in which
@@ -338,7 +338,7 @@ static int acpi_scan_bus_check(struct acpi_device *adev, void *not_used)
int error;

acpi_bus_get_status(adev);
- if (!acpi_device_is_present(adev)) {
+ if (!acpi_dev_ready_for_enumeration(adev)) {
acpi_scan_device_not_enumerated(adev);
return 0;
}
@@ -1913,11 +1913,6 @@ static bool acpi_device_should_be_hidden(acpi_handle handle)
return true;
}

-bool acpi_device_is_present(const struct acpi_device *adev)
-{
- return adev->status.present || adev->status.functional;
-}
-
static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
const char *idstr,
const struct acpi_device_id **matchid)
@@ -1938,6 +1933,18 @@ static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
return false;
}

+bool acpi_scan_check_handler(const struct acpi_device *adev,
+ struct acpi_scan_handler *handler)
+{
+ struct acpi_hardware_id *hwid;
+
+ list_for_each_entry(hwid, &adev->pnp.ids, list)
+ if (acpi_scan_handler_matching(handler, hwid->id, NULL))
+ return true;
+
+ return false;
+}
+
static struct acpi_scan_handler *acpi_scan_match_handler(const char *idstr,
const struct acpi_device_id **matchid)
{
@@ -2381,16 +2388,38 @@ EXPORT_SYMBOL_GPL(acpi_dev_clear_dependencies);
* acpi_dev_ready_for_enumeration - Check if the ACPI device is ready for enumeration
* @device: Pointer to the &struct acpi_device to check
*
- * Check if the device is present and has no unmet dependencies.
+ * Check if the device is functional or enabled and has no unmet dependencies.
*
- * Return true if the device is ready for enumeratino. Otherwise, return false.
+ * Return true if the device is ready for enumeration. Otherwise, return false.
*/
bool acpi_dev_ready_for_enumeration(const struct acpi_device *device)
{
if (device->flags.honor_deps && device->dep_unmet)
return false;

- return acpi_device_is_present(device);
+ /*
+ * ACPI 6.5's 6.3.7 "_STA (Device Status)" allows firmware to return
+ * (!present && functional) for certain types of devices that should be
+ * enumerated. Note that the enabled bit should not be set unless the
+ * present bit is set.
+ *
+ * However, limit this only to processor devices to reduce possible
+ * regressions with firmware.
+ */
+ if (device->status.functional)
+ return true;
+
+ if (!device->status.present)
+ return false;
+
+ /*
+ * Fast path - if enabled is set, avoid the more expensive test to
+ * check whether this device is a processor.
+ */
+ if (device->status.enabled)
+ return true;
+
+ return !acpi_device_is_processor(device);
}
EXPORT_SYMBOL_GPL(acpi_dev_ready_for_enumeration);

--
2.30.2


--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-01-11 16:18:17

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 02/21] ACPI: processor: Add support for processors described as container packages

On Tue, 9 Jan 2024 16:13:21 +0000
"Russell King (Oracle)" <[email protected]> wrote:

> On Tue, Jan 09, 2024 at 05:05:15PM +0100, Rafael J. Wysocki wrote:
> > On Tue, Jan 9, 2024 at 4:49 PM Russell King (Oracle)
> > <[email protected]> wrote:
> > >
> > > On Mon, Dec 18, 2023 at 09:17:34PM +0100, Rafael J. Wysocki wrote:
> > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > > >
> > > > > From: James Morse <[email protected]>
> > > > >
> > > > > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
> > > > > 5.2.12:
> > > > >
> > > > > "Starting with ACPI Specification 6.3, the use of the Processor() object
> > > > > was deprecated. Only legacy systems should continue with this usage. On
> > > > > the Itanium architecture only, a _UID is provided for the Processor()
> > > > > that is a string object. This usage of _UID is also deprecated since it
> > > > > can preclude an OSPM from being able to match a processor to a
> > > > > non-enumerable device, such as those defined in the MADT. From ACPI
> > > > > Specification 6.3 onward, all processor objects for all architectures
> > > > > except Itanium must now use Device() objects with an _HID of ACPI0007,
> > > > > and use only integer _UID values."
> > > > >
> > > > > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors
> > > > >
> > > > > Duplicate descriptions are not allowed, the ACPI processor driver already
> > > > > parses the UID from both devices and containers. acpi_processor_get_info()
> > > > > returns an error if the UID exists twice in the DSDT.
> > > >
> > > > I'm not really sure how the above is related to the actual patch.
> > > >
> > > > > The missing probe for CPUs described as packages
> > > >
> > > > It is unclear what exactly is meant by "CPUs described as packages".
> > > >
> > > > From the patch, it looks like those would be Processor() objects
> > > > defined under a processor container device.
> > > >
> > > > > creates a problem for
> > > > > moving the cpu_register() calls into the acpi_processor driver, as CPUs
> > > > > described like this don't get registered, leading to errors from other
> > > > > subsystems when they try to add new sysfs entries to the CPU node.
> > > > > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)
> > > > >
> > > > > To fix this, parse the processor container and call acpi_processor_add()
> > > > > for each processor that is discovered like this.
> > > >
> > > > Discovered like what?
> > > >
> > > > > The processor container
> > > > > handler is added with acpi_scan_add_handler(), so no detach call will
> > > > > arrive.
> > > >
> > > > The above requires clarification too.
> > >
> > > The above comments... yea. As I didn't write the commit description, but
> > > James did, and James has basically vanished, I don't think these can be
> > > answered, short of rewriting the entire commit message, with me spending
> > > a lot of time with the ACPI specification trying to get the terminology
> > > right - because at lot of the above on the face of it seems to be things
> > > to do with wrong terminology being used.
> > >
> > > I wasn't expecting this level of issues with this patch set, and I now
> > > feel completely out of my depth with this series. I'm wondering whether
> > > I should even continue with it, since I don't have the ACPI knowledge
> > > to address a lot of these comments.
> >
> > Well, sorry about this.
> >
> > I met James at the LPC last year, so he seems to be still around, in
> > some way at least..
>
> On the previous posting, I wanted James to comment on some of the
> feedback from Jonathan, and despite explicitly asking, there has been
> nothing but radio silence ever since James' last post of this series.
>
> So, I now deem this work to be completely dead in the water, and not
> going to happen - not unless others can input on your comments.
>
I'll take another pass at this and see which comments I can resolve.
Will need a few additional test setups so may take a few days.

So far I've established that QEMU uses Processor for x86 and
ACPI0007 for arm64. Goody, at least that simplifies testing
the various options.

Jonathan


2024-01-11 18:00:03

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 02/21] ACPI: processor: Add support for processors described as container packages

On Mon, 18 Dec 2023 21:17:34 +0100
"Rafael J. Wysocki" <[email protected]> wrote:

> On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> >
> > From: James Morse <[email protected]>

Done some digging + machine faking. This is mid stage results at best.

Summary: I don't think this patch is necessary. If anyone happens to be in
the mood for testing on various platforms, can you drop this patch and
see if everything still works.

With this patch in place, and a processor container containing
Processor() objects acpi_process_add is called twice - once via
the path added here and once via acpi_bus_attach etc.

Maybe it's a left over from earlier approaches to some of this?


> >
> > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
> > 5.2.12:
> >
> > "Starting with ACPI Specification 6.3, the use of the Processor() object
> > was deprecated. Only legacy systems should continue with this usage. On
> > the Itanium architecture only, a _UID is provided for the Processor()
> > that is a string object. This usage of _UID is also deprecated since it
> > can preclude an OSPM from being able to match a processor to a
> > non-enumerable device, such as those defined in the MADT. From ACPI
> > Specification 6.3 onward, all processor objects for all architectures
> > except Itanium must now use Device() objects with an _HID of ACPI0007,
> > and use only integer _UID values."

Well, we definitely don't care about Itanium any more so most of this is irrelevant
and can be scrubbed going forwards!

Otherwise I think we only care about Device() and Processor() being two things
that might be seen to describe CPUs and they may or may not be in a
Processor container.

> >
> > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors
> >
> > Duplicate descriptions are not allowed, the ACPI processor driver already
> > parses the UID from both devices and containers. acpi_processor_get_info()
> > returns an error if the UID exists twice in the DSDT.
>
> I'm not really sure how the above is related to the actual patch.

This is nasty. They key is that with this patch in place, we are actually
adding them twice if they are are instantiated via Processor() in a processor
container. So this reference is explaining why we don't get two lots registered.

This patch should call out explicitly why we want to do it twice
(I'm assuming on a temporary baseis).

>
> > The missing probe for CPUs described as packages
>
> It is unclear what exactly is meant by "CPUs described as packages".
>
> From the patch, it looks like those would be Processor() objects
> defined under a processor container device.
Agreed.

>
> > creates a problem for
> > moving the cpu_register() calls into the acpi_processor driver, as CPUs
> > described like this don't get registered, leading to errors from other
> > subsystems when they try to add new sysfs entries to the CPU node.
> > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)
> >
> > To fix this, parse the processor container and call acpi_processor_add()
> > for each processor that is discovered like this.
>
> Discovered like what?
Doesn't add any info.

"To fix this, parse the processor container and call acpi_processor_add() for
each processor found."

>
> > The processor container
> > handler is added with acpi_scan_add_handler(), so no detach call will
> > arrive.
>
> The above requires clarification too.
>
> > Qemu TCG describes CPUs using processor devices in a processor container.

Hmm. This isn't so clear cut.

For ARM it does it nicely with ACPI0007 etc. For x86 it is still
Processor() under some circumstances... (why exactly doesn't matter here
- it's all legacy mess).

To poke this I hacked the arm virt qemu platform to use Processor() in a
container so I could like for like comparisons.

The logic that injects a HID into Processor() objects means the existing
handlers get fired without this patch. I'm going to assume that might
not be the case later in this patch set, but I've not found where it
is broken yet :(


> > For more information, see build_cpus_aml() in Qemu hw/acpi/cpu.c and
> > https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#processor-container-device
> >
> > Signed-off-by: James Morse <[email protected]>
> > Tested-by: Miguel Luis <[email protected]>
> > Tested-by: Vishnu Pajjuri <[email protected]>
> > Tested-by: Jianyong Wu <[email protected]>
> > ---
> > Outstanding comments:
> > https://lore.kernel.org/r/[email protected]
> > https://lore.kernel.org/r/[email protected]
> > ---
> > drivers/acpi/acpi_processor.c | 22 ++++++++++++++++++++++
> > 1 file changed, 22 insertions(+)
> >
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index 4fe2ef54088c..6a542e0ce396 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -626,9 +626,31 @@ static struct acpi_scan_handler processor_handler = {
> > },
> > };
> >
> > +static acpi_status acpi_processor_container_walk(acpi_handle handle,
> > + u32 lvl,
> > + void *context,
> > + void **rv)
> > +{
> > + struct acpi_device *adev;
> > + acpi_status status;
> > +
> > + adev = acpi_get_acpi_dev(handle);
> > + if (!adev)
> > + return AE_ERROR;
>
> Why is the reference counting needed here?
>
> Wouldn't acpi_fetch_acpi_dev() suffice?
You are the expert here :) I can't see why the reference is needed
so would be fine with dropping it.

>
> Also, should the walk really be terminated on the first error?

If this patch makes sense things will probably blow up later but no
worse than before so sure, keep going.

>
> > +
> > + status = acpi_processor_add(adev, &processor_device_ids[0]);
> > + acpi_put_acpi_dev(adev);
> > +
> > + return status;
> > +}
> > +
> > static int acpi_processor_container_attach(struct acpi_device *dev,
> > const struct acpi_device_id *id)
> > {
> > + acpi_walk_namespace(ACPI_TYPE_PROCESSOR, dev->handle,
> > + ACPI_UINT32_MAX, acpi_processor_container_walk,
> > + NULL, NULL, NULL);
>
> This covers processor objects only, so why is this not needed for
> processor devices defined under a processor container object?

Both cases are covered by the existing handling without this.

I'm far from clear on why we need this patch. Presumably
it's the reference in the description on it breaking for
Processor Package containing Processor() objects that matters
after a move... I'm struggling to find that move though!



>
> It is not obvious, so it would be nice to add a comment explaining the
> difference.
>
> > +
> > return 1;
> > }
> >
> > --
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


2024-01-11 18:47:15

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 02/21] ACPI: processor: Add support for processors described as container packages

On Thu, Jan 11, 2024 at 05:59:08PM +0000, Jonathan Cameron wrote:
> On Mon, 18 Dec 2023 21:17:34 +0100
> "Rafael J. Wysocki" <[email protected]> wrote:
>
> > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > >
> > > From: James Morse <[email protected]>
>
> Done some digging + machine faking. This is mid stage results at best.
>
> Summary: I don't think this patch is necessary. If anyone happens to be in
> the mood for testing on various platforms, can you drop this patch and
> see if everything still works.
>
> With this patch in place, and a processor container containing
> Processor() objects acpi_process_add is called twice - once via
> the path added here and once via acpi_bus_attach etc.
>
> Maybe it's a left over from earlier approaches to some of this?

From what you're saying, it seems that way. It would be really good to
get a reply from James to see whether he agrees - or at least get the
reason why this patch is in the series... but I suspect that will never
come.

> Both cases are covered by the existing handling without this.
>
> I'm far from clear on why we need this patch. Presumably
> it's the reference in the description on it breaking for
> Processor Package containing Processor() objects that matters
> after a move... I'm struggling to find that move though!

I do know that James did a lot of testing, so maybe he found some
corner case somewhere which made this necessary - but without input
from James, we can't know that.

So, maybe the right way forward on this is to re-test the series
with this patch dropped, and see whether there's any ill effects.
It should be possible to resurect the patch if it does turn out to
be necessary.

Does that sound like a good way forward?

Thanks.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-01-12 09:25:48

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 02/21] ACPI: processor: Add support for processors described as container packages

On Thu, 11 Jan 2024 18:46:47 +0000
"Russell King (Oracle)" <[email protected]> wrote:

> On Thu, Jan 11, 2024 at 05:59:08PM +0000, Jonathan Cameron wrote:
> > On Mon, 18 Dec 2023 21:17:34 +0100
> > "Rafael J. Wysocki" <[email protected]> wrote:
> >
> > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > >
> > > > From: James Morse <[email protected]>
> >
> > Done some digging + machine faking. This is mid stage results at best.
> >
> > Summary: I don't think this patch is necessary. If anyone happens to be in
> > the mood for testing on various platforms, can you drop this patch and
> > see if everything still works.
> >
> > With this patch in place, and a processor container containing
> > Processor() objects acpi_process_add is called twice - once via
> > the path added here and once via acpi_bus_attach etc.
> >
> > Maybe it's a left over from earlier approaches to some of this?
>
> From what you're saying, it seems that way. It would be really good to
> get a reply from James to see whether he agrees - or at least get the
> reason why this patch is in the series... but I suspect that will never
> come.
>
> > Both cases are covered by the existing handling without this.
> >
> > I'm far from clear on why we need this patch. Presumably
> > it's the reference in the description on it breaking for
> > Processor Package containing Processor() objects that matters
> > after a move... I'm struggling to find that move though!
>
> I do know that James did a lot of testing, so maybe he found some
> corner case somewhere which made this necessary - but without input
> from James, we can't know that.
>
> So, maybe the right way forward on this is to re-test the series
> with this patch dropped, and see whether there's any ill effects.
> It should be possible to resurect the patch if it does turn out to
> be necessary.
>
> Does that sound like a good way forward?
>
> Thanks.
>

Yes that sounds like the best plan. Note this patch can only make a
difference on non arm64 arches because it's a firmware bug to combine
Processor() with a GICC entry in APIC/MADT. To even test on ARM64
you have to skip the bug check.

https://elixir.bootlin.com/linux/latest/source/drivers/acpi/processor_core.c#L101

/* device_declaration means Device object in DSDT, in the
* GIC interrupt model, logical processors are required to
* have a Processor Device object in the DSDT, so we should
* check device_declaration here
*/
// if (device_declaration && (gicc->uid == acpi_id)) {
if (gicc->uid == acpi_id) {
*mpidr = gicc->arm_mpidr;
return 0;
}

Only alternative is probably to go history diving and try and
find another change that would have required this and is now gone.

The ACPI scanning code has had a lot of changes whilst this work has
been underway. More than possible that this was papering over some
issue that has long since been fixed. I can't find any deliberate
functional changes, but there is some code generalization that 'might'
have side effects in this area. Rafael, any expectation that anything
changed in how scanning processor containers works?

Jonathan



2024-01-12 11:52:25

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 01/21] ACPI: Only enumerate enabled (or functional) devices

On Thu, 11 Jan 2024 10:26:15 +0000
"Russell King (Oracle)" <[email protected]> wrote:

> On Thu, Jan 11, 2024 at 10:19:49AM +0000, Jonathan Cameron wrote:
> > On Tue, 2 Jan 2024 14:39:25 +0000
> > Jonathan Cameron <[email protected]> wrote:
> >
> > > On Fri, 15 Dec 2023 20:47:31 +0100
> > > "Rafael J. Wysocki" <[email protected]> wrote:
> > >
> > > > On Friday, December 15, 2023 5:15:39 PM CET Jonathan Cameron wrote:
> > > > > On Fri, 15 Dec 2023 15:31:55 +0000
> > > > > "Russell King (Oracle)" <[email protected]> wrote:
> > > > >
> > > > > > On Thu, Dec 14, 2023 at 07:37:10PM +0100, Rafael J. Wysocki wrote:
> > > > > > > On Thu, Dec 14, 2023 at 7:16 PM Rafael J. Wysocki <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Thu, Dec 14, 2023 at 7:10 PM Russell King (Oracle)
> > > > > > > > <[email protected]> wrote:
> > > > > > > > > I guess we need something like:
> > > > > > > > >
> > > > > > > > > if (device->status.present)
> > > > > > > > > return device->device_type != ACPI_BUS_TYPE_PROCESSOR ||
> > > > > > > > > device->status.enabled;
> > > > > > > > > else
> > > > > > > > > return device->status.functional;
> > > > > > > > >
> > > > > > > > > so we only check device->status.enabled for processor-type devices?
> > > > > > > >
> > > > > > > > Yes, something like this.
> > > > > > >
> > > > > > > However, that is not sufficient, because there are
> > > > > > > ACPI_BUS_TYPE_DEVICE devices representing processors.
> > > > > > >
> > > > > > > I'm not sure about a clean way to do it ATM.
> > > > > >
> > > > > > Ok, how about:
> > > > > >
> > > > > > static bool acpi_dev_is_processor(const struct acpi_device *device)
> > > > > > {
> > > > > > struct acpi_hardware_id *hwid;
> > > > > >
> > > > > > if (device->device_type == ACPI_BUS_TYPE_PROCESSOR)
> > > > > > return true;
> > > > > >
> > > > > > if (device->device_type != ACPI_BUS_TYPE_DEVICE)
> > > > > > return false;
> > > > > >
> > > > > > list_for_each_entry(hwid, &device->pnp.ids, list)
> > > > > > if (!strcmp(ACPI_PROCESSOR_OBJECT_HID, hwid->id) ||
> > > > > > !strcmp(ACPI_PROCESSOR_DEVICE_HID, hwid->id))
> > > > > > return true;
> > > > > >
> > > > > > return false;
> > > > > > }
> > > > > >
> > > > > > and then:
> > > > > >
> > > > > > if (device->status.present)
> > > > > > return !acpi_dev_is_processor(device) || device->status.enabled;
> > > > > > else
> > > > > > return device->status.functional;
> > > > > >
> > > > > > ?
> > > > > >
> > > > > Changing it to CPU only for now makes sense to me and I think this code snippet should do the
> > > > > job. Nice and simple.
> > > >
> > > > Well, except that it does checks that are done elsewhere slightly
> > > > differently, which from the maintenance POV is not nice.
> > > >
> > > > Maybe something like the appended patch (untested).
> > >
> > > Hi Rafael,
> > >
> > > As far as I can see that's functionally equivalent, so looks good to me.
> > > I'm not set up to test this today though, so will defer to Russell on whether
> > > there is anything missing
> > >
> > > Thanks for putting this together.
> >
> > This is rather embarrassing...
> >
> > I span this up on a QEMU instance with some prints to find out we need
> > the !acpi_device_is_processor() restriction.
> > On my 'random' test setup it fails on one device. ACPI0017 - which I
> > happen to know rather well. It's the weird pseudo device that lets
> > a CXL aware OS know there is a CEDT table to probe.
> >
> > Whilst I really don't like that hack (it is all about making software
> > distribution of out of tree modules easier rather than something
> > fundamental), I'm the CXL QEMU maintainer :(
> >
> > Will fix that, but it shows there is at least one broken firmware out
> > there.
> >
> > On plus side, Rafael's code seems to work as expected and lets that
> > buggy firwmare carry on working :) So lets pretend the bug in qemu
> > is a deliberate test case!
>
> Lol, thanks for a test case and showing that Rafael's approach is
> indeed necessary.
>
> Would your test quality for a tested-by for this? For reference, this
> is my current version below with Rafael's update:

Sure. This matches what I have.

Tested-by: Jonathan Cameron <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>


>
> 8<====
> From: Russell King (Oracle) <[email protected]>
> Subject: [PATCH] ACPI: Only enumerate enabled (or functional) processor
> devices
>
> From: James Morse <[email protected]>
>
> Today the ACPI enumeration code 'visits' all devices that are present.
>
> This is a problem for arm64, where CPUs are always present, but not
> always enabled. When a device-check occurs because the firmware-policy
> has changed and a CPU is now enabled, the following error occurs:
> | acpi ACPI0007:48: Enumeration failure
>
> This is ultimately because acpi_dev_ready_for_enumeration() returns
> true for a device that is not enabled. The ACPI Processor driver
> will not register such CPUs as they are not 'decoding their resources'.
>
> ACPI allows a device to be functional instead of maintaining the
> present and enabled bit, but we can't simply check the enabled bit
> for all devices since firmware can be buggy.
>
> If ACPI indicates that the device is present and enabled, then all well
> and good, we can enumate it. However, if the device is present and not
> enabled, then we also check whether the device is a processor device
> to limit the impact of this new check to just processor devices.
>
> This avoids enumerating present && functional processor devices that
> are not enabled.
>
> Signed-off-by: James Morse <[email protected]>
> Co-developed-by: Rafael J. Wysocki <[email protected]>
> Signed-off-by: Russell King (Oracle) <[email protected]>
> ---
> Changes since RFC v2:
> * Incorporate comment suggestion by Gavin Shan.
> Changes since RFC v3:
> * Fixed "sert" typo.
> Changes since RFC v3 (smaller series):
> * Restrict checking the enabled bit to processor devices, update
> commit comments.
> * Use Rafael's suggestion in
> https://lore.kernel.org/r/5760569.DvuYhMxLoT@kreacher
> ---
> drivers/acpi/acpi_processor.c | 11 ++++++++
> drivers/acpi/device_pm.c | 2 +-
> drivers/acpi/device_sysfs.c | 2 +-
> drivers/acpi/internal.h | 4 ++-
> drivers/acpi/property.c | 2 +-
> drivers/acpi/scan.c | 49 ++++++++++++++++++++++++++++-------
> 6 files changed, 56 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 4fe2ef54088c..cf7c1cca69dd 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -626,6 +626,17 @@ static struct acpi_scan_handler processor_handler = {
> },
> };
>
> +bool acpi_device_is_processor(const struct acpi_device *adev)
> +{
> + if (adev->device_type == ACPI_BUS_TYPE_PROCESSOR)
> + return true;
> +
> + if (adev->device_type != ACPI_BUS_TYPE_DEVICE)
> + return false;
> +
> + return acpi_scan_check_handler(adev, &processor_handler);
> +}
> +
> static int acpi_processor_container_attach(struct acpi_device *dev,
> const struct acpi_device_id *id)
> {
> diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> index 3b4d048c4941..e3c80f3b3b57 100644
> --- a/drivers/acpi/device_pm.c
> +++ b/drivers/acpi/device_pm.c
> @@ -313,7 +313,7 @@ int acpi_bus_init_power(struct acpi_device *device)
> return -EINVAL;
>
> device->power.state = ACPI_STATE_UNKNOWN;
> - if (!acpi_device_is_present(device)) {
> + if (!acpi_dev_ready_for_enumeration(device)) {
> device->flags.initialized = false;
> return -ENXIO;
> }
> diff --git a/drivers/acpi/device_sysfs.c b/drivers/acpi/device_sysfs.c
> index 23373faa35ec..a0256d2493a7 100644
> --- a/drivers/acpi/device_sysfs.c
> +++ b/drivers/acpi/device_sysfs.c
> @@ -141,7 +141,7 @@ static int create_pnp_modalias(const struct acpi_device *acpi_dev, char *modalia
> struct acpi_hardware_id *id;
>
> /* Avoid unnecessarily loading modules for non present devices. */
> - if (!acpi_device_is_present(acpi_dev))
> + if (!acpi_dev_ready_for_enumeration(acpi_dev))
> return 0;
>
> /*
> diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
> index 866c7c4ed233..9388d4c8674a 100644
> --- a/drivers/acpi/internal.h
> +++ b/drivers/acpi/internal.h
> @@ -62,6 +62,8 @@ void acpi_sysfs_add_hotplug_profile(struct acpi_hotplug_profile *hotplug,
> int acpi_scan_add_handler_with_hotplug(struct acpi_scan_handler *handler,
> const char *hotplug_profile_name);
> void acpi_scan_hotplug_enabled(struct acpi_hotplug_profile *hotplug, bool val);
> +bool acpi_scan_check_handler(const struct acpi_device *adev,
> + struct acpi_scan_handler *handler);
>
> #ifdef CONFIG_DEBUG_FS
> extern struct dentry *acpi_debugfs_dir;
> @@ -107,7 +109,6 @@ int acpi_device_setup_files(struct acpi_device *dev);
> void acpi_device_remove_files(struct acpi_device *dev);
> void acpi_device_add_finalize(struct acpi_device *device);
> void acpi_free_pnp_ids(struct acpi_device_pnp *pnp);
> -bool acpi_device_is_present(const struct acpi_device *adev);
> bool acpi_device_is_battery(struct acpi_device *adev);
> bool acpi_device_is_first_physical_node(struct acpi_device *adev,
> const struct device *dev);
> @@ -119,6 +120,7 @@ int acpi_bus_register_early_device(int type);
> const struct acpi_device *acpi_companion_match(const struct device *dev);
> int __acpi_device_uevent_modalias(const struct acpi_device *adev,
> struct kobj_uevent_env *env);
> +bool acpi_device_is_processor(const struct acpi_device *adev);
>
> /* --------------------------------------------------------------------------
> Power Resource
> diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c
> index 6979a3f9f90a..14d6948fd88a 100644
> --- a/drivers/acpi/property.c
> +++ b/drivers/acpi/property.c
> @@ -1420,7 +1420,7 @@ static bool acpi_fwnode_device_is_available(const struct fwnode_handle *fwnode)
> if (!is_acpi_device_node(fwnode))
> return false;
>
> - return acpi_device_is_present(to_acpi_device_node(fwnode));
> + return acpi_dev_ready_for_enumeration(to_acpi_device_node(fwnode));
> }
>
> static const void *
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 02bb2cce423f..f94d1f744bcc 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -304,7 +304,7 @@ static int acpi_scan_device_check(struct acpi_device *adev)
> int error;
>
> acpi_bus_get_status(adev);
> - if (acpi_device_is_present(adev)) {
> + if (acpi_dev_ready_for_enumeration(adev)) {
> /*
> * This function is only called for device objects for which
> * matching scan handlers exist. The only situation in which
> @@ -338,7 +338,7 @@ static int acpi_scan_bus_check(struct acpi_device *adev, void *not_used)
> int error;
>
> acpi_bus_get_status(adev);
> - if (!acpi_device_is_present(adev)) {
> + if (!acpi_dev_ready_for_enumeration(adev)) {
> acpi_scan_device_not_enumerated(adev);
> return 0;
> }
> @@ -1913,11 +1913,6 @@ static bool acpi_device_should_be_hidden(acpi_handle handle)
> return true;
> }
>
> -bool acpi_device_is_present(const struct acpi_device *adev)
> -{
> - return adev->status.present || adev->status.functional;
> -}
> -
> static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
> const char *idstr,
> const struct acpi_device_id **matchid)
> @@ -1938,6 +1933,18 @@ static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
> return false;
> }
>
> +bool acpi_scan_check_handler(const struct acpi_device *adev,
> + struct acpi_scan_handler *handler)
> +{
> + struct acpi_hardware_id *hwid;
> +
> + list_for_each_entry(hwid, &adev->pnp.ids, list)
> + if (acpi_scan_handler_matching(handler, hwid->id, NULL))
> + return true;
> +
> + return false;
> +}
> +
> static struct acpi_scan_handler *acpi_scan_match_handler(const char *idstr,
> const struct acpi_device_id **matchid)
> {
> @@ -2381,16 +2388,38 @@ EXPORT_SYMBOL_GPL(acpi_dev_clear_dependencies);
> * acpi_dev_ready_for_enumeration - Check if the ACPI device is ready for enumeration
> * @device: Pointer to the &struct acpi_device to check
> *
> - * Check if the device is present and has no unmet dependencies.
> + * Check if the device is functional or enabled and has no unmet dependencies.
> *
> - * Return true if the device is ready for enumeratino. Otherwise, return false.
> + * Return true if the device is ready for enumeration. Otherwise, return false.
> */
> bool acpi_dev_ready_for_enumeration(const struct acpi_device *device)
> {
> if (device->flags.honor_deps && device->dep_unmet)
> return false;
>
> - return acpi_device_is_present(device);
> + /*
> + * ACPI 6.5's 6.3.7 "_STA (Device Status)" allows firmware to return
> + * (!present && functional) for certain types of devices that should be
> + * enumerated. Note that the enabled bit should not be set unless the
> + * present bit is set.
> + *
> + * However, limit this only to processor devices to reduce possible
> + * regressions with firmware.
> + */
> + if (device->status.functional)
> + return true;
> +
> + if (!device->status.present)
> + return false;
> +
> + /*
> + * Fast path - if enabled is set, avoid the more expensive test to
> + * check whether this device is a processor.
> + */
> + if (device->status.enabled)
> + return true;
> +
> + return !acpi_device_is_processor(device);
> }
> EXPORT_SYMBOL_GPL(acpi_dev_ready_for_enumeration);
>


2024-01-12 15:02:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 02/21] ACPI: processor: Add support for processors described as container packages

On Fri, Jan 12, 2024 at 10:25 AM Jonathan Cameron
<[email protected]> wrote:
>
> On Thu, 11 Jan 2024 18:46:47 +0000
> "Russell King (Oracle)" <[email protected]> wrote:
>
> > On Thu, Jan 11, 2024 at 05:59:08PM +0000, Jonathan Cameron wrote:
> > > On Mon, 18 Dec 2023 21:17:34 +0100
> > > "Rafael J. Wysocki" <[email protected]> wrote:
> > >
> > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > > >
> > > > > From: James Morse <[email protected]>
> > >
> > > Done some digging + machine faking. This is mid stage results at best.
> > >
> > > Summary: I don't think this patch is necessary. If anyone happens to be in
> > > the mood for testing on various platforms, can you drop this patch and
> > > see if everything still works.
> > >
> > > With this patch in place, and a processor container containing
> > > Processor() objects acpi_process_add is called twice - once via
> > > the path added here and once via acpi_bus_attach etc.
> > >
> > > Maybe it's a left over from earlier approaches to some of this?
> >
> > From what you're saying, it seems that way. It would be really good to
> > get a reply from James to see whether he agrees - or at least get the
> > reason why this patch is in the series... but I suspect that will never
> > come.
> >
> > > Both cases are covered by the existing handling without this.
> > >
> > > I'm far from clear on why we need this patch. Presumably
> > > it's the reference in the description on it breaking for
> > > Processor Package containing Processor() objects that matters
> > > after a move... I'm struggling to find that move though!
> >
> > I do know that James did a lot of testing, so maybe he found some
> > corner case somewhere which made this necessary - but without input
> > from James, we can't know that.
> >
> > So, maybe the right way forward on this is to re-test the series
> > with this patch dropped, and see whether there's any ill effects.
> > It should be possible to resurect the patch if it does turn out to
> > be necessary.
> >
> > Does that sound like a good way forward?
> >
> > Thanks.
> >
>
> Yes that sounds like the best plan. Note this patch can only make a
> difference on non arm64 arches because it's a firmware bug to combine
> Processor() with a GICC entry in APIC/MADT. To even test on ARM64
> you have to skip the bug check.
>
> https://elixir.bootlin.com/linux/latest/source/drivers/acpi/processor_core.c#L101
>
> /* device_declaration means Device object in DSDT, in the
> * GIC interrupt model, logical processors are required to
> * have a Processor Device object in the DSDT, so we should
> * check device_declaration here
> */
> // if (device_declaration && (gicc->uid == acpi_id)) {
> if (gicc->uid == acpi_id) {
> *mpidr = gicc->arm_mpidr;
> return 0;
> }
>
> Only alternative is probably to go history diving and try and
> find another change that would have required this and is now gone.
>
> The ACPI scanning code has had a lot of changes whilst this work has
> been underway. More than possible that this was papering over some
> issue that has long since been fixed. I can't find any deliberate
> functional changes, but there is some code generalization that 'might'
> have side effects in this area. Rafael, any expectation that anything
> changed in how scanning processor containers works?

There have been changes, but I can't recall when exactly without some
git history research.

In any case, it is always better to work on top of the current
mainline code IMO.

2024-01-12 15:04:18

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 02/21] ACPI: processor: Add support for processors described as container packages

On Fri, 12 Jan 2024 16:01:40 +0100
"Rafael J. Wysocki" <[email protected]> wrote:

> On Fri, Jan 12, 2024 at 10:25 AM Jonathan Cameron
> <[email protected]> wrote:
> >
> > On Thu, 11 Jan 2024 18:46:47 +0000
> > "Russell King (Oracle)" <[email protected]> wrote:
> >
> > > On Thu, Jan 11, 2024 at 05:59:08PM +0000, Jonathan Cameron wrote:
> > > > On Mon, 18 Dec 2023 21:17:34 +0100
> > > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > >
> > > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > > > >
> > > > > > From: James Morse <[email protected]>
> > > >
> > > > Done some digging + machine faking. This is mid stage results at best.
> > > >
> > > > Summary: I don't think this patch is necessary. If anyone happens to be in
> > > > the mood for testing on various platforms, can you drop this patch and
> > > > see if everything still works.
> > > >
> > > > With this patch in place, and a processor container containing
> > > > Processor() objects acpi_process_add is called twice - once via
> > > > the path added here and once via acpi_bus_attach etc.
> > > >
> > > > Maybe it's a left over from earlier approaches to some of this?
> > >
> > > From what you're saying, it seems that way. It would be really good to
> > > get a reply from James to see whether he agrees - or at least get the
> > > reason why this patch is in the series... but I suspect that will never
> > > come.
> > >
> > > > Both cases are covered by the existing handling without this.
> > > >
> > > > I'm far from clear on why we need this patch. Presumably
> > > > it's the reference in the description on it breaking for
> > > > Processor Package containing Processor() objects that matters
> > > > after a move... I'm struggling to find that move though!
> > >
> > > I do know that James did a lot of testing, so maybe he found some
> > > corner case somewhere which made this necessary - but without input
> > > from James, we can't know that.
> > >
> > > So, maybe the right way forward on this is to re-test the series
> > > with this patch dropped, and see whether there's any ill effects.
> > > It should be possible to resurect the patch if it does turn out to
> > > be necessary.
> > >
> > > Does that sound like a good way forward?
> > >
> > > Thanks.
> > >
> >
> > Yes that sounds like the best plan. Note this patch can only make a
> > difference on non arm64 arches because it's a firmware bug to combine
> > Processor() with a GICC entry in APIC/MADT. To even test on ARM64
> > you have to skip the bug check.
> >
> > https://elixir.bootlin.com/linux/latest/source/drivers/acpi/processor_core.c#L101
> >
> > /* device_declaration means Device object in DSDT, in the
> > * GIC interrupt model, logical processors are required to
> > * have a Processor Device object in the DSDT, so we should
> > * check device_declaration here
> > */
> > // if (device_declaration && (gicc->uid == acpi_id)) {
> > if (gicc->uid == acpi_id) {
> > *mpidr = gicc->arm_mpidr;
> > return 0;
> > }
> >
> > Only alternative is probably to go history diving and try and
> > find another change that would have required this and is now gone.
> >
> > The ACPI scanning code has had a lot of changes whilst this work has
> > been underway. More than possible that this was papering over some
> > issue that has long since been fixed. I can't find any deliberate
> > functional changes, but there is some code generalization that 'might'
> > have side effects in this area. Rafael, any expectation that anything
> > changed in how scanning processor containers works?
>
> There have been changes, but I can't recall when exactly without some
> git history research.
>
> In any case, it is always better to work on top of the current
> mainline code IMO.

Absolutely - just in this case the series has been rebased for
a few years because the standards discussions took far far too long!

Jonathan



2024-01-15 10:47:41

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 02/21] ACPI: processor: Add support for processors described as container packages

On Fri, Jan 12, 2024 at 04:01:40PM +0100, Rafael J. Wysocki wrote:
> In any case, it is always better to work on top of the current
> mainline code IMO.

That's fine if one is starting to do some work now, but that is not the
case with this. The first posting was almost a year ago:

https://lwn.net/Articles/922127/

which likely means that it's been around for at 18 months or more, and
we can also see that this patch was in that original patch set. What
the history of this patch is before the first posting... only James
would be able to answer that and I feel that we're highly unlikely to
get any kind of response.

Anyway, consider this patch dropped from the series.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-01-15 11:07:00

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 03/21] ACPI: processor: Register CPUs that are online, but not described in the DSDT

On Mon, Dec 18, 2023 at 09:22:03PM +0100, Rafael J. Wysocki wrote:
> On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> >
> > From: James Morse <[email protected]>
> >
> > ACPI has two descriptions of CPUs, one in the MADT/APIC table, the other
> > in the DSDT. Both are required. (ACPI 6.5's 8.4 "Declaring Processors"
> > says "Each processor in the system must be declared in the ACPI
> > namespace"). Having two descriptions allows firmware authors to get
> > this wrong.
> >
> > If CPUs are described in the MADT/APIC, they will be brought online
> > early during boot. Once the register_cpu() calls are moved to ACPI,
> > they will be based on the DSDT description of the CPUs. When CPUs are
> > missing from the DSDT description, they will end up online, but not
> > registered.
> >
> > Add a helper that runs after acpi_init() has completed to register
> > CPUs that are online, but weren't found in the DSDT. Any CPU that
> > is registered by this code triggers a firmware-bug warning and kernel
> > taint.
> >
> > Qemu TCG only describes the first CPU in the DSDT, unless cpu-hotplug
> > is configured.
>
> So why is this a kernel problem?

So what are you proposing should be the behaviour here? What this
statement seems to be saying is that QEMU as it exists today only
describes the first CPU in DSDT.

As this patch series changes when arch_register_cpu() gets called (as
described in the paragraph above) we obviously need to preserve the
_existing_ behaviour to avoid causing regressions. So, if changing the
kernel causes user visible regressions (e.g. sysfs entries to
disappear) then it obviously _is_ a kernel problem that needs to be
solved.

We can't say "well fix QEMU then" without invoking the wrath of Linus.

> > Signed-off-by: James Morse <[email protected]>
> > Reviewed-by: Jonathan Cameron <[email protected]>
> > Reviewed-by: Gavin Shan <[email protected]>
> > Tested-by: Miguel Luis <[email protected]>
> > Tested-by: Vishnu Pajjuri <[email protected]>
> > Tested-by: Jianyong Wu <[email protected]>
> > Signed-off-by: Russell King (Oracle) <[email protected]>
> > ---
> > drivers/acpi/acpi_processor.c | 19 +++++++++++++++++++
> > 1 file changed, 19 insertions(+)
> >
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index 6a542e0ce396..0511f2bc10bc 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -791,6 +791,25 @@ void __init acpi_processor_init(void)
> > acpi_pcc_cpufreq_init();
> > }
> >
> > +static int __init acpi_processor_register_missing_cpus(void)
> > +{
> > + int cpu;
> > +
> > + if (acpi_disabled)
> > + return 0;
> > +
> > + for_each_online_cpu(cpu) {
> > + if (!get_cpu_device(cpu)) {
> > + pr_err_once(FW_BUG "CPU %u has no ACPI namespace description!\n", cpu);
> > + add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
> > + arch_register_cpu(cpu);
>
> Which part of this code is related to ACPI?

That's a good question, and I suspect it would be more suited to being
placed in drivers/base/cpu.c except for the problem that the error
message refers to ACPI.

As long as we keep the acpi_disabled test, I guess that's fine.
cpu_dev_register_generic() there already tests acpi_disabled.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-01-22 07:36:51

by Gavin Shan

[permalink] [raw]
Subject: Re: [PATCH RFC v3 01/21] ACPI: Only enumerate enabled (or functional) devices

On 1/11/24 20:26, Russell King (Oracle) wrote:
> On Thu, Jan 11, 2024 at 10:19:49AM +0000, Jonathan Cameron wrote:
>> On Tue, 2 Jan 2024 14:39:25 +0000
>> Jonathan Cameron <[email protected]> wrote:
>>
>>> On Fri, 15 Dec 2023 20:47:31 +0100
>>> "Rafael J. Wysocki" <[email protected]> wrote:
>>>
>>>> On Friday, December 15, 2023 5:15:39 PM CET Jonathan Cameron wrote:
>>>>> On Fri, 15 Dec 2023 15:31:55 +0000
>>>>> "Russell King (Oracle)" <[email protected]> wrote:
>>>>>
>>>>>> On Thu, Dec 14, 2023 at 07:37:10PM +0100, Rafael J. Wysocki wrote:
>>>>>>> On Thu, Dec 14, 2023 at 7:16 PM Rafael J. Wysocki <[email protected]> wrote:
>>>>>>>>
>>>>>>>> On Thu, Dec 14, 2023 at 7:10 PM Russell King (Oracle)
>>>>>>>> <[email protected]> wrote:
>>>>>>>>> I guess we need something like:
>>>>>>>>>
>>>>>>>>> if (device->status.present)
>>>>>>>>> return device->device_type != ACPI_BUS_TYPE_PROCESSOR ||
>>>>>>>>> device->status.enabled;
>>>>>>>>> else
>>>>>>>>> return device->status.functional;
>>>>>>>>>
>>>>>>>>> so we only check device->status.enabled for processor-type devices?
>>>>>>>>
>>>>>>>> Yes, something like this.
>>>>>>>
>>>>>>> However, that is not sufficient, because there are
>>>>>>> ACPI_BUS_TYPE_DEVICE devices representing processors.
>>>>>>>
>>>>>>> I'm not sure about a clean way to do it ATM.
>>>>>>
>>>>>> Ok, how about:
>>>>>>
>>>>>> static bool acpi_dev_is_processor(const struct acpi_device *device)
>>>>>> {
>>>>>> struct acpi_hardware_id *hwid;
>>>>>>
>>>>>> if (device->device_type == ACPI_BUS_TYPE_PROCESSOR)
>>>>>> return true;
>>>>>>
>>>>>> if (device->device_type != ACPI_BUS_TYPE_DEVICE)
>>>>>> return false;
>>>>>>
>>>>>> list_for_each_entry(hwid, &device->pnp.ids, list)
>>>>>> if (!strcmp(ACPI_PROCESSOR_OBJECT_HID, hwid->id) ||
>>>>>> !strcmp(ACPI_PROCESSOR_DEVICE_HID, hwid->id))
>>>>>> return true;
>>>>>>
>>>>>> return false;
>>>>>> }
>>>>>>
>>>>>> and then:
>>>>>>
>>>>>> if (device->status.present)
>>>>>> return !acpi_dev_is_processor(device) || device->status.enabled;
>>>>>> else
>>>>>> return device->status.functional;
>>>>>>
>>>>>> ?
>>>>>>
>>>>> Changing it to CPU only for now makes sense to me and I think this code snippet should do the
>>>>> job. Nice and simple.
>>>>
>>>> Well, except that it does checks that are done elsewhere slightly
>>>> differently, which from the maintenance POV is not nice.
>>>>
>>>> Maybe something like the appended patch (untested).
>>>
>>> Hi Rafael,
>>>
>>> As far as I can see that's functionally equivalent, so looks good to me.
>>> I'm not set up to test this today though, so will defer to Russell on whether
>>> there is anything missing
>>>
>>> Thanks for putting this together.
>>
>> This is rather embarrassing...
>>
>> I span this up on a QEMU instance with some prints to find out we need
>> the !acpi_device_is_processor() restriction.
>> On my 'random' test setup it fails on one device. ACPI0017 - which I
>> happen to know rather well. It's the weird pseudo device that lets
>> a CXL aware OS know there is a CEDT table to probe.
>>
>> Whilst I really don't like that hack (it is all about making software
>> distribution of out of tree modules easier rather than something
>> fundamental), I'm the CXL QEMU maintainer :(
>>
>> Will fix that, but it shows there is at least one broken firmware out
>> there.
>>
>> On plus side, Rafael's code seems to work as expected and lets that
>> buggy firwmare carry on working :) So lets pretend the bug in qemu
>> is a deliberate test case!
>
> Lol, thanks for a test case and showing that Rafael's approach is
> indeed necessary.
>
> Would your test quality for a tested-by for this? For reference, this
> is my current version below with Rafael's update:
>
> 8<====
> From: Russell King (Oracle) <[email protected]>
> Subject: [PATCH] ACPI: Only enumerate enabled (or functional) processor
> devices
>
> From: James Morse <[email protected]>
>
> Today the ACPI enumeration code 'visits' all devices that are present.
>
> This is a problem for arm64, where CPUs are always present, but not
> always enabled. When a device-check occurs because the firmware-policy
> has changed and a CPU is now enabled, the following error occurs:
> | acpi ACPI0007:48: Enumeration failure
>
> This is ultimately because acpi_dev_ready_for_enumeration() returns
> true for a device that is not enabled. The ACPI Processor driver
> will not register such CPUs as they are not 'decoding their resources'.
>
> ACPI allows a device to be functional instead of maintaining the
> present and enabled bit, but we can't simply check the enabled bit
> for all devices since firmware can be buggy.
>
> If ACPI indicates that the device is present and enabled, then all well
> and good, we can enumate it. However, if the device is present and not
> enabled, then we also check whether the device is a processor device
> to limit the impact of this new check to just processor devices.
>
> This avoids enumerating present && functional processor devices that
> are not enabled.
>
> Signed-off-by: James Morse <[email protected]>
> Co-developed-by: Rafael J. Wysocki <[email protected]>
> Signed-off-by: Russell King (Oracle) <[email protected]>
> ---
> Changes since RFC v2:
> * Incorporate comment suggestion by Gavin Shan.
> Changes since RFC v3:
> * Fixed "sert" typo.
> Changes since RFC v3 (smaller series):
> * Restrict checking the enabled bit to processor devices, update
> commit comments.
> * Use Rafael's suggestion in
> https://lore.kernel.org/r/5760569.DvuYhMxLoT@kreacher
> ---
> drivers/acpi/acpi_processor.c | 11 ++++++++
> drivers/acpi/device_pm.c | 2 +-
> drivers/acpi/device_sysfs.c | 2 +-
> drivers/acpi/internal.h | 4 ++-
> drivers/acpi/property.c | 2 +-
> drivers/acpi/scan.c | 49 ++++++++++++++++++++++++++++-------
> 6 files changed, 56 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 4fe2ef54088c..cf7c1cca69dd 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -626,6 +626,17 @@ static struct acpi_scan_handler processor_handler = {
> },
> };
>
> +bool acpi_device_is_processor(const struct acpi_device *adev)
> +{
> + if (adev->device_type == ACPI_BUS_TYPE_PROCESSOR)
> + return true;
> +
> + if (adev->device_type != ACPI_BUS_TYPE_DEVICE)
> + return false;
> +
> + return acpi_scan_check_handler(adev, &processor_handler);
> +}
> +
> static int acpi_processor_container_attach(struct acpi_device *dev,
> const struct acpi_device_id *id)
> {
> diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> index 3b4d048c4941..e3c80f3b3b57 100644
> --- a/drivers/acpi/device_pm.c
> +++ b/drivers/acpi/device_pm.c
> @@ -313,7 +313,7 @@ int acpi_bus_init_power(struct acpi_device *device)
> return -EINVAL;
>
> device->power.state = ACPI_STATE_UNKNOWN;
> - if (!acpi_device_is_present(device)) {
> + if (!acpi_dev_ready_for_enumeration(device)) {
> device->flags.initialized = false;
> return -ENXIO;
> }
> diff --git a/drivers/acpi/device_sysfs.c b/drivers/acpi/device_sysfs.c
> index 23373faa35ec..a0256d2493a7 100644
> --- a/drivers/acpi/device_sysfs.c
> +++ b/drivers/acpi/device_sysfs.c
> @@ -141,7 +141,7 @@ static int create_pnp_modalias(const struct acpi_device *acpi_dev, char *modalia
> struct acpi_hardware_id *id;
>
> /* Avoid unnecessarily loading modules for non present devices. */
> - if (!acpi_device_is_present(acpi_dev))
> + if (!acpi_dev_ready_for_enumeration(acpi_dev))
> return 0;
>
> /*
> diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
> index 866c7c4ed233..9388d4c8674a 100644
> --- a/drivers/acpi/internal.h
> +++ b/drivers/acpi/internal.h
> @@ -62,6 +62,8 @@ void acpi_sysfs_add_hotplug_profile(struct acpi_hotplug_profile *hotplug,
> int acpi_scan_add_handler_with_hotplug(struct acpi_scan_handler *handler,
> const char *hotplug_profile_name);
> void acpi_scan_hotplug_enabled(struct acpi_hotplug_profile *hotplug, bool val);
> +bool acpi_scan_check_handler(const struct acpi_device *adev,
> + struct acpi_scan_handler *handler);
>
> #ifdef CONFIG_DEBUG_FS
> extern struct dentry *acpi_debugfs_dir;
> @@ -107,7 +109,6 @@ int acpi_device_setup_files(struct acpi_device *dev);
> void acpi_device_remove_files(struct acpi_device *dev);
> void acpi_device_add_finalize(struct acpi_device *device);
> void acpi_free_pnp_ids(struct acpi_device_pnp *pnp);
> -bool acpi_device_is_present(const struct acpi_device *adev);
> bool acpi_device_is_battery(struct acpi_device *adev);
> bool acpi_device_is_first_physical_node(struct acpi_device *adev,
> const struct device *dev);
> @@ -119,6 +120,7 @@ int acpi_bus_register_early_device(int type);
> const struct acpi_device *acpi_companion_match(const struct device *dev);
> int __acpi_device_uevent_modalias(const struct acpi_device *adev,
> struct kobj_uevent_env *env);
> +bool acpi_device_is_processor(const struct acpi_device *adev);
>
> /* --------------------------------------------------------------------------
> Power Resource
> diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c
> index 6979a3f9f90a..14d6948fd88a 100644
> --- a/drivers/acpi/property.c
> +++ b/drivers/acpi/property.c
> @@ -1420,7 +1420,7 @@ static bool acpi_fwnode_device_is_available(const struct fwnode_handle *fwnode)
> if (!is_acpi_device_node(fwnode))
> return false;
>
> - return acpi_device_is_present(to_acpi_device_node(fwnode));
> + return acpi_dev_ready_for_enumeration(to_acpi_device_node(fwnode));
> }
>
> static const void *
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 02bb2cce423f..f94d1f744bcc 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -304,7 +304,7 @@ static int acpi_scan_device_check(struct acpi_device *adev)
> int error;
>
> acpi_bus_get_status(adev);
> - if (acpi_device_is_present(adev)) {
> + if (acpi_dev_ready_for_enumeration(adev)) {
> /*
> * This function is only called for device objects for which
> * matching scan handlers exist. The only situation in which
> @@ -338,7 +338,7 @@ static int acpi_scan_bus_check(struct acpi_device *adev, void *not_used)
> int error;
>
> acpi_bus_get_status(adev);
> - if (!acpi_device_is_present(adev)) {
> + if (!acpi_dev_ready_for_enumeration(adev)) {
> acpi_scan_device_not_enumerated(adev);
> return 0;
> }
> @@ -1913,11 +1913,6 @@ static bool acpi_device_should_be_hidden(acpi_handle handle)
> return true;
> }
>
> -bool acpi_device_is_present(const struct acpi_device *adev)
> -{
> - return adev->status.present || adev->status.functional;
> -}
> -
> static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
> const char *idstr,
> const struct acpi_device_id **matchid)
> @@ -1938,6 +1933,18 @@ static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
> return false;
> }
>
> +bool acpi_scan_check_handler(const struct acpi_device *adev,
> + struct acpi_scan_handler *handler)
> +{
> + struct acpi_hardware_id *hwid;
> +
> + list_for_each_entry(hwid, &adev->pnp.ids, list)
> + if (acpi_scan_handler_matching(handler, hwid->id, NULL))
> + return true;
> +
> + return false;
> +}
> +
> static struct acpi_scan_handler *acpi_scan_match_handler(const char *idstr,
> const struct acpi_device_id **matchid)
> {
> @@ -2381,16 +2388,38 @@ EXPORT_SYMBOL_GPL(acpi_dev_clear_dependencies);
> * acpi_dev_ready_for_enumeration - Check if the ACPI device is ready for enumeration
> * @device: Pointer to the &struct acpi_device to check
> *
> - * Check if the device is present and has no unmet dependencies.
> + * Check if the device is functional or enabled and has no unmet dependencies.
> *
> - * Return true if the device is ready for enumeratino. Otherwise, return false.
> + * Return true if the device is ready for enumeration. Otherwise, return false.
> */
> bool acpi_dev_ready_for_enumeration(const struct acpi_device *device)
> {
> if (device->flags.honor_deps && device->dep_unmet)
> return false;
>
> - return acpi_device_is_present(device);
> + /*
> + * ACPI 6.5's 6.3.7 "_STA (Device Status)" allows firmware to return
> + * (!present && functional) for certain types of devices that should be
> + * enumerated. Note that the enabled bit should not be set unless the
> + * present bit is set.
> + *
> + * However, limit this only to processor devices to reduce possible
> + * regressions with firmware.
> + */
> + if (device->status.functional)
> + return true;
> +
> + if (!device->status.present)
> + return false;
> +
> + /*
> + * Fast path - if enabled is set, avoid the more expensive test to
> + * check whether this device is a processor.
> + */
> + if (device->status.enabled)
> + return true;
> +

It may be worthy to replace 'if enabled is set' with 'if the enabled bit is set',
to be consistent with the terminologies used in the above comments.

Apart from it, the patch itself looks good to me.

> + return !acpi_device_is_processor(device);
> }
> EXPORT_SYMBOL_GPL(acpi_dev_ready_for_enumeration);
>

Thanks,
Gavin


2024-01-22 17:13:47

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 03/21] ACPI: processor: Register CPUs that are online, but not described in the DSDT

On Mon, 15 Jan 2024 11:06:29 +0000
"Russell King (Oracle)" <[email protected]> wrote:

> On Mon, Dec 18, 2023 at 09:22:03PM +0100, Rafael J. Wysocki wrote:
> > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > >
> > > From: James Morse <[email protected]>
> > >
> > > ACPI has two descriptions of CPUs, one in the MADT/APIC table, the other
> > > in the DSDT. Both are required. (ACPI 6.5's 8.4 "Declaring Processors"
> > > says "Each processor in the system must be declared in the ACPI
> > > namespace"). Having two descriptions allows firmware authors to get
> > > this wrong.
> > >
> > > If CPUs are described in the MADT/APIC, they will be brought online
> > > early during boot. Once the register_cpu() calls are moved to ACPI,
> > > they will be based on the DSDT description of the CPUs. When CPUs are
> > > missing from the DSDT description, they will end up online, but not
> > > registered.
> > >
> > > Add a helper that runs after acpi_init() has completed to register
> > > CPUs that are online, but weren't found in the DSDT. Any CPU that
> > > is registered by this code triggers a firmware-bug warning and kernel
> > > taint.
> > >
> > > Qemu TCG only describes the first CPU in the DSDT, unless cpu-hotplug
> > > is configured.
> >
> > So why is this a kernel problem?
>
> So what are you proposing should be the behaviour here? What this
> statement seems to be saying is that QEMU as it exists today only
> describes the first CPU in DSDT.

This confuses me somewhat, because I'm far from sure which machines this
is true for in QEMU. I'm guessing it's a legacy thing with
some old distro version of QEMU - so we'll have to paper over it anyway
but for current QEMU I'm not sure it's true.

Helpfully there are a bunch of ACPI table tests so I've been checking
through all the multi CPU cases.

CPU hotplug not enabled.
pc/DSDT.dimmpxm - 4x Processor entries. -smp 4
pc/DSDT.acpihmat - 2x Processor entries. -smp 2
q35/DSDT.acpihmat - 2x Processor entries. -smp 2
virt/DSDT.acpihmatvirt - 4x ACPI0007 entries -smp 4
q35/DSDT.acpihmat-noinitiator - 4 x Processor () entries -smp 4
virt/DSDT.topology - 8x ACPI0007 entries

I've also looked at the code and we have various types of
CPU hotplug on x86 but they all build appropriate numbers of
Processor() entries in DSDT.
Arm likewise seems to build the right number of ACPI0007 entries
(and doesn't yet have CPU HP support).

If anyone can add a reference on why this is needed that would be very
helpful.

>
> As this patch series changes when arch_register_cpu() gets called (as
> described in the paragraph above) we obviously need to preserve the
> _existing_ behaviour to avoid causing regressions. So, if changing the
> kernel causes user visible regressions (e.g. sysfs entries to
> disappear) then it obviously _is_ a kernel problem that needs to be
> solved.
>
> We can't say "well fix QEMU then" without invoking the wrath of Linus.

Overall I'm fine with the defensive nature of this patch as there
'might' be firmware out there with this problem - I just can't establish
that there is! If anyone else recalls the history of this then give
a shout. I vaguely wondered if this was an ia64 thing but nope, QEMU
never generated tables for ia64 before dropping support back in QEMU 2.11


>
> > > Signed-off-by: James Morse <[email protected]>
> > > Reviewed-by: Jonathan Cameron <[email protected]>
> > > Reviewed-by: Gavin Shan <[email protected]>
> > > Tested-by: Miguel Luis <[email protected]>
> > > Tested-by: Vishnu Pajjuri <[email protected]>
> > > Tested-by: Jianyong Wu <[email protected]>
> > > Signed-off-by: Russell King (Oracle) <[email protected]>
> > > ---
> > > drivers/acpi/acpi_processor.c | 19 +++++++++++++++++++
> > > 1 file changed, 19 insertions(+)
> > >
> > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > > index 6a542e0ce396..0511f2bc10bc 100644
> > > --- a/drivers/acpi/acpi_processor.c
> > > +++ b/drivers/acpi/acpi_processor.c
> > > @@ -791,6 +791,25 @@ void __init acpi_processor_init(void)
> > > acpi_pcc_cpufreq_init();
> > > }
> > >
> > > +static int __init acpi_processor_register_missing_cpus(void)
> > > +{
> > > + int cpu;
> > > +
> > > + if (acpi_disabled)
> > > + return 0;
> > > +
> > > + for_each_online_cpu(cpu) {
> > > + if (!get_cpu_device(cpu)) {
> > > + pr_err_once(FW_BUG "CPU %u has no ACPI namespace description!\n", cpu);
> > > + add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
> > > + arch_register_cpu(cpu);
> >
> > Which part of this code is related to ACPI?
>
> That's a good question, and I suspect it would be more suited to being
> placed in drivers/base/cpu.c except for the problem that the error
> message refers to ACPI.
>
> As long as we keep the acpi_disabled test, I guess that's fine.
> cpu_dev_register_generic() there already tests acpi_disabled.
>
Moving it seems fine to me.

Jonathan


2024-01-22 17:21:43

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 03/21] ACPI: processor: Register CPUs that are online, but not described in the DSDT

On Mon, Jan 22, 2024 at 5:02 PM Jonathan Cameron
<[email protected]> wrote:
>
> On Mon, 15 Jan 2024 11:06:29 +0000
> "Russell King (Oracle)" <[email protected]> wrote:
>
> > On Mon, Dec 18, 2023 at 09:22:03PM +0100, Rafael J. Wysocki wrote:
> > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > >
> > > > From: James Morse <[email protected]>
> > > >
> > > > ACPI has two descriptions of CPUs, one in the MADT/APIC table, the other
> > > > in the DSDT. Both are required. (ACPI 6.5's 8.4 "Declaring Processors"
> > > > says "Each processor in the system must be declared in the ACPI
> > > > namespace"). Having two descriptions allows firmware authors to get
> > > > this wrong.
> > > >
> > > > If CPUs are described in the MADT/APIC, they will be brought online
> > > > early during boot. Once the register_cpu() calls are moved to ACPI,
> > > > they will be based on the DSDT description of the CPUs. When CPUs are
> > > > missing from the DSDT description, they will end up online, but not
> > > > registered.
> > > >
> > > > Add a helper that runs after acpi_init() has completed to register
> > > > CPUs that are online, but weren't found in the DSDT. Any CPU that
> > > > is registered by this code triggers a firmware-bug warning and kernel
> > > > taint.
> > > >
> > > > Qemu TCG only describes the first CPU in the DSDT, unless cpu-hotplug
> > > > is configured.
> > >
> > > So why is this a kernel problem?
> >
> > So what are you proposing should be the behaviour here? What this
> > statement seems to be saying is that QEMU as it exists today only
> > describes the first CPU in DSDT.
>
> This confuses me somewhat, because I'm far from sure which machines this
> is true for in QEMU. I'm guessing it's a legacy thing with
> some old distro version of QEMU - so we'll have to paper over it anyway
> but for current QEMU I'm not sure it's true.
>
> Helpfully there are a bunch of ACPI table tests so I've been checking
> through all the multi CPU cases.
>
> CPU hotplug not enabled.
> pc/DSDT.dimmpxm - 4x Processor entries. -smp 4
> pc/DSDT.acpihmat - 2x Processor entries. -smp 2
> q35/DSDT.acpihmat - 2x Processor entries. -smp 2
> virt/DSDT.acpihmatvirt - 4x ACPI0007 entries -smp 4
> q35/DSDT.acpihmat-noinitiator - 4 x Processor () entries -smp 4
> virt/DSDT.topology - 8x ACPI0007 entries
>
> I've also looked at the code and we have various types of
> CPU hotplug on x86 but they all build appropriate numbers of
> Processor() entries in DSDT.
> Arm likewise seems to build the right number of ACPI0007 entries
> (and doesn't yet have CPU HP support).
>
> If anyone can add a reference on why this is needed that would be very
> helpful.

Yes, it would.

Personally, I would prefer to assume that it is not necessary until it
turns out that (1) there is firmware with this issue actually in use
and (2) updating the firmware in question to follow the specification
is not practical.

Otherwise, we'd make it easier to ship non-compliant firmware for no
good reason.

2024-01-22 18:08:00

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 03/21] ACPI: processor: Register CPUs that are online, but not described in the DSDT

On Mon, Jan 22, 2024 at 04:02:27PM +0000, Jonathan Cameron wrote:
> On Mon, 15 Jan 2024 11:06:29 +0000
> "Russell King (Oracle)" <[email protected]> wrote:
>
> > On Mon, Dec 18, 2023 at 09:22:03PM +0100, Rafael J. Wysocki wrote:
> > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > >
> > > > From: James Morse <[email protected]>
> > > >
> > > > ACPI has two descriptions of CPUs, one in the MADT/APIC table, the other
> > > > in the DSDT. Both are required. (ACPI 6.5's 8.4 "Declaring Processors"
> > > > says "Each processor in the system must be declared in the ACPI
> > > > namespace"). Having two descriptions allows firmware authors to get
> > > > this wrong.
> > > >
> > > > If CPUs are described in the MADT/APIC, they will be brought online
> > > > early during boot. Once the register_cpu() calls are moved to ACPI,
> > > > they will be based on the DSDT description of the CPUs. When CPUs are
> > > > missing from the DSDT description, they will end up online, but not
> > > > registered.
> > > >
> > > > Add a helper that runs after acpi_init() has completed to register
> > > > CPUs that are online, but weren't found in the DSDT. Any CPU that
> > > > is registered by this code triggers a firmware-bug warning and kernel
> > > > taint.
> > > >
> > > > Qemu TCG only describes the first CPU in the DSDT, unless cpu-hotplug
> > > > is configured.
> > >
> > > So why is this a kernel problem?
> >
> > So what are you proposing should be the behaviour here? What this
> > statement seems to be saying is that QEMU as it exists today only
> > describes the first CPU in DSDT.
>
> This confuses me somewhat, because I'm far from sure which machines this
> is true for in QEMU. I'm guessing it's a legacy thing with
> some old distro version of QEMU - so we'll have to paper over it anyway
> but for current QEMU I'm not sure it's true.
>
> Helpfully there are a bunch of ACPI table tests so I've been checking
> through all the multi CPU cases.
>
> CPU hotplug not enabled.
> pc/DSDT.dimmpxm - 4x Processor entries. -smp 4
> pc/DSDT.acpihmat - 2x Processor entries. -smp 2
> q35/DSDT.acpihmat - 2x Processor entries. -smp 2
> virt/DSDT.acpihmatvirt - 4x ACPI0007 entries -smp 4
> q35/DSDT.acpihmat-noinitiator - 4 x Processor () entries -smp 4
> virt/DSDT.topology - 8x ACPI0007 entries
>
> I've also looked at the code and we have various types of
> CPU hotplug on x86 but they all build appropriate numbers of
> Processor() entries in DSDT.
> Arm likewise seems to build the right number of ACPI0007 entries
> (and doesn't yet have CPU HP support).
>
> If anyone can add a reference on why this is needed that would be very
> helpful.

Maybe Salil can shed some light on this?

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-01-22 18:28:18

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 04/21] ACPI: processor: Register all CPUs from acpi_processor_get_info()

On Mon, 18 Dec 2023 21:30:50 +0100
"Rafael J. Wysocki" <[email protected]> wrote:

> On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> >
> > From: James Morse <[email protected]>
> >
> > To allow ACPI to skip the call to arch_register_cpu() when the _STA
> > value indicates the CPU can't be brought online right now, move the
> > arch_register_cpu() call into acpi_processor_get_info().
>
> This kind of looks backwards to me and has a potential to become
> super-confusing.
>
> I would instead add a way for the generic code to ask the platform
> firmware whether or not the given CPU is enabled and so it can be
> registered.

Hi Rafael,

The ACPI interpreter isn't up at this stage so we'd need to pull that
forwards. I'm not sure if we can pull the interpreter init early enough.

Perhaps pushing the registration back in all cases is the way to go?
Given the acpi interpretter is initialized via subsys_initcall() it would
need to be after that - I tried pushing cpu_dev_register_generic()
immediately after acpi_bus_init() and that seems fine.
We can't leave the rest of cpu_dev_init() that late because a bunch
of other stuff relies on it (CPU freq blows up first as a core_init()
on my setup).

So to make this work we need it to always move the registration later
than the necessary infrastructure, perhaps to subsys_initcall_sync()
as is done for missing CPUs (we'd need to combine the two given that
needs to run after this, or potentially just stop checking for acpi_disabled
and don't taint the kernel!). I think this is probably the most consistent
option on basis it at least moves the registration to the same point
whatever is going on and can easily use the arch callback you suggest
to hide away the logic on deciding if a CPU is there or not.

What do you think is the best way to do this?


>
> > Systems can still be booted with 'acpi=off', or not include ano
> > ACPI description at all. For these, the CPUs continue to be
> > registered by cpu_dev_register_generic().
> >
> > This moves the CPU register logic back to a subsys_initcall(),
> > while the memory nodes will have been registered earlier.
>
> Isn't this somewhat risky?
>
> > Signed-off-by: James Morse <[email protected]>
> > Reviewed-by: Gavin Shan <[email protected]>
> > Tested-by: Miguel Luis <[email protected]>
> > Tested-by: Vishnu Pajjuri <[email protected]>
> > Tested-by: Jianyong Wu <[email protected]>
> > Signed-off-by: Russell King (Oracle) <[email protected]>
> > ---
> > Changes since RFC v2:
> > * Fixup comment in acpi_processor_get_info() (Gavin Shan)
> > * Add comment in cpu_dev_register_generic() (Gavin Shan)
> > ---
> > drivers/acpi/acpi_processor.c | 12 ++++++++++++
> > drivers/base/cpu.c | 6 +++++-
> > 2 files changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index 0511f2bc10bc..e7ed4730cbbe 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -314,6 +314,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
> > cpufreq_add_device("acpi-cpufreq");
> > }
> >
> > + /*
> > + * Register CPUs that are present. get_cpu_device() is used to skip
> > + * duplicate CPU descriptions from firmware.
> > + */
> > + if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
> > + !get_cpu_device(pr->id)) {
> > + int ret = arch_register_cpu(pr->id);
> > +
> > + if (ret)
> > + return ret;
> > + }
> > +
> > /*
> > * Extra Processor objects may be enumerated on MP systems with
> > * less than the max # of CPUs. They should be ignored _iff
> > diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> > index 47de0f140ba6..13d052bf13f4 100644
> > --- a/drivers/base/cpu.c
> > +++ b/drivers/base/cpu.c
> > @@ -553,7 +553,11 @@ static void __init cpu_dev_register_generic(void)
> > {
> > int i, ret;
> >
> > - if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES))
> > + /*
> > + * When ACPI is enabled, CPUs are registered via
> > + * acpi_processor_get_info().
> > + */
> > + if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES) || !acpi_disabled)
> > return;
> >
> > for_each_present_cpu(i) {
> > --
> > 2.30.2
> >
> >
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


2024-01-22 18:42:12

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

On Mon, 18 Dec 2023 21:35:16 +0100
"Rafael J. Wysocki" <[email protected]> wrote:

> On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> >
> > From: James Morse <[email protected]>
> >
> > The code behind ACPI_HOTPLUG_CPU allows a not-present CPU to become
> > present.
>
> Right.
>
> > This isn't the only use of HOTPLUG_CPU. On arm64 and riscv
> > CPUs can be taken offline as a power saving measure.
>
> But still there is the case in which a non-present CPU can become
> present, isn't it there?

Not yet defined by the architectures (and I'm assuming it probably never will be).

The original proposal we took to ARM was to do exactly that - they pushed
back hard on the basis there was no architecturally safe way to implement it.
Too much of the ARM arch has to exist from the start of time.

https://lore.kernel.org/linux-arm-kernel/[email protected]/
is one of the relevant threads of the kernel side of that discussion.

Not to put specific words into the ARM architects mouths, but the
short description is that there is currently no demand for working
out how to make physical CPU hotplug possible, as such they will not
provide an architecturally compliant way to do it for virtual CPU hotplug and
another means is needed (which is why this series doesn't use the present bit
for that purpose and we have the Online capable bit in MADT/GICC)

It was a 'fun' dance of several years to get to that clarification.
As another fun fact, the same is defined for x86, but I don't think
anyone has used it yet (GICC for ARM has an online capable bit in the flags to
enable this, which was remarkably similar to the online capable bit in the
flags of the Local APIC entries as added fairly recently).

>
> > On arm64 an offline CPU may be disabled by firmware, preventing it from
> > being brought back online, but it remains present throughout.
> >
> > Adding code to prevent user-space trying to online these disabled CPUs
> > needs some additional terminology.
> >
> > Rename the Kconfig symbol CONFIG_ACPI_HOTPLUG_PRESENT_CPU to reflect
> > that it makes possible CPUs present.
>
> Honestly, I don't think that this change is necessary or even useful.

Whilst it's an attempt to avoid future confusion, the rename is
not something I really care about so my advice to Russell is drop
it unless you are attached to it!

Jonathan


>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


2024-01-22 18:44:12

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 04/21] ACPI: processor: Register all CPUs from acpi_processor_get_info()

On Mon, Jan 22, 2024 at 6:44 PM Jonathan Cameron
<[email protected]> wrote:
>
> On Mon, 18 Dec 2023 21:30:50 +0100
> "Rafael J. Wysocki" <[email protected]> wrote:
>
> > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > >
> > > From: James Morse <[email protected]>
> > >
> > > To allow ACPI to skip the call to arch_register_cpu() when the _STA
> > > value indicates the CPU can't be brought online right now, move the
> > > arch_register_cpu() call into acpi_processor_get_info().
> >
> > This kind of looks backwards to me and has a potential to become
> > super-confusing.
> >
> > I would instead add a way for the generic code to ask the platform
> > firmware whether or not the given CPU is enabled and so it can be
> > registered.
>
> Hi Rafael,
>
> The ACPI interpreter isn't up at this stage so we'd need to pull that
> forwards. I'm not sure if we can pull the interpreter init early enough.

Well, this patch effectively defers the AP registration to the time
when acpi_processor_get_info() runs and the interpreter is up and
running then.

For consistency, it would be better to defer the AP registration in
general to that point.

> Perhaps pushing the registration back in all cases is the way to go?
> Given the acpi interpretter is initialized via subsys_initcall() it would
> need to be after that - I tried pushing cpu_dev_register_generic()
> immediately after acpi_bus_init() and that seems fine.

Sounds promising.

> We can't leave the rest of cpu_dev_init() that late because a bunch
> of other stuff relies on it (CPU freq blows up first as a core_init()
> on my setup).

I see.

> So to make this work we need it to always move the registration later
> than the necessary infrastructure, perhaps to subsys_initcall_sync()
> as is done for missing CPUs (we'd need to combine the two given that
> needs to run after this, or potentially just stop checking for acpi_disabled
> and don't taint the kernel!). I think this is probably the most consistent
> option on basis it at least moves the registration to the same point
> whatever is going on and can easily use the arch callback you suggest
> to hide away the logic on deciding if a CPU is there or not.

I agree.

2024-01-23 09:27:46

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 03/21] ACPI: processor: Register CPUs that are online, but not described in the DSDT

On Mon, 22 Jan 2024 17:30:05 +0000
"Russell King (Oracle)" <[email protected]> wrote:

> On Mon, Jan 22, 2024 at 05:22:46PM +0100, Rafael J. Wysocki wrote:
> > On Mon, Jan 22, 2024 at 5:02 PM Jonathan Cameron
> > <[email protected]> wrote:
> > >
> > > On Mon, 15 Jan 2024 11:06:29 +0000
> > > "Russell King (Oracle)" <[email protected]> wrote:
> > >
> > > > On Mon, Dec 18, 2023 at 09:22:03PM +0100, Rafael J. Wysocki wrote:
> > > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > > > >
> > > > > > From: James Morse <[email protected]>
> > > > > >
> > > > > > ACPI has two descriptions of CPUs, one in the MADT/APIC table, the other
> > > > > > in the DSDT. Both are required. (ACPI 6.5's 8.4 "Declaring Processors"
> > > > > > says "Each processor in the system must be declared in the ACPI
> > > > > > namespace"). Having two descriptions allows firmware authors to get
> > > > > > this wrong.
> > > > > >
> > > > > > If CPUs are described in the MADT/APIC, they will be brought online
> > > > > > early during boot. Once the register_cpu() calls are moved to ACPI,
> > > > > > they will be based on the DSDT description of the CPUs. When CPUs are
> > > > > > missing from the DSDT description, they will end up online, but not
> > > > > > registered.
> > > > > >
> > > > > > Add a helper that runs after acpi_init() has completed to register
> > > > > > CPUs that are online, but weren't found in the DSDT. Any CPU that
> > > > > > is registered by this code triggers a firmware-bug warning and kernel
> > > > > > taint.
> > > > > >
> > > > > > Qemu TCG only describes the first CPU in the DSDT, unless cpu-hotplug
> > > > > > is configured.
> > > > >
> > > > > So why is this a kernel problem?
> > > >
> > > > So what are you proposing should be the behaviour here? What this
> > > > statement seems to be saying is that QEMU as it exists today only
> > > > describes the first CPU in DSDT.
> > >
> > > This confuses me somewhat, because I'm far from sure which machines this
> > > is true for in QEMU. I'm guessing it's a legacy thing with
> > > some old distro version of QEMU - so we'll have to paper over it anyway
> > > but for current QEMU I'm not sure it's true.
> > >
> > > Helpfully there are a bunch of ACPI table tests so I've been checking
> > > through all the multi CPU cases.
> > >
> > > CPU hotplug not enabled.
> > > pc/DSDT.dimmpxm - 4x Processor entries. -smp 4
> > > pc/DSDT.acpihmat - 2x Processor entries. -smp 2
> > > q35/DSDT.acpihmat - 2x Processor entries. -smp 2
> > > virt/DSDT.acpihmatvirt - 4x ACPI0007 entries -smp 4
> > > q35/DSDT.acpihmat-noinitiator - 4 x Processor () entries -smp 4
> > > virt/DSDT.topology - 8x ACPI0007 entries
> > >
> > > I've also looked at the code and we have various types of
> > > CPU hotplug on x86 but they all build appropriate numbers of
> > > Processor() entries in DSDT.
> > > Arm likewise seems to build the right number of ACPI0007 entries
> > > (and doesn't yet have CPU HP support).
> > >
> > > If anyone can add a reference on why this is needed that would be very
> > > helpful.
> >
> > Yes, it would.
> >
> > Personally, I would prefer to assume that it is not necessary until it
> > turns out that (1) there is firmware with this issue actually in use
> > and (2) updating the firmware in question to follow the specification
> > is not practical.
> >
> > Otherwise, we'd make it easier to ship non-compliant firmware for no
> > good reason.
>
> If Salil can't come up with a reason, then I'm in favour of dropping
> the patch like already done for patch 2. If the code change serves no
> useful purpose, there's no point in making the change.
>

Salil's out today, but I've messaged him to follow up later in the week.

It 'might' be the odd cold plug path where QEMU half comes up, then extra
CPUs are added, then it boots. (used by some orchestration frameworks)
I don't have a set up for that and I won't get to creating one today anyway
(we all love start of the year planning workshops!)

I've +CC'd a few people have run tests on the various iterations of this
work in the past. Maybe one of them can shed some light on this?

Jonathan





2024-01-23 10:10:01

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 14/21] irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc()

On Tue, 9 Jan 2024 19:27:20 +0000
"Russell King (Oracle)" <[email protected]> wrote:

> On Fri, Dec 15, 2023 at 04:33:01PM +0000, Jonathan Cameron wrote:
> > On Wed, 13 Dec 2023 12:50:23 +0000
> > Russell King (Oracle) <[email protected]> wrote:
> >
> > > From: James Morse <[email protected]>
> > >
> > > gic_acpi_match_gicc() is only called via gic_acpi_count_gicr_regions().
> > > It should only count the number of enabled redistributors, but it
> > > also tries to sanity check the GICC entry, currently returning an
> > > error if the Enabled bit is set, but the gicr_base_address is zero.
> > >
> > > Adding support for the online-capable bit to the sanity check
> > > complicates it, for no benefit. The existing check implicitly
> > > depends on gic_acpi_count_gicr_regions() previous failing to find
> > > any GICR regions (as it is valid to have gicr_base_address of zero if
> > > the redistributors are described via a GICR entry).
> > >
> > > Instead of complicating the check, remove it. Failures that happen
> > > at this point cause the irqchip not to register, meaning no irqs
> > > can be requested. The kernel grinds to a panic() pretty quickly.
> > >
> > > Without the check, MADT tables that exhibit this problem are still
> > > caught by gic_populate_rdist(), which helpfully also prints what
> > > went wrong:
> > > | CPU4: mpidr 100 has no re-distributor!
> > >
> > > Signed-off-by: James Morse <[email protected]>
> > > Reviewed-by: Gavin Shan <[email protected]>
> > > Tested-by: Miguel Luis <[email protected]>
> > > Tested-by: Vishnu Pajjuri <[email protected]>
> > > Tested-by: Jianyong Wu <[email protected]>
> > > Signed-off-by: Russell King (Oracle) <[email protected]>
> > > ---
> > > drivers/irqchip/irq-gic-v3.c | 18 ++++++------------
> > > 1 file changed, 6 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > > index 98b0329b7154..ebecd4546830 100644
> > > --- a/drivers/irqchip/irq-gic-v3.c
> > > +++ b/drivers/irqchip/irq-gic-v3.c
> > > @@ -2420,21 +2420,15 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header,
> > >
> > > /*
> > > * If GICC is enabled and has valid gicr base address, then it means
> > > - * GICR base is presented via GICC
> > > + * GICR base is presented via GICC. The redistributor is only known to
> > > + * be accessible if the GICC is marked as enabled. If this bit is not
> > > + * set, we'd need to add the redistributor at runtime, which isn't
> > > + * supported.
> > > */
> > > - if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) {
> > > + if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address)
> >
> > I was very vague in previous review. I think the reasons you are switching
> > from acpi_gicc_is_useable(gicc) to the gicc->flags & ACPI_MADT_ENABLED
> > needs calling out as I'm fairly sure that this point in the series at least
> > acpi_gicc_is_usable is same as current upstream:
> >
> > static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc)
> > {
> > return gicc->flags & ACPI_MADT_ENABLED;
> > }
>
> In a previous patch adding acpi_gicc_is_usable() c54e52f84d7a ("arm64,
> irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper") this
> was:
>
> - if ((gicc->flags & ACPI_MADT_ENABLED) && gicc->gicr_base_address) {
> + if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) {
>
> so effectively this is undoing that particular change, which raises in
> my mind why the change was made in the first place if it's just going
> to be reverted in a later patch (because in a following patch,
> acpi_gicc_is_usable() has an additional condition added to it that
> isn't applicable here.) which effectively makes acpi_gicc_is_usable()
> return true if either ACPI_MADT_ENABLED _or_
> ACPI_MADT_GICC_ONLINE_CAPABLE (as it is now known) are set.

Ok. So maybe just calling out that we are about to change the meaning
of acpi_gicc_is_usable() so need to partly revert that earlier patch
to make use of it everywhere.

Or perhaps introduce
acpi_gicc_is_enabled() which is called by acpi_gicc_is_usable()
along with the new conditions when they are added though as you
say later, what does usable mean?

>
> However, if ACPI_MADT_GICC_ONLINE_CAPABLE is set, does that actually
> mean that the GICC is usable? I'm not sure it does. ACPI v6.5 says that
> this bit indicates that the system supports enabling this processor
> later. Is the GICC of a currently disabled processor "usable"...

I agree, this is confusing.

acpi_gicc_may_be_usable()?

Or invert it in all places to give a cleaner meaning
!acpi_gicc_never_usable()

Bit of a pain to change this throughout again, but maybe necessary
to avoid confusion in future.

>
> Clearly, the intention of this change is not to count this GICC entry
> if it is marked ACPI_MADT_GICC_ONLINE_CAPABLE, but I feel that isn't
> described in the commit message.

Agreed, though that only happens in the next patch so easier to describe
there or via a patch adding initially identical multiple helper functions
that then diverge in following patch?

Whilst a helper for this one location seems silly it would let us put
the two helpers next to each other where the distinction is obvious.

>
> Moreover, I am getting the feeling that there are _two_ changes going
> on here - there's the change that's talked about in the commit message
> (the complex validation that seems unnecessary) and then there's the
> preparation for the change to acpi_gicc_is_usable() - which maybe
> should be in the following patch where it would be less confusing.

Agreed.

>
> Would you agree?
>
Yes, the move would help as then it's obvious why this needs to change
and that is separate from the naming question.

So in conclusion, I agree with everything you've called out on this one,
up to you to pick which solution cleans this up. I think options are.
1) Just move the change to the next patch where it's easier to describe.
Leaves the odd 'usable' behind.
2) Rename the useable() to something else, maybe inverting logic as
!never is easier than now_or_maybe_later.
3) Possibly add another helper for this new case which starts as matching
the existing one, but diverges in a later patch (Should still not be
in this patch which as you observer is doing something else and I think
is actually a bug fix anyway, be it one that has never mattered for
any shipping firmware).

Jonathan



2024-01-23 10:27:42

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 17/21] ACPI: add support to register CPUs based on the _STA enabled bit

On Tue, 2 Jan 2024 14:53:20 +0000
Jonathan Cameron <[email protected]> wrote:

> On Mon, 18 Dec 2023 13:03:32 +0000
> "Russell King (Oracle)" <[email protected]> wrote:
>
> > On Wed, Dec 13, 2023 at 12:50:38PM +0000, Russell King wrote:
> > > From: James Morse <[email protected]>
> > >
> > > acpi_processor_get_info() registers all present CPUs. Registering a
> > > CPU is what creates the sysfs entries and triggers the udev
> > > notifications.
> > >
> > > arm64 virtual machines that support 'virtual cpu hotplug' use the
> > > enabled bit to indicate whether the CPU can be brought online, as
> > > the existing ACPI tables require all hardware to be described and
> > > present.
> > >
> > > If firmware describes a CPU as present, but disabled, skip the
> > > registration. Such CPUs are present, but can't be brought online for
> > > whatever reason. (e.g. firmware/hypervisor policy).
> > >
> > > Once firmware sets the enabled bit, the CPU can be registered and
> > > brought online by user-space. Online CPUs, or CPUs that are missing
> > > an _STA method must always be registered.
> >
> > ...
> >
> > > @@ -526,6 +552,9 @@ static void acpi_processor_post_eject(struct acpi_device *device)
> > > acpi_processor_make_not_present(device);
> > > return;
> > > }
> > > +
> > > + if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED))
> > > + arch_unregister_cpu(pr->id);
> >
> > This change isn't described in the commit log, but seems to be the cause
> > of the build error identified by the kernel build bot that is fixed
> > later in this series. I'm wondering whether this should be in a
> > different patch, maybe "ACPI: Check _STA present bit before making CPUs
> > not present" ?
>
> Would seem a bit odd to call arch_unregister_cpu() way before the code
> is added to call the matching arch_registers_cpu()
>
> Mind you this eject doesn't just apply to those CPUs that are registered
> later I think, but instead to all. So we run into the spec hole that
> there is no way to identify initially 'enabled' CPUs that might be disabled
> later.
>
> >
> > Or maybe my brain isn't working properly (due to being Covid positive.)
> > Any thoughts, Jonathan?
>
> I'll go with a resounding 'not sure' on where this change belongs.
> I blame my non existent start of the year hangover.
> Hope you have recovered!

Looking again, I think you were right, move it to that earlier patch.

J
>
> Jonathan
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


2024-01-23 10:31:00

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 18/21] ACPI: processor: Only call arch_unregister_cpu() if HOTPLUG_CPU is selected

On Mon, 18 Dec 2023 12:58:07 +0000
"Russell King (Oracle)" <[email protected]> wrote:

> On Fri, Dec 15, 2023 at 04:50:09PM +0000, Jonathan Cameron wrote:
> > On Wed, 13 Dec 2023 12:50:43 +0000
> > Russell King (Oracle) <[email protected]> wrote:
> >
> > > From: James Morse <[email protected]>
> > >
> > > The kbuild robot points out that configurations without HOTPLUG_CPU
> > > selected can try to build acpi_processor_post_eject() without success
> > > as arch_unregister_cpu() is not defined.
> > >
> > > Check this explicitly. This will be merged into:
> > > | ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug
> > > for any subsequent posting.
> > >
> > > Reported-by: kbuild test robot <[email protected]>
> > > Signed-off-by: James Morse <[email protected]>
> > > Tested-by: Miguel Luis <[email protected]>
> > > Tested-by: Vishnu Pajjuri <[email protected]>
> > > Tested-by: Jianyong Wu <[email protected]>
> > > ---
> > > This should probably be squashed into an earlier patch.
> >
> > Agreed. If not
> > Reviewed-by: Jonathan Cameron <[email protected]>
>
> I'm not convinced that "ACPI: Add post_eject to struct acpi_scan_handler
> for cpu hotplug" is the correct commit to squash this into.
>
> As far as acpi_processor.c is concerned, This commit merely renames
> acpi_processor_remove() to be acpi_processor_post_eject(). The function
> references arch_unregister_cpu() before and after this change, and its
> build is dependent on CONFIG_ACPI_HOTPLUG_PRESENT_CPU being defined.
>
> Commit "ACPI: convert acpi_processor_post_eject() to use IS_ENABLED()"
> removed the ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU surrounding
> acpi_processor_post_eject, and that symbol depends on
> CONFIG_HOTPLUG_CPU, so I think this commit is also fine.
>
> Commit "ACPI: Check _STA present bit before making CPUs not present"
> rewrites the function - the original body gets called
> acpi_processor_make_not_present() and a new acpi_processor_post_eject()
> is created. At this point, it doesn't reference arch_unregister_cpu().
>
> Commit "ACPI: add support to register CPUs based on the _STA enabled
> bit" adds a reference to arch_unregister_cpu() in this new
> acpi_processor_post_eject() - so I think this is the correct commit
> this change should be merged into.

That or where that change ends up given your earlier suggestion to
move that change as well. I find it hard to care as long as
the bisection issue is squashed by the change. If we make the code
drop out before the build issue is introduced that's fine because
we are arguing we shouldn't be running it anyway so such protection
is fine if not necessary for build fix purposes.

J

>


2024-01-23 10:58:52

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 20/21] ACPI: Add _OSC bits to advertise OS support for toggling CPU present/enabled

On Tue, 2 Jan 2024 15:35:45 +0000
Jose Marinho <[email protected]> wrote:

> > -----Original Message-----
> > From: Jonathan Cameron <[email protected]>
> > Sent: Tuesday, January 2, 2024 3:17 PM
> > To: Jose Marinho <[email protected]>
> > Cc: Russell King (Oracle) <[email protected]>; linux-
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; linux-arm-
> > [email protected]; [email protected];
> > [email protected]; [email protected]; acpica-
> > [email protected]; [email protected]; linux-
> > [email protected]; [email protected]; [email protected];
> > Salil Mehta <[email protected]>; Jean-Philippe Brucker <jean-
> > [email protected]>; Jianyong Wu <[email protected]>; Justin He
> > <[email protected]>; James Morse <[email protected]>; Samer El-Haj-
> > Mahmoud <[email protected]>; nd <[email protected]>; Kangkang
> > Shen <[email protected]>
> > Subject: Re: [PATCH RFC v3 20/21] ACPI: Add _OSC bits to advertise OS support
> > for toggling CPU present/enabled
> >
> > On Tue, 2 Jan 2024 13:07:25 +0000
> > Jose Marinho <[email protected]> wrote:
> >
> > > Hi Jonathan,
> > >
> > > > -----Original Message-----
> > > > From: Jonathan Cameron <[email protected]>
> > > > Sent: Friday, December 15, 2023 5:12 PM
> > > > To: Russell King (Oracle) <[email protected]>
> > > > Cc: [email protected]; [email protected]; linux-
> > > > [email protected]; [email protected]; linux-
> > > > [email protected]; [email protected]; linux-
> > > > [email protected]; [email protected]; [email protected];
> > > > acpica- [email protected]; [email protected];
> > > > linux- [email protected]; [email protected]; linux-
> > > > [email protected]; Salil Mehta <[email protected]>;
> > > > Jean-Philippe Brucker <[email protected]>; Jianyong Wu
> > > > <[email protected]>; Justin He <[email protected]>; James Morse
> > > > <[email protected]>; Jose Marinho <[email protected]>; Samer
> > > > El-Haj-Mahmoud <Samer.El- [email protected]>
> > > > Subject: Re: [PATCH RFC v3 20/21] ACPI: Add _OSC bits to advertise
> > > > OS support for toggling CPU present/enabled
> > > >
> > > > On Wed, 13 Dec 2023 12:50:54 +0000
> > > > Russell King (Oracle) <[email protected]> wrote:
> > > >
> > > > > From: James Morse <[email protected]>
> > > > >
> > > > > Platform firmware can disabled a CPU, or make it not-present by
> > > > > making an eject-request notification, then waiting for the os to
> > > > > make it offline
> > > > OS
> > > >
> > > > > and call _EJx. After the firmware updates _STA with the new status.
> > > > >
> > > > > Not all operating systems support this. For arm64 making CPUs
> > > > > not-present has never been supported. For all ACPI architectures,
> > > > > making CPUs disabled has recently been added. Firmware can't know
> > > > > what
> > > > the OS has support for.
> > > > >
> > > > > Add two new _OSC bits to advertise whether the OS supports the
> > > > > _STA enabled or present bits being toggled for CPUs. This will be
> > > > > important for arm64 if systems that support physical CPU hotplug
> > > > > ever appear as
> > > > > arm64 linux doesn't currently support this, so firmware shouldn't try.
> > > > >
> > > > > Advertising this support to firmware is useful for cloud
> > > > > orchestrators to know whether they can scale a particular VM by adding
> > CPUs.
> > > > >
> > > > > Signed-off-by: James Morse <[email protected]>
> > > > > Tested-by: Miguel Luis <[email protected]>
> > > > > Tested-by: Vishnu Pajjuri <[email protected]>
> > > > > Tested-by: Jianyong Wu <[email protected]>
> > > >
> > > > I'm very much in favor of this _OSC but it hasn't been accepted yet I think...
> > > > https://bugzilla.tianocore.org/show_bug.cgi?id=4481
> > > >
> > > > Jose? Github suggests you are the proposer on this.
> > >
> > > The addition of these _OSC bits was proposed by us on the forum in question.
> > > The forum opted to pause the definition until additional practical information
> > could be provided on the use-cases.
> > >
> > > If anyone is interested in progressing the _OSC bit definition, you are invited to
> > express that interest in the Bugzilla ticket.
> >
> > I've poked around a bit and can't find any reference to how to actually get a
> > bugzilla account bugzilla.tianocore.org. Any pointers? I'm sure I had one at one
> > stage, but trying every plausible email address and the forgotten password link
> > got me nowhere.
> >
>
> The procedure to get a new account is described here: https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues
> The immediate next steps are:
> - Join https://edk2.groups.io/g/devel, and subscribe edk2 | devel group.
> - Send the email with the detail reason to Bugzilla Admin ([email protected]) , this email address will be created as Bugzilla account.
>
> > > Information that you should provide to increase the chances of the ticket being
> > reopened:
> > > - use-case for the new _OSC bits,
> >
> > Really annoying without it as a hypervisor can't query if a guest can do anything
> > useful if the host does virtual CPU hotplug via this newly added route.
> > Given this is new functionality and there is non trivial effort required by the host
> > to instantiate such a CPU it would be nice to be able to find out if the feature is
> > supported by the Guest OS without having to basically suck it an see with
> > hypervisors having to do a trial hotplug just to see if it 'might' work.
> >
> > > - what breaks (if anything) without the proposed _OSC bits.
> >
> > Nothing breaks - you can merrily poke in hotplugged CPUs with the attendant
> > creation of resources in the host and have them disappear into a black hole.
> > That's ugly but not broken as such. Hopefully a hypervisor will not keep trying
> > until the first attempt either succeeds or fails.
> >
> > >
> > > We did receive additional comments:
> > > - the proposed _OSC bits are not generic: the bits simply convey whether the
> > guest OS understands CPU hot-plug, but it says nothing about the number of CPUs
> > that the OS supports.
> >
> > If a guest says it supports this feature, you would hope it supports it for the
> > number of CPUs that have the present bit set but the enabled not.
> > I'd clarify that in the text rather than provide a means of querying the number of
> > CPUs supported.
> > Number wouldn't be sufficient anyway as it wouldn't indicate 'which' CPUs are
> > supported.
> > Nothing says they have to be contiguous or lowest IDs etc.
> >
> > > - There could be alternate schemes that do not rely on spec changes. E.g. there
> > could be a hypervisor IMPDEF mechanism to describe if an OS image supports
> > CPU hot-plug.
> >
> > Sigh. Yes, that could be done at the cost of every guest having to be made aware
> > of every hypervisor impdef mechanism. Trying to avoid that mess is why I think
> > an _OSC makes sense as then everyone can use the same control.
> >
> > No particular reason we should use ACPI at all for VMs :)
> >
> > >
> > > >
> > > > btw v4 looks ok but v5 in the tianocore github seems to have lost
> > > > the actual OSC part.
> > >
> > > Agree that, if we do progress with this spec change, v4 is the correct formulation
> > we should adopt.
> > >
> > Thanks for the update.
> >
> > Overall this is a question we need to resolve soon. If this code otherwise goes in
> > linux without the OSC we will always need to support the 'suck it and see'
> > approach as we'll never know if the guest fell down the hole. Thus if not added
> > soon we might as well not add it at all and we'll all be looking at the code and
> > thinking "that's ugly and shouldn't have been necessary" for years to come.
> >
> > +CC Kangkang as he might be able to help get this started again.
>
> We're keen to support the progress of this ECR.

So work is underway on kicking this off again, but I think we need a backup plan
(even if it is a bit ugly) as I really don't want the kernel code to get caught
behind an ASWG discussion that might not end up with the answer we want anyway.

So even if we eventually land this _OSC in the spec, I think we will have
systems where it's unknowable if they support this feature or not.
That is the 'suck it and see' approach will be necessary. If an orchestrator
really wants to know if this is supported by the guest it will have to try
telling the guest the CPU is enabled, and if the guest turns it on we know it
supports this feature. So it'll have to have a tedious probe loop.

That can then be shortcut but an _OSC if we have one later.

I really want to see this feature go into the kernel this cycle and this ugly
corner isn't to my mind a blocker.

So I suggest we drop this patch for now and we'll revisit later.


>
> Regards,
> Jose
>
> >
> > Jonathan
> >
> > > Regards,
> > > Jose
> > >
> > > >
> > > > Jonathan
> > > >
> > > > > ---
> > > > > I'm assuming Loongarch machines do not support physical CPU hotplug.
> > > > >
> > > > > Changes since RFC v3:
> > > > > * Drop ia64 changes
> > > > > * Update James' comment below "---" to remove reference to ia64
> > > > >
> > > > > Outstanding comment:
> > > > > https://lore.kernel.org/r/[email protected]
> > > >
> > > >
> > > >
> > > > > ---
> > > > > arch/x86/Kconfig | 1 +
> > > > > drivers/acpi/Kconfig | 9 +++++++++
> > > > > drivers/acpi/acpi_processor.c | 14 +++++++++++++-
> > > > > drivers/acpi/bus.c | 16 ++++++++++++++++
> > > > > include/linux/acpi.h | 4 ++++
> > > > > 5 files changed, 43 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index
> > > > > 64fc7c475ab0..33fc4dcd950c 100644
> > > > > --- a/arch/x86/Kconfig
> > > > > +++ b/arch/x86/Kconfig
> > > > > @@ -60,6 +60,7 @@ config X86
> > > > > select ACPI_LEGACY_TABLES_LOOKUP if ACPI
> > > > > select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
> > > > > select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR
> > > > && HOTPLUG_CPU
> > > > > + select ACPI_HOTPLUG_IGNORE_OSC if ACPI &&
> > > > HOTPLUG_CPU
> > > > > select ARCH_32BIT_OFF_T if X86_32
> > > > > select ARCH_CLOCKSOURCE_INIT
> > > > > select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
> > > > > diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index
> > > > > 9c5a43d0aff4..020e7c0ab985 100644
> > > > > --- a/drivers/acpi/Kconfig
> > > > > +++ b/drivers/acpi/Kconfig
> > > > > @@ -311,6 +311,15 @@ config ACPI_HOTPLUG_PRESENT_CPU
> > > > > depends on ACPI_PROCESSOR && HOTPLUG_CPU
> > > > > select ACPI_CONTAINER
> > > > >
> > > > > +config ACPI_HOTPLUG_IGNORE_OSC
> > > > > + bool
> > > > > + depends on ACPI_HOTPLUG_PRESENT_CPU
> > > > > + help
> > > > > + Ignore whether firmware acknowledged support for toggling the CPU
> > > > > + present bit in _STA. Some architectures predate the _OSC bits, so
> > > > > + firmware doesn't know to do this.
> > > > > +
> > > > > +
> > > > > config ACPI_PROCESSOR_AGGREGATOR
> > > > > tristate "Processor Aggregator"
> > > > > depends on ACPI_PROCESSOR
> > > > > diff --git a/drivers/acpi/acpi_processor.c
> > > > > b/drivers/acpi/acpi_processor.c index ea12e70dfd39..5bb207a7a1dd
> > > > > 100644
> > > > > --- a/drivers/acpi/acpi_processor.c
> > > > > +++ b/drivers/acpi/acpi_processor.c
> > > > > @@ -182,6 +182,18 @@ static void __init
> > > > > acpi_pcc_cpufreq_init(void) static void __init
> > > > > acpi_pcc_cpufreq_init(void) {} #endif /*
> > > > > CONFIG_X86 */
> > > > >
> > > > > +static bool acpi_processor_hotplug_present_supported(void)
> > > > > +{
> > > > > + if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
> > > > > + return false;
> > > > > +
> > > > > + /* x86 systems pre-date the _OSC bit */
> > > > > + if (IS_ENABLED(CONFIG_ACPI_HOTPLUG_IGNORE_OSC))
> > > > > + return true;
> > > > > +
> > > > > + return osc_sb_hotplug_present_support_acked;
> > > > > +}
> > > > > +
> > > > > /* Initialization */
> > > > > static int acpi_processor_make_present(struct acpi_processor *pr)
> > > > > { @@ -189,7 +201,7 @@ static int
> > > > > acpi_processor_make_present(struct
> > > > acpi_processor *pr)
> > > > > acpi_status status;
> > > > > int ret;
> > > > >
> > > > > - if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU)) {
> > > > > + if (!acpi_processor_hotplug_present_supported()) {
> > > > > pr_err_once("Changing CPU present bit is not supported\n");
> > > > > return -ENODEV;
> > > > > }
> > > > > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c index
> > > > > 72e64c0718c9..7122450739d6 100644
> > > > > --- a/drivers/acpi/bus.c
> > > > > +++ b/drivers/acpi/bus.c
> > > > > @@ -298,6 +298,13 @@
> > > > > EXPORT_SYMBOL_GPL(osc_sb_native_usb4_support_confirmed);
> > > > >
> > > > > bool osc_sb_cppc2_support_acked;
> > > > >
> > > > > +/*
> > > > > + * ACPI 6.? Proposed Operating System Capabilities for modifying
> > > > > +CPU
> > > > > + * present/enable.
> > > > > + */
> > > > > +bool osc_sb_hotplug_enabled_support_acked;
> > > > > +bool osc_sb_hotplug_present_support_acked;
> > > > > +
> > > > > static u8 sb_uuid_str[] = "0811B06E-4A27-44F9-8D60-3CBBC22E7B48";
> > > > > static void acpi_bus_osc_negotiate_platform_control(void)
> > > > > {
> > > > > @@ -346,6 +353,11 @@ static void
> > > > > acpi_bus_osc_negotiate_platform_control(void)
> > > > >
> > > > > if (!ghes_disable)
> > > > > capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
> > > > > +
> > > > > + capbuf[OSC_SUPPORT_DWORD] |=
> > > > OSC_SB_HOTPLUG_ENABLED_SUPPORT;
> > > > > + if (IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU))
> > > > > + capbuf[OSC_SUPPORT_DWORD] |=
> > > > OSC_SB_HOTPLUG_PRESENT_SUPPORT;
> > > > > +
> > > > > if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
> > > > > return;
> > > > >
> > > > > @@ -383,6 +395,10 @@ static void
> > > > acpi_bus_osc_negotiate_platform_control(void)
> > > > > capbuf_ret[OSC_SUPPORT_DWORD] &
> > > > OSC_SB_NATIVE_USB4_SUPPORT;
> > > > > osc_cpc_flexible_adr_space_confirmed =
> > > > > capbuf_ret[OSC_SUPPORT_DWORD] &
> > > > OSC_SB_CPC_FLEXIBLE_ADR_SPACE;
> > > > > + osc_sb_hotplug_enabled_support_acked =
> > > > > + capbuf_ret[OSC_SUPPORT_DWORD] &
> > > > OSC_SB_HOTPLUG_ENABLED_SUPPORT;
> > > > > + osc_sb_hotplug_present_support_acked =
> > > > > + capbuf_ret[OSC_SUPPORT_DWORD] &
> > > > OSC_SB_HOTPLUG_PRESENT_SUPPORT;
> > > > > }
> > > > >
> > > > > kfree(context.ret.pointer);
> > > > > diff --git a/include/linux/acpi.h b/include/linux/acpi.h index
> > > > > 00be66683505..c572abac803c 100644
> > > > > --- a/include/linux/acpi.h
> > > > > +++ b/include/linux/acpi.h
> > > > > @@ -559,12 +559,16 @@ acpi_status acpi_run_osc(acpi_handle handle,
> > > > struct acpi_osc_context *context);
> > > > > #define OSC_SB_NATIVE_USB4_SUPPORT 0x00040000
> > > > > #define OSC_SB_PRM_SUPPORT 0x00200000
> > > > > #define OSC_SB_FFH_OPR_SUPPORT 0x00400000
> > > > > +#define OSC_SB_HOTPLUG_ENABLED_SUPPORT 0x00800000
> > > > > +#define OSC_SB_HOTPLUG_PRESENT_SUPPORT 0x01000000
> > > > >
> > > > > extern bool osc_sb_apei_support_acked; extern bool
> > > > > osc_pc_lpi_support_confirmed; extern bool
> > > > > osc_sb_native_usb4_support_confirmed;
> > > > > extern bool osc_sb_cppc2_support_acked; extern bool
> > > > > osc_cpc_flexible_adr_space_confirmed;
> > > > > +extern bool osc_sb_hotplug_enabled_support_acked;
> > > > > +extern bool osc_sb_hotplug_present_support_acked;
> > > > >
> > > > > /* USB4 Capabilities */
> > > > > #define OSC_USB_USB3_TUNNELING 0x00000001
> > >
>


2024-01-23 13:44:06

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

On Mon, Jan 22, 2024 at 06:00:13PM +0000, Jonathan Cameron wrote:
> On Mon, 18 Dec 2023 21:35:16 +0100
> "Rafael J. Wysocki" <[email protected]> wrote:
>
> > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > >
> > > From: James Morse <[email protected]>
> > >
> > > The code behind ACPI_HOTPLUG_CPU allows a not-present CPU to become
> > > present.
> >
> > Right.
> >
> > > This isn't the only use of HOTPLUG_CPU. On arm64 and riscv
> > > CPUs can be taken offline as a power saving measure.
> >
> > But still there is the case in which a non-present CPU can become
> > present, isn't it there?
>
> Not yet defined by the architectures (and I'm assuming it probably never will be).
>
> The original proposal we took to ARM was to do exactly that - they pushed
> back hard on the basis there was no architecturally safe way to implement it.
> Too much of the ARM arch has to exist from the start of time.
>
> https://lore.kernel.org/linux-arm-kernel/[email protected]/
> is one of the relevant threads of the kernel side of that discussion.
>
> Not to put specific words into the ARM architects mouths, but the
> short description is that there is currently no demand for working
> out how to make physical CPU hotplug possible, as such they will not
> provide an architecturally compliant way to do it for virtual CPU hotplug and
> another means is needed (which is why this series doesn't use the present bit
> for that purpose and we have the Online capable bit in MADT/GICC)
>
> It was a 'fun' dance of several years to get to that clarification.
> As another fun fact, the same is defined for x86, but I don't think
> anyone has used it yet (GICC for ARM has an online capable bit in the flags to
> enable this, which was remarkably similar to the online capable bit in the
> flags of the Local APIC entries as added fairly recently).
>
> >
> > > On arm64 an offline CPU may be disabled by firmware, preventing it from
> > > being brought back online, but it remains present throughout.
> > >
> > > Adding code to prevent user-space trying to online these disabled CPUs
> > > needs some additional terminology.
> > >
> > > Rename the Kconfig symbol CONFIG_ACPI_HOTPLUG_PRESENT_CPU to reflect
> > > that it makes possible CPUs present.
> >
> > Honestly, I don't think that this change is necessary or even useful.
>
> Whilst it's an attempt to avoid future confusion, the rename is
> not something I really care about so my advice to Russell is drop
> it unless you are attached to it!

While I agree that it isn't a necessity, I don't fully agree that it
isn't useful.

One of the issues will be that while Arm64 will support hotplug vCPU,
it won't be setting ACPI_HOTPLUG_CPU because it doesn't support
the present bit changing. So I can see why James decided to rename
it - because with Arm64's hotplug vCPU, the idea that ACPI_HOTPLUG_CPU
somehow enables hotplug CPU support is now no longer true.

Keeping it as ACPI_HOTPLUG_CPU makes the code less obvious, because it
leads one to assume that it ought to be enabled for Arm64's
implementatinon, and that could well cause issues in the future if
people make the assumption that "ACPI_HOTPLUG_CPU" means hotplug CPU
is supported in ACPI. It doesn't anymore.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-01-23 14:23:13

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 17/21] ACPI: add support to register CPUs based on the _STA enabled bit

On Tue, 23 Jan 2024 13:10:44 +0000
"Russell King (Oracle)" <[email protected]> wrote:

> On Tue, Jan 23, 2024 at 10:26:03AM +0000, Jonathan Cameron wrote:
> > On Tue, 2 Jan 2024 14:53:20 +0000
> > Jonathan Cameron <[email protected]> wrote:
> >
> > > On Mon, 18 Dec 2023 13:03:32 +0000
> > > "Russell King (Oracle)" <[email protected]> wrote:
> > >
> > > > On Wed, Dec 13, 2023 at 12:50:38PM +0000, Russell King wrote:
> > > > > From: James Morse <[email protected]>
> > > > >
> > > > > acpi_processor_get_info() registers all present CPUs. Registering a
> > > > > CPU is what creates the sysfs entries and triggers the udev
> > > > > notifications.
> > > > >
> > > > > arm64 virtual machines that support 'virtual cpu hotplug' use the
> > > > > enabled bit to indicate whether the CPU can be brought online, as
> > > > > the existing ACPI tables require all hardware to be described and
> > > > > present.
> > > > >
> > > > > If firmware describes a CPU as present, but disabled, skip the
> > > > > registration. Such CPUs are present, but can't be brought online for
> > > > > whatever reason. (e.g. firmware/hypervisor policy).
> > > > >
> > > > > Once firmware sets the enabled bit, the CPU can be registered and
> > > > > brought online by user-space. Online CPUs, or CPUs that are missing
> > > > > an _STA method must always be registered.
> > > >
> > > > ...
> > > >
> > > > > @@ -526,6 +552,9 @@ static void acpi_processor_post_eject(struct acpi_device *device)
> > > > > acpi_processor_make_not_present(device);
> > > > > return;
> > > > > }
> > > > > +
> > > > > + if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED))
> > > > > + arch_unregister_cpu(pr->id);
> > > >
> > > > This change isn't described in the commit log, but seems to be the cause
> > > > of the build error identified by the kernel build bot that is fixed
> > > > later in this series. I'm wondering whether this should be in a
> > > > different patch, maybe "ACPI: Check _STA present bit before making CPUs
> > > > not present" ?
> > >
> > > Would seem a bit odd to call arch_unregister_cpu() way before the code
> > > is added to call the matching arch_registers_cpu()
> > >
> > > Mind you this eject doesn't just apply to those CPUs that are registered
> > > later I think, but instead to all. So we run into the spec hole that
> > > there is no way to identify initially 'enabled' CPUs that might be disabled
> > > later.
> > >
> > > >
> > > > Or maybe my brain isn't working properly (due to being Covid positive.)
> > > > Any thoughts, Jonathan?
> > >
> > > I'll go with a resounding 'not sure' on where this change belongs.
> > > I blame my non existent start of the year hangover.
> > > Hope you have recovered!
> >
> > Looking again, I think you were right, move it to that earlier patch.
>
> I'm having second thoughts - because this patch introduces the
> arch_register_cpu() into the acpi_processor_add() path (via
> acpi_processor_get_info() and acpi_processor_make_enabled(), so isn't
> it also correct to add arch_unregister_cpu() to the detach/post_eject
> path as well? If we add one without the other, doesn't stuff become
> a bit asymetric?
>
> Looking more deeply at these changes, I'm finding it isn't easy to
> keep track of everything that's going on here.

I can sympathize.

>
> We had attach()/detach() callbacks that were nice and symetrical.
> How we have this post_eject() callback that makes things asymetrical.
>
> We have the attach() method that registers the CPU, but no detach
> method, instead having the post_eject() method. On the face of it,
> arch_unregister_cpu() doesn't look symetric unless one goes digging
> more in the code - by that, I mean arch_register_cpu() only gets
> called of present=1 _and_ enabled=1. However, arch_unregister_cpu()
> gets called buried in acpi_processor_make_not_present(), called when
> present=0, and then we have this new one to handle the case where
> enabled=0. It is not obvious that arch_unregister_cpu() is the reverse
> of what happens with arch_register_cpu() here.

One option would be to pull the arch_unregister_cpu() out so it
happens in one place in both the present = 0 and enabled = 0 cases but
I'm not sure if it's safe to reorder the contents of
acpi_processor_not_present() as it's followed by a bunch of things.

Would looks something like

if (cpu_present(pr->id)) {
if (!(sta & ACPI_STA_DEVICE_PRESENT)) {
acpi_processor_make_not_present(device); /* Remove arch_cpu_unregister() */
} else if (!(sta & ACPI_STA_DEVICE_ENABLED)) {
/* Nothing to do in this case */
} else {
return; /* Firmware did something silly - probably racing */
}
arch_unregister_cpu(pr->id);

return;
}

>
> Then we have the add() method allocating pr->throttling.shared_cpu_map,
> and acpi_processor_make_not_present() freeing it. From what I read in
> ACPI v6.5, enabled is not allowed to be set without present. So, if
> _STA reports that a CPU that had present=1 enabled=1, but then is
> later reported to be enabled=0 (which we handle by calling only
> arch_unregister_cpu()) then what happens when _STA changes to
> enabled=1 later? Does add() get called?

yes it does (I poked it to see) which indeed isn't good (unless I've
broken my setup in some obscure way).

Seems we need a few more things than arch_unregister_cpu() pulled out
in the above code.


> If it does, this would cause
> a new acpi_processor structure to be allocated and the old one to be
> leaked... I hope I'm wrong about add() being called - but if it isn't,
> how does enabled going from 0->1 get handled... and if we are handling
> its 1->0 transition separately from present, then surely we should be
> handling that.
>
> Maybe I'm just getting confused, but I've spent much of this morning
> trying to unravel all this... and I'm of the opinion that this isn't a
> sign of a good approach.

It's all annoyingly messy at the root of things, but indeed you've found
some issues in current implementation. Feels like just ripping out
a bunch of stuff from acpi_processor_make_not_present() and calling it
for both paths will probably work, but I've not tested that yet.

Jonathan
>


2024-01-23 15:00:08

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 17/21] ACPI: add support to register CPUs based on the _STA enabled bit

On Tue, Jan 23, 2024 at 02:22:18PM +0000, Jonathan Cameron wrote:
> On Tue, 23 Jan 2024 13:10:44 +0000
> "Russell King (Oracle)" <[email protected]> wrote:
>
> > On Tue, Jan 23, 2024 at 10:26:03AM +0000, Jonathan Cameron wrote:
> > > On Tue, 2 Jan 2024 14:53:20 +0000
> > > Jonathan Cameron <[email protected]> wrote:
> > >
> > > > On Mon, 18 Dec 2023 13:03:32 +0000
> > > > "Russell King (Oracle)" <[email protected]> wrote:
> > > >
> > > > > On Wed, Dec 13, 2023 at 12:50:38PM +0000, Russell King wrote:
> > > > > > From: James Morse <[email protected]>
> > > > > >
> > > > > > acpi_processor_get_info() registers all present CPUs. Registering a
> > > > > > CPU is what creates the sysfs entries and triggers the udev
> > > > > > notifications.
> > > > > >
> > > > > > arm64 virtual machines that support 'virtual cpu hotplug' use the
> > > > > > enabled bit to indicate whether the CPU can be brought online, as
> > > > > > the existing ACPI tables require all hardware to be described and
> > > > > > present.
> > > > > >
> > > > > > If firmware describes a CPU as present, but disabled, skip the
> > > > > > registration. Such CPUs are present, but can't be brought online for
> > > > > > whatever reason. (e.g. firmware/hypervisor policy).
> > > > > >
> > > > > > Once firmware sets the enabled bit, the CPU can be registered and
> > > > > > brought online by user-space. Online CPUs, or CPUs that are missing
> > > > > > an _STA method must always be registered.
> > > > >
> > > > > ...
> > > > >
> > > > > > @@ -526,6 +552,9 @@ static void acpi_processor_post_eject(struct acpi_device *device)
> > > > > > acpi_processor_make_not_present(device);
> > > > > > return;
> > > > > > }
> > > > > > +
> > > > > > + if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED))
> > > > > > + arch_unregister_cpu(pr->id);
> > > > >
> > > > > This change isn't described in the commit log, but seems to be the cause
> > > > > of the build error identified by the kernel build bot that is fixed
> > > > > later in this series. I'm wondering whether this should be in a
> > > > > different patch, maybe "ACPI: Check _STA present bit before making CPUs
> > > > > not present" ?
> > > >
> > > > Would seem a bit odd to call arch_unregister_cpu() way before the code
> > > > is added to call the matching arch_registers_cpu()
> > > >
> > > > Mind you this eject doesn't just apply to those CPUs that are registered
> > > > later I think, but instead to all. So we run into the spec hole that
> > > > there is no way to identify initially 'enabled' CPUs that might be disabled
> > > > later.
> > > >
> > > > >
> > > > > Or maybe my brain isn't working properly (due to being Covid positive.)
> > > > > Any thoughts, Jonathan?
> > > >
> > > > I'll go with a resounding 'not sure' on where this change belongs.
> > > > I blame my non existent start of the year hangover.
> > > > Hope you have recovered!
> > >
> > > Looking again, I think you were right, move it to that earlier patch.
> >
> > I'm having second thoughts - because this patch introduces the
> > arch_register_cpu() into the acpi_processor_add() path (via
> > acpi_processor_get_info() and acpi_processor_make_enabled(), so isn't
> > it also correct to add arch_unregister_cpu() to the detach/post_eject
> > path as well? If we add one without the other, doesn't stuff become
> > a bit asymetric?
> >
> > Looking more deeply at these changes, I'm finding it isn't easy to
> > keep track of everything that's going on here.
>
> I can sympathize.
>
> >
> > We had attach()/detach() callbacks that were nice and symetrical.
> > How we have this post_eject() callback that makes things asymetrical.
> >
> > We have the attach() method that registers the CPU, but no detach
> > method, instead having the post_eject() method. On the face of it,
> > arch_unregister_cpu() doesn't look symetric unless one goes digging
> > more in the code - by that, I mean arch_register_cpu() only gets
> > called of present=1 _and_ enabled=1. However, arch_unregister_cpu()
> > gets called buried in acpi_processor_make_not_present(), called when
> > present=0, and then we have this new one to handle the case where
> > enabled=0. It is not obvious that arch_unregister_cpu() is the reverse
> > of what happens with arch_register_cpu() here.
>
> One option would be to pull the arch_unregister_cpu() out so it
> happens in one place in both the present = 0 and enabled = 0 cases but
> I'm not sure if it's safe to reorder the contents of
> acpi_processor_not_present() as it's followed by a bunch of things.
>
> Would looks something like
>
> if (cpu_present(pr->id)) {
> if (!(sta & ACPI_STA_DEVICE_PRESENT)) {
> acpi_processor_make_not_present(device); /* Remove arch_cpu_unregister() */
> } else if (!(sta & ACPI_STA_DEVICE_ENABLED)) {
> /* Nothing to do in this case */
> } else {
> return; /* Firmware did something silly - probably racing */
> }
> arch_unregister_cpu(pr->id);
>
> return;
> }
>
> >
> > Then we have the add() method allocating pr->throttling.shared_cpu_map,
> > and acpi_processor_make_not_present() freeing it. From what I read in
> > ACPI v6.5, enabled is not allowed to be set without present. So, if
> > _STA reports that a CPU that had present=1 enabled=1, but then is
> > later reported to be enabled=0 (which we handle by calling only
> > arch_unregister_cpu()) then what happens when _STA changes to
> > enabled=1 later? Does add() get called?
>
> yes it does (I poked it to see) which indeed isn't good (unless I've
> broken my setup in some obscure way).

Thanks for confirming - I haven't had a chance to do any testing (late
lunch because of spending so long looking at this...)

> Seems we need a few more things than arch_unregister_cpu() pulled out
> in the above code.

Yes, and I also wonder whether we should be doing any of that
unconditionally...

> > If it does, this would cause
> > a new acpi_processor structure to be allocated and the old one to be
> > leaked... I hope I'm wrong about add() being called - but if it isn't,
> > how does enabled going from 0->1 get handled... and if we are handling
> > its 1->0 transition separately from present, then surely we should be
> > handling that.
> >
> > Maybe I'm just getting confused, but I've spent much of this morning
> > trying to unravel all this... and I'm of the opinion that this isn't a
> > sign of a good approach.
>
> It's all annoyingly messy at the root of things, but indeed you've found
> some issues in current implementation. Feels like just ripping out
> a bunch of stuff from acpi_processor_make_not_present() and calling it
> for both paths will probably work, but I've not tested that yet.

.. since surely if we've already got to the point of issuing a
post_eject() callback, the device has already been ejected
and thus has gone - and if it is ever "replaced" we will get an
attach() callback.

Moreover, looking at acpi_scan_hot_remove(), if we are the device
being ejected, then after ej0 is evaluated, _STA is checked, and
acpi_bus_post_eject() called only if enabled=0. (This will also
end up calling post_eject() for any children as well which won't
have their _STA evaluated.)

So this has got me wondering whether acpi_processor_post_eject()
should be doing all the cleanup in acpi_processor_make_not_present()
except if we believe the call is in error (e.g.
!ACPI_HOTPLUG_PRESENT_CPU and present=0) - thus preparing the way
for a future attach() callback.

Hmm. I wonder if Rafael has any input on this.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-01-23 16:16:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

On Tue, Jan 23, 2024 at 2:28 PM Russell King (Oracle)
<[email protected]> wrote:
>
> On Mon, Jan 22, 2024 at 06:00:13PM +0000, Jonathan Cameron wrote:
> > On Mon, 18 Dec 2023 21:35:16 +0100
> > "Rafael J. Wysocki" <[email protected]> wrote:
> >
> > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > >
> > > > From: James Morse <[email protected]>
> > > >
> > > > The code behind ACPI_HOTPLUG_CPU allows a not-present CPU to become
> > > > present.
> > >
> > > Right.
> > >
> > > > This isn't the only use of HOTPLUG_CPU. On arm64 and riscv
> > > > CPUs can be taken offline as a power saving measure.
> > >
> > > But still there is the case in which a non-present CPU can become
> > > present, isn't it there?
> >
> > Not yet defined by the architectures (and I'm assuming it probably never will be).
> >
> > The original proposal we took to ARM was to do exactly that - they pushed
> > back hard on the basis there was no architecturally safe way to implement it.
> > Too much of the ARM arch has to exist from the start of time.
> >
> > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > is one of the relevant threads of the kernel side of that discussion.
> >
> > Not to put specific words into the ARM architects mouths, but the
> > short description is that there is currently no demand for working
> > out how to make physical CPU hotplug possible, as such they will not
> > provide an architecturally compliant way to do it for virtual CPU hotplug and
> > another means is needed (which is why this series doesn't use the present bit
> > for that purpose and we have the Online capable bit in MADT/GICC)
> >
> > It was a 'fun' dance of several years to get to that clarification.
> > As another fun fact, the same is defined for x86, but I don't think
> > anyone has used it yet (GICC for ARM has an online capable bit in the flags to
> > enable this, which was remarkably similar to the online capable bit in the
> > flags of the Local APIC entries as added fairly recently).
> >
> > >
> > > > On arm64 an offline CPU may be disabled by firmware, preventing it from
> > > > being brought back online, but it remains present throughout.
> > > >
> > > > Adding code to prevent user-space trying to online these disabled CPUs
> > > > needs some additional terminology.
> > > >
> > > > Rename the Kconfig symbol CONFIG_ACPI_HOTPLUG_PRESENT_CPU to reflect
> > > > that it makes possible CPUs present.
> > >
> > > Honestly, I don't think that this change is necessary or even useful.
> >
> > Whilst it's an attempt to avoid future confusion, the rename is
> > not something I really care about so my advice to Russell is drop
> > it unless you are attached to it!
>
> While I agree that it isn't a necessity, I don't fully agree that it
> isn't useful.
>
> One of the issues will be that while Arm64 will support hotplug vCPU,
> it won't be setting ACPI_HOTPLUG_CPU because it doesn't support
> the present bit changing. So I can see why James decided to rename
> it - because with Arm64's hotplug vCPU, the idea that ACPI_HOTPLUG_CPU
> somehow enables hotplug CPU support is now no longer true.
>
> Keeping it as ACPI_HOTPLUG_CPU makes the code less obvious, because it
> leads one to assume that it ought to be enabled for Arm64's
> implementatinon, and that could well cause issues in the future if
> people make the assumption that "ACPI_HOTPLUG_CPU" means hotplug CPU
> is supported in ACPI. It doesn't anymore.

On x86 there is no confusion AFAICS. It's always meant "as long as
the platform supports it".

2024-01-23 16:58:06

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

On Tue, Jan 23, 2024 at 05:15:54PM +0100, Rafael J. Wysocki wrote:
> On Tue, Jan 23, 2024 at 2:28 PM Russell King (Oracle)
> <[email protected]> wrote:
> >
> > On Mon, Jan 22, 2024 at 06:00:13PM +0000, Jonathan Cameron wrote:
> > > On Mon, 18 Dec 2023 21:35:16 +0100
> > > "Rafael J. Wysocki" <[email protected]> wrote:
> > >
> > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > > >
> > > > > From: James Morse <[email protected]>
> > > > >
> > > > > The code behind ACPI_HOTPLUG_CPU allows a not-present CPU to become
> > > > > present.
> > > >
> > > > Right.
> > > >
> > > > > This isn't the only use of HOTPLUG_CPU. On arm64 and riscv
> > > > > CPUs can be taken offline as a power saving measure.
> > > >
> > > > But still there is the case in which a non-present CPU can become
> > > > present, isn't it there?
> > >
> > > Not yet defined by the architectures (and I'm assuming it probably never will be).
> > >
> > > The original proposal we took to ARM was to do exactly that - they pushed
> > > back hard on the basis there was no architecturally safe way to implement it.
> > > Too much of the ARM arch has to exist from the start of time.
> > >
> > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > is one of the relevant threads of the kernel side of that discussion.
> > >
> > > Not to put specific words into the ARM architects mouths, but the
> > > short description is that there is currently no demand for working
> > > out how to make physical CPU hotplug possible, as such they will not
> > > provide an architecturally compliant way to do it for virtual CPU hotplug and
> > > another means is needed (which is why this series doesn't use the present bit
> > > for that purpose and we have the Online capable bit in MADT/GICC)
> > >
> > > It was a 'fun' dance of several years to get to that clarification.
> > > As another fun fact, the same is defined for x86, but I don't think
> > > anyone has used it yet (GICC for ARM has an online capable bit in the flags to
> > > enable this, which was remarkably similar to the online capable bit in the
> > > flags of the Local APIC entries as added fairly recently).
> > >
> > > >
> > > > > On arm64 an offline CPU may be disabled by firmware, preventing it from
> > > > > being brought back online, but it remains present throughout.
> > > > >
> > > > > Adding code to prevent user-space trying to online these disabled CPUs
> > > > > needs some additional terminology.
> > > > >
> > > > > Rename the Kconfig symbol CONFIG_ACPI_HOTPLUG_PRESENT_CPU to reflect
> > > > > that it makes possible CPUs present.
> > > >
> > > > Honestly, I don't think that this change is necessary or even useful.
> > >
> > > Whilst it's an attempt to avoid future confusion, the rename is
> > > not something I really care about so my advice to Russell is drop
> > > it unless you are attached to it!
> >
> > While I agree that it isn't a necessity, I don't fully agree that it
> > isn't useful.
> >
> > One of the issues will be that while Arm64 will support hotplug vCPU,
> > it won't be setting ACPI_HOTPLUG_CPU because it doesn't support
> > the present bit changing. So I can see why James decided to rename
> > it - because with Arm64's hotplug vCPU, the idea that ACPI_HOTPLUG_CPU
> > somehow enables hotplug CPU support is now no longer true.
> >
> > Keeping it as ACPI_HOTPLUG_CPU makes the code less obvious, because it
> > leads one to assume that it ought to be enabled for Arm64's
> > implementatinon, and that could well cause issues in the future if
> > people make the assumption that "ACPI_HOTPLUG_CPU" means hotplug CPU
> > is supported in ACPI. It doesn't anymore.
>
> On x86 there is no confusion AFAICS. It's always meant "as long as
> the platform supports it".

That's x86, which supports physical CPU hotplug. We're introducing
support for Arm64 here which doesn't support physical CPU hotplug.

ACPI-based Physical Virtual
Arch HOTPLUG_CPU ACPI_HOTPLUG_CPU Hotplug Hotplug Hotplug
Arm64 Y N Y N Y
x86 Y Y Y Y Y

So ACPI_HOTPLUG_CPU becomes totally misnamed with the introduction
of hotplug on Arm64.

If we want to just look at stuff from an x86 perspective, then yes,
it remains correct to call it ACPI_HOTPLUG_CPU. It isn't correct as
soon as we add Arm64, as I already said.

And honestly, a two line quip to my reasoned argument is not IMHO
an acceptable reply.

.. getting close to throwing the rag in over this.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-01-23 17:53:03

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

On Tue, Jan 23, 2024 at 5:36 PM Russell King (Oracle)
<[email protected]> wrote:
>
> On Tue, Jan 23, 2024 at 05:15:54PM +0100, Rafael J. Wysocki wrote:
> > On Tue, Jan 23, 2024 at 2:28 PM Russell King (Oracle)
> > <[email protected]> wrote:
> > >
> > > On Mon, Jan 22, 2024 at 06:00:13PM +0000, Jonathan Cameron wrote:
> > > > On Mon, 18 Dec 2023 21:35:16 +0100
> > > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > >
> > > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > > > >
> > > > > > From: James Morse <[email protected]>
> > > > > >
> > > > > > The code behind ACPI_HOTPLUG_CPU allows a not-present CPU to become
> > > > > > present.
> > > > >
> > > > > Right.
> > > > >
> > > > > > This isn't the only use of HOTPLUG_CPU. On arm64 and riscv
> > > > > > CPUs can be taken offline as a power saving measure.
> > > > >
> > > > > But still there is the case in which a non-present CPU can become
> > > > > present, isn't it there?
> > > >
> > > > Not yet defined by the architectures (and I'm assuming it probably never will be).
> > > >
> > > > The original proposal we took to ARM was to do exactly that - they pushed
> > > > back hard on the basis there was no architecturally safe way to implement it.
> > > > Too much of the ARM arch has to exist from the start of time.
> > > >
> > > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > is one of the relevant threads of the kernel side of that discussion.
> > > >
> > > > Not to put specific words into the ARM architects mouths, but the
> > > > short description is that there is currently no demand for working
> > > > out how to make physical CPU hotplug possible, as such they will not
> > > > provide an architecturally compliant way to do it for virtual CPU hotplug and
> > > > another means is needed (which is why this series doesn't use the present bit
> > > > for that purpose and we have the Online capable bit in MADT/GICC)
> > > >
> > > > It was a 'fun' dance of several years to get to that clarification.
> > > > As another fun fact, the same is defined for x86, but I don't think
> > > > anyone has used it yet (GICC for ARM has an online capable bit in the flags to
> > > > enable this, which was remarkably similar to the online capable bit in the
> > > > flags of the Local APIC entries as added fairly recently).
> > > >
> > > > >
> > > > > > On arm64 an offline CPU may be disabled by firmware, preventing it from
> > > > > > being brought back online, but it remains present throughout.
> > > > > >
> > > > > > Adding code to prevent user-space trying to online these disabled CPUs
> > > > > > needs some additional terminology.
> > > > > >
> > > > > > Rename the Kconfig symbol CONFIG_ACPI_HOTPLUG_PRESENT_CPU to reflect
> > > > > > that it makes possible CPUs present.
> > > > >
> > > > > Honestly, I don't think that this change is necessary or even useful.
> > > >
> > > > Whilst it's an attempt to avoid future confusion, the rename is
> > > > not something I really care about so my advice to Russell is drop
> > > > it unless you are attached to it!
> > >
> > > While I agree that it isn't a necessity, I don't fully agree that it
> > > isn't useful.
> > >
> > > One of the issues will be that while Arm64 will support hotplug vCPU,
> > > it won't be setting ACPI_HOTPLUG_CPU because it doesn't support
> > > the present bit changing. So I can see why James decided to rename
> > > it - because with Arm64's hotplug vCPU, the idea that ACPI_HOTPLUG_CPU
> > > somehow enables hotplug CPU support is now no longer true.
> > >
> > > Keeping it as ACPI_HOTPLUG_CPU makes the code less obvious, because it
> > > leads one to assume that it ought to be enabled for Arm64's
> > > implementatinon, and that could well cause issues in the future if
> > > people make the assumption that "ACPI_HOTPLUG_CPU" means hotplug CPU
> > > is supported in ACPI. It doesn't anymore.
> >
> > On x86 there is no confusion AFAICS. It's always meant "as long as
> > the platform supports it".
>
> That's x86, which supports physical CPU hotplug. We're introducing
> support for Arm64 here which doesn't support physical CPU hotplug.
>
> ACPI-based Physical Virtual
> Arch HOTPLUG_CPU ACPI_HOTPLUG_CPU Hotplug Hotplug Hotplug
> Arm64 Y N Y N Y
> x86 Y Y Y Y Y
>
> So ACPI_HOTPLUG_CPU becomes totally misnamed with the introduction
> of hotplug on Arm64.
>
> If we want to just look at stuff from an x86 perspective, then yes,
> it remains correct to call it ACPI_HOTPLUG_CPU. It isn't correct as
> soon as we add Arm64, as I already said.

And if you rename it, it becomes less confusing for ARM64, but more
confusing for x86, which basically is my point.

IMO "hotplug" covers both cases well enough and "hotplug present" is
only accurate for one of them.

> And honestly, a two line quip to my reasoned argument is not IMHO
> an acceptable reply.

Well, I'm not even sure how to respond to this ...

2024-01-23 18:20:59

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

On Tue, Jan 23, 2024 at 06:43:59PM +0100, Rafael J. Wysocki wrote:
> On Tue, Jan 23, 2024 at 5:36 PM Russell King (Oracle)
> <[email protected]> wrote:
> >
> > On Tue, Jan 23, 2024 at 05:15:54PM +0100, Rafael J. Wysocki wrote:
> > > On Tue, Jan 23, 2024 at 2:28 PM Russell King (Oracle)
> > > <[email protected]> wrote:
> > > >
> > > > On Mon, Jan 22, 2024 at 06:00:13PM +0000, Jonathan Cameron wrote:
> > > > > On Mon, 18 Dec 2023 21:35:16 +0100
> > > > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > > >
> > > > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > > > > >
> > > > > > > From: James Morse <[email protected]>
> > > > > > >
> > > > > > > The code behind ACPI_HOTPLUG_CPU allows a not-present CPU to become
> > > > > > > present.
> > > > > >
> > > > > > Right.
> > > > > >
> > > > > > > This isn't the only use of HOTPLUG_CPU. On arm64 and riscv
> > > > > > > CPUs can be taken offline as a power saving measure.
> > > > > >
> > > > > > But still there is the case in which a non-present CPU can become
> > > > > > present, isn't it there?
> > > > >
> > > > > Not yet defined by the architectures (and I'm assuming it probably never will be).
> > > > >
> > > > > The original proposal we took to ARM was to do exactly that - they pushed
> > > > > back hard on the basis there was no architecturally safe way to implement it.
> > > > > Too much of the ARM arch has to exist from the start of time.
> > > > >
> > > > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > > is one of the relevant threads of the kernel side of that discussion.
> > > > >
> > > > > Not to put specific words into the ARM architects mouths, but the
> > > > > short description is that there is currently no demand for working
> > > > > out how to make physical CPU hotplug possible, as such they will not
> > > > > provide an architecturally compliant way to do it for virtual CPU hotplug and
> > > > > another means is needed (which is why this series doesn't use the present bit
> > > > > for that purpose and we have the Online capable bit in MADT/GICC)
> > > > >
> > > > > It was a 'fun' dance of several years to get to that clarification.
> > > > > As another fun fact, the same is defined for x86, but I don't think
> > > > > anyone has used it yet (GICC for ARM has an online capable bit in the flags to
> > > > > enable this, which was remarkably similar to the online capable bit in the
> > > > > flags of the Local APIC entries as added fairly recently).
> > > > >
> > > > > >
> > > > > > > On arm64 an offline CPU may be disabled by firmware, preventing it from
> > > > > > > being brought back online, but it remains present throughout.
> > > > > > >
> > > > > > > Adding code to prevent user-space trying to online these disabled CPUs
> > > > > > > needs some additional terminology.
> > > > > > >
> > > > > > > Rename the Kconfig symbol CONFIG_ACPI_HOTPLUG_PRESENT_CPU to reflect
> > > > > > > that it makes possible CPUs present.
> > > > > >
> > > > > > Honestly, I don't think that this change is necessary or even useful.
> > > > >
> > > > > Whilst it's an attempt to avoid future confusion, the rename is
> > > > > not something I really care about so my advice to Russell is drop
> > > > > it unless you are attached to it!
> > > >
> > > > While I agree that it isn't a necessity, I don't fully agree that it
> > > > isn't useful.
> > > >
> > > > One of the issues will be that while Arm64 will support hotplug vCPU,
> > > > it won't be setting ACPI_HOTPLUG_CPU because it doesn't support
> > > > the present bit changing. So I can see why James decided to rename
> > > > it - because with Arm64's hotplug vCPU, the idea that ACPI_HOTPLUG_CPU
> > > > somehow enables hotplug CPU support is now no longer true.
> > > >
> > > > Keeping it as ACPI_HOTPLUG_CPU makes the code less obvious, because it
> > > > leads one to assume that it ought to be enabled for Arm64's
> > > > implementatinon, and that could well cause issues in the future if
> > > > people make the assumption that "ACPI_HOTPLUG_CPU" means hotplug CPU
> > > > is supported in ACPI. It doesn't anymore.
> > >
> > > On x86 there is no confusion AFAICS. It's always meant "as long as
> > > the platform supports it".
> >
> > That's x86, which supports physical CPU hotplug. We're introducing
> > support for Arm64 here which doesn't support physical CPU hotplug.
> >
> > ACPI-based Physical Virtual
> > Arch HOTPLUG_CPU ACPI_HOTPLUG_CPU Hotplug Hotplug Hotplug
> > Arm64 Y N Y N Y
> > x86 Y Y Y Y Y
> >
> > So ACPI_HOTPLUG_CPU becomes totally misnamed with the introduction
> > of hotplug on Arm64.
> >
> > If we want to just look at stuff from an x86 perspective, then yes,
> > it remains correct to call it ACPI_HOTPLUG_CPU. It isn't correct as
> > soon as we add Arm64, as I already said.
>
> And if you rename it, it becomes less confusing for ARM64, but more
> confusing for x86, which basically is my point.
>
> IMO "hotplug" covers both cases well enough and "hotplug present" is
> only accurate for one of them.
>
> > And honestly, a two line quip to my reasoned argument is not IMHO
> > an acceptable reply.
>
> Well, I'm not even sure how to respond to this ...

The above explanation you give would have been useful...

I don't see how "hotplug" covers both cases. As I've tried to point
out many times now, ACPI_HOTPLUG_CPU is N for Arm64, yet it supports
ACPI based hotplug. How does ACPI_HOTPLUG_CPU cover Arm64 if it's
N there? IMHO it totally doesn't, and moreover, it goes against what
one would logically expect - and this is why I have a problem with
your effective NAK for this change. I believe you are basically
wrong on this for the reasons I've given - that ACPI_HOTPLUG_CPU
will be N for Arm64 despite it supporting ACPI-based CPU hotplug.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-01-23 18:28:08

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

On Tue, Jan 23, 2024 at 7:20 PM Russell King (Oracle)
<[email protected]> wrote:
>
> On Tue, Jan 23, 2024 at 06:43:59PM +0100, Rafael J. Wysocki wrote:
> > On Tue, Jan 23, 2024 at 5:36 PM Russell King (Oracle)
> > <[email protected]> wrote:
> > >
> > > On Tue, Jan 23, 2024 at 05:15:54PM +0100, Rafael J. Wysocki wrote:
> > > > On Tue, Jan 23, 2024 at 2:28 PM Russell King (Oracle)
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Mon, Jan 22, 2024 at 06:00:13PM +0000, Jonathan Cameron wrote:
> > > > > > On Mon, 18 Dec 2023 21:35:16 +0100
> > > > > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > > > >
> > > > > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > > > > > >
> > > > > > > > From: James Morse <[email protected]>
> > > > > > > >
> > > > > > > > The code behind ACPI_HOTPLUG_CPU allows a not-present CPU to become
> > > > > > > > present.
> > > > > > >
> > > > > > > Right.
> > > > > > >
> > > > > > > > This isn't the only use of HOTPLUG_CPU. On arm64 and riscv
> > > > > > > > CPUs can be taken offline as a power saving measure.
> > > > > > >
> > > > > > > But still there is the case in which a non-present CPU can become
> > > > > > > present, isn't it there?
> > > > > >
> > > > > > Not yet defined by the architectures (and I'm assuming it probably never will be).
> > > > > >
> > > > > > The original proposal we took to ARM was to do exactly that - they pushed
> > > > > > back hard on the basis there was no architecturally safe way to implement it.
> > > > > > Too much of the ARM arch has to exist from the start of time.
> > > > > >
> > > > > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > > > is one of the relevant threads of the kernel side of that discussion.
> > > > > >
> > > > > > Not to put specific words into the ARM architects mouths, but the
> > > > > > short description is that there is currently no demand for working
> > > > > > out how to make physical CPU hotplug possible, as such they will not
> > > > > > provide an architecturally compliant way to do it for virtual CPU hotplug and
> > > > > > another means is needed (which is why this series doesn't use the present bit
> > > > > > for that purpose and we have the Online capable bit in MADT/GICC)
> > > > > >
> > > > > > It was a 'fun' dance of several years to get to that clarification.
> > > > > > As another fun fact, the same is defined for x86, but I don't think
> > > > > > anyone has used it yet (GICC for ARM has an online capable bit in the flags to
> > > > > > enable this, which was remarkably similar to the online capable bit in the
> > > > > > flags of the Local APIC entries as added fairly recently).
> > > > > >
> > > > > > >
> > > > > > > > On arm64 an offline CPU may be disabled by firmware, preventing it from
> > > > > > > > being brought back online, but it remains present throughout.
> > > > > > > >
> > > > > > > > Adding code to prevent user-space trying to online these disabled CPUs
> > > > > > > > needs some additional terminology.
> > > > > > > >
> > > > > > > > Rename the Kconfig symbol CONFIG_ACPI_HOTPLUG_PRESENT_CPU to reflect
> > > > > > > > that it makes possible CPUs present.
> > > > > > >
> > > > > > > Honestly, I don't think that this change is necessary or even useful.
> > > > > >
> > > > > > Whilst it's an attempt to avoid future confusion, the rename is
> > > > > > not something I really care about so my advice to Russell is drop
> > > > > > it unless you are attached to it!
> > > > >
> > > > > While I agree that it isn't a necessity, I don't fully agree that it
> > > > > isn't useful.
> > > > >
> > > > > One of the issues will be that while Arm64 will support hotplug vCPU,
> > > > > it won't be setting ACPI_HOTPLUG_CPU because it doesn't support
> > > > > the present bit changing. So I can see why James decided to rename
> > > > > it - because with Arm64's hotplug vCPU, the idea that ACPI_HOTPLUG_CPU
> > > > > somehow enables hotplug CPU support is now no longer true.
> > > > >
> > > > > Keeping it as ACPI_HOTPLUG_CPU makes the code less obvious, because it
> > > > > leads one to assume that it ought to be enabled for Arm64's
> > > > > implementatinon, and that could well cause issues in the future if
> > > > > people make the assumption that "ACPI_HOTPLUG_CPU" means hotplug CPU
> > > > > is supported in ACPI. It doesn't anymore.
> > > >
> > > > On x86 there is no confusion AFAICS. It's always meant "as long as
> > > > the platform supports it".
> > >
> > > That's x86, which supports physical CPU hotplug. We're introducing
> > > support for Arm64 here which doesn't support physical CPU hotplug.
> > >
> > > ACPI-based Physical Virtual
> > > Arch HOTPLUG_CPU ACPI_HOTPLUG_CPU Hotplug Hotplug Hotplug
> > > Arm64 Y N Y N Y
> > > x86 Y Y Y Y Y
> > >
> > > So ACPI_HOTPLUG_CPU becomes totally misnamed with the introduction
> > > of hotplug on Arm64.
> > >
> > > If we want to just look at stuff from an x86 perspective, then yes,
> > > it remains correct to call it ACPI_HOTPLUG_CPU. It isn't correct as
> > > soon as we add Arm64, as I already said.
> >
> > And if you rename it, it becomes less confusing for ARM64, but more
> > confusing for x86, which basically is my point.
> >
> > IMO "hotplug" covers both cases well enough and "hotplug present" is
> > only accurate for one of them.
> >
> > > And honestly, a two line quip to my reasoned argument is not IMHO
> > > an acceptable reply.
> >
> > Well, I'm not even sure how to respond to this ...
>
> The above explanation you give would have been useful...
>
> I don't see how "hotplug" covers both cases. As I've tried to point
> out many times now, ACPI_HOTPLUG_CPU is N for Arm64, yet it supports
> ACPI based hotplug. How does ACPI_HOTPLUG_CPU cover Arm64 if it's
> N there?

But IIUC this change is preliminary for changing it (or equivalent
option with a different name) to Y, isn't it?

> IMHO it totally doesn't, and moreover, it goes against what
> one would logically expect - and this is why I have a problem with
> your effective NAK for this change. I believe you are basically
> wrong on this for the reasons I've given - that ACPI_HOTPLUG_CPU
> will be N for Arm64 despite it supporting ACPI-based CPU hotplug.

So I still have to understand how renaming it for all architectures
(including x86) is supposed to help.

It will still be the same option under a different name. How does
that change things technically?

2024-01-23 19:05:23

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

On Tue, Jan 23, 2024 at 07:26:57PM +0100, Rafael J. Wysocki wrote:
> On Tue, Jan 23, 2024 at 7:20 PM Russell King (Oracle)
> <[email protected]> wrote:
> >
> > On Tue, Jan 23, 2024 at 06:43:59PM +0100, Rafael J. Wysocki wrote:
> > > On Tue, Jan 23, 2024 at 5:36 PM Russell King (Oracle)
> > > <[email protected]> wrote:
> > > >
> > > > On Tue, Jan 23, 2024 at 05:15:54PM +0100, Rafael J. Wysocki wrote:
> > > > > On Tue, Jan 23, 2024 at 2:28 PM Russell King (Oracle)
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, Jan 22, 2024 at 06:00:13PM +0000, Jonathan Cameron wrote:
> > > > > > > On Mon, 18 Dec 2023 21:35:16 +0100
> > > > > > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > > > > >
> > > > > > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > From: James Morse <[email protected]>
> > > > > > > > >
> > > > > > > > > The code behind ACPI_HOTPLUG_CPU allows a not-present CPU to become
> > > > > > > > > present.
> > > > > > > >
> > > > > > > > Right.
> > > > > > > >
> > > > > > > > > This isn't the only use of HOTPLUG_CPU. On arm64 and riscv
> > > > > > > > > CPUs can be taken offline as a power saving measure.
> > > > > > > >
> > > > > > > > But still there is the case in which a non-present CPU can become
> > > > > > > > present, isn't it there?
> > > > > > >
> > > > > > > Not yet defined by the architectures (and I'm assuming it probably never will be).
> > > > > > >
> > > > > > > The original proposal we took to ARM was to do exactly that - they pushed
> > > > > > > back hard on the basis there was no architecturally safe way to implement it.
> > > > > > > Too much of the ARM arch has to exist from the start of time.
> > > > > > >
> > > > > > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > > > > is one of the relevant threads of the kernel side of that discussion.
> > > > > > >
> > > > > > > Not to put specific words into the ARM architects mouths, but the
> > > > > > > short description is that there is currently no demand for working
> > > > > > > out how to make physical CPU hotplug possible, as such they will not
> > > > > > > provide an architecturally compliant way to do it for virtual CPU hotplug and
> > > > > > > another means is needed (which is why this series doesn't use the present bit
> > > > > > > for that purpose and we have the Online capable bit in MADT/GICC)
> > > > > > >
> > > > > > > It was a 'fun' dance of several years to get to that clarification.
> > > > > > > As another fun fact, the same is defined for x86, but I don't think
> > > > > > > anyone has used it yet (GICC for ARM has an online capable bit in the flags to
> > > > > > > enable this, which was remarkably similar to the online capable bit in the
> > > > > > > flags of the Local APIC entries as added fairly recently).
> > > > > > >
> > > > > > > >
> > > > > > > > > On arm64 an offline CPU may be disabled by firmware, preventing it from
> > > > > > > > > being brought back online, but it remains present throughout.
> > > > > > > > >
> > > > > > > > > Adding code to prevent user-space trying to online these disabled CPUs
> > > > > > > > > needs some additional terminology.
> > > > > > > > >
> > > > > > > > > Rename the Kconfig symbol CONFIG_ACPI_HOTPLUG_PRESENT_CPU to reflect
> > > > > > > > > that it makes possible CPUs present.
> > > > > > > >
> > > > > > > > Honestly, I don't think that this change is necessary or even useful.
> > > > > > >
> > > > > > > Whilst it's an attempt to avoid future confusion, the rename is
> > > > > > > not something I really care about so my advice to Russell is drop
> > > > > > > it unless you are attached to it!
> > > > > >
> > > > > > While I agree that it isn't a necessity, I don't fully agree that it
> > > > > > isn't useful.
> > > > > >
> > > > > > One of the issues will be that while Arm64 will support hotplug vCPU,
> > > > > > it won't be setting ACPI_HOTPLUG_CPU because it doesn't support
> > > > > > the present bit changing. So I can see why James decided to rename
> > > > > > it - because with Arm64's hotplug vCPU, the idea that ACPI_HOTPLUG_CPU
> > > > > > somehow enables hotplug CPU support is now no longer true.
> > > > > >
> > > > > > Keeping it as ACPI_HOTPLUG_CPU makes the code less obvious, because it
> > > > > > leads one to assume that it ought to be enabled for Arm64's
> > > > > > implementatinon, and that could well cause issues in the future if
> > > > > > people make the assumption that "ACPI_HOTPLUG_CPU" means hotplug CPU
> > > > > > is supported in ACPI. It doesn't anymore.
> > > > >
> > > > > On x86 there is no confusion AFAICS. It's always meant "as long as
> > > > > the platform supports it".
> > > >
> > > > That's x86, which supports physical CPU hotplug. We're introducing
> > > > support for Arm64 here which doesn't support physical CPU hotplug.
> > > >
> > > > ACPI-based Physical Virtual
> > > > Arch HOTPLUG_CPU ACPI_HOTPLUG_CPU Hotplug Hotplug Hotplug
> > > > Arm64 Y N Y N Y
> > > > x86 Y Y Y Y Y
> > > >
> > > > So ACPI_HOTPLUG_CPU becomes totally misnamed with the introduction
> > > > of hotplug on Arm64.
> > > >
> > > > If we want to just look at stuff from an x86 perspective, then yes,
> > > > it remains correct to call it ACPI_HOTPLUG_CPU. It isn't correct as
> > > > soon as we add Arm64, as I already said.
> > >
> > > And if you rename it, it becomes less confusing for ARM64, but more
> > > confusing for x86, which basically is my point.
> > >
> > > IMO "hotplug" covers both cases well enough and "hotplug present" is
> > > only accurate for one of them.
> > >
> > > > And honestly, a two line quip to my reasoned argument is not IMHO
> > > > an acceptable reply.
> > >
> > > Well, I'm not even sure how to respond to this ...
> >
> > The above explanation you give would have been useful...
> >
> > I don't see how "hotplug" covers both cases. As I've tried to point
> > out many times now, ACPI_HOTPLUG_CPU is N for Arm64, yet it supports
> > ACPI based hotplug. How does ACPI_HOTPLUG_CPU cover Arm64 if it's
> > N there?
>
> But IIUC this change is preliminary for changing it (or equivalent
> option with a different name) to Y, isn't it?

No. As I keep saying, ACPI_HOTPLUG_CPU ends up N on Arm64 even when
it supports hotplug CPU via ACPI.

Even with the full Arm64 patch set here, under arch/ we still only
have:

arch/loongarch/Kconfig: select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR && HOTPLUG_CPU
arch/x86/Kconfig: select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR && HOTPLUG_CPU

To say it yet again, ACPI_HOTPLUG_(PRESENT_)CPU is *never* set on
Arm64.

> > IMHO it totally doesn't, and moreover, it goes against what
> > one would logically expect - and this is why I have a problem with
> > your effective NAK for this change. I believe you are basically
> > wrong on this for the reasons I've given - that ACPI_HOTPLUG_CPU
> > will be N for Arm64 despite it supporting ACPI-based CPU hotplug.
>
> So I still have to understand how renaming it for all architectures
> (including x86) is supposed to help.
>
> It will still be the same option under a different name. How does
> that change things technically?

Do you think that it makes any sense to have support for ACPI-based
hotplug CPU *and* having it functional with a configuration symbol
named "ACPI_HOTPLUG_CPU" to be set to N ? That's essentially what
you are advocating for...

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-01-23 19:55:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

On Tue, Jan 23, 2024 at 7:59 PM Russell King (Oracle)
<[email protected]> wrote:
>
> On Tue, Jan 23, 2024 at 07:26:57PM +0100, Rafael J. Wysocki wrote:
> > On Tue, Jan 23, 2024 at 7:20 PM Russell King (Oracle)
> > <[email protected]> wrote:
> > >
> > > On Tue, Jan 23, 2024 at 06:43:59PM +0100, Rafael J. Wysocki wrote:
> > > > On Tue, Jan 23, 2024 at 5:36 PM Russell King (Oracle)
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Tue, Jan 23, 2024 at 05:15:54PM +0100, Rafael J. Wysocki wrote:
> > > > > > On Tue, Jan 23, 2024 at 2:28 PM Russell King (Oracle)
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > On Mon, Jan 22, 2024 at 06:00:13PM +0000, Jonathan Cameron wrote:
> > > > > > > > On Mon, 18 Dec 2023 21:35:16 +0100
> > > > > > > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > > > > > >
> > > > > > > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > From: James Morse <[email protected]>
> > > > > > > > > >
> > > > > > > > > > The code behind ACPI_HOTPLUG_CPU allows a not-present CPU to become
> > > > > > > > > > present.
> > > > > > > > >
> > > > > > > > > Right.
> > > > > > > > >
> > > > > > > > > > This isn't the only use of HOTPLUG_CPU. On arm64 and riscv
> > > > > > > > > > CPUs can be taken offline as a power saving measure.
> > > > > > > > >
> > > > > > > > > But still there is the case in which a non-present CPU can become
> > > > > > > > > present, isn't it there?
> > > > > > > >
> > > > > > > > Not yet defined by the architectures (and I'm assuming it probably never will be).
> > > > > > > >
> > > > > > > > The original proposal we took to ARM was to do exactly that - they pushed
> > > > > > > > back hard on the basis there was no architecturally safe way to implement it.
> > > > > > > > Too much of the ARM arch has to exist from the start of time.
> > > > > > > >
> > > > > > > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > > > > > is one of the relevant threads of the kernel side of that discussion.
> > > > > > > >
> > > > > > > > Not to put specific words into the ARM architects mouths, but the
> > > > > > > > short description is that there is currently no demand for working
> > > > > > > > out how to make physical CPU hotplug possible, as such they will not
> > > > > > > > provide an architecturally compliant way to do it for virtual CPU hotplug and
> > > > > > > > another means is needed (which is why this series doesn't use the present bit
> > > > > > > > for that purpose and we have the Online capable bit in MADT/GICC)
> > > > > > > >
> > > > > > > > It was a 'fun' dance of several years to get to that clarification.
> > > > > > > > As another fun fact, the same is defined for x86, but I don't think
> > > > > > > > anyone has used it yet (GICC for ARM has an online capable bit in the flags to
> > > > > > > > enable this, which was remarkably similar to the online capable bit in the
> > > > > > > > flags of the Local APIC entries as added fairly recently).
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > On arm64 an offline CPU may be disabled by firmware, preventing it from
> > > > > > > > > > being brought back online, but it remains present throughout.
> > > > > > > > > >
> > > > > > > > > > Adding code to prevent user-space trying to online these disabled CPUs
> > > > > > > > > > needs some additional terminology.
> > > > > > > > > >
> > > > > > > > > > Rename the Kconfig symbol CONFIG_ACPI_HOTPLUG_PRESENT_CPU to reflect
> > > > > > > > > > that it makes possible CPUs present.
> > > > > > > > >
> > > > > > > > > Honestly, I don't think that this change is necessary or even useful.
> > > > > > > >
> > > > > > > > Whilst it's an attempt to avoid future confusion, the rename is
> > > > > > > > not something I really care about so my advice to Russell is drop
> > > > > > > > it unless you are attached to it!
> > > > > > >
> > > > > > > While I agree that it isn't a necessity, I don't fully agree that it
> > > > > > > isn't useful.
> > > > > > >
> > > > > > > One of the issues will be that while Arm64 will support hotplug vCPU,
> > > > > > > it won't be setting ACPI_HOTPLUG_CPU because it doesn't support
> > > > > > > the present bit changing. So I can see why James decided to rename
> > > > > > > it - because with Arm64's hotplug vCPU, the idea that ACPI_HOTPLUG_CPU
> > > > > > > somehow enables hotplug CPU support is now no longer true.
> > > > > > >
> > > > > > > Keeping it as ACPI_HOTPLUG_CPU makes the code less obvious, because it
> > > > > > > leads one to assume that it ought to be enabled for Arm64's
> > > > > > > implementatinon, and that could well cause issues in the future if
> > > > > > > people make the assumption that "ACPI_HOTPLUG_CPU" means hotplug CPU
> > > > > > > is supported in ACPI. It doesn't anymore.
> > > > > >
> > > > > > On x86 there is no confusion AFAICS. It's always meant "as long as
> > > > > > the platform supports it".
> > > > >
> > > > > That's x86, which supports physical CPU hotplug. We're introducing
> > > > > support for Arm64 here which doesn't support physical CPU hotplug.
> > > > >
> > > > > ACPI-based Physical Virtual
> > > > > Arch HOTPLUG_CPU ACPI_HOTPLUG_CPU Hotplug Hotplug Hotplug
> > > > > Arm64 Y N Y N Y
> > > > > x86 Y Y Y Y Y
> > > > >
> > > > > So ACPI_HOTPLUG_CPU becomes totally misnamed with the introduction
> > > > > of hotplug on Arm64.
> > > > >
> > > > > If we want to just look at stuff from an x86 perspective, then yes,
> > > > > it remains correct to call it ACPI_HOTPLUG_CPU. It isn't correct as
> > > > > soon as we add Arm64, as I already said.
> > > >
> > > > And if you rename it, it becomes less confusing for ARM64, but more
> > > > confusing for x86, which basically is my point.
> > > >
> > > > IMO "hotplug" covers both cases well enough and "hotplug present" is
> > > > only accurate for one of them.
> > > >
> > > > > And honestly, a two line quip to my reasoned argument is not IMHO
> > > > > an acceptable reply.
> > > >
> > > > Well, I'm not even sure how to respond to this ...
> > >
> > > The above explanation you give would have been useful...
> > >
> > > I don't see how "hotplug" covers both cases. As I've tried to point
> > > out many times now, ACPI_HOTPLUG_CPU is N for Arm64, yet it supports
> > > ACPI based hotplug. How does ACPI_HOTPLUG_CPU cover Arm64 if it's
> > > N there?
> >
> > But IIUC this change is preliminary for changing it (or equivalent
> > option with a different name) to Y, isn't it?
>
> No. As I keep saying, ACPI_HOTPLUG_CPU ends up N on Arm64 even when
> it supports hotplug CPU via ACPI.
>
> Even with the full Arm64 patch set here, under arch/ we still only
> have:
>
> arch/loongarch/Kconfig: select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR && HOTPLUG_CPU
> arch/x86/Kconfig: select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR && HOTPLUG_CPU
>
> To say it yet again, ACPI_HOTPLUG_(PRESENT_)CPU is *never* set on
> Arm64.

Allright, so ARM64 is not going to use the code that is conditional on
ACPI_HOTPLUG_CPU today.

Fair enough.

> > > IMHO it totally doesn't, and moreover, it goes against what
> > > one would logically expect - and this is why I have a problem with
> > > your effective NAK for this change. I believe you are basically
> > > wrong on this for the reasons I've given - that ACPI_HOTPLUG_CPU
> > > will be N for Arm64 despite it supporting ACPI-based CPU hotplug.
> >
> > So I still have to understand how renaming it for all architectures
> > (including x86) is supposed to help.
> >
> > It will still be the same option under a different name. How does
> > that change things technically?
>
> Do you think that it makes any sense to have support for ACPI-based
> hotplug CPU

So this is all about what you and I mean by "ACPI-based hotplug CPU".

> *and* having it functional with a configuration symbol
> named "ACPI_HOTPLUG_CPU" to be set to N ? That's essentially what
> you are advocating for...

Setting ACPI_HOTPLUG_CPU to N means that you are not going to compile
the code that is conditional on it.

That code allows the processor driver to be removed from CPUs and
arch_unregister_cpu() to be called from within acpi_bus_trim() (among
other things). On the way up, it allows arch_register_cpu() to be
called from within acpi_bus_scan(). If these things are not done,
what I mean by "ACPI-based hotplug CPU" is not supported.

2024-01-23 20:17:48

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

On Tue, Jan 23, 2024 at 9:09 PM Russell King (Oracle)
<[email protected]> wrote:
>
> On Tue, Jan 23, 2024 at 08:27:05PM +0100, Rafael J. Wysocki wrote:
> > On Tue, Jan 23, 2024 at 7:59 PM Russell King (Oracle)
> > <[email protected]> wrote:
> > >
> > > On Tue, Jan 23, 2024 at 07:26:57PM +0100, Rafael J. Wysocki wrote:
> > > > On Tue, Jan 23, 2024 at 7:20 PM Russell King (Oracle)
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Tue, Jan 23, 2024 at 06:43:59PM +0100, Rafael J. Wysocki wrote:
> > > > > > On Tue, Jan 23, 2024 at 5:36 PM Russell King (Oracle)
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > On Tue, Jan 23, 2024 at 05:15:54PM +0100, Rafael J. Wysocki wrote:
> > > > > > > > On Tue, Jan 23, 2024 at 2:28 PM Russell King (Oracle)
> > > > > > > > <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jan 22, 2024 at 06:00:13PM +0000, Jonathan Cameron wrote:
> > > > > > > > > > On Mon, 18 Dec 2023 21:35:16 +0100
> > > > > > > > > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > From: James Morse <[email protected]>
> > > > > > > > > > > >
> > > > > > > > > > > > The code behind ACPI_HOTPLUG_CPU allows a not-present CPU to become
> > > > > > > > > > > > present.
> > > > > > > > > > >
> > > > > > > > > > > Right.
> > > > > > > > > > >
> > > > > > > > > > > > This isn't the only use of HOTPLUG_CPU. On arm64 and riscv
> > > > > > > > > > > > CPUs can be taken offline as a power saving measure.
> > > > > > > > > > >
> > > > > > > > > > > But still there is the case in which a non-present CPU can become
> > > > > > > > > > > present, isn't it there?
> > > > > > > > > >
> > > > > > > > > > Not yet defined by the architectures (and I'm assuming it probably never will be).
> > > > > > > > > >
> > > > > > > > > > The original proposal we took to ARM was to do exactly that - they pushed
> > > > > > > > > > back hard on the basis there was no architecturally safe way to implement it.
> > > > > > > > > > Too much of the ARM arch has to exist from the start of time.
> > > > > > > > > >
> > > > > > > > > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > > > > > > > is one of the relevant threads of the kernel side of that discussion.
> > > > > > > > > >
> > > > > > > > > > Not to put specific words into the ARM architects mouths, but the
> > > > > > > > > > short description is that there is currently no demand for working
> > > > > > > > > > out how to make physical CPU hotplug possible, as such they will not
> > > > > > > > > > provide an architecturally compliant way to do it for virtual CPU hotplug and
> > > > > > > > > > another means is needed (which is why this series doesn't use the present bit
> > > > > > > > > > for that purpose and we have the Online capable bit in MADT/GICC)
> > > > > > > > > >
> > > > > > > > > > It was a 'fun' dance of several years to get to that clarification.
> > > > > > > > > > As another fun fact, the same is defined for x86, but I don't think
> > > > > > > > > > anyone has used it yet (GICC for ARM has an online capable bit in the flags to
> > > > > > > > > > enable this, which was remarkably similar to the online capable bit in the
> > > > > > > > > > flags of the Local APIC entries as added fairly recently).
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > On arm64 an offline CPU may be disabled by firmware, preventing it from
> > > > > > > > > > > > being brought back online, but it remains present throughout.
> > > > > > > > > > > >
> > > > > > > > > > > > Adding code to prevent user-space trying to online these disabled CPUs
> > > > > > > > > > > > needs some additional terminology.
> > > > > > > > > > > >
> > > > > > > > > > > > Rename the Kconfig symbol CONFIG_ACPI_HOTPLUG_PRESENT_CPU to reflect
> > > > > > > > > > > > that it makes possible CPUs present.
> > > > > > > > > > >
> > > > > > > > > > > Honestly, I don't think that this change is necessary or even useful.
> > > > > > > > > >
> > > > > > > > > > Whilst it's an attempt to avoid future confusion, the rename is
> > > > > > > > > > not something I really care about so my advice to Russell is drop
> > > > > > > > > > it unless you are attached to it!
> > > > > > > > >
> > > > > > > > > While I agree that it isn't a necessity, I don't fully agree that it
> > > > > > > > > isn't useful.
> > > > > > > > >
> > > > > > > > > One of the issues will be that while Arm64 will support hotplug vCPU,
> > > > > > > > > it won't be setting ACPI_HOTPLUG_CPU because it doesn't support
> > > > > > > > > the present bit changing. So I can see why James decided to rename
> > > > > > > > > it - because with Arm64's hotplug vCPU, the idea that ACPI_HOTPLUG_CPU
> > > > > > > > > somehow enables hotplug CPU support is now no longer true.
> > > > > > > > >
> > > > > > > > > Keeping it as ACPI_HOTPLUG_CPU makes the code less obvious, because it
> > > > > > > > > leads one to assume that it ought to be enabled for Arm64's
> > > > > > > > > implementatinon, and that could well cause issues in the future if
> > > > > > > > > people make the assumption that "ACPI_HOTPLUG_CPU" means hotplug CPU
> > > > > > > > > is supported in ACPI. It doesn't anymore.
> > > > > > > >
> > > > > > > > On x86 there is no confusion AFAICS. It's always meant "as long as
> > > > > > > > the platform supports it".
> > > > > > >
> > > > > > > That's x86, which supports physical CPU hotplug. We're introducing
> > > > > > > support for Arm64 here which doesn't support physical CPU hotplug.
> > > > > > >
> > > > > > > ACPI-based Physical Virtual
> > > > > > > Arch HOTPLUG_CPU ACPI_HOTPLUG_CPU Hotplug Hotplug Hotplug
> > > > > > > Arm64 Y N Y N Y
> > > > > > > x86 Y Y Y Y Y
> > > > > > >
> > > > > > > So ACPI_HOTPLUG_CPU becomes totally misnamed with the introduction
> > > > > > > of hotplug on Arm64.
> > > > > > >
> > > > > > > If we want to just look at stuff from an x86 perspective, then yes,
> > > > > > > it remains correct to call it ACPI_HOTPLUG_CPU. It isn't correct as
> > > > > > > soon as we add Arm64, as I already said.
> > > > > >
> > > > > > And if you rename it, it becomes less confusing for ARM64, but more
> > > > > > confusing for x86, which basically is my point.
> > > > > >
> > > > > > IMO "hotplug" covers both cases well enough and "hotplug present" is
> > > > > > only accurate for one of them.
> > > > > >
> > > > > > > And honestly, a two line quip to my reasoned argument is not IMHO
> > > > > > > an acceptable reply.
> > > > > >
> > > > > > Well, I'm not even sure how to respond to this ...
> > > > >
> > > > > The above explanation you give would have been useful...
> > > > >
> > > > > I don't see how "hotplug" covers both cases. As I've tried to point
> > > > > out many times now, ACPI_HOTPLUG_CPU is N for Arm64, yet it supports
> > > > > ACPI based hotplug. How does ACPI_HOTPLUG_CPU cover Arm64 if it's
> > > > > N there?
> > > >
> > > > But IIUC this change is preliminary for changing it (or equivalent
> > > > option with a different name) to Y, isn't it?
> > >
> > > No. As I keep saying, ACPI_HOTPLUG_CPU ends up N on Arm64 even when
> > > it supports hotplug CPU via ACPI.
> > >
> > > Even with the full Arm64 patch set here, under arch/ we still only
> > > have:
> > >
> > > arch/loongarch/Kconfig: select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR && HOTPLUG_CPU
> > > arch/x86/Kconfig: select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR && HOTPLUG_CPU
> > >
> > > To say it yet again, ACPI_HOTPLUG_(PRESENT_)CPU is *never* set on
> > > Arm64.
> >
> > Allright, so ARM64 is not going to use the code that is conditional on
> > ACPI_HOTPLUG_CPU today.
> >
> > Fair enough.
> >
> > > > > IMHO it totally doesn't, and moreover, it goes against what
> > > > > one would logically expect - and this is why I have a problem with
> > > > > your effective NAK for this change. I believe you are basically
> > > > > wrong on this for the reasons I've given - that ACPI_HOTPLUG_CPU
> > > > > will be N for Arm64 despite it supporting ACPI-based CPU hotplug.
> > > >
> > > > So I still have to understand how renaming it for all architectures
> > > > (including x86) is supposed to help.
> > > >
> > > > It will still be the same option under a different name. How does
> > > > that change things technically?
> > >
> > > Do you think that it makes any sense to have support for ACPI-based
> > > hotplug CPU
> >
> > So this is all about what you and I mean by "ACPI-based hotplug CPU".
> >
> > > *and* having it functional with a configuration symbol
> > > named "ACPI_HOTPLUG_CPU" to be set to N ? That's essentially what
> > > you are advocating for...
> >
> > Setting ACPI_HOTPLUG_CPU to N means that you are not going to compile
> > the code that is conditional on it.
> >
> > That code allows the processor driver to be removed from CPUs and
> > arch_unregister_cpu() to be called from within acpi_bus_trim() (among
> > other things). On the way up, it allows arch_register_cpu() to be
> > called from within acpi_bus_scan(). If these things are not done,
> > what I mean by "ACPI-based hotplug CPU" is not supported.
>
> Even on Arm64, arch_register_cpu() and arch_unregister_cpu() will be
> called when the CPU in the VM is hot-removed or hot-added...

In a different way, however.

2024-01-23 20:50:38

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

On Tue, Jan 23, 2024 at 08:27:05PM +0100, Rafael J. Wysocki wrote:
> On Tue, Jan 23, 2024 at 7:59 PM Russell King (Oracle)
> <[email protected]> wrote:
> >
> > On Tue, Jan 23, 2024 at 07:26:57PM +0100, Rafael J. Wysocki wrote:
> > > On Tue, Jan 23, 2024 at 7:20 PM Russell King (Oracle)
> > > <[email protected]> wrote:
> > > >
> > > > On Tue, Jan 23, 2024 at 06:43:59PM +0100, Rafael J. Wysocki wrote:
> > > > > On Tue, Jan 23, 2024 at 5:36 PM Russell King (Oracle)
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > On Tue, Jan 23, 2024 at 05:15:54PM +0100, Rafael J. Wysocki wrote:
> > > > > > > On Tue, Jan 23, 2024 at 2:28 PM Russell King (Oracle)
> > > > > > > <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jan 22, 2024 at 06:00:13PM +0000, Jonathan Cameron wrote:
> > > > > > > > > On Mon, 18 Dec 2023 21:35:16 +0100
> > > > > > > > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > From: James Morse <[email protected]>
> > > > > > > > > > >
> > > > > > > > > > > The code behind ACPI_HOTPLUG_CPU allows a not-present CPU to become
> > > > > > > > > > > present.
> > > > > > > > > >
> > > > > > > > > > Right.
> > > > > > > > > >
> > > > > > > > > > > This isn't the only use of HOTPLUG_CPU. On arm64 and riscv
> > > > > > > > > > > CPUs can be taken offline as a power saving measure.
> > > > > > > > > >
> > > > > > > > > > But still there is the case in which a non-present CPU can become
> > > > > > > > > > present, isn't it there?
> > > > > > > > >
> > > > > > > > > Not yet defined by the architectures (and I'm assuming it probably never will be).
> > > > > > > > >
> > > > > > > > > The original proposal we took to ARM was to do exactly that - they pushed
> > > > > > > > > back hard on the basis there was no architecturally safe way to implement it.
> > > > > > > > > Too much of the ARM arch has to exist from the start of time.
> > > > > > > > >
> > > > > > > > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > > > > > > is one of the relevant threads of the kernel side of that discussion.
> > > > > > > > >
> > > > > > > > > Not to put specific words into the ARM architects mouths, but the
> > > > > > > > > short description is that there is currently no demand for working
> > > > > > > > > out how to make physical CPU hotplug possible, as such they will not
> > > > > > > > > provide an architecturally compliant way to do it for virtual CPU hotplug and
> > > > > > > > > another means is needed (which is why this series doesn't use the present bit
> > > > > > > > > for that purpose and we have the Online capable bit in MADT/GICC)
> > > > > > > > >
> > > > > > > > > It was a 'fun' dance of several years to get to that clarification.
> > > > > > > > > As another fun fact, the same is defined for x86, but I don't think
> > > > > > > > > anyone has used it yet (GICC for ARM has an online capable bit in the flags to
> > > > > > > > > enable this, which was remarkably similar to the online capable bit in the
> > > > > > > > > flags of the Local APIC entries as added fairly recently).
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > On arm64 an offline CPU may be disabled by firmware, preventing it from
> > > > > > > > > > > being brought back online, but it remains present throughout.
> > > > > > > > > > >
> > > > > > > > > > > Adding code to prevent user-space trying to online these disabled CPUs
> > > > > > > > > > > needs some additional terminology.
> > > > > > > > > > >
> > > > > > > > > > > Rename the Kconfig symbol CONFIG_ACPI_HOTPLUG_PRESENT_CPU to reflect
> > > > > > > > > > > that it makes possible CPUs present.
> > > > > > > > > >
> > > > > > > > > > Honestly, I don't think that this change is necessary or even useful.
> > > > > > > > >
> > > > > > > > > Whilst it's an attempt to avoid future confusion, the rename is
> > > > > > > > > not something I really care about so my advice to Russell is drop
> > > > > > > > > it unless you are attached to it!
> > > > > > > >
> > > > > > > > While I agree that it isn't a necessity, I don't fully agree that it
> > > > > > > > isn't useful.
> > > > > > > >
> > > > > > > > One of the issues will be that while Arm64 will support hotplug vCPU,
> > > > > > > > it won't be setting ACPI_HOTPLUG_CPU because it doesn't support
> > > > > > > > the present bit changing. So I can see why James decided to rename
> > > > > > > > it - because with Arm64's hotplug vCPU, the idea that ACPI_HOTPLUG_CPU
> > > > > > > > somehow enables hotplug CPU support is now no longer true.
> > > > > > > >
> > > > > > > > Keeping it as ACPI_HOTPLUG_CPU makes the code less obvious, because it
> > > > > > > > leads one to assume that it ought to be enabled for Arm64's
> > > > > > > > implementatinon, and that could well cause issues in the future if
> > > > > > > > people make the assumption that "ACPI_HOTPLUG_CPU" means hotplug CPU
> > > > > > > > is supported in ACPI. It doesn't anymore.
> > > > > > >
> > > > > > > On x86 there is no confusion AFAICS. It's always meant "as long as
> > > > > > > the platform supports it".
> > > > > >
> > > > > > That's x86, which supports physical CPU hotplug. We're introducing
> > > > > > support for Arm64 here which doesn't support physical CPU hotplug.
> > > > > >
> > > > > > ACPI-based Physical Virtual
> > > > > > Arch HOTPLUG_CPU ACPI_HOTPLUG_CPU Hotplug Hotplug Hotplug
> > > > > > Arm64 Y N Y N Y
> > > > > > x86 Y Y Y Y Y
> > > > > >
> > > > > > So ACPI_HOTPLUG_CPU becomes totally misnamed with the introduction
> > > > > > of hotplug on Arm64.
> > > > > >
> > > > > > If we want to just look at stuff from an x86 perspective, then yes,
> > > > > > it remains correct to call it ACPI_HOTPLUG_CPU. It isn't correct as
> > > > > > soon as we add Arm64, as I already said.
> > > > >
> > > > > And if you rename it, it becomes less confusing for ARM64, but more
> > > > > confusing for x86, which basically is my point.
> > > > >
> > > > > IMO "hotplug" covers both cases well enough and "hotplug present" is
> > > > > only accurate for one of them.
> > > > >
> > > > > > And honestly, a two line quip to my reasoned argument is not IMHO
> > > > > > an acceptable reply.
> > > > >
> > > > > Well, I'm not even sure how to respond to this ...
> > > >
> > > > The above explanation you give would have been useful...
> > > >
> > > > I don't see how "hotplug" covers both cases. As I've tried to point
> > > > out many times now, ACPI_HOTPLUG_CPU is N for Arm64, yet it supports
> > > > ACPI based hotplug. How does ACPI_HOTPLUG_CPU cover Arm64 if it's
> > > > N there?
> > >
> > > But IIUC this change is preliminary for changing it (or equivalent
> > > option with a different name) to Y, isn't it?
> >
> > No. As I keep saying, ACPI_HOTPLUG_CPU ends up N on Arm64 even when
> > it supports hotplug CPU via ACPI.
> >
> > Even with the full Arm64 patch set here, under arch/ we still only
> > have:
> >
> > arch/loongarch/Kconfig: select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR && HOTPLUG_CPU
> > arch/x86/Kconfig: select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR && HOTPLUG_CPU
> >
> > To say it yet again, ACPI_HOTPLUG_(PRESENT_)CPU is *never* set on
> > Arm64.
>
> Allright, so ARM64 is not going to use the code that is conditional on
> ACPI_HOTPLUG_CPU today.
>
> Fair enough.
>
> > > > IMHO it totally doesn't, and moreover, it goes against what
> > > > one would logically expect - and this is why I have a problem with
> > > > your effective NAK for this change. I believe you are basically
> > > > wrong on this for the reasons I've given - that ACPI_HOTPLUG_CPU
> > > > will be N for Arm64 despite it supporting ACPI-based CPU hotplug.
> > >
> > > So I still have to understand how renaming it for all architectures
> > > (including x86) is supposed to help.
> > >
> > > It will still be the same option under a different name. How does
> > > that change things technically?
> >
> > Do you think that it makes any sense to have support for ACPI-based
> > hotplug CPU
>
> So this is all about what you and I mean by "ACPI-based hotplug CPU".
>
> > *and* having it functional with a configuration symbol
> > named "ACPI_HOTPLUG_CPU" to be set to N ? That's essentially what
> > you are advocating for...
>
> Setting ACPI_HOTPLUG_CPU to N means that you are not going to compile
> the code that is conditional on it.
>
> That code allows the processor driver to be removed from CPUs and
> arch_unregister_cpu() to be called from within acpi_bus_trim() (among
> other things). On the way up, it allows arch_register_cpu() to be
> called from within acpi_bus_scan(). If these things are not done,
> what I mean by "ACPI-based hotplug CPU" is not supported.

Even on Arm64, arch_register_cpu() and arch_unregister_cpu() will be
called when the CPU in the VM is hot-removed or hot-added...

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-01-23 20:57:45

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

On Tue, Jan 23, 2024 at 09:17:18PM +0100, Rafael J. Wysocki wrote:
> On Tue, Jan 23, 2024 at 9:09 PM Russell King (Oracle)
> <[email protected]> wrote:
> >
> > On Tue, Jan 23, 2024 at 08:27:05PM +0100, Rafael J. Wysocki wrote:
> > > On Tue, Jan 23, 2024 at 7:59 PM Russell King (Oracle)
> > > <[email protected]> wrote:
> > > >
> > > > On Tue, Jan 23, 2024 at 07:26:57PM +0100, Rafael J. Wysocki wrote:
> > > > > On Tue, Jan 23, 2024 at 7:20 PM Russell King (Oracle)
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > On Tue, Jan 23, 2024 at 06:43:59PM +0100, Rafael J. Wysocki wrote:
> > > > > > > On Tue, Jan 23, 2024 at 5:36 PM Russell King (Oracle)
> > > > > > > <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jan 23, 2024 at 05:15:54PM +0100, Rafael J. Wysocki wrote:
> > > > > > > > > On Tue, Jan 23, 2024 at 2:28 PM Russell King (Oracle)
> > > > > > > > > <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Jan 22, 2024 at 06:00:13PM +0000, Jonathan Cameron wrote:
> > > > > > > > > > > On Mon, 18 Dec 2023 21:35:16 +0100
> > > > > > > > > > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > From: James Morse <[email protected]>
> > > > > > > > > > > > >
> > > > > > > > > > > > > The code behind ACPI_HOTPLUG_CPU allows a not-present CPU to become
> > > > > > > > > > > > > present.
> > > > > > > > > > > >
> > > > > > > > > > > > Right.
> > > > > > > > > > > >
> > > > > > > > > > > > > This isn't the only use of HOTPLUG_CPU. On arm64 and riscv
> > > > > > > > > > > > > CPUs can be taken offline as a power saving measure.
> > > > > > > > > > > >
> > > > > > > > > > > > But still there is the case in which a non-present CPU can become
> > > > > > > > > > > > present, isn't it there?
> > > > > > > > > > >
> > > > > > > > > > > Not yet defined by the architectures (and I'm assuming it probably never will be).
> > > > > > > > > > >
> > > > > > > > > > > The original proposal we took to ARM was to do exactly that - they pushed
> > > > > > > > > > > back hard on the basis there was no architecturally safe way to implement it.
> > > > > > > > > > > Too much of the ARM arch has to exist from the start of time.
> > > > > > > > > > >
> > > > > > > > > > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > > > > > > > > is one of the relevant threads of the kernel side of that discussion.
> > > > > > > > > > >
> > > > > > > > > > > Not to put specific words into the ARM architects mouths, but the
> > > > > > > > > > > short description is that there is currently no demand for working
> > > > > > > > > > > out how to make physical CPU hotplug possible, as such they will not
> > > > > > > > > > > provide an architecturally compliant way to do it for virtual CPU hotplug and
> > > > > > > > > > > another means is needed (which is why this series doesn't use the present bit
> > > > > > > > > > > for that purpose and we have the Online capable bit in MADT/GICC)
> > > > > > > > > > >
> > > > > > > > > > > It was a 'fun' dance of several years to get to that clarification.
> > > > > > > > > > > As another fun fact, the same is defined for x86, but I don't think
> > > > > > > > > > > anyone has used it yet (GICC for ARM has an online capable bit in the flags to
> > > > > > > > > > > enable this, which was remarkably similar to the online capable bit in the
> > > > > > > > > > > flags of the Local APIC entries as added fairly recently).
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > On arm64 an offline CPU may be disabled by firmware, preventing it from
> > > > > > > > > > > > > being brought back online, but it remains present throughout.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Adding code to prevent user-space trying to online these disabled CPUs
> > > > > > > > > > > > > needs some additional terminology.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Rename the Kconfig symbol CONFIG_ACPI_HOTPLUG_PRESENT_CPU to reflect
> > > > > > > > > > > > > that it makes possible CPUs present.
> > > > > > > > > > > >
> > > > > > > > > > > > Honestly, I don't think that this change is necessary or even useful.
> > > > > > > > > > >
> > > > > > > > > > > Whilst it's an attempt to avoid future confusion, the rename is
> > > > > > > > > > > not something I really care about so my advice to Russell is drop
> > > > > > > > > > > it unless you are attached to it!
> > > > > > > > > >
> > > > > > > > > > While I agree that it isn't a necessity, I don't fully agree that it
> > > > > > > > > > isn't useful.
> > > > > > > > > >
> > > > > > > > > > One of the issues will be that while Arm64 will support hotplug vCPU,
> > > > > > > > > > it won't be setting ACPI_HOTPLUG_CPU because it doesn't support
> > > > > > > > > > the present bit changing. So I can see why James decided to rename
> > > > > > > > > > it - because with Arm64's hotplug vCPU, the idea that ACPI_HOTPLUG_CPU
> > > > > > > > > > somehow enables hotplug CPU support is now no longer true.
> > > > > > > > > >
> > > > > > > > > > Keeping it as ACPI_HOTPLUG_CPU makes the code less obvious, because it
> > > > > > > > > > leads one to assume that it ought to be enabled for Arm64's
> > > > > > > > > > implementatinon, and that could well cause issues in the future if
> > > > > > > > > > people make the assumption that "ACPI_HOTPLUG_CPU" means hotplug CPU
> > > > > > > > > > is supported in ACPI. It doesn't anymore.
> > > > > > > > >
> > > > > > > > > On x86 there is no confusion AFAICS. It's always meant "as long as
> > > > > > > > > the platform supports it".
> > > > > > > >
> > > > > > > > That's x86, which supports physical CPU hotplug. We're introducing
> > > > > > > > support for Arm64 here which doesn't support physical CPU hotplug.
> > > > > > > >
> > > > > > > > ACPI-based Physical Virtual
> > > > > > > > Arch HOTPLUG_CPU ACPI_HOTPLUG_CPU Hotplug Hotplug Hotplug
> > > > > > > > Arm64 Y N Y N Y
> > > > > > > > x86 Y Y Y Y Y
> > > > > > > >
> > > > > > > > So ACPI_HOTPLUG_CPU becomes totally misnamed with the introduction
> > > > > > > > of hotplug on Arm64.
> > > > > > > >
> > > > > > > > If we want to just look at stuff from an x86 perspective, then yes,
> > > > > > > > it remains correct to call it ACPI_HOTPLUG_CPU. It isn't correct as
> > > > > > > > soon as we add Arm64, as I already said.
> > > > > > >
> > > > > > > And if you rename it, it becomes less confusing for ARM64, but more
> > > > > > > confusing for x86, which basically is my point.
> > > > > > >
> > > > > > > IMO "hotplug" covers both cases well enough and "hotplug present" is
> > > > > > > only accurate for one of them.
> > > > > > >
> > > > > > > > And honestly, a two line quip to my reasoned argument is not IMHO
> > > > > > > > an acceptable reply.
> > > > > > >
> > > > > > > Well, I'm not even sure how to respond to this ...
> > > > > >
> > > > > > The above explanation you give would have been useful...
> > > > > >
> > > > > > I don't see how "hotplug" covers both cases. As I've tried to point
> > > > > > out many times now, ACPI_HOTPLUG_CPU is N for Arm64, yet it supports
> > > > > > ACPI based hotplug. How does ACPI_HOTPLUG_CPU cover Arm64 if it's
> > > > > > N there?
> > > > >
> > > > > But IIUC this change is preliminary for changing it (or equivalent
> > > > > option with a different name) to Y, isn't it?
> > > >
> > > > No. As I keep saying, ACPI_HOTPLUG_CPU ends up N on Arm64 even when
> > > > it supports hotplug CPU via ACPI.
> > > >
> > > > Even with the full Arm64 patch set here, under arch/ we still only
> > > > have:
> > > >
> > > > arch/loongarch/Kconfig: select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR && HOTPLUG_CPU
> > > > arch/x86/Kconfig: select ACPI_HOTPLUG_PRESENT_CPU if ACPI_PROCESSOR && HOTPLUG_CPU
> > > >
> > > > To say it yet again, ACPI_HOTPLUG_(PRESENT_)CPU is *never* set on
> > > > Arm64.
> > >
> > > Allright, so ARM64 is not going to use the code that is conditional on
> > > ACPI_HOTPLUG_CPU today.
> > >
> > > Fair enough.
> > >
> > > > > > IMHO it totally doesn't, and moreover, it goes against what
> > > > > > one would logically expect - and this is why I have a problem with
> > > > > > your effective NAK for this change. I believe you are basically
> > > > > > wrong on this for the reasons I've given - that ACPI_HOTPLUG_CPU
> > > > > > will be N for Arm64 despite it supporting ACPI-based CPU hotplug.
> > > > >
> > > > > So I still have to understand how renaming it for all architectures
> > > > > (including x86) is supposed to help.
> > > > >
> > > > > It will still be the same option under a different name. How does
> > > > > that change things technically?
> > > >
> > > > Do you think that it makes any sense to have support for ACPI-based
> > > > hotplug CPU
> > >
> > > So this is all about what you and I mean by "ACPI-based hotplug CPU".
> > >
> > > > *and* having it functional with a configuration symbol
> > > > named "ACPI_HOTPLUG_CPU" to be set to N ? That's essentially what
> > > > you are advocating for...
> > >
> > > Setting ACPI_HOTPLUG_CPU to N means that you are not going to compile
> > > the code that is conditional on it.
> > >
> > > That code allows the processor driver to be removed from CPUs and
> > > arch_unregister_cpu() to be called from within acpi_bus_trim() (among
> > > other things). On the way up, it allows arch_register_cpu() to be
> > > called from within acpi_bus_scan(). If these things are not done,
> > > what I mean by "ACPI-based hotplug CPU" is not supported.
> >
> > Even on Arm64, arch_register_cpu() and arch_unregister_cpu() will be
> > called when the CPU in the VM is hot-removed or hot-added...
>
> In a different way, however.

This is getting tiresome. The goal posts keep moving. This isn't a
discussion, this is a "you're wrong and I'm going to keep changing my
argument if you agree with me to make you always wrong".

Sorry, no point continuing this.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-01-23 22:12:56

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

On Tue, Jan 23, 2024 at 10:12 PM Russell King (Oracle)
<[email protected]> wrote:
>
> On Tue, Jan 23, 2024 at 08:57:08PM +0000, Russell King (Oracle) wrote:
> > On Tue, Jan 23, 2024 at 09:17:18PM +0100, Rafael J. Wysocki wrote:
> > > On Tue, Jan 23, 2024 at 9:09 PM Russell King (Oracle)
> > > <[email protected]> wrote:

[cut]

> > Sorry, no point continuing this.
>
> Let me be clear why I'm exhasperated by this.
>
> I've been giving you a technical argument (Arm64 supporting ACPI
> hotplug CPU, but ACPI_HOTPLUG_CPU=n) for many many emails. You
> seemed to misunderstand that, expecting ACPI_HOTPLUG_CPU to become
> Y later in the series.
>
> When that became clear that it wasn't, you've changed tack. It then
> became about whether two functions get called or not.
>
> When I pointed out that they are still going to be called, oh no,
> it's not about whether those two functions will be called but
> how they get called.

As I've said already in this thread, it is all about what "ACPI-based
CPU hotplug" means to each of us.

I know what it means to me: Running the code that is compiled when
ACPI_HOTPLUG_CPU is set via the processor scan handler.

I'm not entirely sure what it means to you.

You are saying that the config option name needs to be changed,
because it is going to stay N for ARM64 and it will support
"ACPI-based CPU hotplug" and I'm not sure what exactly you mean by
this.

To me, this just means that ARM64 is not going to use the processor
scan handler in the way it is used on x86.

> Essentially, what this comes down to is that _you_ have no technical
> argument against the change, just _you_ don't personally want it
> and it doesn't matter what justification I come up with, you're
> always going to tell me something different.

Sorry, but I'm just not convinced by your justification.

> So why not state that you personally don't want it in the first
> place? Why this game of cat and mouse and the constantly changing
> arguments. I guess it's to waste developers time.
>
> Well, I'm calling you out for this, because I'm that pissed off
> at the amount of time you're causing to be wasted.

And I don't have to suffer this kind of abuse. Sorry.

2024-01-24 08:45:59

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH RFC v3 05/21] ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'

On Tue, Jan 23, 2024 at 11:05:43PM +0100, Rafael J. Wysocki wrote:
> > So why not state that you personally don't want it in the first
> > place? Why this game of cat and mouse and the constantly changing
> > arguments. I guess it's to waste developers time.
> >
> > Well, I'm calling you out for this, because I'm that pissed off
> > at the amount of time you're causing to be wasted.
>
> And I don't have to suffer this kind of abuse. Sorry.

And I've had enough of this crap, so I'm not walking away. Good
riddance.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-01-25 13:57:22

by Miguel Luis

[permalink] [raw]
Subject: Re: [PATCH RFC v3 03/21] ACPI: processor: Register CPUs that are online, but not described in the DSDT

Hi

> On 23 Jan 2024, at 08:27, Jonathan Cameron <[email protected]> wrote:
>
> On Mon, 22 Jan 2024 17:30:05 +0000
> "Russell King (Oracle)" <[email protected]> wrote:
>
>> On Mon, Jan 22, 2024 at 05:22:46PM +0100, Rafael J. Wysocki wrote:
>>> On Mon, Jan 22, 2024 at 5:02 PM Jonathan Cameron
>>> <[email protected]> wrote:
>>>>
>>>> On Mon, 15 Jan 2024 11:06:29 +0000
>>>> "Russell King (Oracle)" <[email protected]> wrote:
>>>>
>>>>> On Mon, Dec 18, 2023 at 09:22:03PM +0100, Rafael J. Wysocki wrote:
>>>>>> On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
>>>>>>>
>>>>>>> From: James Morse <[email protected]>
>>>>>>>
>>>>>>> ACPI has two descriptions of CPUs, one in the MADT/APIC table, the other
>>>>>>> in the DSDT. Both are required. (ACPI 6.5's 8.4 "Declaring Processors"
>>>>>>> says "Each processor in the system must be declared in the ACPI
>>>>>>> namespace"). Having two descriptions allows firmware authors to get
>>>>>>> this wrong.
>>>>>>>
>>>>>>> If CPUs are described in the MADT/APIC, they will be brought online
>>>>>>> early during boot. Once the register_cpu() calls are moved to ACPI,
>>>>>>> they will be based on the DSDT description of the CPUs. When CPUs are
>>>>>>> missing from the DSDT description, they will end up online, but not
>>>>>>> registered.
>>>>>>>
>>>>>>> Add a helper that runs after acpi_init() has completed to register
>>>>>>> CPUs that are online, but weren't found in the DSDT. Any CPU that
>>>>>>> is registered by this code triggers a firmware-bug warning and kernel
>>>>>>> taint.
>>>>>>>
>>>>>>> Qemu TCG only describes the first CPU in the DSDT, unless cpu-hotplug
>>>>>>> is configured.
>>>>>>
>>>>>> So why is this a kernel problem?
>>>>>
>>>>> So what are you proposing should be the behaviour here? What this
>>>>> statement seems to be saying is that QEMU as it exists today only
>>>>> describes the first CPU in DSDT.
>>>>
>>>> This confuses me somewhat, because I'm far from sure which machines this
>>>> is true for in QEMU. I'm guessing it's a legacy thing with
>>>> some old distro version of QEMU - so we'll have to paper over it anyway
>>>> but for current QEMU I'm not sure it's true.
>>>>
>>>> Helpfully there are a bunch of ACPI table tests so I've been checking
>>>> through all the multi CPU cases.
>>>>
>>>> CPU hotplug not enabled.
>>>> pc/DSDT.dimmpxm - 4x Processor entries. -smp 4
>>>> pc/DSDT.acpihmat - 2x Processor entries. -smp 2
>>>> q35/DSDT.acpihmat - 2x Processor entries. -smp 2
>>>> virt/DSDT.acpihmatvirt - 4x ACPI0007 entries -smp 4
>>>> q35/DSDT.acpihmat-noinitiator - 4 x Processor () entries -smp 4
>>>> virt/DSDT.topology - 8x ACPI0007 entries
>>>>
>>>> I've also looked at the code and we have various types of
>>>> CPU hotplug on x86 but they all build appropriate numbers of
>>>> Processor() entries in DSDT.
>>>> Arm likewise seems to build the right number of ACPI0007 entries
>>>> (and doesn't yet have CPU HP support).
>>>>
>>>> If anyone can add a reference on why this is needed that would be very
>>>> helpful.
>>>
>>> Yes, it would.
>>>
>>> Personally, I would prefer to assume that it is not necessary until it
>>> turns out that (1) there is firmware with this issue actually in use
>>> and (2) updating the firmware in question to follow the specification
>>> is not practical.
>>>
>>> Otherwise, we'd make it easier to ship non-compliant firmware for no
>>> good reason.
>>
>> If Salil can't come up with a reason, then I'm in favour of dropping
>> the patch like already done for patch 2. If the code change serves no
>> useful purpose, there's no point in making the change.
>>
>
> Salil's out today, but I've messaged him to follow up later in the week.
>
> It 'might' be the odd cold plug path where QEMU half comes up, then extra
> CPUs are added, then it boots. (used by some orchestration frameworks)
> I don't have a set up for that and I won't get to creating one today anyway
> (we all love start of the year planning workshops!)
>
> I've +CC'd a few people have run tests on the various iterations of this
> work in the past. Maybe one of them can shed some light on this?
>

IIUC, this patch covers a scenario for non compliant firmware and in which my
tests for AArch64 using RFC v2 have been unable to trigger its error message so
far. This does not mean, however, this patch should not be taken forward though.

It seems benevolent enough detecting non compliant firmware and still proceed
while having whoever uses that firmware to get to know that.

I'm not sure, however, whether the reference to a specific VMM should be in the
commit message though. That might not be anything to do with the kernel so a
more meaningful rewrite on this separation of concerns could be useful.

Miguel

> Jonathan


2024-01-25 14:46:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 03/21] ACPI: processor: Register CPUs that are online, but not described in the DSDT

Hi,

On Thu, Jan 25, 2024 at 2:56 PM Miguel Luis <[email protected]> wrote:
>
> Hi
>
> > On 23 Jan 2024, at 08:27, Jonathan Cameron <[email protected]> wrote:
> >
> > On Mon, 22 Jan 2024 17:30:05 +0000
> > "Russell King (Oracle)" <[email protected]> wrote:
> >
> >> On Mon, Jan 22, 2024 at 05:22:46PM +0100, Rafael J. Wysocki wrote:
> >>> On Mon, Jan 22, 2024 at 5:02 PM Jonathan Cameron
> >>> <[email protected]> wrote:
> >>>>
> >>>> On Mon, 15 Jan 2024 11:06:29 +0000
> >>>> "Russell King (Oracle)" <[email protected]> wrote:
> >>>>
> >>>>> On Mon, Dec 18, 2023 at 09:22:03PM +0100, Rafael J. Wysocki wrote:
> >>>>>> On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> >>>>>>>
> >>>>>>> From: James Morse <[email protected]>
> >>>>>>>
> >>>>>>> ACPI has two descriptions of CPUs, one in the MADT/APIC table, the other
> >>>>>>> in the DSDT. Both are required. (ACPI 6.5's 8.4 "Declaring Processors"
> >>>>>>> says "Each processor in the system must be declared in the ACPI
> >>>>>>> namespace"). Having two descriptions allows firmware authors to get
> >>>>>>> this wrong.
> >>>>>>>
> >>>>>>> If CPUs are described in the MADT/APIC, they will be brought online
> >>>>>>> early during boot. Once the register_cpu() calls are moved to ACPI,
> >>>>>>> they will be based on the DSDT description of the CPUs. When CPUs are
> >>>>>>> missing from the DSDT description, they will end up online, but not
> >>>>>>> registered.
> >>>>>>>
> >>>>>>> Add a helper that runs after acpi_init() has completed to register
> >>>>>>> CPUs that are online, but weren't found in the DSDT. Any CPU that
> >>>>>>> is registered by this code triggers a firmware-bug warning and kernel
> >>>>>>> taint.
> >>>>>>>
> >>>>>>> Qemu TCG only describes the first CPU in the DSDT, unless cpu-hotplug
> >>>>>>> is configured.
> >>>>>>
> >>>>>> So why is this a kernel problem?
> >>>>>
> >>>>> So what are you proposing should be the behaviour here? What this
> >>>>> statement seems to be saying is that QEMU as it exists today only
> >>>>> describes the first CPU in DSDT.
> >>>>
> >>>> This confuses me somewhat, because I'm far from sure which machines this
> >>>> is true for in QEMU. I'm guessing it's a legacy thing with
> >>>> some old distro version of QEMU - so we'll have to paper over it anyway
> >>>> but for current QEMU I'm not sure it's true.
> >>>>
> >>>> Helpfully there are a bunch of ACPI table tests so I've been checking
> >>>> through all the multi CPU cases.
> >>>>
> >>>> CPU hotplug not enabled.
> >>>> pc/DSDT.dimmpxm - 4x Processor entries. -smp 4
> >>>> pc/DSDT.acpihmat - 2x Processor entries. -smp 2
> >>>> q35/DSDT.acpihmat - 2x Processor entries. -smp 2
> >>>> virt/DSDT.acpihmatvirt - 4x ACPI0007 entries -smp 4
> >>>> q35/DSDT.acpihmat-noinitiator - 4 x Processor () entries -smp 4
> >>>> virt/DSDT.topology - 8x ACPI0007 entries
> >>>>
> >>>> I've also looked at the code and we have various types of
> >>>> CPU hotplug on x86 but they all build appropriate numbers of
> >>>> Processor() entries in DSDT.
> >>>> Arm likewise seems to build the right number of ACPI0007 entries
> >>>> (and doesn't yet have CPU HP support).
> >>>>
> >>>> If anyone can add a reference on why this is needed that would be very
> >>>> helpful.
> >>>
> >>> Yes, it would.
> >>>
> >>> Personally, I would prefer to assume that it is not necessary until it
> >>> turns out that (1) there is firmware with this issue actually in use
> >>> and (2) updating the firmware in question to follow the specification
> >>> is not practical.
> >>>
> >>> Otherwise, we'd make it easier to ship non-compliant firmware for no
> >>> good reason.
> >>
> >> If Salil can't come up with a reason, then I'm in favour of dropping
> >> the patch like already done for patch 2. If the code change serves no
> >> useful purpose, there's no point in making the change.
> >>
> >
> > Salil's out today, but I've messaged him to follow up later in the week.
> >
> > It 'might' be the odd cold plug path where QEMU half comes up, then extra
> > CPUs are added, then it boots. (used by some orchestration frameworks)
> > I don't have a set up for that and I won't get to creating one today anyway
> > (we all love start of the year planning workshops!)
> >
> > I've +CC'd a few people have run tests on the various iterations of this
> > work in the past. Maybe one of them can shed some light on this?
> >
>
> IIUC, this patch covers a scenario for non compliant firmware and in which my
> tests for AArch64 using RFC v2 have been unable to trigger its error message so
> far. This does not mean, however, this patch should not be taken forward though.
>
> It seems benevolent enough detecting non compliant firmware and still proceed
> while having whoever uses that firmware to get to know that.

There is one issue with this approach, though.

If this is done by Linux and Linux is used as a main testing vehicle
for whoever produced that firmware, it may pass the tests and be
shipped causing a problem for the rest of the industry (because other
operating systems will not support that firmware and now they will be
put in an awkward position).

I've seen enough breakage resulting from a similar policy in some
other OS and with Linux on the receiving end that I'd rather avoid
doing this to someone else.

So if the firmware is not compliant, the best way to go is to ask
whoever ships it to please fix their stuff, or if other OSes already
work around the non-compliance, it's time to update the spec to
reflect the reality (aka "industry practice").

Thanks!

2024-01-29 13:04:17

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v3 03/21] ACPI: processor: Register CPUs that are online, but not described in the DSDT

On Tue, 23 Jan 2024 09:27:25 +0000
Jonathan Cameron <[email protected]> wrote:

> On Mon, 22 Jan 2024 17:30:05 +0000
> "Russell King (Oracle)" <[email protected]> wrote:
>
> > On Mon, Jan 22, 2024 at 05:22:46PM +0100, Rafael J. Wysocki wrote:
> > > On Mon, Jan 22, 2024 at 5:02 PM Jonathan Cameron
> > > <[email protected]> wrote:
> > > >
> > > > On Mon, 15 Jan 2024 11:06:29 +0000
> > > > "Russell King (Oracle)" <[email protected]> wrote:
> > > >
> > > > > On Mon, Dec 18, 2023 at 09:22:03PM +0100, Rafael J. Wysocki wrote:
> > > > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <[email protected]> wrote:
> > > > > > >
> > > > > > > From: James Morse <[email protected]>
> > > > > > >
> > > > > > > ACPI has two descriptions of CPUs, one in the MADT/APIC table, the other
> > > > > > > in the DSDT. Both are required. (ACPI 6.5's 8.4 "Declaring Processors"
> > > > > > > says "Each processor in the system must be declared in the ACPI
> > > > > > > namespace"). Having two descriptions allows firmware authors to get
> > > > > > > this wrong.
> > > > > > >
> > > > > > > If CPUs are described in the MADT/APIC, they will be brought online
> > > > > > > early during boot. Once the register_cpu() calls are moved to ACPI,
> > > > > > > they will be based on the DSDT description of the CPUs. When CPUs are
> > > > > > > missing from the DSDT description, they will end up online, but not
> > > > > > > registered.
> > > > > > >
> > > > > > > Add a helper that runs after acpi_init() has completed to register
> > > > > > > CPUs that are online, but weren't found in the DSDT. Any CPU that
> > > > > > > is registered by this code triggers a firmware-bug warning and kernel
> > > > > > > taint.
> > > > > > >
> > > > > > > Qemu TCG only describes the first CPU in the DSDT, unless cpu-hotplug
> > > > > > > is configured.
> > > > > >
> > > > > > So why is this a kernel problem?
> > > > >
> > > > > So what are you proposing should be the behaviour here? What this
> > > > > statement seems to be saying is that QEMU as it exists today only
> > > > > describes the first CPU in DSDT.
> > > >
> > > > This confuses me somewhat, because I'm far from sure which machines this
> > > > is true for in QEMU. I'm guessing it's a legacy thing with
> > > > some old distro version of QEMU - so we'll have to paper over it anyway
> > > > but for current QEMU I'm not sure it's true.
> > > >
> > > > Helpfully there are a bunch of ACPI table tests so I've been checking
> > > > through all the multi CPU cases.
> > > >
> > > > CPU hotplug not enabled.
> > > > pc/DSDT.dimmpxm - 4x Processor entries. -smp 4
> > > > pc/DSDT.acpihmat - 2x Processor entries. -smp 2
> > > > q35/DSDT.acpihmat - 2x Processor entries. -smp 2
> > > > virt/DSDT.acpihmatvirt - 4x ACPI0007 entries -smp 4
> > > > q35/DSDT.acpihmat-noinitiator - 4 x Processor () entries -smp 4
> > > > virt/DSDT.topology - 8x ACPI0007 entries
> > > >
> > > > I've also looked at the code and we have various types of
> > > > CPU hotplug on x86 but they all build appropriate numbers of
> > > > Processor() entries in DSDT.
> > > > Arm likewise seems to build the right number of ACPI0007 entries
> > > > (and doesn't yet have CPU HP support).
> > > >
> > > > If anyone can add a reference on why this is needed that would be very
> > > > helpful.
> > >
> > > Yes, it would.
> > >
> > > Personally, I would prefer to assume that it is not necessary until it
> > > turns out that (1) there is firmware with this issue actually in use
> > > and (2) updating the firmware in question to follow the specification
> > > is not practical.
> > >
> > > Otherwise, we'd make it easier to ship non-compliant firmware for no
> > > good reason.
> >
> > If Salil can't come up with a reason, then I'm in favour of dropping
> > the patch like already done for patch 2. If the code change serves no
> > useful purpose, there's no point in making the change.
> >
>
> Salil's out today, but I've messaged him to follow up later in the week.
>
> It 'might' be the odd cold plug path where QEMU half comes up, then extra
> CPUs are added, then it boots. (used by some orchestration frameworks)

I poked this on x86 - it only applies with hotplug enabled anyway so
same result as doing the hotplug later - All possible Processor() entries
already exist in DSDT. Hence this isn't the source of the mysterious
broken configuration.

If anyone does poke this path, the old discussion between James
and Salil provides some instructions (mostly the thread is about
another issue).
https://op-lists.linaro.org/archives/list/[email protected]/thread/DNAGB2FB5ALVLV2BYWYOCLKGNF77PNXS/

Also on x86 a test involving smp 2,max-cpus=4 and adding cpu-id 3
(so skipping 2) doesn't boot. (this is without Salil's QEMU patches).
I guess there are some well known rules in there that I don't know about
and QEMU isn't preventing people shooting themselves in the foot.

As I'm concerned, drop this patch.
If there are platforms out there doing this wrong they'll surface once
we get this into more test farms (so linux-next). If we need this
'fix' we can apply it when we have a problem firmware to point at.

Thanks,

Jonathan

> I don't have a set up for that and I won't get to creating one today anyway
> (we all love start of the year planning workshops!)

>
> I've +CC'd a few people have run tests on the various iterations of this
> work in the past. Maybe one of them can shed some light on this?
>
> Jonathan
>
>
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


2024-01-29 15:06:10

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 01/21] ACPI: Only enumerate enabled (or functional) devices

On Mon, Jan 29, 2024 at 3:55 PM Russell King (Oracle)
<[email protected]> wrote:
>
> Hi Jonathan,
>
> On Fri, Jan 12, 2024 at 11:52:05AM +0000, Jonathan Cameron wrote:
> > On Thu, 11 Jan 2024 10:26:15 +0000
> > "Russell King (Oracle)" <[email protected]> wrote:
> > > @@ -2381,16 +2388,38 @@ EXPORT_SYMBOL_GPL(acpi_dev_clear_dependencies);
> > > * acpi_dev_ready_for_enumeration - Check if the ACPI device is ready for enumeration
> > > * @device: Pointer to the &struct acpi_device to check
> > > *
> > > - * Check if the device is present and has no unmet dependencies.
> > > + * Check if the device is functional or enabled and has no unmet dependencies.
> > > *
> > > - * Return true if the device is ready for enumeratino. Otherwise, return false.
> > > + * Return true if the device is ready for enumeration. Otherwise, return false.
> > > */
> > > bool acpi_dev_ready_for_enumeration(const struct acpi_device *device)
> > > {
> > > if (device->flags.honor_deps && device->dep_unmet)
> > > return false;
> > >
> > > - return acpi_device_is_present(device);
> > > + /*
> > > + * ACPI 6.5's 6.3.7 "_STA (Device Status)" allows firmware to return
> > > + * (!present && functional) for certain types of devices that should be
> > > + * enumerated. Note that the enabled bit should not be set unless the
> > > + * present bit is set.
> > > + *
> > > + * However, limit this only to processor devices to reduce possible
> > > + * regressions with firmware.
> > > + */
> > > + if (device->status.functional)
> > > + return true;
>
> I have a report from within Oracle that this causes testing failures
> with QEMU using -smp cpus=2,maxcpus=4. I think it needs to be:
>
> if (!device->status.present)
> return device->status.functional;
>
> if (device->status.enabled)
> return true;
>
> return !acpi_device_is_processor(device);

The above is fine by me.

> So we can better understand the history here, let's list it as a
> truth table. P=present, F=functional, E=enabled, Orig=how the code
> is in mainline, James=James' original proposal, Rafael=the proposed
> replacement but seems to be buggy, Rmk=the fixed version that passes
> tests:
>
> P F E Orig James Rafael Rmk
> 0 0 0 0 0 0 0
> 0 0 1 0 0 0 0
> 0 1 0 1 1 1 1
> 0 1 1 1 0 1 1
> 1 0 0 1 0 !processor !processor
> 1 0 1 1 1 1 1
> 1 1 0 1 0 1 !processor
> 1 1 1 1 1 1 1
>
> Any objections to this?

So AFAIAC it can return false if not enabled, but present and
functional. [Side note: I'm wondering what "functional" means then,
but whatever.]

2024-01-29 15:35:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC v3 01/21] ACPI: Only enumerate enabled (or functional) devices

On Mon, Jan 29, 2024 at 4:17 PM Russell King (Oracle)
<[email protected]> wrote:
>
> On Mon, Jan 29, 2024 at 04:05:42PM +0100, Rafael J. Wysocki wrote:
> > On Mon, Jan 29, 2024 at 3:55 PM Russell King (Oracle)
> > <[email protected]> wrote:
> > >
> > > Hi Jonathan,
> > >
> > > On Fri, Jan 12, 2024 at 11:52:05AM +0000, Jonathan Cameron wrote:
> > > > On Thu, 11 Jan 2024 10:26:15 +0000
> > > > "Russell King (Oracle)" <[email protected]> wrote:
> > > > > @@ -2381,16 +2388,38 @@ EXPORT_SYMBOL_GPL(acpi_dev_clear_dependencies);
> > > > > * acpi_dev_ready_for_enumeration - Check if the ACPI device is ready for enumeration
> > > > > * @device: Pointer to the &struct acpi_device to check
> > > > > *
> > > > > - * Check if the device is present and has no unmet dependencies.
> > > > > + * Check if the device is functional or enabled and has no unmet dependencies.
> > > > > *
> > > > > - * Return true if the device is ready for enumeratino. Otherwise, return false.
> > > > > + * Return true if the device is ready for enumeration. Otherwise, return false.
> > > > > */
> > > > > bool acpi_dev_ready_for_enumeration(const struct acpi_device *device)
> > > > > {
> > > > > if (device->flags.honor_deps && device->dep_unmet)
> > > > > return false;
> > > > >
> > > > > - return acpi_device_is_present(device);
> > > > > + /*
> > > > > + * ACPI 6.5's 6.3.7 "_STA (Device Status)" allows firmware to return
> > > > > + * (!present && functional) for certain types of devices that should be
> > > > > + * enumerated. Note that the enabled bit should not be set unless the
> > > > > + * present bit is set.
> > > > > + *
> > > > > + * However, limit this only to processor devices to reduce possible
> > > > > + * regressions with firmware.
> > > > > + */
> > > > > + if (device->status.functional)
> > > > > + return true;
> > >
> > > I have a report from within Oracle that this causes testing failures
> > > with QEMU using -smp cpus=2,maxcpus=4. I think it needs to be:
> > >
> > > if (!device->status.present)
> > > return device->status.functional;
> > >
> > > if (device->status.enabled)
> > > return true;
> > >
> > > return !acpi_device_is_processor(device);
> >
> > The above is fine by me.
> >
> > > So we can better understand the history here, let's list it as a
> > > truth table. P=present, F=functional, E=enabled, Orig=how the code
> > > is in mainline, James=James' original proposal, Rafael=the proposed
> > > replacement but seems to be buggy, Rmk=the fixed version that passes
> > > tests:
> > >
> > > P F E Orig James Rafael Rmk
> > > 0 0 0 0 0 0 0
> > > 0 0 1 0 0 0 0
> > > 0 1 0 1 1 1 1
> > > 0 1 1 1 0 1 1
> > > 1 0 0 1 0 !processor !processor
> > > 1 0 1 1 1 1 1
> > > 1 1 0 1 0 1 !processor
> > > 1 1 1 1 1 1 1
> > >
> > > Any objections to this?
> >
> > So AFAIAC it can return false if not enabled, but present and
> > functional. [Side note: I'm wondering what "functional" means then,
> > but whatever.]
>
> From ACPI v6.5 (bit 3 is our "status.functional":
>
> _STA may return bit 0 clear (not present) with bit [3] set (device is
> functional). This case is used to indicate a valid device for which no
> device driver should be loaded (for example, a bridge device.) Children
> of this device may be present and valid. OSPM should continue
> enumeration below a device whose _STA returns this bit combination.
>
> So, for this case, acpi_dev_ready_for_enumeration() returning true for
> this case is correct, since we're supposed to enumerate it and child
> devices.
>
> It's probably also worth pointing out that in the above table, the two
> combinations with P=0 E=1 goes against the spec, but are included for
> completness.

The difference between the last two columns is the present and
functional, but not enabled combination AFAICS, for which my patch
just returned true, but the firmware disagrees with that.

It is kind of analogous to the "not present and functional" case
covered by the spec, which is why it is fine by me to return "false"
then (for processors), but the spec is not crystal clear about it.