2013-07-19 17:48:48

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

CONFIG_ARCH_MEMORY_PROBE enables /sys/devices/system/memory/probe
interface, which allows a given memory address to be hot-added as
follows. (See Documentation/memory-hotplug.txt for more detail.)

# echo start_address_of_new_memory > /sys/devices/system/memory/probe

This probe interface is required on powerpc. On x86, however, ACPI
notifies a memory hotplug event to the kernel, which performs its
hotplug operation as the result. Therefore, regular users do not need
this interface on x86. This probe interface is also error-prone and
misleading that the kernel blindly adds a given memory address without
checking if the memory is present on the system; no probing is done
despite of its name. The kernel crashes when a user requests to online
a memory block that is not present on the system. This interface is
currently used for testing as it can fake a hotplug event.

This patch disables CONFIG_ARCH_MEMORY_PROBE by default on x86, adds
its Kconfig menu entry on x86, and clarifies its use in Documentation/
memory-hotplug.txt.

Signed-off-by: Toshi Kani <[email protected]>
---
Documentation/memory-hotplug.txt | 16 +++++++++-------
arch/x86/Kconfig | 7 ++++++-
2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 8e5eacb..8fd254c 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -210,13 +210,15 @@ If memory device is found, memory hotplug code will be called.

4.2 Notify memory hot-add event by hand
------------
-In some environments, especially virtualized environment, firmware will not
-notify memory hotplug event to the kernel. For such environment, "probe"
-interface is supported. This interface depends on CONFIG_ARCH_MEMORY_PROBE.
-
-Now, CONFIG_ARCH_MEMORY_PROBE is supported only by powerpc but it does not
-contain highly architecture codes. Please add config if you need "probe"
-interface.
+On powerpc, the firmware does not notify a memory hotplug event to the kernel.
+Therefore, "probe" interface is supported to notify the event to the kernel.
+This interface depends on CONFIG_ARCH_MEMORY_PROBE.
+
+CONFIG_ARCH_MEMORY_PROBE is supported on powerpc only. On x86, this config
+option is disabled by default since ACPI notifies a memory hotplug event to
+the kernel, which performs its hotplug operation as the result. Please
+enable this option if you need the "probe" interface for testing purposes
+on x86.

Probe interface is located at
/sys/devices/system/memory/probe
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b32ebf9..408ef68 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1344,8 +1344,13 @@ config ARCH_SELECT_MEMORY_MODEL
depends on ARCH_SPARSEMEM_ENABLE

config ARCH_MEMORY_PROBE
- def_bool y
+ bool "Enable sysfs memory/probe interface"
+ default n
depends on X86_64 && MEMORY_HOTPLUG
+ help
+ This option enables a sysfs memory/probe interface for testing.
+ See Documentation/memory-hotplug.txt for more information.
+ If you are unsure how to answer this question, answer N.

config ARCH_PROC_KCORE_TEXT
def_bool y


2013-07-19 19:30:30

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index b32ebf9..408ef68 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1344,8 +1344,13 @@ config ARCH_SELECT_MEMORY_MODEL
> depends on ARCH_SPARSEMEM_ENABLE
>
> config ARCH_MEMORY_PROBE
> - def_bool y
> + bool "Enable sysfs memory/probe interface"
> + default n
> depends on X86_64 && MEMORY_HOTPLUG
> + help
> + This option enables a sysfs memory/probe interface for testing.
> + See Documentation/memory-hotplug.txt for more information.
> + If you are unsure how to answer this question, answer N.

makes sense.

Acked-by: KOSAKI Motohiro <[email protected]>



2013-07-19 19:35:56

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

On Fri, 2013-07-19 at 15:30 -0400, KOSAKI Motohiro wrote:
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index b32ebf9..408ef68 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1344,8 +1344,13 @@ config ARCH_SELECT_MEMORY_MODEL
> > depends on ARCH_SPARSEMEM_ENABLE
> >
> > config ARCH_MEMORY_PROBE
> > - def_bool y
> > + bool "Enable sysfs memory/probe interface"
> > + default n
> > depends on X86_64 && MEMORY_HOTPLUG
> > + help
> > + This option enables a sysfs memory/probe interface for testing.
> > + See Documentation/memory-hotplug.txt for more information.
> > + If you are unsure how to answer this question, answer N.
>
> makes sense.
>
> Acked-by: KOSAKI Motohiro <[email protected]>

Great! Thanks Motohiro!
-Toshi

2013-07-22 08:37:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default


* Toshi Kani <[email protected]> wrote:

> CONFIG_ARCH_MEMORY_PROBE enables /sys/devices/system/memory/probe
> interface, which allows a given memory address to be hot-added as
> follows. (See Documentation/memory-hotplug.txt for more detail.)
>
> # echo start_address_of_new_memory > /sys/devices/system/memory/probe
>
> This probe interface is required on powerpc. On x86, however, ACPI
> notifies a memory hotplug event to the kernel, which performs its
> hotplug operation as the result. Therefore, regular users do not need
> this interface on x86. This probe interface is also error-prone and
> misleading that the kernel blindly adds a given memory address without
> checking if the memory is present on the system; no probing is done
> despite of its name. The kernel crashes when a user requests to online
> a memory block that is not present on the system. This interface is
> currently used for testing as it can fake a hotplug event.
>
> This patch disables CONFIG_ARCH_MEMORY_PROBE by default on x86, adds
> its Kconfig menu entry on x86, and clarifies its use in Documentation/
> memory-hotplug.txt.

Could we please also fix it to never crash the kernel, even if stupid
ranges are provided?

Thanks,

Ingo

2013-07-22 17:12:57

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

On Mon, 2013-07-22 at 10:37 +0200, Ingo Molnar wrote:
> * Toshi Kani <[email protected]> wrote:
>
> > CONFIG_ARCH_MEMORY_PROBE enables /sys/devices/system/memory/probe
> > interface, which allows a given memory address to be hot-added as
> > follows. (See Documentation/memory-hotplug.txt for more detail.)
> >
> > # echo start_address_of_new_memory > /sys/devices/system/memory/probe
> >
> > This probe interface is required on powerpc. On x86, however, ACPI
> > notifies a memory hotplug event to the kernel, which performs its
> > hotplug operation as the result. Therefore, regular users do not need
> > this interface on x86. This probe interface is also error-prone and
> > misleading that the kernel blindly adds a given memory address without
> > checking if the memory is present on the system; no probing is done
> > despite of its name. The kernel crashes when a user requests to online
> > a memory block that is not present on the system. This interface is
> > currently used for testing as it can fake a hotplug event.
> >
> > This patch disables CONFIG_ARCH_MEMORY_PROBE by default on x86, adds
> > its Kconfig menu entry on x86, and clarifies its use in Documentation/
> > memory-hotplug.txt.
>
> Could we please also fix it to never crash the kernel, even if stupid
> ranges are provided?

Yes, this probe interface can be enhanced to verify the firmware
information before adding a given memory address. However, such change
would interfere its test use of "fake" hotplug, which is only the known
use-case of this interface on x86.

In order to verify if a given memory address is enabled at run-time (as
opposed to boot-time), we need to check with ACPI memory device objects
on x86. However, system vendors tend to not implement memory device
objects unless their systems support memory hotplug. Dave Hansen is
using this interface for his testing as a way to fake a hotplug event on
a system that does not support memory hotplug.

Thanks,
-Toshi





2013-07-22 20:57:30

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

(7/22/13 1:12 PM), Toshi Kani wrote:
> On Mon, 2013-07-22 at 10:37 +0200, Ingo Molnar wrote:
>> * Toshi Kani <[email protected]> wrote:
>>
>>> CONFIG_ARCH_MEMORY_PROBE enables /sys/devices/system/memory/probe
>>> interface, which allows a given memory address to be hot-added as
>>> follows. (See Documentation/memory-hotplug.txt for more detail.)
>>>
>>> # echo start_address_of_new_memory > /sys/devices/system/memory/probe
>>>
>>> This probe interface is required on powerpc. On x86, however, ACPI
>>> notifies a memory hotplug event to the kernel, which performs its
>>> hotplug operation as the result. Therefore, regular users do not need
>>> this interface on x86. This probe interface is also error-prone and
>>> misleading that the kernel blindly adds a given memory address without
>>> checking if the memory is present on the system; no probing is done
>>> despite of its name. The kernel crashes when a user requests to online
>>> a memory block that is not present on the system. This interface is
>>> currently used for testing as it can fake a hotplug event.
>>>
>>> This patch disables CONFIG_ARCH_MEMORY_PROBE by default on x86, adds
>>> its Kconfig menu entry on x86, and clarifies its use in Documentation/
>>> memory-hotplug.txt.
>>
>> Could we please also fix it to never crash the kernel, even if stupid
>> ranges are provided?
>
> Yes, this probe interface can be enhanced to verify the firmware
> information before adding a given memory address. However, such change
> would interfere its test use of "fake" hotplug, which is only the known
> use-case of this interface on x86.
>
> In order to verify if a given memory address is enabled at run-time (as
> opposed to boot-time), we need to check with ACPI memory device objects
> on x86. However, system vendors tend to not implement memory device
> objects unless their systems support memory hotplug. Dave Hansen is
> using this interface for his testing as a way to fake a hotplug event on
> a system that does not support memory hotplug.

One of possible option is to return EINVAL when system has real hotplug device.
I mean this interface is only useful when system don't have proper hardware
feature and doesn't work correctly hardware property and this interface command
are not consistent.

Dave, What do you think?

2013-07-22 21:04:26

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

On 07/22/2013 01:57 PM, KOSAKI Motohiro wrote:
>
> One of possible option is to return EINVAL when system has real hotplug
> device.
> I mean this interface is only useful when system don't have proper hardware
> feature and doesn't work correctly hardware property and this interface
> command
> are not consistent.
>
> Dave, What do you think?

Sounds reasonable to me.

2013-07-23 00:25:37

by Yasuaki Ishimatsu

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

(2013/07/20 2:47), Toshi Kani wrote:
> CONFIG_ARCH_MEMORY_PROBE enables /sys/devices/system/memory/probe
> interface, which allows a given memory address to be hot-added as
> follows. (See Documentation/memory-hotplug.txt for more detail.)
>
> # echo start_address_of_new_memory > /sys/devices/system/memory/probe
>
> This probe interface is required on powerpc. On x86, however, ACPI
> notifies a memory hotplug event to the kernel, which performs its
> hotplug operation as the result. Therefore, regular users do not need
> this interface on x86. This probe interface is also error-prone and
> misleading that the kernel blindly adds a given memory address without
> checking if the memory is present on the system; no probing is done
> despite of its name. The kernel crashes when a user requests to online
> a memory block that is not present on the system. This interface is
> currently used for testing as it can fake a hotplug event.
>
> This patch disables CONFIG_ARCH_MEMORY_PROBE by default on x86, adds
> its Kconfig menu entry on x86, and clarifies its use in Documentation/
> memory-hotplug.txt.
>
> Signed-off-by: Toshi Kani <[email protected]>

Reviewed-by: Yasuaki Ishimatsu <[email protected]>

Thanks,
Yasuaki Ishimatsu

> ---
> Documentation/memory-hotplug.txt | 16 +++++++++-------
> arch/x86/Kconfig | 7 ++++++-
> 2 files changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
> index 8e5eacb..8fd254c 100644
> --- a/Documentation/memory-hotplug.txt
> +++ b/Documentation/memory-hotplug.txt
> @@ -210,13 +210,15 @@ If memory device is found, memory hotplug code will be called.
>
> 4.2 Notify memory hot-add event by hand
> ------------
> -In some environments, especially virtualized environment, firmware will not
> -notify memory hotplug event to the kernel. For such environment, "probe"
> -interface is supported. This interface depends on CONFIG_ARCH_MEMORY_PROBE.
> -
> -Now, CONFIG_ARCH_MEMORY_PROBE is supported only by powerpc but it does not
> -contain highly architecture codes. Please add config if you need "probe"
> -interface.
> +On powerpc, the firmware does not notify a memory hotplug event to the kernel.
> +Therefore, "probe" interface is supported to notify the event to the kernel.
> +This interface depends on CONFIG_ARCH_MEMORY_PROBE.
> +
> +CONFIG_ARCH_MEMORY_PROBE is supported on powerpc only. On x86, this config
> +option is disabled by default since ACPI notifies a memory hotplug event to
> +the kernel, which performs its hotplug operation as the result. Please
> +enable this option if you need the "probe" interface for testing purposes
> +on x86.
>
> Probe interface is located at
> /sys/devices/system/memory/probe
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index b32ebf9..408ef68 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1344,8 +1344,13 @@ config ARCH_SELECT_MEMORY_MODEL
> depends on ARCH_SPARSEMEM_ENABLE
>
> config ARCH_MEMORY_PROBE
> - def_bool y
> + bool "Enable sysfs memory/probe interface"
> + default n
> depends on X86_64 && MEMORY_HOTPLUG
> + help
> + This option enables a sysfs memory/probe interface for testing.
> + See Documentation/memory-hotplug.txt for more information.
> + If you are unsure how to answer this question, answer N.
>
> config ARCH_PROC_KCORE_TEXT
> def_bool y
>

2013-07-23 00:35:36

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

On Mon, 2013-07-22 at 16:57 -0400, KOSAKI Motohiro wrote:
> (7/22/13 1:12 PM), Toshi Kani wrote:
> > On Mon, 2013-07-22 at 10:37 +0200, Ingo Molnar wrote:
> >> * Toshi Kani <[email protected]> wrote:
> >>
> >>> CONFIG_ARCH_MEMORY_PROBE enables /sys/devices/system/memory/probe
> >>> interface, which allows a given memory address to be hot-added as
> >>> follows. (See Documentation/memory-hotplug.txt for more detail.)
> >>>
> >>> # echo start_address_of_new_memory > /sys/devices/system/memory/probe
> >>>
> >>> This probe interface is required on powerpc. On x86, however, ACPI
> >>> notifies a memory hotplug event to the kernel, which performs its
> >>> hotplug operation as the result. Therefore, regular users do not need
> >>> this interface on x86. This probe interface is also error-prone and
> >>> misleading that the kernel blindly adds a given memory address without
> >>> checking if the memory is present on the system; no probing is done
> >>> despite of its name. The kernel crashes when a user requests to online
> >>> a memory block that is not present on the system. This interface is
> >>> currently used for testing as it can fake a hotplug event.
> >>>
> >>> This patch disables CONFIG_ARCH_MEMORY_PROBE by default on x86, adds
> >>> its Kconfig menu entry on x86, and clarifies its use in Documentation/
> >>> memory-hotplug.txt.
> >>
> >> Could we please also fix it to never crash the kernel, even if stupid
> >> ranges are provided?
> >
> > Yes, this probe interface can be enhanced to verify the firmware
> > information before adding a given memory address. However, such change
> > would interfere its test use of "fake" hotplug, which is only the known
> > use-case of this interface on x86.
> >
> > In order to verify if a given memory address is enabled at run-time (as
> > opposed to boot-time), we need to check with ACPI memory device objects
> > on x86. However, system vendors tend to not implement memory device
> > objects unless their systems support memory hotplug. Dave Hansen is
> > using this interface for his testing as a way to fake a hotplug event on
> > a system that does not support memory hotplug.
>
> One of possible option is to return EINVAL when system has real hotplug device.
> I mean this interface is only useful when system don't have proper hardware
> feature and doesn't work correctly hardware property and this interface command
> are not consistent.

Thanks for the suggestion. Unfortunately, such check cannot be added
without some ugliness. There is no capability bit in ACPI that
indicates if the platform supports memory hotplug. Let's assume we can
consider that if ACPI has no memory object, then it has no memory
hotplug support. We still need to distinguish this case from a case
that ACPI has a memory device object but it is disabled (hence, it can
be added later). However, the two cases are basically handled in the
same way that neither cases call the ACPI memory handlers in
drivers/acpi/acpi_memhotplug.c, where memory-specific code can be added.
Therefore, it will require adding memory device-specific code into the
ACPI scan code itself, which is supposed to be device type-neutral. So,
it will be hard to sell without a very compelling reason...

I think a long term solution should be to remove this config option from
x86 Kconfig as you suggested before. For now, we can disable it by
default and document as "test-use" only, which this patch did.
-Toshi

2013-07-23 00:46:32

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

On Tue, 2013-07-23 at 09:24 +0900, Yasuaki Ishimatsu wrote:
> (2013/07/20 2:47), Toshi Kani wrote:
> > CONFIG_ARCH_MEMORY_PROBE enables /sys/devices/system/memory/probe
> > interface, which allows a given memory address to be hot-added as
> > follows. (See Documentation/memory-hotplug.txt for more detail.)
> >
> > # echo start_address_of_new_memory > /sys/devices/system/memory/probe
> >
> > This probe interface is required on powerpc. On x86, however, ACPI
> > notifies a memory hotplug event to the kernel, which performs its
> > hotplug operation as the result. Therefore, regular users do not need
> > this interface on x86. This probe interface is also error-prone and
> > misleading that the kernel blindly adds a given memory address without
> > checking if the memory is present on the system; no probing is done
> > despite of its name. The kernel crashes when a user requests to online
> > a memory block that is not present on the system. This interface is
> > currently used for testing as it can fake a hotplug event.
> >
> > This patch disables CONFIG_ARCH_MEMORY_PROBE by default on x86, adds
> > its Kconfig menu entry on x86, and clarifies its use in Documentation/
> > memory-hotplug.txt.
> >
> > Signed-off-by: Toshi Kani <[email protected]>
>
> Reviewed-by: Yasuaki Ishimatsu <[email protected]>

Great! Thanks Yasuaki!
-Toshi


>
> Thanks,
> Yasuaki Ishimatsu
>
> > ---
> > Documentation/memory-hotplug.txt | 16 +++++++++-------
> > arch/x86/Kconfig | 7 ++++++-
> > 2 files changed, 15 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
> > index 8e5eacb..8fd254c 100644
> > --- a/Documentation/memory-hotplug.txt
> > +++ b/Documentation/memory-hotplug.txt
> > @@ -210,13 +210,15 @@ If memory device is found, memory hotplug code will be called.
> >
> > 4.2 Notify memory hot-add event by hand
> > ------------
> > -In some environments, especially virtualized environment, firmware will not
> > -notify memory hotplug event to the kernel. For such environment, "probe"
> > -interface is supported. This interface depends on CONFIG_ARCH_MEMORY_PROBE.
> > -
> > -Now, CONFIG_ARCH_MEMORY_PROBE is supported only by powerpc but it does not
> > -contain highly architecture codes. Please add config if you need "probe"
> > -interface.
> > +On powerpc, the firmware does not notify a memory hotplug event to the kernel.
> > +Therefore, "probe" interface is supported to notify the event to the kernel.
> > +This interface depends on CONFIG_ARCH_MEMORY_PROBE.
> > +
> > +CONFIG_ARCH_MEMORY_PROBE is supported on powerpc only. On x86, this config
> > +option is disabled by default since ACPI notifies a memory hotplug event to
> > +the kernel, which performs its hotplug operation as the result. Please
> > +enable this option if you need the "probe" interface for testing purposes
> > +on x86.
> >
> > Probe interface is located at
> > /sys/devices/system/memory/probe
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index b32ebf9..408ef68 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1344,8 +1344,13 @@ config ARCH_SELECT_MEMORY_MODEL
> > depends on ARCH_SPARSEMEM_ENABLE
> >
> > config ARCH_MEMORY_PROBE
> > - def_bool y
> > + bool "Enable sysfs memory/probe interface"
> > + default n
> > depends on X86_64 && MEMORY_HOTPLUG
> > + help
> > + This option enables a sysfs memory/probe interface for testing.
> > + See Documentation/memory-hotplug.txt for more information.
> > + If you are unsure how to answer this question, answer N.
> >
> > config ARCH_PROC_KCORE_TEXT
> > def_bool y
> >
>
>

Subject: [tip:x86/mm] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

Commit-ID: a0842b704b0f7c2d925473676f2012ea5dc39976
Gitweb: http://git.kernel.org/tip/a0842b704b0f7c2d925473676f2012ea5dc39976
Author: Toshi Kani <[email protected]>
AuthorDate: Fri, 19 Jul 2013 11:47:48 -0600
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 22 Jul 2013 11:13:34 +0200

mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

CONFIG_ARCH_MEMORY_PROBE enables the
/sys/devices/system/memory/probe interface, which allows a given
memory address to be hot-added as follows:

# echo start_address_of_new_memory > /sys/devices/system/memory/probe

(See Documentation/memory-hotplug.txt for more details.)

This probe interface is required on powerpc. On x86, however,
ACPI notifies a memory hotplug event to the kernel, which
performs its hotplug operation as the result.

Therefore, regular users do not need this interface on x86. This probe
interface is also error-prone and misleading that the kernel blindly
adds a given memory address without checking if the memory is present
on the system; no probing is done despite of its name.

The kernel crashes when a user requests to online a memory block
that is not present on the system. This interface is currently
used for testing as it can fake a hotplug event.

This patch disables CONFIG_ARCH_MEMORY_PROBE by default on x86,
adds its Kconfig menu entry on x86, and clarifies its use in
Documentation/ memory-hotplug.txt.

Signed-off-by: Toshi Kani <[email protected]>
Acked-by: KOSAKI Motohiro <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
[ Edited it slightly. ]
Signed-off-by: Ingo Molnar <[email protected]>
---
Documentation/memory-hotplug.txt | 16 +++++++++-------
arch/x86/Kconfig | 6 +++++-
2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 8e5eacb..8fd254c 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -210,13 +210,15 @@ If memory device is found, memory hotplug code will be called.

4.2 Notify memory hot-add event by hand
------------
-In some environments, especially virtualized environment, firmware will not
-notify memory hotplug event to the kernel. For such environment, "probe"
-interface is supported. This interface depends on CONFIG_ARCH_MEMORY_PROBE.
-
-Now, CONFIG_ARCH_MEMORY_PROBE is supported only by powerpc but it does not
-contain highly architecture codes. Please add config if you need "probe"
-interface.
+On powerpc, the firmware does not notify a memory hotplug event to the kernel.
+Therefore, "probe" interface is supported to notify the event to the kernel.
+This interface depends on CONFIG_ARCH_MEMORY_PROBE.
+
+CONFIG_ARCH_MEMORY_PROBE is supported on powerpc only. On x86, this config
+option is disabled by default since ACPI notifies a memory hotplug event to
+the kernel, which performs its hotplug operation as the result. Please
+enable this option if you need the "probe" interface for testing purposes
+on x86.

Probe interface is located at
/sys/devices/system/memory/probe
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b32ebf9..2f9d957 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1344,8 +1344,12 @@ config ARCH_SELECT_MEMORY_MODEL
depends on ARCH_SPARSEMEM_ENABLE

config ARCH_MEMORY_PROBE
- def_bool y
+ bool "Enable sysfs memory/probe interface"
depends on X86_64 && MEMORY_HOTPLUG
+ help
+ This option enables a sysfs memory/probe interface for testing.
+ See Documentation/memory-hotplug.txt for more information.
+ If you are unsure how to answer this question, answer N.

config ARCH_PROC_KCORE_TEXT
def_bool y

2013-07-23 08:01:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default


* Toshi Kani <[email protected]> wrote:

> > Could we please also fix it to never crash the kernel, even if stupid
> > ranges are provided?
>
> Yes, this probe interface can be enhanced to verify the firmware
> information before adding a given memory address. However, such change
> would interfere its test use of "fake" hotplug, which is only the known
> use-case of this interface on x86.

Not crashing the kernel is not a novel concept even for test interfaces...

Where does the possible crash come from - from using invalid RAM ranges,
right? I.e. on x86 to fix the crash we need to check the RAM is present in
the e820 maps, is marked RAM there, and is not already registered with the
kernel, or so?

> In order to verify if a given memory address is enabled at run-time (as
> opposed to boot-time), we need to check with ACPI memory device objects
> on x86. However, system vendors tend to not implement memory device
> objects unless their systems support memory hotplug. Dave Hansen is
> using this interface for his testing as a way to fake a hotplug event on
> a system that does not support memory hotplug.

All vendors implement e820 maps for the memory present at boot time.

How is the testing done by Dave Hansen? If it's done by booting with less
RAM than available (via say the mem=1g boot parameter), and then
hot-adding some of the missing RAM, then this could be made safe via the
e820 maps and by consultig the physical memory maps (to avoid double
registry), right?

How does the hotplug event based approach solve double adds? Relies on the
hardware not sending a hot-add event twice for the same memory area or for
an invalid memory area, or does it include fail-safes and double checks as
well to avoid double adds and adding invalid memory? If yes then that
could be utilized here as well.

I.e. fragility of an interface is our choice, not some natural given
property.

Thanks,

Ingo

2013-07-23 20:46:00

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

On Tue, 2013-07-23 at 10:01 +0200, Ingo Molnar wrote:
> * Toshi Kani <[email protected]> wrote:
>
> > > Could we please also fix it to never crash the kernel, even if stupid
> > > ranges are provided?
> >
> > Yes, this probe interface can be enhanced to verify the firmware
> > information before adding a given memory address. However, such change
> > would interfere its test use of "fake" hotplug, which is only the known
> > use-case of this interface on x86.
>
> Not crashing the kernel is not a novel concept even for test interfaces...

Agreed.

> Where does the possible crash come from - from using invalid RAM ranges,
> right? I.e. on x86 to fix the crash we need to check the RAM is present in
> the e820 maps, is marked RAM there, and is not already registered with the
> kernel, or so?

Yes, the crash comes from using invalid RAM ranges. How to check if the
RAM is present is different if the system supports hotplug or not.

> > In order to verify if a given memory address is enabled at run-time (as
> > opposed to boot-time), we need to check with ACPI memory device objects
> > on x86. However, system vendors tend to not implement memory device
> > objects unless their systems support memory hotplug. Dave Hansen is
> > using this interface for his testing as a way to fake a hotplug event on
> > a system that does not support memory hotplug.
>
> All vendors implement e820 maps for the memory present at boot time.

Yes for boot time. At run-time, e820 is not guaranteed to represent a
new memory added. Here is a quote from ACPI spec.

===
15.1 INT 15H, E820H - Query System Address Map
:
The memory map conveyed by this interface is not required to reflect any
changes in available physical memory that have occurred after the BIOS
has initially passed control to the operating system. For example, if
memory is added dynamically, this interface is not required to reflect
the new system memory configuration.
===

By definition, the "probe" interface is used for the kernel to recognize
a new memory added at run-time. So, it should check ACPI memory device
objects (which represents run-time state) for the verification. On x86,
however, ACPI also sends a hotplug event to the kernel, which triggers
the kernel to recognize the new physical memory properly. Hence, users
do not need this "probe" interface.

> How is the testing done by Dave Hansen? If it's done by booting with less
> RAM than available (via say the mem=1g boot parameter), and then
> hot-adding some of the missing RAM, then this could be made safe via the
> e820 maps and by consultig the physical memory maps (to avoid double
> registry), right?

If we focus on this test scenario on a system that does not support
hotplug, yes, I agree that we can check with e820 since it is safe to
assume that the system has no change after boot. IOW, it is unsafe to
check with e820 if the system supports hotplug, but there is no use in
this interface for testing if the system supports hotplug. So, this may
be a good idea.

Dave, is this how you are testing? Do you always specify a valid memory
address for your testing?

> How does the hotplug event based approach solve double adds? Relies on the
> hardware not sending a hot-add event twice for the same memory area or for
> an invalid memory area, or does it include fail-safes and double checks as
> well to avoid double adds and adding invalid memory? If yes then that
> could be utilized here as well.

In high-level, here is how ACPI memory hotplug works:

1. ACPI sends a hotplug event to a new ACPI memory device object that is
hot-added.
2. The kernel is notified, and verifies if the new memory device object
has not been attached by any handler yet.
3. The memory handler is called, and obtains a new memory range from the
ACPI memory device object.
4. The memory handler calls add_memory() with the new address range.

The above step 1-4 proceeds automatically within the kernel. No user
input (nor sysfs interface) is necessary. Step 2 prevents double adds
and step 3 gets a valid address range from the firmware directly. Step
4 is basically the same as the "probe" interface, but with all the
verification up front, this step is safe.

Thanks,
-Toshi

2013-07-23 20:59:23

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

On 07/23/2013 01:45 PM, Toshi Kani wrote:
> Dave, is this how you are testing? Do you always specify a valid memory
> address for your testing?

For the moment, yes.

I'm actually working on some other patches that add the kernel metadata
for memory ranges even if they're not backed by physical memory. But
_that_ is just for testing and I'll just have to modify whatever you do
here in those patches anyway.

It sounds like you're pretty confident that it has no users, so why
don't you just go ahead and axe it on x86 and config it out completely?
Folks that need it can just hack it back in.

2013-07-23 21:35:42

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

On Tue, 2013-07-23 at 13:59 -0700, Dave Hansen wrote:
> On 07/23/2013 01:45 PM, Toshi Kani wrote:
> > Dave, is this how you are testing? Do you always specify a valid memory
> > address for your testing?
>
> For the moment, yes.
>
> I'm actually working on some other patches that add the kernel metadata
> for memory ranges even if they're not backed by physical memory. But
> _that_ is just for testing and I'll just have to modify whatever you do
> here in those patches anyway.
>
> It sounds like you're pretty confident that it has no users, so why
> don't you just go ahead and axe it on x86 and config it out completely?
> Folks that need it can just hack it back in.

Well, I am only confident that this interface is not necessary for ACPI
hotplug. As we found you as a user of this interface for testing on a
system without hotplug support, it is prudent to assume that there may
be other users as well. So, I am willing to keep the interface
configurable (with default disabled) for now.

The question is what to do in the next step. There are two options:

1) Make the interface safe to use
2) Remove the config option from x86 Kconfig

Both options will achieve the same goal -- prevent the crash. Once this
first patch gets in, we will see if there are more users on the
interface. Then, we can decide if we go with 1) for keeping it in the
long term, or deprecate with 2).

Thanks,
-Toshi

2013-07-24 00:18:17

by Hush Bensen

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

On 07/24/2013 04:45 AM, Toshi Kani wrote:
> On Tue, 2013-07-23 at 10:01 +0200, Ingo Molnar wrote:
>> * Toshi Kani <[email protected]> wrote:
>>
>>>> Could we please also fix it to never crash the kernel, even if stupid
>>>> ranges are provided?
>>> Yes, this probe interface can be enhanced to verify the firmware
>>> information before adding a given memory address. However, such change
>>> would interfere its test use of "fake" hotplug, which is only the known
>>> use-case of this interface on x86.
>> Not crashing the kernel is not a novel concept even for test interfaces...
> Agreed.
>
>> Where does the possible crash come from - from using invalid RAM ranges,
>> right? I.e. on x86 to fix the crash we need to check the RAM is present in
>> the e820 maps, is marked RAM there, and is not already registered with the
>> kernel, or so?
> Yes, the crash comes from using invalid RAM ranges. How to check if the
> RAM is present is different if the system supports hotplug or not.

Could you explain different methods to check the RAM is present if the
system supports hotplkug or not?

>
>>> In order to verify if a given memory address is enabled at run-time (as
>>> opposed to boot-time), we need to check with ACPI memory device objects
>>> on x86. However, system vendors tend to not implement memory device
>>> objects unless their systems support memory hotplug. Dave Hansen is
>>> using this interface for his testing as a way to fake a hotplug event on
>>> a system that does not support memory hotplug.
>> All vendors implement e820 maps for the memory present at boot time.
> Yes for boot time. At run-time, e820 is not guaranteed to represent a
> new memory added. Here is a quote from ACPI spec.
>
> ===
> 15.1 INT 15H, E820H - Query System Address Map
> :
> The memory map conveyed by this interface is not required to reflect any
> changes in available physical memory that have occurred after the BIOS
> has initially passed control to the operating system. For example, if
> memory is added dynamically, this interface is not required to reflect
> the new system memory configuration.
> ===
>
> By definition, the "probe" interface is used for the kernel to recognize
> a new memory added at run-time. So, it should check ACPI memory device
> objects (which represents run-time state) for the verification. On x86,
> however, ACPI also sends a hotplug event to the kernel, which triggers
> the kernel to recognize the new physical memory properly. Hence, users
> do not need this "probe" interface.
>
>> How is the testing done by Dave Hansen? If it's done by booting with less
>> RAM than available (via say the mem=1g boot parameter), and then
>> hot-adding some of the missing RAM, then this could be made safe via the
>> e820 maps and by consultig the physical memory maps (to avoid double
>> registry), right?
> If we focus on this test scenario on a system that does not support
> hotplug, yes, I agree that we can check with e820 since it is safe to
> assume that the system has no change after boot. IOW, it is unsafe to
> check with e820 if the system supports hotplug, but there is no use in
> this interface for testing if the system supports hotplug. So, this may
> be a good idea.
>
> Dave, is this how you are testing? Do you always specify a valid memory
> address for your testing?
>
>> How does the hotplug event based approach solve double adds? Relies on the
>> hardware not sending a hot-add event twice for the same memory area or for
>> an invalid memory area, or does it include fail-safes and double checks as
>> well to avoid double adds and adding invalid memory? If yes then that
>> could be utilized here as well.
> In high-level, here is how ACPI memory hotplug works:
>
> 1. ACPI sends a hotplug event to a new ACPI memory device object that is
> hot-added.
> 2. The kernel is notified, and verifies if the new memory device object
> has not been attached by any handler yet.
> 3. The memory handler is called, and obtains a new memory range from the
> ACPI memory device object.
> 4. The memory handler calls add_memory() with the new address range.
>
> The above step 1-4 proceeds automatically within the kernel. No user
> input (nor sysfs interface) is necessary. Step 2 prevents double adds
> and step 3 gets a valid address range from the firmware directly. Step
> 4 is basically the same as the "probe" interface, but with all the
> verification up front, this step is safe.

This is hot-added part, could you also explain how ACPI memory hotplug
works for hot-remove?

>
> Thanks,
> -Toshi
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-07-24 04:20:47

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default


* Toshi Kani <[email protected]> wrote:

> On Tue, 2013-07-23 at 10:01 +0200, Ingo Molnar wrote:
> > * Toshi Kani <[email protected]> wrote:
> >
> > > > Could we please also fix it to never crash the kernel, even if stupid
> > > > ranges are provided?
> > >
> > > Yes, this probe interface can be enhanced to verify the firmware
> > > information before adding a given memory address. However, such change
> > > would interfere its test use of "fake" hotplug, which is only the known
> > > use-case of this interface on x86.
> >
> > Not crashing the kernel is not a novel concept even for test interfaces...
>
> Agreed.
>
> > Where does the possible crash come from - from using invalid RAM ranges,
> > right? I.e. on x86 to fix the crash we need to check the RAM is present in
> > the e820 maps, is marked RAM there, and is not already registered with the
> > kernel, or so?
>
> Yes, the crash comes from using invalid RAM ranges. How to check if the
> RAM is present is different if the system supports hotplug or not.
>
> > > In order to verify if a given memory address is enabled at run-time (as
> > > opposed to boot-time), we need to check with ACPI memory device objects
> > > on x86. However, system vendors tend to not implement memory device
> > > objects unless their systems support memory hotplug. Dave Hansen is
> > > using this interface for his testing as a way to fake a hotplug event on
> > > a system that does not support memory hotplug.
> >
> > All vendors implement e820 maps for the memory present at boot time.
>
> Yes for boot time. At run-time, e820 is not guaranteed to represent a
> new memory added. [...]

Yes I know that, the e820 map is boot only.

You claimed that the only purpose of this on x86 was that testing was done
on non-hotplug systems, using this interface. Non-hotplug systems have
e820 maps.

> > How does the hotplug event based approach solve double adds? Relies on
> > the hardware not sending a hot-add event twice for the same memory
> > area or for an invalid memory area, or does it include fail-safes and
> > double checks as well to avoid double adds and adding invalid memory?
> > If yes then that could be utilized here as well.
>
> In high-level, here is how ACPI memory hotplug works:
>
> 1. ACPI sends a hotplug event to a new ACPI memory device object that is
> hot-added.
> 2. The kernel is notified, and verifies if the new memory device object
> has not been attached by any handler yet.
> 3. The memory handler is called, and obtains a new memory range from the
> ACPI memory device object.
> 4. The memory handler calls add_memory() with the new address range.
>
> The above step 1-4 proceeds automatically within the kernel. No user
> input (nor sysfs interface) is necessary. Step 2 prevents double adds
> [...]

If this 'new memory device object' is some ACPI detail then I don't see
how it protects the kernel from a buggy ACPI implementation double adding
the same physical memory range.

> and step 3 gets a valid address range from the firmware directly. Step
> 4 is basically the same as the "probe" interface, but with all the
> verification up front, this step is safe.

So what verification does the kernel do to ensure that a buggy ACPI
implementation does not pass us a crappy memory range, such a double
physical range (represented via separate 'memory device objects'), or a
range overlapping with an existing physical memory range already known to
the kernel, or a totally nonsensical range the CPU cannot even access
physically, etc.?

Also, is there any verification done to make sure that the new memory
range is actually RAM - i.e. we could write the first and last word of it
and see whether it gets modified correctly [to keep the sanity check
fast]?

Thanks,

Ingo

2013-07-24 16:03:22

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

On Wed, 2013-07-24 at 08:18 +0800, Hush Bensen wrote:
> On 07/24/2013 04:45 AM, Toshi Kani wrote:
> > On Tue, 2013-07-23 at 10:01 +0200, Ingo Molnar wrote:
> >> * Toshi Kani <[email protected]> wrote:
> >>
> >>>> Could we please also fix it to never crash the kernel, even if stupid
> >>>> ranges are provided?
> >>> Yes, this probe interface can be enhanced to verify the firmware
> >>> information before adding a given memory address. However, such change
> >>> would interfere its test use of "fake" hotplug, which is only the known
> >>> use-case of this interface on x86.
> >> Not crashing the kernel is not a novel concept even for test interfaces...
> > Agreed.
> >
> >> Where does the possible crash come from - from using invalid RAM ranges,
> >> right? I.e. on x86 to fix the crash we need to check the RAM is present in
> >> the e820 maps, is marked RAM there, and is not already registered with the
> >> kernel, or so?
> > Yes, the crash comes from using invalid RAM ranges. How to check if the
> > RAM is present is different if the system supports hotplug or not.
>
> Could you explain different methods to check the RAM is present if the
> system supports hotplkug or not?

e820 and UEFI memory descriptor tables are the boot-time interfaces.
These interfaces are not required to reflect any run-time changes.

ACPI memory device objects can be used at both boot-time and run-time,
which reflect any run-time changes. But they are optional to implement.
They typically are not implemented unless the system supports hotplug.

> >>> In order to verify if a given memory address is enabled at run-time (as
> >>> opposed to boot-time), we need to check with ACPI memory device objects
> >>> on x86. However, system vendors tend to not implement memory device
> >>> objects unless their systems support memory hotplug. Dave Hansen is
> >>> using this interface for his testing as a way to fake a hotplug event on
> >>> a system that does not support memory hotplug.
> >> All vendors implement e820 maps for the memory present at boot time.
> > Yes for boot time. At run-time, e820 is not guaranteed to represent a
> > new memory added. Here is a quote from ACPI spec.
> >
> > ===
> > 15.1 INT 15H, E820H - Query System Address Map
> > :
> > The memory map conveyed by this interface is not required to reflect any
> > changes in available physical memory that have occurred after the BIOS
> > has initially passed control to the operating system. For example, if
> > memory is added dynamically, this interface is not required to reflect
> > the new system memory configuration.
> > ===
> >
> > By definition, the "probe" interface is used for the kernel to recognize
> > a new memory added at run-time. So, it should check ACPI memory device
> > objects (which represents run-time state) for the verification. On x86,
> > however, ACPI also sends a hotplug event to the kernel, which triggers
> > the kernel to recognize the new physical memory properly. Hence, users
> > do not need this "probe" interface.
> >
> >> How is the testing done by Dave Hansen? If it's done by booting with less
> >> RAM than available (via say the mem=1g boot parameter), and then
> >> hot-adding some of the missing RAM, then this could be made safe via the
> >> e820 maps and by consultig the physical memory maps (to avoid double
> >> registry), right?
> > If we focus on this test scenario on a system that does not support
> > hotplug, yes, I agree that we can check with e820 since it is safe to
> > assume that the system has no change after boot. IOW, it is unsafe to
> > check with e820 if the system supports hotplug, but there is no use in
> > this interface for testing if the system supports hotplug. So, this may
> > be a good idea.
> >
> > Dave, is this how you are testing? Do you always specify a valid memory
> > address for your testing?
> >
> >> How does the hotplug event based approach solve double adds? Relies on the
> >> hardware not sending a hot-add event twice for the same memory area or for
> >> an invalid memory area, or does it include fail-safes and double checks as
> >> well to avoid double adds and adding invalid memory? If yes then that
> >> could be utilized here as well.
> > In high-level, here is how ACPI memory hotplug works:
> >
> > 1. ACPI sends a hotplug event to a new ACPI memory device object that is
> > hot-added.
> > 2. The kernel is notified, and verifies if the new memory device object
> > has not been attached by any handler yet.
> > 3. The memory handler is called, and obtains a new memory range from the
> > ACPI memory device object.
> > 4. The memory handler calls add_memory() with the new address range.
> >
> > The above step 1-4 proceeds automatically within the kernel. No user
> > input (nor sysfs interface) is necessary. Step 2 prevents double adds
> > and step 3 gets a valid address range from the firmware directly. Step
> > 4 is basically the same as the "probe" interface, but with all the
> > verification up front, this step is safe.
>
> This is hot-added part, could you also explain how ACPI memory hotplug
> works for hot-remove?

Sure. Here is high-level.

1. ACPI sends a hotplug event to an ACPI memory device object that is
requested to hot-remove.
2. The kernel is notified, and verifies if the memory device object is
attached by a handler.
3. The memory handler is called (which is being attached), and obtains
its memory range.
4. The memory handler calls remove_memory() with the address range.
5. The kernel calls eject method of the ACPI memory device object.

Thanks,
-Toshi

2013-07-24 17:09:18

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

On Wed, 2013-07-24 at 06:20 +0200, Ingo Molnar wrote:
> * Toshi Kani <[email protected]> wrote:
>
> > On Tue, 2013-07-23 at 10:01 +0200, Ingo Molnar wrote:
> > > * Toshi Kani <[email protected]> wrote:
> > >
> > > > > Could we please also fix it to never crash the kernel, even if stupid
> > > > > ranges are provided?
> > > >
> > > > Yes, this probe interface can be enhanced to verify the firmware
> > > > information before adding a given memory address. However, such change
> > > > would interfere its test use of "fake" hotplug, which is only the known
> > > > use-case of this interface on x86.
> > >
> > > Not crashing the kernel is not a novel concept even for test interfaces...
> >
> > Agreed.
> >
> > > Where does the possible crash come from - from using invalid RAM ranges,
> > > right? I.e. on x86 to fix the crash we need to check the RAM is present in
> > > the e820 maps, is marked RAM there, and is not already registered with the
> > > kernel, or so?
> >
> > Yes, the crash comes from using invalid RAM ranges. How to check if the
> > RAM is present is different if the system supports hotplug or not.
> >
> > > > In order to verify if a given memory address is enabled at run-time (as
> > > > opposed to boot-time), we need to check with ACPI memory device objects
> > > > on x86. However, system vendors tend to not implement memory device
> > > > objects unless their systems support memory hotplug. Dave Hansen is
> > > > using this interface for his testing as a way to fake a hotplug event on
> > > > a system that does not support memory hotplug.
> > >
> > > All vendors implement e820 maps for the memory present at boot time.
> >
> > Yes for boot time. At run-time, e820 is not guaranteed to represent a
> > new memory added. [...]
>
> Yes I know that, the e820 map is boot only.
>
> You claimed that the only purpose of this on x86 was that testing was done
> on non-hotplug systems, using this interface. Non-hotplug systems have
> e820 maps.

Right. Sorry, I first thought that the interface needed to work as
defined, i.e. detect a new memory. But for the test purpose on
non-hotplug systems, that is not necessary. So, I agree that we can
check e820.

I summarized two options in the email below.
https://lkml.org/lkml/2013/7/23/602

Option 1) adds a check with e820. Option 2) deprecates the interface by
removing the config option from x86 Kconfig. I was thinking that we
could evaluate two options after this patch gets in. Does it make
sense?

> > > How does the hotplug event based approach solve double adds? Relies on
> > > the hardware not sending a hot-add event twice for the same memory
> > > area or for an invalid memory area, or does it include fail-safes and
> > > double checks as well to avoid double adds and adding invalid memory?
> > > If yes then that could be utilized here as well.
> >
> > In high-level, here is how ACPI memory hotplug works:
> >
> > 1. ACPI sends a hotplug event to a new ACPI memory device object that is
> > hot-added.
> > 2. The kernel is notified, and verifies if the new memory device object
> > has not been attached by any handler yet.
> > 3. The memory handler is called, and obtains a new memory range from the
> > ACPI memory device object.
> > 4. The memory handler calls add_memory() with the new address range.
> >
> > The above step 1-4 proceeds automatically within the kernel. No user
> > input (nor sysfs interface) is necessary. Step 2 prevents double adds
> > [...]
>
> If this 'new memory device object' is some ACPI detail then I don't see
> how it protects the kernel from a buggy ACPI implementation double adding
> the same physical memory range.

You are right that the kernel is not fully protected from buggy ACPI.
In case of double adding, though, such hot-add operation fails
gracefully since add_memory() returns with -EEXIST. But if buggy ACPI
returns an invalid RAM range, then it can crash the system, just like an
invalid address in e820 can crash the system as well.

> > and step 3 gets a valid address range from the firmware directly. Step
> > 4 is basically the same as the "probe" interface, but with all the
> > verification up front, this step is safe.
>
> So what verification does the kernel do to ensure that a buggy ACPI
> implementation does not pass us a crappy memory range, such a double
> physical range (represented via separate 'memory device objects'), or a
> range overlapping with an existing physical memory range already known to
> the kernel, or a totally nonsensical range the CPU cannot even access
> physically, etc.?

The kernel checks if the status of an ACPI memory device object is
marked as enabled. But it does not protect from buggy ACPI because
anything can be wrong...

Overlapping and double add cases are verified in add_memory(), i.e.
register_memory_resource() fails.

If an address range is unique & wrong, we have no protection from it.

> Also, is there any verification done to make sure that the new memory
> range is actually RAM - i.e. we could write the first and last word of it
> and see whether it gets modified correctly [to keep the sanity check
> fast]?

No such check is performed -- just like we don't at boot-time.

This may sound bad, but in my experience, such obvious bugs are quickly
found and fixed during the FW development phase.


Thanks,
-Toshi

2013-07-25 00:17:21

by Hush Bensen

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

On 07/25/2013 12:02 AM, Toshi Kani wrote:
> On Wed, 2013-07-24 at 08:18 +0800, Hush Bensen wrote:
>> On 07/24/2013 04:45 AM, Toshi Kani wrote:
>>> On Tue, 2013-07-23 at 10:01 +0200, Ingo Molnar wrote:
>>>> * Toshi Kani <[email protected]> wrote:
>>>>
>>>>>> Could we please also fix it to never crash the kernel, even if stupid
>>>>>> ranges are provided?
>>>>> Yes, this probe interface can be enhanced to verify the firmware
>>>>> information before adding a given memory address. However, such change
>>>>> would interfere its test use of "fake" hotplug, which is only the known
>>>>> use-case of this interface on x86.
>>>> Not crashing the kernel is not a novel concept even for test interfaces...
>>> Agreed.
>>>
>>>> Where does the possible crash come from - from using invalid RAM ranges,
>>>> right? I.e. on x86 to fix the crash we need to check the RAM is present in
>>>> the e820 maps, is marked RAM there, and is not already registered with the
>>>> kernel, or so?
>>> Yes, the crash comes from using invalid RAM ranges. How to check if the
>>> RAM is present is different if the system supports hotplug or not.
>> Could you explain different methods to check the RAM is present if the
>> system supports hotplkug or not?
> e820 and UEFI memory descriptor tables are the boot-time interfaces.
> These interfaces are not required to reflect any run-time changes.
>
> ACPI memory device objects can be used at both boot-time and run-time,
> which reflect any run-time changes. But they are optional to implement.
> They typically are not implemented unless the system supports hotplug.
>
>>>>> In order to verify if a given memory address is enabled at run-time (as
>>>>> opposed to boot-time), we need to check with ACPI memory device objects
>>>>> on x86. However, system vendors tend to not implement memory device
>>>>> objects unless their systems support memory hotplug. Dave Hansen is
>>>>> using this interface for his testing as a way to fake a hotplug event on
>>>>> a system that does not support memory hotplug.
>>>> All vendors implement e820 maps for the memory present at boot time.
>>> Yes for boot time. At run-time, e820 is not guaranteed to represent a
>>> new memory added. Here is a quote from ACPI spec.
>>>
>>> ===
>>> 15.1 INT 15H, E820H - Query System Address Map
>>> :
>>> The memory map conveyed by this interface is not required to reflect any
>>> changes in available physical memory that have occurred after the BIOS
>>> has initially passed control to the operating system. For example, if
>>> memory is added dynamically, this interface is not required to reflect
>>> the new system memory configuration.
>>> ===
>>>
>>> By definition, the "probe" interface is used for the kernel to recognize
>>> a new memory added at run-time. So, it should check ACPI memory device
>>> objects (which represents run-time state) for the verification. On x86,
>>> however, ACPI also sends a hotplug event to the kernel, which triggers
>>> the kernel to recognize the new physical memory properly. Hence, users
>>> do not need this "probe" interface.
>>>
>>>> How is the testing done by Dave Hansen? If it's done by booting with less
>>>> RAM than available (via say the mem=1g boot parameter), and then
>>>> hot-adding some of the missing RAM, then this could be made safe via the
>>>> e820 maps and by consultig the physical memory maps (to avoid double
>>>> registry), right?
>>> If we focus on this test scenario on a system that does not support
>>> hotplug, yes, I agree that we can check with e820 since it is safe to
>>> assume that the system has no change after boot. IOW, it is unsafe to
>>> check with e820 if the system supports hotplug, but there is no use in
>>> this interface for testing if the system supports hotplug. So, this may
>>> be a good idea.
>>>
>>> Dave, is this how you are testing? Do you always specify a valid memory
>>> address for your testing?
>>>
>>>> How does the hotplug event based approach solve double adds? Relies on the
>>>> hardware not sending a hot-add event twice for the same memory area or for
>>>> an invalid memory area, or does it include fail-safes and double checks as
>>>> well to avoid double adds and adding invalid memory? If yes then that
>>>> could be utilized here as well.
>>> In high-level, here is how ACPI memory hotplug works:
>>>
>>> 1. ACPI sends a hotplug event to a new ACPI memory device object that is
>>> hot-added.
>>> 2. The kernel is notified, and verifies if the new memory device object
>>> has not been attached by any handler yet.
>>> 3. The memory handler is called, and obtains a new memory range from the
>>> ACPI memory device object.
>>> 4. The memory handler calls add_memory() with the new address range.
>>>
>>> The above step 1-4 proceeds automatically within the kernel. No user
>>> input (nor sysfs interface) is necessary. Step 2 prevents double adds
>>> and step 3 gets a valid address range from the firmware directly. Step
>>> 4 is basically the same as the "probe" interface, but with all the
>>> verification up front, this step is safe.
>> This is hot-added part, could you also explain how ACPI memory hotplug
>> works for hot-remove?
> Sure. Here is high-level.
>
> 1. ACPI sends a hotplug event to an ACPI memory device object that is
> requested to hot-remove.
> 2. The kernel is notified, and verifies if the memory device object is
> attached by a handler.
> 3. The memory handler is called (which is being attached), and obtains
> its memory range.
> 4. The memory handler calls remove_memory() with the address range.
> 5. The kernel calls eject method of the ACPI memory device object.

Thanks for your explaination, very useful for me. ;-) Btw, what's the
eject method done?

>
> Thanks,
> -Toshi
>
>

2013-07-25 00:44:57

by Hush Bensen

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

On 07/25/2013 12:02 AM, Toshi Kani wrote:
> On Wed, 2013-07-24 at 08:18 +0800, Hush Bensen wrote:
>> On 07/24/2013 04:45 AM, Toshi Kani wrote:
>>> On Tue, 2013-07-23 at 10:01 +0200, Ingo Molnar wrote:
>>>> * Toshi Kani <[email protected]> wrote:
>>>>
>>>>>> Could we please also fix it to never crash the kernel, even if stupid
>>>>>> ranges are provided?
>>>>> Yes, this probe interface can be enhanced to verify the firmware
>>>>> information before adding a given memory address. However, such change
>>>>> would interfere its test use of "fake" hotplug, which is only the known
>>>>> use-case of this interface on x86.
>>>> Not crashing the kernel is not a novel concept even for test interfaces...
>>> Agreed.
>>>
>>>> Where does the possible crash come from - from using invalid RAM ranges,
>>>> right? I.e. on x86 to fix the crash we need to check the RAM is present in
>>>> the e820 maps, is marked RAM there, and is not already registered with the
>>>> kernel, or so?
>>> Yes, the crash comes from using invalid RAM ranges. How to check if the
>>> RAM is present is different if the system supports hotplug or not.
>> Could you explain different methods to check the RAM is present if the
>> system supports hotplkug or not?
> e820 and UEFI memory descriptor tables are the boot-time interfaces.
> These interfaces are not required to reflect any run-time changes.
>
> ACPI memory device objects can be used at both boot-time and run-time,
> which reflect any run-time changes. But they are optional to implement.
> They typically are not implemented unless the system supports hotplug.
>
>>>>> In order to verify if a given memory address is enabled at run-time (as
>>>>> opposed to boot-time), we need to check with ACPI memory device objects
>>>>> on x86. However, system vendors tend to not implement memory device
>>>>> objects unless their systems support memory hotplug. Dave Hansen is
>>>>> using this interface for his testing as a way to fake a hotplug event on
>>>>> a system that does not support memory hotplug.
>>>> All vendors implement e820 maps for the memory present at boot time.
>>> Yes for boot time. At run-time, e820 is not guaranteed to represent a
>>> new memory added. Here is a quote from ACPI spec.
>>>
>>> ===
>>> 15.1 INT 15H, E820H - Query System Address Map
>>> :
>>> The memory map conveyed by this interface is not required to reflect any
>>> changes in available physical memory that have occurred after the BIOS
>>> has initially passed control to the operating system. For example, if
>>> memory is added dynamically, this interface is not required to reflect
>>> the new system memory configuration.
>>> ===
>>>
>>> By definition, the "probe" interface is used for the kernel to recognize
>>> a new memory added at run-time. So, it should check ACPI memory device
>>> objects (which represents run-time state) for the verification. On x86,
>>> however, ACPI also sends a hotplug event to the kernel, which triggers
>>> the kernel to recognize the new physical memory properly. Hence, users
>>> do not need this "probe" interface.
>>>
>>>> How is the testing done by Dave Hansen? If it's done by booting with less
>>>> RAM than available (via say the mem=1g boot parameter), and then
>>>> hot-adding some of the missing RAM, then this could be made safe via the
>>>> e820 maps and by consultig the physical memory maps (to avoid double
>>>> registry), right?
>>> If we focus on this test scenario on a system that does not support
>>> hotplug, yes, I agree that we can check with e820 since it is safe to
>>> assume that the system has no change after boot. IOW, it is unsafe to
>>> check with e820 if the system supports hotplug, but there is no use in
>>> this interface for testing if the system supports hotplug. So, this may
>>> be a good idea.
>>>
>>> Dave, is this how you are testing? Do you always specify a valid memory
>>> address for your testing?
>>>
>>>> How does the hotplug event based approach solve double adds? Relies on the
>>>> hardware not sending a hot-add event twice for the same memory area or for
>>>> an invalid memory area, or does it include fail-safes and double checks as
>>>> well to avoid double adds and adding invalid memory? If yes then that
>>>> could be utilized here as well.
>>> In high-level, here is how ACPI memory hotplug works:
>>>
>>> 1. ACPI sends a hotplug event to a new ACPI memory device object that is
>>> hot-added.
>>> 2. The kernel is notified, and verifies if the new memory device object
>>> has not been attached by any handler yet.
>>> 3. The memory handler is called, and obtains a new memory range from the
>>> ACPI memory device object.
>>> 4. The memory handler calls add_memory() with the new address range.
>>>
>>> The above step 1-4 proceeds automatically within the kernel. No user
>>> input (nor sysfs interface) is necessary. Step 2 prevents double adds
>>> and step 3 gets a valid address range from the firmware directly. Step
>>> 4 is basically the same as the "probe" interface, but with all the
>>> verification up front, this step is safe.
>> This is hot-added part, could you also explain how ACPI memory hotplug
>> works for hot-remove?
> Sure. Here is high-level.
>
> 1. ACPI sends a hotplug event to an ACPI memory device object that is
> requested to hot-remove.
> 2. The kernel is notified, and verifies if the memory device object is
> attached by a handler.
> 3. The memory handler is called (which is being attached), and obtains
> its memory range.
> 4. The memory handler calls remove_memory() with the address range.
> 5. The kernel calls eject method of the ACPI memory device object.

Could you give me the calltrace of add_memory and remove_memory? I don't
have machine support hotplug, but I hope to investigate how ACPI part
works for memory hotplug. ;-)

>
> Thanks,
> -Toshi
>
>

2013-07-25 00:56:28

by Hush Bensen

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

On 07/25/2013 12:02 AM, Toshi Kani wrote:
> On Wed, 2013-07-24 at 08:18 +0800, Hush Bensen wrote:
>> On 07/24/2013 04:45 AM, Toshi Kani wrote:
>>> On Tue, 2013-07-23 at 10:01 +0200, Ingo Molnar wrote:
>>>> * Toshi Kani <[email protected]> wrote:
>>>>
>>>>>> Could we please also fix it to never crash the kernel, even if stupid
>>>>>> ranges are provided?
>>>>> Yes, this probe interface can be enhanced to verify the firmware
>>>>> information before adding a given memory address. However, such change
>>>>> would interfere its test use of "fake" hotplug, which is only the known
>>>>> use-case of this interface on x86.
>>>> Not crashing the kernel is not a novel concept even for test interfaces...
>>> Agreed.
>>>
>>>> Where does the possible crash come from - from using invalid RAM ranges,
>>>> right? I.e. on x86 to fix the crash we need to check the RAM is present in
>>>> the e820 maps, is marked RAM there, and is not already registered with the
>>>> kernel, or so?
>>> Yes, the crash comes from using invalid RAM ranges. How to check if the
>>> RAM is present is different if the system supports hotplug or not.
>> Could you explain different methods to check the RAM is present if the
>> system supports hotplkug or not?
> e820 and UEFI memory descriptor tables are the boot-time interfaces.
> These interfaces are not required to reflect any run-time changes.
>
> ACPI memory device objects can be used at both boot-time and run-time,
> which reflect any run-time changes. But they are optional to implement.
> They typically are not implemented unless the system supports hotplug.
>
>>>>> In order to verify if a given memory address is enabled at run-time (as
>>>>> opposed to boot-time), we need to check with ACPI memory device objects
>>>>> on x86. However, system vendors tend to not implement memory device
>>>>> objects unless their systems support memory hotplug. Dave Hansen is
>>>>> using this interface for his testing as a way to fake a hotplug event on
>>>>> a system that does not support memory hotplug.
>>>> All vendors implement e820 maps for the memory present at boot time.
>>> Yes for boot time. At run-time, e820 is not guaranteed to represent a
>>> new memory added. Here is a quote from ACPI spec.
>>>
>>> ===
>>> 15.1 INT 15H, E820H - Query System Address Map
>>> :
>>> The memory map conveyed by this interface is not required to reflect any
>>> changes in available physical memory that have occurred after the BIOS
>>> has initially passed control to the operating system. For example, if
>>> memory is added dynamically, this interface is not required to reflect
>>> the new system memory configuration.
>>> ===
>>>
>>> By definition, the "probe" interface is used for the kernel to recognize
>>> a new memory added at run-time. So, it should check ACPI memory device
>>> objects (which represents run-time state) for the verification. On x86,
>>> however, ACPI also sends a hotplug event to the kernel, which triggers
>>> the kernel to recognize the new physical memory properly. Hence, users
>>> do not need this "probe" interface.
>>>
>>>> How is the testing done by Dave Hansen? If it's done by booting with less
>>>> RAM than available (via say the mem=1g boot parameter), and then
>>>> hot-adding some of the missing RAM, then this could be made safe via the
>>>> e820 maps and by consultig the physical memory maps (to avoid double
>>>> registry), right?
>>> If we focus on this test scenario on a system that does not support
>>> hotplug, yes, I agree that we can check with e820 since it is safe to
>>> assume that the system has no change after boot. IOW, it is unsafe to
>>> check with e820 if the system supports hotplug, but there is no use in
>>> this interface for testing if the system supports hotplug. So, this may
>>> be a good idea.
>>>
>>> Dave, is this how you are testing? Do you always specify a valid memory
>>> address for your testing?
>>>
>>>> How does the hotplug event based approach solve double adds? Relies on the
>>>> hardware not sending a hot-add event twice for the same memory area or for
>>>> an invalid memory area, or does it include fail-safes and double checks as
>>>> well to avoid double adds and adding invalid memory? If yes then that
>>>> could be utilized here as well.
>>> In high-level, here is how ACPI memory hotplug works:
>>>
>>> 1. ACPI sends a hotplug event to a new ACPI memory device object that is
>>> hot-added.
>>> 2. The kernel is notified, and verifies if the new memory device object
>>> has not been attached by any handler yet.
>>> 3. The memory handler is called, and obtains a new memory range from the
>>> ACPI memory device object.
>>> 4. The memory handler calls add_memory() with the new address range.
>>>
>>> The above step 1-4 proceeds automatically within the kernel. No user
>>> input (nor sysfs interface) is necessary. Step 2 prevents double adds
>>> and step 3 gets a valid address range from the firmware directly. Step
>>> 4 is basically the same as the "probe" interface, but with all the
>>> verification up front, this step is safe.
>> This is hot-added part, could you also explain how ACPI memory hotplug
>> works for hot-remove?
> Sure. Here is high-level.
>
> 1. ACPI sends a hotplug event to an ACPI memory device object that is
> requested to hot-remove.
> 2. The kernel is notified, and verifies if the memory device object is
> attached by a handler.
> 3. The memory handler is called (which is being attached), and obtains
> its memory range.
> 4. The memory handler calls remove_memory() with the address range.
> 5. The kernel calls eject method of the ACPI memory device object.

If hot remove the memory device by the hardware, or writing 1 to
/sys/bus/acpi/devices/PNP0C80:XX/eject both will call eject method?
What's the difference between these two methods? I guess the former will
send SCI and the latter won't.

>
> Thanks,
> -Toshi
>
>

2013-07-25 03:09:20

by Yasuaki Ishimatsu

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

(2013/07/25 9:56), Hush Bensen wrote:
> On 07/25/2013 12:02 AM, Toshi Kani wrote:
>> On Wed, 2013-07-24 at 08:18 +0800, Hush Bensen wrote:
>>> On 07/24/2013 04:45 AM, Toshi Kani wrote:
>>>> On Tue, 2013-07-23 at 10:01 +0200, Ingo Molnar wrote:
>>>>> * Toshi Kani <[email protected]> wrote:
>>>>>
>>>>>>> Could we please also fix it to never crash the kernel, even if stupid
>>>>>>> ranges are provided?
>>>>>> Yes, this probe interface can be enhanced to verify the firmware
>>>>>> information before adding a given memory address. However, such change
>>>>>> would interfere its test use of "fake" hotplug, which is only the known
>>>>>> use-case of this interface on x86.
>>>>> Not crashing the kernel is not a novel concept even for test interfaces...
>>>> Agreed.
>>>>
>>>>> Where does the possible crash come from - from using invalid RAM ranges,
>>>>> right? I.e. on x86 to fix the crash we need to check the RAM is present in
>>>>> the e820 maps, is marked RAM there, and is not already registered with the
>>>>> kernel, or so?
>>>> Yes, the crash comes from using invalid RAM ranges. How to check if the
>>>> RAM is present is different if the system supports hotplug or not.
>>> Could you explain different methods to check the RAM is present if the
>>> system supports hotplkug or not?
>> e820 and UEFI memory descriptor tables are the boot-time interfaces.
>> These interfaces are not required to reflect any run-time changes.
>>
>> ACPI memory device objects can be used at both boot-time and run-time,
>> which reflect any run-time changes. But they are optional to implement.
>> They typically are not implemented unless the system supports hotplug.
>>
>>>>>> In order to verify if a given memory address is enabled at run-time (as
>>>>>> opposed to boot-time), we need to check with ACPI memory device objects
>>>>>> on x86. However, system vendors tend to not implement memory device
>>>>>> objects unless their systems support memory hotplug. Dave Hansen is
>>>>>> using this interface for his testing as a way to fake a hotplug event on
>>>>>> a system that does not support memory hotplug.
>>>>> All vendors implement e820 maps for the memory present at boot time.
>>>> Yes for boot time. At run-time, e820 is not guaranteed to represent a
>>>> new memory added. Here is a quote from ACPI spec.
>>>>
>>>> ===
>>>> 15.1 INT 15H, E820H - Query System Address Map
>>>> :
>>>> The memory map conveyed by this interface is not required to reflect any
>>>> changes in available physical memory that have occurred after the BIOS
>>>> has initially passed control to the operating system. For example, if
>>>> memory is added dynamically, this interface is not required to reflect
>>>> the new system memory configuration.
>>>> ===
>>>>
>>>> By definition, the "probe" interface is used for the kernel to recognize
>>>> a new memory added at run-time. So, it should check ACPI memory device
>>>> objects (which represents run-time state) for the verification. On x86,
>>>> however, ACPI also sends a hotplug event to the kernel, which triggers
>>>> the kernel to recognize the new physical memory properly. Hence, users
>>>> do not need this "probe" interface.
>>>>
>>>>> How is the testing done by Dave Hansen? If it's done by booting with less
>>>>> RAM than available (via say the mem=1g boot parameter), and then
>>>>> hot-adding some of the missing RAM, then this could be made safe via the
>>>>> e820 maps and by consultig the physical memory maps (to avoid double
>>>>> registry), right?
>>>> If we focus on this test scenario on a system that does not support
>>>> hotplug, yes, I agree that we can check with e820 since it is safe to
>>>> assume that the system has no change after boot. IOW, it is unsafe to
>>>> check with e820 if the system supports hotplug, but there is no use in
>>>> this interface for testing if the system supports hotplug. So, this may
>>>> be a good idea.
>>>>
>>>> Dave, is this how you are testing? Do you always specify a valid memory
>>>> address for your testing?
>>>>
>>>>> How does the hotplug event based approach solve double adds? Relies on the
>>>>> hardware not sending a hot-add event twice for the same memory area or for
>>>>> an invalid memory area, or does it include fail-safes and double checks as
>>>>> well to avoid double adds and adding invalid memory? If yes then that
>>>>> could be utilized here as well.
>>>> In high-level, here is how ACPI memory hotplug works:
>>>>
>>>> 1. ACPI sends a hotplug event to a new ACPI memory device object that is
>>>> hot-added.
>>>> 2. The kernel is notified, and verifies if the new memory device object
>>>> has not been attached by any handler yet.
>>>> 3. The memory handler is called, and obtains a new memory range from the
>>>> ACPI memory device object.
>>>> 4. The memory handler calls add_memory() with the new address range.
>>>>
>>>> The above step 1-4 proceeds automatically within the kernel. No user
>>>> input (nor sysfs interface) is necessary. Step 2 prevents double adds
>>>> and step 3 gets a valid address range from the firmware directly. Step
>>>> 4 is basically the same as the "probe" interface, but with all the
>>>> verification up front, this step is safe.
>>> This is hot-added part, could you also explain how ACPI memory hotplug
>>> works for hot-remove?
>> Sure. Here is high-level.
>>
>> 1. ACPI sends a hotplug event to an ACPI memory device object that is
>> requested to hot-remove.
>> 2. The kernel is notified, and verifies if the memory device object is
>> attached by a handler.
>> 3. The memory handler is called (which is being attached), and obtains
>> its memory range.
>> 4. The memory handler calls remove_memory() with the address range.
>> 5. The kernel calls eject method of the ACPI memory device object.
>
> If hot remove the memory device by the hardware, or writing 1 to
> /sys/bus/acpi/devices/PNP0C80:XX/eject both will call eject method?

Yes.
Both operations will call eject method.

> What's the difference between these two methods? I guess the former will send SCI and the latter won't.

Triggers are different. Former is triggered by SCI, latter is triggered by
writing sysfs.

Thanks,
Yasuaki Ishimatsu


>
>>
>> Thanks,
>> -Toshi
>>
>>
>

2013-07-25 03:35:00

by Hush Bensen

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

On 07/25/2013 11:08 AM, Yasuaki Ishimatsu wrote:
> (2013/07/25 9:56), Hush Bensen wrote:
>> On 07/25/2013 12:02 AM, Toshi Kani wrote:
>>> On Wed, 2013-07-24 at 08:18 +0800, Hush Bensen wrote:
>>>> On 07/24/2013 04:45 AM, Toshi Kani wrote:
>>>>> On Tue, 2013-07-23 at 10:01 +0200, Ingo Molnar wrote:
>>>>>> * Toshi Kani <[email protected]> wrote:
>>>>>>
>>>>>>>> Could we please also fix it to never crash the kernel, even if
>>>>>>>> stupid
>>>>>>>> ranges are provided?
>>>>>>> Yes, this probe interface can be enhanced to verify the firmware
>>>>>>> information before adding a given memory address. However, such
>>>>>>> change
>>>>>>> would interfere its test use of "fake" hotplug, which is only
>>>>>>> the known
>>>>>>> use-case of this interface on x86.
>>>>>> Not crashing the kernel is not a novel concept even for test
>>>>>> interfaces...
>>>>> Agreed.
>>>>>
>>>>>> Where does the possible crash come from - from using invalid RAM
>>>>>> ranges,
>>>>>> right? I.e. on x86 to fix the crash we need to check the RAM is
>>>>>> present in
>>>>>> the e820 maps, is marked RAM there, and is not already registered
>>>>>> with the
>>>>>> kernel, or so?
>>>>> Yes, the crash comes from using invalid RAM ranges. How to check
>>>>> if the
>>>>> RAM is present is different if the system supports hotplug or not.
>>>> Could you explain different methods to check the RAM is present if the
>>>> system supports hotplkug or not?
>>> e820 and UEFI memory descriptor tables are the boot-time interfaces.
>>> These interfaces are not required to reflect any run-time changes.
>>>
>>> ACPI memory device objects can be used at both boot-time and run-time,
>>> which reflect any run-time changes. But they are optional to
>>> implement.
>>> They typically are not implemented unless the system supports hotplug.
>>>
>>>>>>> In order to verify if a given memory address is enabled at
>>>>>>> run-time (as
>>>>>>> opposed to boot-time), we need to check with ACPI memory device
>>>>>>> objects
>>>>>>> on x86. However, system vendors tend to not implement memory
>>>>>>> device
>>>>>>> objects unless their systems support memory hotplug. Dave Hansen is
>>>>>>> using this interface for his testing as a way to fake a hotplug
>>>>>>> event on
>>>>>>> a system that does not support memory hotplug.
>>>>>> All vendors implement e820 maps for the memory present at boot time.
>>>>> Yes for boot time. At run-time, e820 is not guaranteed to
>>>>> represent a
>>>>> new memory added. Here is a quote from ACPI spec.
>>>>>
>>>>> ===
>>>>> 15.1 INT 15H, E820H - Query System Address Map
>>>>> :
>>>>> The memory map conveyed by this interface is not required to
>>>>> reflect any
>>>>> changes in available physical memory that have occurred after the
>>>>> BIOS
>>>>> has initially passed control to the operating system. For example, if
>>>>> memory is added dynamically, this interface is not required to
>>>>> reflect
>>>>> the new system memory configuration.
>>>>> ===
>>>>>
>>>>> By definition, the "probe" interface is used for the kernel to
>>>>> recognize
>>>>> a new memory added at run-time. So, it should check ACPI memory
>>>>> device
>>>>> objects (which represents run-time state) for the verification.
>>>>> On x86,
>>>>> however, ACPI also sends a hotplug event to the kernel, which
>>>>> triggers
>>>>> the kernel to recognize the new physical memory properly. Hence,
>>>>> users
>>>>> do not need this "probe" interface.
>>>>>
>>>>>> How is the testing done by Dave Hansen? If it's done by booting
>>>>>> with less
>>>>>> RAM than available (via say the mem=1g boot parameter), and then
>>>>>> hot-adding some of the missing RAM, then this could be made safe
>>>>>> via the
>>>>>> e820 maps and by consultig the physical memory maps (to avoid double
>>>>>> registry), right?
>>>>> If we focus on this test scenario on a system that does not support
>>>>> hotplug, yes, I agree that we can check with e820 since it is safe to
>>>>> assume that the system has no change after boot. IOW, it is
>>>>> unsafe to
>>>>> check with e820 if the system supports hotplug, but there is no
>>>>> use in
>>>>> this interface for testing if the system supports hotplug. So,
>>>>> this may
>>>>> be a good idea.
>>>>>
>>>>> Dave, is this how you are testing? Do you always specify a valid
>>>>> memory
>>>>> address for your testing?
>>>>>
>>>>>> How does the hotplug event based approach solve double adds?
>>>>>> Relies on the
>>>>>> hardware not sending a hot-add event twice for the same memory
>>>>>> area or for
>>>>>> an invalid memory area, or does it include fail-safes and double
>>>>>> checks as
>>>>>> well to avoid double adds and adding invalid memory? If yes then
>>>>>> that
>>>>>> could be utilized here as well.
>>>>> In high-level, here is how ACPI memory hotplug works:
>>>>>
>>>>> 1. ACPI sends a hotplug event to a new ACPI memory device object
>>>>> that is
>>>>> hot-added.
>>>>> 2. The kernel is notified, and verifies if the new memory device
>>>>> object
>>>>> has not been attached by any handler yet.
>>>>> 3. The memory handler is called, and obtains a new memory range
>>>>> from the
>>>>> ACPI memory device object.
>>>>> 4. The memory handler calls add_memory() with the new address range.
>>>>>
>>>>> The above step 1-4 proceeds automatically within the kernel. No user
>>>>> input (nor sysfs interface) is necessary. Step 2 prevents double
>>>>> adds
>>>>> and step 3 gets a valid address range from the firmware directly.
>>>>> Step
>>>>> 4 is basically the same as the "probe" interface, but with all the
>>>>> verification up front, this step is safe.
>>>> This is hot-added part, could you also explain how ACPI memory hotplug
>>>> works for hot-remove?
>>> Sure. Here is high-level.
>>>
>>> 1. ACPI sends a hotplug event to an ACPI memory device object that is
>>> requested to hot-remove.
>>> 2. The kernel is notified, and verifies if the memory device object is
>>> attached by a handler.
>>> 3. The memory handler is called (which is being attached), and obtains
>>> its memory range.
>>> 4. The memory handler calls remove_memory() with the address range.
>>> 5. The kernel calls eject method of the ACPI memory device object.
>>
>> If hot remove the memory device by the hardware, or writing 1 to
>> /sys/bus/acpi/devices/PNP0C80:XX/eject both will call eject method?
>
> Yes.
> Both operations will call eject method.
>
>> What's the difference between these two methods? I guess the former
>> will send SCI and the latter won't.
>
> Triggers are different. Former is triggered by SCI, latter is
> triggered by
> writing sysfs.

Thanks, another question, what's the role of udev in memory hotplug?

>
> Thanks,
> Yasuaki Ishimatsu
>
>
>>
>>>
>>> Thanks,
>>> -Toshi
>>>
>>>
>>
>
>

2013-07-25 04:56:19

by Yasuaki Ishimatsu

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

(2013/07/25 12:34), Hush Bensen wrote:
> On 07/25/2013 11:08 AM, Yasuaki Ishimatsu wrote:
>> (2013/07/25 9:56), Hush Bensen wrote:
>>> On 07/25/2013 12:02 AM, Toshi Kani wrote:
>>>> On Wed, 2013-07-24 at 08:18 +0800, Hush Bensen wrote:
>>>>> On 07/24/2013 04:45 AM, Toshi Kani wrote:
>>>>>> On Tue, 2013-07-23 at 10:01 +0200, Ingo Molnar wrote:
>>>>>>> * Toshi Kani <[email protected]> wrote:
>>>>>>>
>>>>>>>>> Could we please also fix it to never crash the kernel, even if stupid
>>>>>>>>> ranges are provided?
>>>>>>>> Yes, this probe interface can be enhanced to verify the firmware
>>>>>>>> information before adding a given memory address. However, such change
>>>>>>>> would interfere its test use of "fake" hotplug, which is only the known
>>>>>>>> use-case of this interface on x86.
>>>>>>> Not crashing the kernel is not a novel concept even for test interfaces...
>>>>>> Agreed.
>>>>>>
>>>>>>> Where does the possible crash come from - from using invalid RAM ranges,
>>>>>>> right? I.e. on x86 to fix the crash we need to check the RAM is present in
>>>>>>> the e820 maps, is marked RAM there, and is not already registered with the
>>>>>>> kernel, or so?
>>>>>> Yes, the crash comes from using invalid RAM ranges. How to check if the
>>>>>> RAM is present is different if the system supports hotplug or not.
>>>>> Could you explain different methods to check the RAM is present if the
>>>>> system supports hotplkug or not?
>>>> e820 and UEFI memory descriptor tables are the boot-time interfaces.
>>>> These interfaces are not required to reflect any run-time changes.
>>>>
>>>> ACPI memory device objects can be used at both boot-time and run-time,
>>>> which reflect any run-time changes. But they are optional to implement.
>>>> They typically are not implemented unless the system supports hotplug.
>>>>
>>>>>>>> In order to verify if a given memory address is enabled at run-time (as
>>>>>>>> opposed to boot-time), we need to check with ACPI memory device objects
>>>>>>>> on x86. However, system vendors tend to not implement memory device
>>>>>>>> objects unless their systems support memory hotplug. Dave Hansen is
>>>>>>>> using this interface for his testing as a way to fake a hotplug event on
>>>>>>>> a system that does not support memory hotplug.
>>>>>>> All vendors implement e820 maps for the memory present at boot time.
>>>>>> Yes for boot time. At run-time, e820 is not guaranteed to represent a
>>>>>> new memory added. Here is a quote from ACPI spec.
>>>>>>
>>>>>> ===
>>>>>> 15.1 INT 15H, E820H - Query System Address Map
>>>>>> :
>>>>>> The memory map conveyed by this interface is not required to reflect any
>>>>>> changes in available physical memory that have occurred after the BIOS
>>>>>> has initially passed control to the operating system. For example, if
>>>>>> memory is added dynamically, this interface is not required to reflect
>>>>>> the new system memory configuration.
>>>>>> ===
>>>>>>
>>>>>> By definition, the "probe" interface is used for the kernel to recognize
>>>>>> a new memory added at run-time. So, it should check ACPI memory device
>>>>>> objects (which represents run-time state) for the verification. On x86,
>>>>>> however, ACPI also sends a hotplug event to the kernel, which triggers
>>>>>> the kernel to recognize the new physical memory properly. Hence, users
>>>>>> do not need this "probe" interface.
>>>>>>
>>>>>>> How is the testing done by Dave Hansen? If it's done by booting with less
>>>>>>> RAM than available (via say the mem=1g boot parameter), and then
>>>>>>> hot-adding some of the missing RAM, then this could be made safe via the
>>>>>>> e820 maps and by consultig the physical memory maps (to avoid double
>>>>>>> registry), right?
>>>>>> If we focus on this test scenario on a system that does not support
>>>>>> hotplug, yes, I agree that we can check with e820 since it is safe to
>>>>>> assume that the system has no change after boot. IOW, it is unsafe to
>>>>>> check with e820 if the system supports hotplug, but there is no use in
>>>>>> this interface for testing if the system supports hotplug. So, this may
>>>>>> be a good idea.
>>>>>>
>>>>>> Dave, is this how you are testing? Do you always specify a valid memory
>>>>>> address for your testing?
>>>>>>
>>>>>>> How does the hotplug event based approach solve double adds? Relies on the
>>>>>>> hardware not sending a hot-add event twice for the same memory area or for
>>>>>>> an invalid memory area, or does it include fail-safes and double checks as
>>>>>>> well to avoid double adds and adding invalid memory? If yes then that
>>>>>>> could be utilized here as well.
>>>>>> In high-level, here is how ACPI memory hotplug works:
>>>>>>
>>>>>> 1. ACPI sends a hotplug event to a new ACPI memory device object that is
>>>>>> hot-added.
>>>>>> 2. The kernel is notified, and verifies if the new memory device object
>>>>>> has not been attached by any handler yet.
>>>>>> 3. The memory handler is called, and obtains a new memory range from the
>>>>>> ACPI memory device object.
>>>>>> 4. The memory handler calls add_memory() with the new address range.
>>>>>>
>>>>>> The above step 1-4 proceeds automatically within the kernel. No user
>>>>>> input (nor sysfs interface) is necessary. Step 2 prevents double adds
>>>>>> and step 3 gets a valid address range from the firmware directly. Step
>>>>>> 4 is basically the same as the "probe" interface, but with all the
>>>>>> verification up front, this step is safe.
>>>>> This is hot-added part, could you also explain how ACPI memory hotplug
>>>>> works for hot-remove?
>>>> Sure. Here is high-level.
>>>>
>>>> 1. ACPI sends a hotplug event to an ACPI memory device object that is
>>>> requested to hot-remove.
>>>> 2. The kernel is notified, and verifies if the memory device object is
>>>> attached by a handler.
>>>> 3. The memory handler is called (which is being attached), and obtains
>>>> its memory range.
>>>> 4. The memory handler calls remove_memory() with the address range.
>>>> 5. The kernel calls eject method of the ACPI memory device object.
>>>
>>> If hot remove the memory device by the hardware, or writing 1 to
>>> /sys/bus/acpi/devices/PNP0C80:XX/eject both will call eject method?
>>
>> Yes.
>> Both operations will call eject method.
>>
>>> What's the difference between these two methods? I guess the former will send SCI and the latter
>>> won't.
>>
>> Triggers are different. Former is triggered by SCI, latter is triggered by
>> writing sysfs.
>
> Thanks, another question, what's the role of udev in memory hotplug?

Udev notifies user land of hotplug events. So if you want to do something
after memory hotplug automatically, you can do it by registering your script
to udev.

Thanks,
Yasuaki Ishimatsu


>
>>
>> Thanks,
>> Yasuaki Ishimatsu
>>
>>
>>>
>>>>
>>>> Thanks,
>>>> -Toshi
>>>>
>>>>
>>>
>>
>>
>

2013-07-25 15:48:22

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

On Thu, 2013-07-25 at 08:17 +0800, Hush Bensen wrote:
:
> Thanks for your explaination, very useful for me. ;-) Btw, what's the
> eject method done?

On bare metal, the eject method makes a target device electrically
isolated and physically removable. On virtualized environment, the
eject method makes a target device unassigned and released.

> Could you give me the calltrace of add_memory and remove_memory? I
> don't have machine support hotplug, but I hope to investigate how
> ACPI part works for memory hotplug. ;-)

Here you go.

[ 84.883517] [<ffffffff8153e8a4>] add_memory+0x24/0x185
[ 84.883525] [<ffffffff812ca331>] ? acpi_get_pxm+0x2b/0x4e
[ 84.883533] [<ffffffff81303418>] acpi_memory_device_add+0x144/0x274
[ 84.883539] [<ffffffff812c1b25>] acpi_bus_device_attach+0x80/0xd9
[ 84.883545] [<ffffffff812c2b65>] acpi_bus_scan+0x65/0x9f
[ 84.883551] [<ffffffff812c2c1d>] acpi_scan_bus_device_check
+0x7e/0x10f
[ 84.883556] [<ffffffff812c2cc1>] acpi_scan_device_check+0x13/0x15
[ 84.883562] [<ffffffff812bbdab>] acpi_os_execute_deferred+0x25/0x32
[ 84.883568] [<ffffffff81057cbc>] process_one_work+0x229/0x3d3
[ 84.883573] [<ffffffff81057bfa>] ? process_one_work+0x167/0x3d3
[ 84.883579] [<ffffffff81058248>] worker_thread+0x133/0x200
[ 84.883584] [<ffffffff81058115>] ? rescuer_thread+0x280/0x280
[ 84.883591] [<ffffffff8105e0c3>] kthread+0xb1/0xb9
[ 84.883598] [<ffffffff8105e012>] ? __kthread_parkme+0x65/0x65
[ 84.883604] [<ffffffff81556c9c>] ret_from_fork+0x7c/0xb0
[ 84.883610] [<ffffffff8105e012>] ? __kthread_parkme+0x65/0x65

[ 129.586622] [<ffffffff8153f4b1>] remove_memory+0x22/0x85
[ 129.586627] [<ffffffff813031c6>] acpi_memory_device_remove+0x77/0xa7
[ 129.586631] [<ffffffff812c0f95>] acpi_bus_device_detach+0x3d/0x5e
[ 129.586634] [<ffffffff812c0ff8>] acpi_bus_trim+0x42/0x7a
[ 129.586637] [<ffffffff812c1587>] acpi_scan_hot_remove+0x1ba/0x261
[ 129.586641] [<ffffffff812c16ce>] acpi_bus_device_eject+0xa0/0xd0
[ 129.586644] [<ffffffff812bbdab>] acpi_os_execute_deferred+0x25/0x32
[ 129.586648] [<ffffffff81057cbc>] process_one_work+0x229/0x3d3
[ 129.586652] [<ffffffff81057bfa>] ? process_one_work+0x167/0x3d3
[ 129.586655] [<ffffffff81058248>] worker_thread+0x133/0x200
[ 129.586658] [<ffffffff81058115>] ? rescuer_thread+0x280/0x280
[ 129.586663] [<ffffffff8105e0c3>] kthread+0xb1/0xb9
[ 129.586667] [<ffffffff8105e012>] ? __kthread_parkme+0x65/0x65
[ 129.586671] [<ffffffff81556c9c>] ret_from_fork+0x7c/0xb0
[ 129.586675] [<ffffffff8105e012>] ? __kthread_parkme+0x65/0x65

-Toshi

2013-07-25 21:38:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default


* Toshi Kani <[email protected]> wrote:

> > You claimed that the only purpose of this on x86 was
> > that testing was done on non-hotplug systems, using
> > this interface. Non-hotplug systems have e820 maps.
>
> Right. Sorry, I first thought that the interface needed
> to work as defined, i.e. detect a new memory. But for
> the test purpose on non-hotplug systems, that is not
> necessary. So, I agree that we can check e820.
>
> I summarized two options in the email below.
> https://lkml.org/lkml/2013/7/23/602
>
> Option 1) adds a check with e820. Option 2) deprecates
> the interface by removing the config option from x86
> Kconfig. I was thinking that we could evaluate two
> options after this patch gets in. Does it make sense?

Yeah.

That having said, if the e820 check is too difficult to
pull off straight away, I also don't mind keeping it as-is
if it's useful for testing. Just make sure you document it
as "you need to be careful with this" (beyond it being a
root-only interface to begin with).

Thanks,

Ingo

2013-07-25 22:37:03

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

On Thu, 2013-07-25 at 23:38 +0200, Ingo Molnar wrote:
> * Toshi Kani <[email protected]> wrote:
>
> > > You claimed that the only purpose of this on x86 was
> > > that testing was done on non-hotplug systems, using
> > > this interface. Non-hotplug systems have e820 maps.
> >
> > Right. Sorry, I first thought that the interface needed
> > to work as defined, i.e. detect a new memory. But for
> > the test purpose on non-hotplug systems, that is not
> > necessary. So, I agree that we can check e820.
> >
> > I summarized two options in the email below.
> > https://lkml.org/lkml/2013/7/23/602
> >
> > Option 1) adds a check with e820. Option 2) deprecates
> > the interface by removing the config option from x86
> > Kconfig. I was thinking that we could evaluate two
> > options after this patch gets in. Does it make sense?
>
> Yeah.
>
> That having said, if the e820 check is too difficult to
> pull off straight away, I also don't mind keeping it as-is
> if it's useful for testing. Just make sure you document it
> as "you need to be careful with this" (beyond it being a
> root-only interface to begin with).

Sounds good. I will keep it as-is for now, and add more clarification
to the document. I will send an updated patch shortly.

I will make sure to come back after this patch gets in and we get more
info about users.

Thanks,
-Toshi