Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere
Computing) are planning to ship systems with 512 CPUs. So that all CPUs on
these systems can be used with defconfig, we'd like to bump NR_CPUS to 512.
Therefore this patch increases the default NR_CPUS from 256 to 512.
As increasing NR_CPUS will increase the size of cpumasks, there's a fear that
this might have a significant impact on stack usage due to code which places
cpumasks on the stack. To mitigate that concern, we can select
CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with
NR_CPUS=256, we only select this when NR_CPUS > 256.
CPUMASK_OFFSTACK configures the cpumasks in the kernel to be
dynamically allocated. This was used in the X86 architecture in the
past to enable support for larger CPU configurations up to 8k cpus.
With that is becomes possible to dynamically size the allocation of
the cpu bitmaps depending on the quantity of processors detected on
bootup. Memory used for cpumasks will increase if the kernel is
run on a machine with more cores.
Further increases may be needed if ARM processor vendors start
supporting more processors. Given the current inflationary trends
in core counts from multiple processor manufacturers this may occur.
There are minor regressions for hackbench. The kernel data size
for 512 cpus is smaller with offstack than with onstack.
Benchmark results using hackbench average over 10 runs of
hackbench -s 512 -l 2000 -g 15 -f 25 -P
on Altra 80 Core
Support for 256 CPUs on stack. Baseline
7.8564 sec
Support for 512 CUs on stack.
7.8713 sec + 0.18%
512 CPUS offstack
7.8916 sec + 0.44%
Kernel size comparison:
text data filename Difference to onstack256 baseline
25755648 9589248 vmlinuz-6.8.0-rc4-onstack256
25755648 9607680 vmlinuz-6.8.0-rc4-onstack512 +0.19%
25755648 9603584 vmlinuz-6.8.0-rc4-offstack512 +0.14%
Tested-by: Eric Mackay <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Christoph Lameter (Ampere) <[email protected]>
---
Original post: https://www.spinics.net/lists/linux-mm/msg369701.html
V2: https://lkml.org/lkml/2024/2/7/505
V1->V2
- Keep quotation marks
- Remove whiltespace damage
- Add tested by
V2->V3:
- Add test results
- Rework descriptions
arch/arm64/Kconfig | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index aa7c1d435139..4e767dede47d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1427,7 +1427,21 @@ config SCHED_SMT
config NR_CPUS
int "Maximum number of CPUs (2-4096)"
range 2 4096
- default "256"
+ default "512"
+
+#
+# Determines the placement of cpumasks.
+#
+# With CPUMASK_OFFSTACK the cpumasks are dynamically allocated.
+# Useful for machines with lots of core because it avoids increasing
+# the size of many of the data structures in the kernel.
+#
+# If this is off then the cpumasks have a static sizes and are
+# embedded within data structures.
+#
+ config CPUMASK_OFFSTACK
+ def_bool y
+ depends on NR_CPUS > 256
config HOTPLUG_CPU
bool "Support for hot-pluggable CPUs"
--
2.39.2
Hi Christoph,
On Wed, Mar 06, 2024 at 05:45:04PM -0800, Christoph Lameter (Ampere) wrote:
> Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere
> Computing) are planning to ship systems with 512 CPUs. So that all CPUs on
> these systems can be used with defconfig, we'd like to bump NR_CPUS to 512.
> Therefore this patch increases the default NR_CPUS from 256 to 512.
>
> As increasing NR_CPUS will increase the size of cpumasks, there's a fear that
> this might have a significant impact on stack usage due to code which places
> cpumasks on the stack. To mitigate that concern, we can select
> CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with
> NR_CPUS=256, we only select this when NR_CPUS > 256.
>
> CPUMASK_OFFSTACK configures the cpumasks in the kernel to be
> dynamically allocated. This was used in the X86 architecture in the
> past to enable support for larger CPU configurations up to 8k cpus.
>
> With that is becomes possible to dynamically size the allocation of
> the cpu bitmaps depending on the quantity of processors detected on
> bootup. Memory used for cpumasks will increase if the kernel is
> run on a machine with more cores.
>
> Further increases may be needed if ARM processor vendors start
> supporting more processors. Given the current inflationary trends
> in core counts from multiple processor manufacturers this may occur.
>
> There are minor regressions for hackbench. The kernel data size
> for 512 cpus is smaller with offstack than with onstack.
>
> Benchmark results using hackbench average over 10 runs of
>
> hackbench -s 512 -l 2000 -g 15 -f 25 -P
>
> on Altra 80 Core
>
> Support for 256 CPUs on stack. Baseline
>
> 7.8564 sec
>
> Support for 512 CUs on stack.
>
> 7.8713 sec + 0.18%
>
> 512 CPUS offstack
>
> 7.8916 sec + 0.44%
>
> Kernel size comparison:
>
> text data filename Difference to onstack256 baseline
> 25755648 9589248 vmlinuz-6.8.0-rc4-onstack256
> 25755648 9607680 vmlinuz-6.8.0-rc4-onstack512 +0.19%
> 25755648 9603584 vmlinuz-6.8.0-rc4-offstack512 +0.14%
Thanks for this data; I think that's a strong justification that this isn't
likely to cause a big problem for us, and so I thing it makes sense to go with
this.
I have two minor comments below.
> Tested-by: Eric Mackay <[email protected]>
> Reviewed-by: Russell King (Oracle) <[email protected]>
> Signed-off-by: Christoph Lameter (Ampere) <[email protected]>
> ---
>
>
> Original post: https://www.spinics.net/lists/linux-mm/msg369701.html
> V2: https://lkml.org/lkml/2024/2/7/505
>
>
> V1->V2
>
> - Keep quotation marks
> - Remove whiltespace damage
> - Add tested by
>
> V2->V3:
> - Add test results
> - Rework descriptions
>
>
> arch/arm64/Kconfig | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index aa7c1d435139..4e767dede47d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1427,7 +1427,21 @@ config SCHED_SMT
> config NR_CPUS
> int "Maximum number of CPUs (2-4096)"
> range 2 4096
> - default "256"
> + default "512"
> +
> +#
> +# Determines the placement of cpumasks.
> +#
> +# With CPUMASK_OFFSTACK the cpumasks are dynamically allocated.
> +# Useful for machines with lots of core because it avoids increasing
> +# the size of many of the data structures in the kernel.
> +#
> +# If this is off then the cpumasks have a static sizes and are
> +# embedded within data structures.
> +#
> + config CPUMASK_OFFSTACK
> + def_bool y
> + depends on NR_CPUS > 256
As before, can we please delete the comment? That's the general semantic of
CPUMASK_OFFSTACK, not why we're selecting it.
That aside, this config option is defined in lib/Kconfig, so we should select
it rather than redefining it. i.e. this should be:
select CPUMASK_OFFSTACK if NR_CPUS > 256
Sorry for not spotting that before.
With those changes:
Acked-by: Mark Rutland <[email protected]>
Catalin, are you happy to fix that up when applying?
Mark.
>
> config HOTPLUG_CPU
> bool "Support for hot-pluggable CPUs"
> --
> 2.39.2
>
On Wed, 06 Mar 2024 17:45:04 -0800, Christoph Lameter (Ampere) wrote:
> Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere
> Computing) are planning to ship systems with 512 CPUs. So that all CPUs on
> these systems can be used with defconfig, we'd like to bump NR_CPUS to 512.
> Therefore this patch increases the default NR_CPUS from 256 to 512.
>
> As increasing NR_CPUS will increase the size of cpumasks, there's a fear that
> this might have a significant impact on stack usage due to code which places
> cpumasks on the stack. To mitigate that concern, we can select
> CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with
> NR_CPUS=256, we only select this when NR_CPUS > 256.
>
> [...]
Applied to arm64 (for-next/misc), thanks!
I dropped the config entry and comment, replaced it with a select as per
Mark's suggestion.
[1/1] ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512
https://git.kernel.org/arm64/c/0499a78369ad
--
Catalin
Dear All,
On 07.03.2024 02:45, Christoph Lameter (Ampere) wrote:
> Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere
> Computing) are planning to ship systems with 512 CPUs. So that all
> CPUs on
> these systems can be used with defconfig, we'd like to bump NR_CPUS to
> 512.
> Therefore this patch increases the default NR_CPUS from 256 to 512.
>
> As increasing NR_CPUS will increase the size of cpumasks, there's a
> fear that
> this might have a significant impact on stack usage due to code which
> places
> cpumasks on the stack. To mitigate that concern, we can select
> CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with
> NR_CPUS=256, we only select this when NR_CPUS > 256.
>
> CPUMASK_OFFSTACK configures the cpumasks in the kernel to be
> dynamically allocated. This was used in the X86 architecture in the
> past to enable support for larger CPU configurations up to 8k cpus.
>
> With that is becomes possible to dynamically size the allocation of
> the cpu bitmaps depending on the quantity of processors detected on
> bootup. Memory used for cpumasks will increase if the kernel is
> run on a machine with more cores.
>
> Further increases may be needed if ARM processor vendors start
> supporting more processors. Given the current inflationary trends
> in core counts from multiple processor manufacturers this may occur.
>
> There are minor regressions for hackbench. The kernel data size
> for 512 cpus is smaller with offstack than with onstack.
>
> Benchmark results using hackbench average over 10 runs of
>
> hackbench -s 512 -l 2000 -g 15 -f 25 -P
>
> on Altra 80 Core
>
> Support for 256 CPUs on stack. Baseline
>
> 7.8564 sec
>
> Support for 512 CUs on stack.
>
> 7.8713 sec + 0.18%
>
> 512 CPUS offstack
>
> 7.8916 sec + 0.44%
>
> Kernel size comparison:
>
> text data filename Difference to
> onstack256 baseline
> 25755648 9589248 vmlinuz-6.8.0-rc4-onstack256
> 25755648 9607680 vmlinuz-6.8.0-rc4-onstack512 +0.19%
> 25755648 9603584 vmlinuz-6.8.0-rc4-offstack512 +0.14%
>
> Tested-by: Eric Mackay <[email protected]>
> Reviewed-by: Russell King (Oracle) <[email protected]>
> Signed-off-by: Christoph Lameter (Ampere) <[email protected]>
This patch landed in today's linux-next as commit 0499a78369ad ("ARM64:
Dynamically allocate cpumasks and increase supported CPUs to 512").
Unfortunately it triggers the following warning during boot on most of
my ARM64-based test boards. Here is an example from Odroid-N2 board:
------------[ cut here ]------------
WARNING: CPU: 4 PID: 63 at drivers/opp/core.c:2554
dev_pm_opp_set_config+0x390/0x710
Modules linked in: dw_hdmi_i2s_audio meson_gxl smsc onboard_usb_hub(+)
rtc_pcf8563 panfrost snd_soc_meson_axg_sound_card drm_shmem_helper
crct10dif_ce dwmac_generic snd_soc_meson_card_utils gpu_sched
snd_soc_meson_g12a_toacodec snd_soc_meson_g12a_tohdmitx rc_odroid
snd_soc_meson_codec_glue pwm_meson ao_cec_g12a meson_gxbb_wdt meson_ir
snd_soc_meson_axg_frddr snd_soc_meson_axg_toddr snd_soc_meson_axg_tdmin
meson_vdec(C) v4l2_mem2mem videobuf2_dma_contig snd_soc_meson_axg_tdmout
videobuf2_memops axg_audio videobuf2_v4l2 sclk_div videodev
reset_meson_audio_arb snd_soc_meson_axg_fifo clk_phase dwmac_meson8b
stmmac_platform videobuf2_common mdio_mux_meson_g12a meson_drm
meson_dw_hdmi rtc_meson_vrtc stmmac meson_ddr_pmu_g12 mc dw_hdmi
drm_display_helper pcs_xpcs snd_soc_meson_t9015 meson_canvas gpio_fan
display_connector snd_soc_meson_axg_tdm_interface
snd_soc_simple_amplifier snd_soc_meson_axg_tdm_formatter nvmem_meson_efuse
hub 1-1:1.0: USB hub found
CPU: 4 PID: 63 Comm: kworker/u12:5 Tainted: G C 6.8.0-rc3+ #14677
Hardware name: Hardkernel ODROID-N2 (DT)
Workqueue: events_unbound deferred_probe_work_func
pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_opp_set_config+0x390/0x710
lr : dev_pm_opp_set_config+0x5c/0x710
...
Call trace:
dev_pm_opp_set_config+0x390/0x710
dt_cpufreq_probe+0x268/0x480
platform_probe+0x68/0xd8
really_probe+0x148/0x2b4
__driver_probe_device+0x78/0x12c
driver_probe_device+0xdc/0x164
__device_attach_driver+0xb8/0x138
bus_for_each_drv+0x84/0xe0
__device_attach+0xa8/0x1b0
device_initial_probe+0x14/0x20
bus_probe_device+0xb0/0xb4
deferred_probe_work_func+0x8c/0xc8
process_one_work+0x1ec/0x53c
worker_thread+0x298/0x408
kthread+0x124/0x128
ret_from_fork+0x10/0x20
irq event stamp: 317178
hardirqs last enabled at (317177): [<ffff8000801788d4>]
ktime_get_coarse_real_ts64+0x10c/0x110
hardirqs last disabled at (317178): [<ffff800081222030>] el1_dbg+0x24/0x8c
softirqs last enabled at (315802): [<ffff800080010a60>]
__do_softirq+0x4a0/0x4e8
softirqs last disabled at (315793): [<ffff8000800169b0>]
____do_softirq+0x10/0x1c
---[ end trace 0000000000000000 ]---
cpu cpu2: error -EBUSY: failed to set regulators
cpufreq-dt: probe of cpufreq-dt failed with error -16
It looks that cpufreq-dt and/or opp drivers needs some adjustments
related with this change.
> ---
>
>
> Original post: https://www.spinics.net/lists/linux-mm/msg369701.html
> V2: https://lkml.org/lkml/2024/2/7/505
>
>
> V1->V2
>
> - Keep quotation marks
> - Remove whiltespace damage
> - Add tested by
>
> V2->V3:
> - Add test results
> - Rework descriptions
>
>
> arch/arm64/Kconfig | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index aa7c1d435139..4e767dede47d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1427,7 +1427,21 @@ config SCHED_SMT
> config NR_CPUS
> int "Maximum number of CPUs (2-4096)"
> range 2 4096
> - default "256"
> + default "512"
> +
> +#
> +# Determines the placement of cpumasks.
> +#
> +# With CPUMASK_OFFSTACK the cpumasks are dynamically allocated.
> +# Useful for machines with lots of core because it avoids increasing
> +# the size of many of the data structures in the kernel.
> +#
> +# If this is off then the cpumasks have a static sizes and are
> +# embedded within data structures.
> +#
> + config CPUMASK_OFFSTACK
> + def_bool y
> + depends on NR_CPUS > 256
>
> config HOTPLUG_CPU
> bool "Support for hot-pluggable CPUs"
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
On Fri, Mar 08, 2024 at 03:01:28PM +0100, Marek Szyprowski wrote:
> This patch landed in today's linux-next as commit 0499a78369ad ("ARM64:
> Dynamically allocate cpumasks and increase supported CPUs to 512").
> Unfortunately it triggers the following warning during boot on most of
> my ARM64-based test boards. Here is an example from Odroid-N2 board:
>
> ?------------[ cut here ]------------
> ?WARNING: CPU: 4 PID: 63 at drivers/opp/core.c:2554
> dev_pm_opp_set_config+0x390/0x710
> ?Modules linked in: dw_hdmi_i2s_audio meson_gxl smsc onboard_usb_hub(+)
> rtc_pcf8563 panfrost snd_soc_meson_axg_sound_card drm_shmem_helper
> crct10dif_ce dwmac_generic snd_soc_meson_card_utils gpu_sched
> snd_soc_meson_g12a_toacodec snd_soc_meson_g12a_tohdmitx rc_odroid
> snd_soc_meson_codec_glue pwm_meson ao_cec_g12a meson_gxbb_wdt meson_ir
> snd_soc_meson_axg_frddr snd_soc_meson_axg_toddr snd_soc_meson_axg_tdmin
> meson_vdec(C) v4l2_mem2mem videobuf2_dma_contig snd_soc_meson_axg_tdmout
> videobuf2_memops axg_audio videobuf2_v4l2 sclk_div videodev
> reset_meson_audio_arb snd_soc_meson_axg_fifo clk_phase dwmac_meson8b
> stmmac_platform videobuf2_common mdio_mux_meson_g12a meson_drm
> meson_dw_hdmi rtc_meson_vrtc stmmac meson_ddr_pmu_g12 mc dw_hdmi
> drm_display_helper pcs_xpcs snd_soc_meson_t9015 meson_canvas gpio_fan
> display_connector snd_soc_meson_axg_tdm_interface
> snd_soc_simple_amplifier snd_soc_meson_axg_tdm_formatter nvmem_meson_efuse
> ?hub 1-1:1.0: USB hub found
> ?CPU: 4 PID: 63 Comm: kworker/u12:5 Tainted: G???????? C 6.8.0-rc3+ #14677
> ?Hardware name: Hardkernel ODROID-N2 (DT)
> ?Workqueue: events_unbound deferred_probe_work_func
> ?pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> ?pc : dev_pm_opp_set_config+0x390/0x710
> ?lr : dev_pm_opp_set_config+0x5c/0x710
> ?...
> ?Call trace:
> ? dev_pm_opp_set_config+0x390/0x710
> ? dt_cpufreq_probe+0x268/0x480
> ? platform_probe+0x68/0xd8
> ? really_probe+0x148/0x2b4
> ? __driver_probe_device+0x78/0x12c
> ? driver_probe_device+0xdc/0x164
> ? __device_attach_driver+0xb8/0x138
> ? bus_for_each_drv+0x84/0xe0
> ? __device_attach+0xa8/0x1b0
> ? device_initial_probe+0x14/0x20
> ? bus_probe_device+0xb0/0xb4
> ? deferred_probe_work_func+0x8c/0xc8
> ? process_one_work+0x1ec/0x53c
> ? worker_thread+0x298/0x408
> ? kthread+0x124/0x128
> ? ret_from_fork+0x10/0x20
> ?irq event stamp: 317178
> ?hardirqs last? enabled at (317177): [<ffff8000801788d4>]
> ktime_get_coarse_real_ts64+0x10c/0x110
> ?hardirqs last disabled at (317178): [<ffff800081222030>] el1_dbg+0x24/0x8c
> ?softirqs last? enabled at (315802): [<ffff800080010a60>]
> __do_softirq+0x4a0/0x4e8
> ?softirqs last disabled at (315793): [<ffff8000800169b0>]
> ____do_softirq+0x10/0x1c
> ?---[ end trace 0000000000000000 ]---
> ?cpu cpu2: error -EBUSY: failed to set regulators
> ?cpufreq-dt: probe of cpufreq-dt failed with error -16
>
> It looks that cpufreq-dt and/or opp drivers needs some adjustments
> related with this change.
That's strange. Is this with defconfig? I wonder whether NR_CPUS being
larger caused the issue with this specific code. Otherwise
CPUMASK_OFFSTACK may not work that well on arm64.
>
>
> > ---
> >
> >
> > Original post: https://www.spinics.net/lists/linux-mm/msg369701.html
> > V2: https://lkml.org/lkml/2024/2/7/505
> >
> >
> > V1->V2
> >
> > - Keep quotation marks
> > - Remove whiltespace damage
> > - Add tested by
> >
> > V2->V3:
> > - Add test results
> > - Rework descriptions
> >
> >
> > ?arch/arm64/Kconfig | 16 +++++++++++++++-
> > ?1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index aa7c1d435139..4e767dede47d 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -1427,7 +1427,21 @@ config SCHED_SMT
> > ?config NR_CPUS
> > ???? int "Maximum number of CPUs (2-4096)"
> > ???? range 2 4096
> > -??? default "256"
> > +??? default "512"
> > +
> > +#
> > +# Determines the placement of cpumasks.
> > +#
> > +# With CPUMASK_OFFSTACK the cpumasks are dynamically allocated.
> > +# Useful for machines with lots of core because it avoids increasing
> > +# the size of many of the data structures in the kernel.
> > +#
> > +# If this is off then the cpumasks have a static sizes and are
> > +# embedded within data structures.
> > +#
> > +??? config CPUMASK_OFFSTACK
> > +??? def_bool y
> > +??? depends on NR_CPUS > 256
> >
> > ?config HOTPLUG_CPU
> > ???? bool "Support for hot-pluggable CPUs"
>
> Best regards
> --
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
>
--
Catalin
On 08.03.2024 15:51, Catalin Marinas wrote:
> On Fri, Mar 08, 2024 at 03:01:28PM +0100, Marek Szyprowski wrote:
>> This patch landed in today's linux-next as commit 0499a78369ad ("ARM64:
>> Dynamically allocate cpumasks and increase supported CPUs to 512").
>> Unfortunately it triggers the following warning during boot on most of
>> my ARM64-based test boards. Here is an example from Odroid-N2 board:
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 4 PID: 63 at drivers/opp/core.c:2554
>> dev_pm_opp_set_config+0x390/0x710
>> Modules linked in: dw_hdmi_i2s_audio meson_gxl smsc onboard_usb_hub(+)
>> rtc_pcf8563 panfrost snd_soc_meson_axg_sound_card drm_shmem_helper
>> crct10dif_ce dwmac_generic snd_soc_meson_card_utils gpu_sched
>> snd_soc_meson_g12a_toacodec snd_soc_meson_g12a_tohdmitx rc_odroid
>> snd_soc_meson_codec_glue pwm_meson ao_cec_g12a meson_gxbb_wdt meson_ir
>> snd_soc_meson_axg_frddr snd_soc_meson_axg_toddr snd_soc_meson_axg_tdmin
>> meson_vdec(C) v4l2_mem2mem videobuf2_dma_contig snd_soc_meson_axg_tdmout
>> videobuf2_memops axg_audio videobuf2_v4l2 sclk_div videodev
>> reset_meson_audio_arb snd_soc_meson_axg_fifo clk_phase dwmac_meson8b
>> stmmac_platform videobuf2_common mdio_mux_meson_g12a meson_drm
>> meson_dw_hdmi rtc_meson_vrtc stmmac meson_ddr_pmu_g12 mc dw_hdmi
>> drm_display_helper pcs_xpcs snd_soc_meson_t9015 meson_canvas gpio_fan
>> display_connector snd_soc_meson_axg_tdm_interface
>> snd_soc_simple_amplifier snd_soc_meson_axg_tdm_formatter nvmem_meson_efuse
>> hub 1-1:1.0: USB hub found
>> CPU: 4 PID: 63 Comm: kworker/u12:5 Tainted: G C 6.8.0-rc3+ #14677
>> Hardware name: Hardkernel ODROID-N2 (DT)
>> Workqueue: events_unbound deferred_probe_work_func
>> pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : dev_pm_opp_set_config+0x390/0x710
>> lr : dev_pm_opp_set_config+0x5c/0x710
>> ...
>> Call trace:
>> dev_pm_opp_set_config+0x390/0x710
>> dt_cpufreq_probe+0x268/0x480
>> platform_probe+0x68/0xd8
>> really_probe+0x148/0x2b4
>> __driver_probe_device+0x78/0x12c
>> driver_probe_device+0xdc/0x164
>> __device_attach_driver+0xb8/0x138
>> bus_for_each_drv+0x84/0xe0
>> __device_attach+0xa8/0x1b0
>> device_initial_probe+0x14/0x20
>> bus_probe_device+0xb0/0xb4
>> deferred_probe_work_func+0x8c/0xc8
>> process_one_work+0x1ec/0x53c
>> worker_thread+0x298/0x408
>> kthread+0x124/0x128
>> ret_from_fork+0x10/0x20
>> irq event stamp: 317178
>> hardirqs last enabled at (317177): [<ffff8000801788d4>]
>> ktime_get_coarse_real_ts64+0x10c/0x110
>> hardirqs last disabled at (317178): [<ffff800081222030>] el1_dbg+0x24/0x8c
>> softirqs last enabled at (315802): [<ffff800080010a60>]
>> __do_softirq+0x4a0/0x4e8
>> softirqs last disabled at (315793): [<ffff8000800169b0>]
>> ____do_softirq+0x10/0x1c
>> ---[ end trace 0000000000000000 ]---
>> cpu cpu2: error -EBUSY: failed to set regulators
>> cpufreq-dt: probe of cpufreq-dt failed with error -16
>>
>> It looks that cpufreq-dt and/or opp drivers needs some adjustments
>> related with this change.
> That's strange. Is this with defconfig? I wonder whether NR_CPUS being
> larger caused the issue with this specific code. Otherwise
> CPUMASK_OFFSTACK may not work that well on arm64.
I've used defconfig with some debug options enabled and some drivers
compiled-in:
make ARCH=arm64 defconfig
/scripts/config -e BLK_DEV_RAM --set-val BLK_DEV_RAM_COUNT 4 --set-val
BLK_DEV_RAM_SIZE 81920 --set-val CMA_SIZE_MBYTES 96 -e PROVE_LOCKING -e
DEBUG_ATOMIC_SLEEP -e STAGING -e I2C_GPIO -e PM_DEBUG -e
PM_ADVANCED_DEBUG -e USB_GADGET -e USB_ETH -e CONFIG_DEVFREQ_THERMAL -e
CONFIG_BRCMFMAC_PCIE -e CONFIG_NFC -d ARCH_SUNXI -d ARCH_ALPINE -d
DRM_NOUVEAU -d ARCH_BCM_IPROC -d ARCH_BERLIN -d ARCH_BRCMSTB -d
ARCH_LAYERSCAPE -d ARCH_LG1K -d ARCH_HISI -d ARCH_MEDIATEK -d ARCH_MVEBU
-d ARCH_SEATTLE -d ARCH_SYNQUACER -d ARCH_RENESAS -d ARCH_STRATIX10 -d
ARCH_TEGRA -d ARCH_SPRD -d ARCH_THUNDER -d ARCH_THUNDER2 -d
ARCH_UNIPHIER -d ARCH_XGENE -d ARCH_ZX -d ARCH_ZYNQMP -d HIBERNATION -d
CLK_SUNXI -d CONFIG_EFI -d CONFIG_TEE -e FW_CFG_SYSFS
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
On Fri, 8 Mar 2024, Marek Szyprowski wrote:
>>>
>>> It looks that cpufreq-dt and/or opp drivers needs some adjustments
>>> related with this change.
>> That's strange. Is this with defconfig? I wonder whether NR_CPUS being
>> larger caused the issue with this specific code. Otherwise
>> CPUMASK_OFFSTACK may not work that well on arm64.
cpumask handling must use the accessor functions provided in
include/linux/cpumask.h for declaring and accessing cpumasks. It is likely
related to the driver opencoding one of the accessors.
I.e. you must use alloc_cpumask_var() and not allocate yourself on the
stack.
On Fri, Mar 08, 2024 at 09:08:59AM -0800, Christoph Lameter (Ampere) wrote:
> On Fri, 8 Mar 2024, Marek Szyprowski wrote:
>
> > > >
> > > > It looks that cpufreq-dt and/or opp drivers needs some adjustments
> > > > related with this change.
> > > That's strange. Is this with defconfig? I wonder whether NR_CPUS being
> > > larger caused the issue with this specific code. Otherwise
> > > CPUMASK_OFFSTACK may not work that well on arm64.
>
> cpumask handling must use the accessor functions provided in
> include/linux/cpumask.h for declaring and accessing cpumasks. It is likely
> related to the driver opencoding one of the accessors.
I took a look at both the OPP code and the cpufreq-dt code and it looks like
those are doign the right thing w.r.t. cpumask manipulation (i.e. they only use
the cpumask accessors, and use the cpumask_var_*() functions to dynamically
allocate/free cpumasks). Maybe I've missed something, but superficially those
look right.
Marek, can you try reverting this commit and trying defconfig + NR_CPUS=512?
That'll have CPUMASK_OFFSTACK=n, and:
* If that blows up, we know the problem is independent of CPUMASK_OFFSTACK, and
has something to do with large cpumasks (either a driver bug, or elsewhere).
* If that doesn't blow up, it suggests the problem is related to
CPUMASK_OFFSTACK rather than with large cpumasks specifically.
Either way, we probably need to revert this patch for now, as this won't have
enough time to soak in linux-next in time for v6.9.
Mark.
On 11.03.2024 13:12, Mark Rutland wrote:
> On Fri, Mar 08, 2024 at 09:08:59AM -0800, Christoph Lameter (Ampere) wrote:
>> On Fri, 8 Mar 2024, Marek Szyprowski wrote:
>>>>> It looks that cpufreq-dt and/or opp drivers needs some adjustments
>>>>> related with this change.
>>>> That's strange. Is this with defconfig? I wonder whether NR_CPUS being
>>>> larger caused the issue with this specific code. Otherwise
>>>> CPUMASK_OFFSTACK may not work that well on arm64.
>> cpumask handling must use the accessor functions provided in
>> include/linux/cpumask.h for declaring and accessing cpumasks. It is likely
>> related to the driver opencoding one of the accessors.
> I took a look at both the OPP code and the cpufreq-dt code and it looks like
> those are doign the right thing w.r.t. cpumask manipulation (i.e. they only use
> the cpumask accessors, and use the cpumask_var_*() functions to dynamically
> allocate/free cpumasks). Maybe I've missed something, but superficially those
> look right.
>
> Marek, can you try reverting this commit and trying defconfig + NR_CPUS=512?
Yes, with $subject reverted and CONFIG_NR_CPUS=512 everything works
fine, so it must be something else broken.
> That'll have CPUMASK_OFFSTACK=n, and:
>
> * If that blows up, we know the problem is independent of CPUMASK_OFFSTACK, and
> has something to do with large cpumasks (either a driver bug, or elsewhere).
>
> * If that doesn't blow up, it suggests the problem is related to
> CPUMASK_OFFSTACK rather than with large cpumasks specifically.
>
> Either way, we probably need to revert this patch for now, as this won't have
> enough time to soak in linux-next in time for v6.9.
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
On Mon, Mar 11, 2024 at 03:56:37PM +0100, Marek Szyprowski wrote:
> On 11.03.2024 13:12, Mark Rutland wrote:
> > On Fri, Mar 08, 2024 at 09:08:59AM -0800, Christoph Lameter (Ampere) wrote:
> >> On Fri, 8 Mar 2024, Marek Szyprowski wrote:
> >>>>> It looks that cpufreq-dt and/or opp drivers needs some adjustments
> >>>>> related with this change.
> >>>> That's strange. Is this with defconfig? I wonder whether NR_CPUS being
> >>>> larger caused the issue with this specific code. Otherwise
> >>>> CPUMASK_OFFSTACK may not work that well on arm64.
> >> cpumask handling must use the accessor functions provided in
> >> include/linux/cpumask.h for declaring and accessing cpumasks. It is likely
> >> related to the driver opencoding one of the accessors.
> > I took a look at both the OPP code and the cpufreq-dt code and it looks like
> > those are doign the right thing w.r.t. cpumask manipulation (i.e. they only use
> > the cpumask accessors, and use the cpumask_var_*() functions to dynamically
> > allocate/free cpumasks). Maybe I've missed something, but superficially those
> > look right.
> >
> > Marek, can you try reverting this commit and trying defconfig + NR_CPUS=512?
>
> Yes, with $subject reverted and CONFIG_NR_CPUS=512 everything works
> fine, so it must be something else broken.
Thanks for confirming. Would you mind testing the problematic commit
with CONFIG_DEBUG_PER_CPU_MAPS enabled? If it doesn't show anything
obvious that can be fixed quickly, I'll revert the commit and queue it
again after -rc1 for 6.10 (I haven't sent 6.9 the pull request yet).
--
Catalin
Hi Catalin,
On 11.03.2024 16:22, Catalin Marinas wrote:
> On Mon, Mar 11, 2024 at 03:56:37PM +0100, Marek Szyprowski wrote:
>> On 11.03.2024 13:12, Mark Rutland wrote:
>>> On Fri, Mar 08, 2024 at 09:08:59AM -0800, Christoph Lameter (Ampere) wrote:
>>>> On Fri, 8 Mar 2024, Marek Szyprowski wrote:
>>>>>>> It looks that cpufreq-dt and/or opp drivers needs some adjustments
>>>>>>> related with this change.
>>>>>> That's strange. Is this with defconfig? I wonder whether NR_CPUS being
>>>>>> larger caused the issue with this specific code. Otherwise
>>>>>> CPUMASK_OFFSTACK may not work that well on arm64.
>>>> cpumask handling must use the accessor functions provided in
>>>> include/linux/cpumask.h for declaring and accessing cpumasks. It is likely
>>>> related to the driver opencoding one of the accessors.
>>> I took a look at both the OPP code and the cpufreq-dt code and it looks like
>>> those are doign the right thing w.r.t. cpumask manipulation (i.e. they only use
>>> the cpumask accessors, and use the cpumask_var_*() functions to dynamically
>>> allocate/free cpumasks). Maybe I've missed something, but superficially those
>>> look right.
>>>
>>> Marek, can you try reverting this commit and trying defconfig + NR_CPUS=512?
>> Yes, with $subject reverted and CONFIG_NR_CPUS=512 everything works
>> fine, so it must be something else broken.
> Thanks for confirming. Would you mind testing the problematic commit
> with CONFIG_DEBUG_PER_CPU_MAPS enabled? If it doesn't show anything
> obvious that can be fixed quickly, I'll revert the commit and queue it
> again after -rc1 for 6.10 (I haven't sent 6.9 the pull request yet).
I've enabled this option, but unfortunately it didn't reveal anything
more besides the warning and error I've posted in my initial report. I
will try to analyze this issue further, but I won't manage to do this today.
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
On Mon, Mar 11, 2024 at 05:51:04PM +0100, Marek Szyprowski wrote:
> On 11.03.2024 16:22, Catalin Marinas wrote:
> > On Mon, Mar 11, 2024 at 03:56:37PM +0100, Marek Szyprowski wrote:
> >> On 11.03.2024 13:12, Mark Rutland wrote:
> >>> On Fri, Mar 08, 2024 at 09:08:59AM -0800, Christoph Lameter (Ampere) wrote:
> >>>> On Fri, 8 Mar 2024, Marek Szyprowski wrote:
> >>>>>>> It looks that cpufreq-dt and/or opp drivers needs some adjustments
> >>>>>>> related with this change.
> >>>>>> That's strange. Is this with defconfig? I wonder whether NR_CPUS being
> >>>>>> larger caused the issue with this specific code. Otherwise
> >>>>>> CPUMASK_OFFSTACK may not work that well on arm64.
> >>>> cpumask handling must use the accessor functions provided in
> >>>> include/linux/cpumask.h for declaring and accessing cpumasks. It is likely
> >>>> related to the driver opencoding one of the accessors.
> >>> I took a look at both the OPP code and the cpufreq-dt code and it looks like
> >>> those are doign the right thing w.r.t. cpumask manipulation (i.e. they only use
> >>> the cpumask accessors, and use the cpumask_var_*() functions to dynamically
> >>> allocate/free cpumasks). Maybe I've missed something, but superficially those
> >>> look right.
> >>>
> >>> Marek, can you try reverting this commit and trying defconfig + NR_CPUS=512?
> >> Yes, with $subject reverted and CONFIG_NR_CPUS=512 everything works
> >> fine, so it must be something else broken.
> > Thanks for confirming. Would you mind testing the problematic commit
> > with CONFIG_DEBUG_PER_CPU_MAPS enabled? If it doesn't show anything
> > obvious that can be fixed quickly, I'll revert the commit and queue it
> > again after -rc1 for 6.10 (I haven't sent 6.9 the pull request yet).
>
> I've enabled this option, but unfortunately it didn't reveal anything
> more besides the warning and error I've posted in my initial report. I
> will try to analyze this issue further, but I won't manage to do this today.
No worries, thanks for giving this a try.
--
Catalin
On Fri, Mar 08, 2024 at 03:01:28PM +0100, Marek Szyprowski wrote:
> On 07.03.2024 02:45, Christoph Lameter (Ampere) wrote:
> > Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere
> > Computing) are planning to ship systems with 512 CPUs. So that all CPUs on
> > these systems can be used with defconfig, we'd like to bump NR_CPUS to 512.
> > Therefore this patch increases the default NR_CPUS from 256 to 512.
> >
> > As increasing NR_CPUS will increase the size of cpumasks, there's a fear that
> > this might have a significant impact on stack usage due to code which places
> > cpumasks on the stack. To mitigate that concern, we can select
> > CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with
> > NR_CPUS=256, we only select this when NR_CPUS > 256.
> >
> > CPUMASK_OFFSTACK configures the cpumasks in the kernel to be
> > dynamically allocated. This was used in the X86 architecture in the
> > past to enable support for larger CPU configurations up to 8k cpus.
[...]
> This patch landed in today's linux-next as commit 0499a78369ad ("ARM64:
> Dynamically allocate cpumasks and increase supported CPUs to 512").
> Unfortunately it triggers the following warning during boot on most of
> my ARM64-based test boards. Here is an example from Odroid-N2 board:
I spent a big part of this afternoon going through the code paths but
there's nothing obvious that triggered this problem. My suspicion is
some memory corruption, algorithmically I can't see anything that could
go wrong with CPUMASK_OFFSTACK. Unfortunately I could not reproduce it
yet to be able to add some debug info.
So I decided to revert this patch. If we get to the bottom of it during
the merging window, I can still revive it. Otherwise we'll add it to
linux-next post -rc1.
Thanks for reporting it and subsequent debugging.
--
Catalin
On Tue, Mar 12, 2024 at 10:06:06AM -0700, Christoph Lameter (Ampere) wrote:
> On Mon, 11 Mar 2024, Christoph Lameter (Ampere) wrote:
>
> > This could be an issue in the ARM64 arch code itself where there maybe
> > an assumption elsewhere that a cpumask can always store up to NR_CPU
> > cpus and not only nr_cpu_ids as OFFSTACK does.
> >
> > How can I exercise the opp driver in order to recreate the problem?
> >
> > I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if
> > there is an issue with OFFSTACK in opp then it should fail with kernel
> > default configuration on that platform.
>
> I checked the ARM64 arch sources use of NR_CPUS and its all fine.
>
> Also verified in my testing logs that CONFIG_PM_OPP was set in all tests.
>
> No warnings in the kernel log during those tests.
>
> How to reproduce this?
I guess you need a platform with a dts that has an "operating-points-v2"
property. I don't have any around.
Sudeep was trying to trigger this code path earlier, not sure where he
got to.
--
Catalin
On Tue, Mar 12, 2024 at 05:55:49PM +0000, Catalin Marinas wrote:
> On Tue, Mar 12, 2024 at 10:06:06AM -0700, Christoph Lameter (Ampere) wrote:
> > On Mon, 11 Mar 2024, Christoph Lameter (Ampere) wrote:
> >
> > > This could be an issue in the ARM64 arch code itself where there maybe
> > > an assumption elsewhere that a cpumask can always store up to NR_CPU
> > > cpus and not only nr_cpu_ids as OFFSTACK does.
> > >
> > > How can I exercise the opp driver in order to recreate the problem?
> > >
> > > I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if
> > > there is an issue with OFFSTACK in opp then it should fail with kernel
> > > default configuration on that platform.
> >
> > I checked the ARM64 arch sources use of NR_CPUS and its all fine.
> >
> > Also verified in my testing logs that CONFIG_PM_OPP was set in all tests.
> >
> > No warnings in the kernel log during those tests.
> >
> > How to reproduce this?
>
> I guess you need a platform with a dts that has an "operating-points-v2"
> property. I don't have any around.
>
> Sudeep was trying to trigger this code path earlier, not sure where he
> got to.
I did try to trigger this on FVP by adding OPPs + some hacks to add dummy
clock provider to successfully probe this driver. I couldn't hit the issue
reported ????. It could be that with the hardware clock/regulator drivers, it
take a different path in OPP core.
--
Regards,
Sudeep
On 13.03.2024 15:35, Sudeep Holla wrote:
> On Tue, Mar 12, 2024 at 05:55:49PM +0000, Catalin Marinas wrote:
>> On Tue, Mar 12, 2024 at 10:06:06AM -0700, Christoph Lameter (Ampere) wrote:
>>> On Mon, 11 Mar 2024, Christoph Lameter (Ampere) wrote:
>>>
>>>> This could be an issue in the ARM64 arch code itself where there maybe
>>>> an assumption elsewhere that a cpumask can always store up to NR_CPU
>>>> cpus and not only nr_cpu_ids as OFFSTACK does.
>>>>
>>>> How can I exercise the opp driver in order to recreate the problem?
>>>>
>>>> I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if
>>>> there is an issue with OFFSTACK in opp then it should fail with kernel
>>>> default configuration on that platform.
>>> I checked the ARM64 arch sources use of NR_CPUS and its all fine.
>>>
>>> Also verified in my testing logs that CONFIG_PM_OPP was set in all tests.
>>>
>>> No warnings in the kernel log during those tests.
>>>
>>> How to reproduce this?
>> I guess you need a platform with a dts that has an "operating-points-v2"
>> property. I don't have any around.
>>
>> Sudeep was trying to trigger this code path earlier, not sure where he
>> got to.
> I did try to trigger this on FVP by adding OPPs + some hacks to add dummy
> clock provider to successfully probe this driver. I couldn't hit the issue
> reported ????. It could be that with the hardware clock/regulator drivers, it
> take a different path in OPP core.
I can fully reproduce this issue on Khadas VIM3 and Odroid-N2 boards.
Both Meson A311D SoC based.
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
On Wed, Mar 13, 2024 at 05:22:33PM +0100, Marek Szyprowski wrote:
> On 13.03.2024 15:35, Sudeep Holla wrote:
> > On Tue, Mar 12, 2024 at 05:55:49PM +0000, Catalin Marinas wrote:
> >> On Tue, Mar 12, 2024 at 10:06:06AM -0700, Christoph Lameter (Ampere) wrote:
> >>> On Mon, 11 Mar 2024, Christoph Lameter (Ampere) wrote:
> >>>
> >>>> This could be an issue in the ARM64 arch code itself where there maybe
> >>>> an assumption elsewhere that a cpumask can always store up to NR_CPU
> >>>> cpus and not only nr_cpu_ids as OFFSTACK does.
> >>>>
> >>>> How can I exercise the opp driver in order to recreate the problem?
> >>>>
> >>>> I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if
> >>>> there is an issue with OFFSTACK in opp then it should fail with kernel
> >>>> default configuration on that platform.
> >>> I checked the ARM64 arch sources use of NR_CPUS and its all fine.
> >>>
> >>> Also verified in my testing logs that CONFIG_PM_OPP was set in all tests.
> >>>
> >>> No warnings in the kernel log during those tests.
> >>>
> >>> How to reproduce this?
> >> I guess you need a platform with a dts that has an "operating-points-v2"
> >> property. I don't have any around.
> >>
> >> Sudeep was trying to trigger this code path earlier, not sure where he
> >> got to.
> > I did try to trigger this on FVP by adding OPPs + some hacks to add dummy
> > clock provider to successfully probe this driver. I couldn't hit the issue
> > reported ????. It could be that with the hardware clock/regulator drivers, it
> > take a different path in OPP core.
>
> I can fully reproduce this issue on Khadas VIM3 and Odroid-N2 boards.
> Both Meson A311D SoC based.
So, if I'm reading the OPP code and the DTS* files for Khadas VIM3
correctly, these use operating-points-v2, which is parsed by the opp
layer.
If the opp layer is unable to parse any operating points, it should
print "no supported OPPs" and remove the table (thereby preventing
the code in question being reached.)
So, I wonder whether what you're seeing is a latent bug which is
being tickled by the presence of the CPU masks being off-stack
changing the kernel timing.
I would suggest the printk debug approach may help here to see when
the OPPs are begun to be parsed, when they're created etc and their
timing relationship to being used. Given the suspicion, it's possible
that the mere addition of printk() may "fix" the problem, which again
would be another semi-useful data point.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On 13.03.2024 17:39, Christoph Lameter (Ampere) wrote:
> On Wed, 13 Mar 2024, Marek Szyprowski wrote:
>
>>> I did try to trigger this on FVP by adding OPPs + some hacks to add
>>> dummy
>>> clock provider to successfully probe this driver. I couldn't hit the
>>> issue
>>> reported ????. It could be that with the hardware clock/regulator
>>> drivers, it
>>> take a different path in OPP core.
>>
>> I can fully reproduce this issue on Khadas VIM3 and Odroid-N2 boards.
>> Both Meson A311D SoC based.
>
> Hmm... Would it trigger on Orangepi5plus? With some effort I can get
> that board up at home.
>
> Could you reboot with some memory diagnostics so that we are sure that
> nothing gets corrupted?
>
> F.e. specify a command line parameter "slub_debug" to enable
> redzoning. That way we may see potential memory corruption.
I've added CONFIG_SLUB_DEBUG_ON=y to my .config, but I got no reports
for any potential memory corruption.
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
On Wed, Mar 13, 2024 at 05:13:33PM +0000, Russell King wrote:
> So, I wonder whether what you're seeing is a latent bug which is
> being tickled by the presence of the CPU masks being off-stack
> changing the kernel timing.
>
> I would suggest the printk debug approach may help here to see when
> the OPPs are begun to be parsed, when they're created etc and their
> timing relationship to being used. Given the suspicion, it's possible
> that the mere addition of printk() may "fix" the problem, which again
> would be another semi-useful data point.
It might be an init order problem. Passing "initcall_debug" on the
cmdline might help a bit.
It would also be useful in dev_pm_opp_set_config(), in the WARN_ON
block, to print opp_table->opp_list.next to get an idea whether it looks
like a valid pointer or memory corruption.
--
Catalin
Dear All,
On 14.03.2024 09:39, Catalin Marinas wrote:
> On Wed, Mar 13, 2024 at 05:13:33PM +0000, Russell King wrote:
>> So, I wonder whether what you're seeing is a latent bug which is
>> being tickled by the presence of the CPU masks being off-stack
>> changing the kernel timing.
>>
>> I would suggest the printk debug approach may help here to see when
>> the OPPs are begun to be parsed, when they're created etc and their
>> timing relationship to being used. Given the suspicion, it's possible
>> that the mere addition of printk() may "fix" the problem, which again
>> would be another semi-useful data point.
> It might be an init order problem. Passing "initcall_debug" on the
> cmdline might help a bit.
>
> It would also be useful in dev_pm_opp_set_config(), in the WARN_ON
> block, to print opp_table->opp_list.next to get an idea whether it looks
> like a valid pointer or memory corruption.
I've finally found some time to do the step-by-step printk-based
debugging of this issue and finally found what's broken!
Here is the fix:
diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
index 8bd6e5e8f121..2d83bbc65dd0 100644
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -208,7 +208,7 @@ static int dt_cpufreq_early_init(struct device *dev,
int cpu)
if (!priv)
return -ENOMEM;
- if (!alloc_cpumask_var(&priv->cpus, GFP_KERNEL))
+ if (!zalloc_cpumask_var(&priv->cpus, GFP_KERNEL))
return -ENOMEM;
cpumask_set_cpu(cpu, priv->cpus);
It is really surprising that this didn't blow up for anyone else so
far... This means that the $subject patch is fine.
I will send a proper patch fixing this issue in a few minutes.
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
On Thu, Mar 14, 2024 at 01:28:40PM +0100, Marek Szyprowski wrote:
> Dear All,
>
> On 14.03.2024 09:39, Catalin Marinas wrote:
> > On Wed, Mar 13, 2024 at 05:13:33PM +0000, Russell King wrote:
> >> So, I wonder whether what you're seeing is a latent bug which is
> >> being tickled by the presence of the CPU masks being off-stack
> >> changing the kernel timing.
> >>
> >> I would suggest the printk debug approach may help here to see when
> >> the OPPs are begun to be parsed, when they're created etc and their
> >> timing relationship to being used. Given the suspicion, it's possible
> >> that the mere addition of printk() may "fix" the problem, which again
> >> would be another semi-useful data point.
> > It might be an init order problem. Passing "initcall_debug" on the
> > cmdline might help a bit.
> >
> > It would also be useful in dev_pm_opp_set_config(), in the WARN_ON
> > block, to print opp_table->opp_list.next to get an idea whether it looks
> > like a valid pointer or memory corruption.
>
> I've finally found some time to do the step-by-step printk-based
> debugging of this issue and finally found what's broken!
>
> Here is the fix:
>
> diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
> index 8bd6e5e8f121..2d83bbc65dd0 100644
> --- a/drivers/cpufreq/cpufreq-dt.c
> +++ b/drivers/cpufreq/cpufreq-dt.c
> @@ -208,7 +208,7 @@ static int dt_cpufreq_early_init(struct device *dev,
> int cpu)
> ??????? if (!priv)
> ??????????????? return -ENOMEM;
>
> -?????? if (!alloc_cpumask_var(&priv->cpus, GFP_KERNEL))
> +?????? if (!zalloc_cpumask_var(&priv->cpus, GFP_KERNEL))
> ??????????????? return -ENOMEM;
>
> ??????? cpumask_set_cpu(cpu, priv->cpus);
>
>
> It is really surprising that this didn't blow up for anyone else so
> far... This means that the $subject patch is fine.
Wow. I guess we've been lucky with that allocation hitting memory
containing zeros. Well done at tracking it down!
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On Thu, Mar 14, 2024 at 01:28:40PM +0100, Marek Szyprowski wrote:
> On 14.03.2024 09:39, Catalin Marinas wrote:
> > On Wed, Mar 13, 2024 at 05:13:33PM +0000, Russell King wrote:
> >> So, I wonder whether what you're seeing is a latent bug which is
> >> being tickled by the presence of the CPU masks being off-stack
> >> changing the kernel timing.
> >>
> >> I would suggest the printk debug approach may help here to see when
> >> the OPPs are begun to be parsed, when they're created etc and their
> >> timing relationship to being used. Given the suspicion, it's possible
> >> that the mere addition of printk() may "fix" the problem, which again
> >> would be another semi-useful data point.
> > It might be an init order problem. Passing "initcall_debug" on the
> > cmdline might help a bit.
> >
> > It would also be useful in dev_pm_opp_set_config(), in the WARN_ON
> > block, to print opp_table->opp_list.next to get an idea whether it looks
> > like a valid pointer or memory corruption.
>
> I've finally found some time to do the step-by-step printk-based
> debugging of this issue and finally found what's broken!
>
> Here is the fix:
>
> diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
> index 8bd6e5e8f121..2d83bbc65dd0 100644
> --- a/drivers/cpufreq/cpufreq-dt.c
> +++ b/drivers/cpufreq/cpufreq-dt.c
> @@ -208,7 +208,7 @@ static int dt_cpufreq_early_init(struct device *dev,
> int cpu)
> ??????? if (!priv)
> ??????????????? return -ENOMEM;
>
> -?????? if (!alloc_cpumask_var(&priv->cpus, GFP_KERNEL))
> +?????? if (!zalloc_cpumask_var(&priv->cpus, GFP_KERNEL))
> ??????????????? return -ENOMEM;
>
> ??????? cpumask_set_cpu(cpu, priv->cpus);
>
>
> It is really surprising that this didn't blow up for anyone else so
> far... This means that the $subject patch is fine.
>
> I will send a proper patch fixing this issue in a few minutes.
Nice. Many thanks for tracking this down. I'll revert the revert of the
CPUMASK_OFFSTACK in the second part of the merging window (I already
sent the pull request).
--
Catalin
On Thu, 14 Mar 2024, Russell King (Oracle) wrote:
>> It is really surprising that this didn't blow up for anyone else so
>> far... This means that the $subject patch is fine.
>
> Wow. I guess we've been lucky with that allocation hitting memory
> containing zeros. Well done at tracking it down!
It would have blown up with slub_debug because that includes poisoning the
contents of all allocations via the slab allocator. Why did that not
occur? We should have seen a backtrace with data in registers etc showing
poisoning values for an unitialized object.
Note that this was indeed triggered by OFFSTACK because
(z)alloc_cpumask_var() only generates a kmalloc allocation if that option
is set.
The config option was never set before my patch was applied on ARM64.
On Wed, 13 Mar 2024, Marek Szyprowski wrote:
>> I did try to trigger this on FVP by adding OPPs + some hacks to add dummy
>> clock provider to successfully probe this driver. I couldn't hit the issue
>> reported ????. It could be that with the hardware clock/regulator drivers, it
>> take a different path in OPP core.
>
> I can fully reproduce this issue on Khadas VIM3 and Odroid-N2 boards.
> Both Meson A311D SoC based.
Hmm... Would it trigger on Orangepi5plus? With some effort I can get that
board up at home.
Could you reboot with some memory diagnostics so that we are sure that
nothing gets corrupted?
F.e. specify a command line parameter "slub_debug" to enable redzoning.
That way we may see potential memory corruption.
On Mon, 11 Mar 2024, Christoph Lameter (Ampere) wrote:
> This could be an issue in the ARM64 arch code itself where there maybe an
> assumption elsewhere that a cpumask can always store up to NR_CPU cpus and
> not only nr_cpu_ids as OFFSTACK does.
>
> How can I exercise the opp driver in order to recreate the problem?
>
> I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if there
> is an issue with OFFSTACK in opp then it should fail with kernel default
> configuration on that platform.
I checked the ARM64 arch sources use of NR_CPUS and its all fine.
Also verified in my testing logs that CONFIG_PM_OPP was set in all tests.
No warnings in the kernel log during those tests.
How to reproduce this?
On Mon, 11 Mar 2024, Catalin Marinas wrote:
>> This patch landed in today's linux-next as commit 0499a78369ad ("ARM64:
>> Dynamically allocate cpumasks and increase supported CPUs to 512").
>> Unfortunately it triggers the following warning during boot on most of
>> my ARM64-based test boards. Here is an example from Odroid-N2 board:
>
> I spent a big part of this afternoon going through the code paths but
> there's nothing obvious that triggered this problem. My suspicion is
> some memory corruption, algorithmically I can't see anything that could
> go wrong with CPUMASK_OFFSTACK. Unfortunately I could not reproduce it
> yet to be able to add some debug info.
>
> So I decided to revert this patch. If we get to the bottom of it during
> the merging window, I can still revive it. Otherwise we'll add it to
> linux-next post -rc1.
I also looked through the opp source and I cannot find even anything that
even uses the functionality changed by the OFFSTACK option.
This could be an issue in the ARM64 arch code itself where there maybe
an assumption elsewhere that a cpumask can always store up to NR_CPU cpus
and not only nr_cpu_ids as OFFSTACK does.
How can I exercise the opp driver in order to recreate the problem?
I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if
there is an issue with OFFSTACK in opp then it should fail with kernel
default configuration on that platform.
On Thu, Mar 07, 2024 at 07:07:07PM +0000, Catalin Marinas wrote:
> On Wed, 06 Mar 2024 17:45:04 -0800, Christoph Lameter (Ampere) wrote:
> > Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere
> > Computing) are planning to ship systems with 512 CPUs. So that all CPUs on
> > these systems can be used with defconfig, we'd like to bump NR_CPUS to 512.
> > Therefore this patch increases the default NR_CPUS from 256 to 512.
> >
> > As increasing NR_CPUS will increase the size of cpumasks, there's a fear that
> > this might have a significant impact on stack usage due to code which places
> > cpumasks on the stack. To mitigate that concern, we can select
> > CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with
> > NR_CPUS=256, we only select this when NR_CPUS > 256.
> >
> > [...]
>
> Applied to arm64 (for-next/misc), thanks!
>
> I dropped the config entry and comment, replaced it with a select as per
> Mark's suggestion.
>
> [1/1] ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512
> https://git.kernel.org/arm64/c/0499a78369ad
I re-instated this patch in arm64 for-next/core as:
https://git.kernel.org/arm64/c/3fbd56f0e7c1
--
Catalin