2021-10-25 18:33:24

by Geert Uytterhoeven

[permalink] [raw]
Subject: Out-of-bounds access when hartid >= NR_CPUS

Hi all,

When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
the 4th CPU either fails to come online, or the system crashes.

This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
- unused core has hartid 0 (sifive,e51),
- processor 0 has hartid 1 (sifive,u74-mc),
- processor 1 has hartid 2 (sifive,u74-mc),
- processor 2 has hartid 3 (sifive,u74-mc),
- processor 3 has hartid 4 (sifive,u74-mc).

I assume the same issue is present on the SiFive fu540 and fu740
SoCs, but I don't have access to these. The issue is not present
on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
hartid 0.

arch/riscv/kernel/cpu_ops.c has:

void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
void *__cpu_up_task_pointer[NR_CPUS] __section(".data");

void cpu_update_secondary_bootdata(unsigned int cpuid,
struct task_struct *tidle)
{
int hartid = cpuid_to_hartid_map(cpuid);

/* Make sure tidle is updated */
smp_mb();
WRITE_ONCE(__cpu_up_stack_pointer[hartid],
task_stack_page(tidle) + THREAD_SIZE);
WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);

The above two writes cause out-of-bound accesses beyond
__cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.

}

arch/riscv/kernel/smpboot.c:setup_smp(void) detects CPUs like this:

for_each_of_cpu_node(dn) {
hart = riscv_of_processor_hartid(dn);
if (hart < 0)
continue;

if (hart == cpuid_to_hartid_map(0)) {
BUG_ON(found_boot_cpu);
found_boot_cpu = 1;
early_map_cpu_to_node(0, of_node_to_nid(dn));
continue;
}
if (cpuid >= NR_CPUS) {
pr_warn("Invalid cpuid [%d] for hartid [%d]\n",
cpuid, hart);
break;
}

cpuid_to_hartid_map(cpuid) = hart;
early_map_cpu_to_node(cpuid, of_node_to_nid(dn));
cpuid++;
}

So cpuid >= CONFIG_NR_CPUS (too many CPU cores) is already rejected.

How to fix this?

We could skip hartids >= NR_CPUS, but that feels strange to me, as
you need NR_CPUS to be larger (much larger if the first usable hartid
is a large number) than the number of CPUs used.

We could store the minimum hartid, and always subtract that when
accessing __cpu_up_{stack,pointer}_pointer[] (also in
arch/riscv/kernel/head.S), but that means unused cores cannot be in the
middle of the hartid range.

Are hartids guaranteed to be continuous? If not, we have no choice but
to index __cpu_up_{stack,pointer}_pointer[] by cpuid instead, which
needs a more expensive conversion in arch/riscv/kernel/head.S.

Thanks for your comments!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


2021-10-26 07:20:24

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: Out-of-bounds access when hartid >= NR_CPUS

On Tue, Oct 26, 2021 at 2:37 AM Ron Economos <[email protected]> wrote:
> On 10/25/21 8:54 AM, Geert Uytterhoeven wrote:
> > When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
> > the 4th CPU either fails to come online, or the system crashes.
> >
> > This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
> > and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
> > - unused core has hartid 0 (sifive,e51),
> > - processor 0 has hartid 1 (sifive,u74-mc),
> > - processor 1 has hartid 2 (sifive,u74-mc),
> > - processor 2 has hartid 3 (sifive,u74-mc),
> > - processor 3 has hartid 4 (sifive,u74-mc).
> >
> > I assume the same issue is present on the SiFive fu540 and fu740
> > SoCs, but I don't have access to these. The issue is not present
> > on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
> > hartid 0.
> >
> > arch/riscv/kernel/cpu_ops.c has:
> >
> > void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
> > void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
> >
> > void cpu_update_secondary_bootdata(unsigned int cpuid,
> > struct task_struct *tidle)
> > {
> > int hartid = cpuid_to_hartid_map(cpuid);
> >
> > /* Make sure tidle is updated */
> > smp_mb();
> > WRITE_ONCE(__cpu_up_stack_pointer[hartid],
> > task_stack_page(tidle) + THREAD_SIZE);
> > WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
> >
> > The above two writes cause out-of-bound accesses beyond
> > __cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.
> >
> > }
> >
> > arch/riscv/kernel/smpboot.c:setup_smp(void) detects CPUs like this:
> >
> > for_each_of_cpu_node(dn) {
> > hart = riscv_of_processor_hartid(dn);
> > if (hart < 0)
> > continue;
> >
> > if (hart == cpuid_to_hartid_map(0)) {
> > BUG_ON(found_boot_cpu);
> > found_boot_cpu = 1;
> > early_map_cpu_to_node(0, of_node_to_nid(dn));
> > continue;
> > }
> > if (cpuid >= NR_CPUS) {
> > pr_warn("Invalid cpuid [%d] for hartid [%d]\n",
> > cpuid, hart);
> > break;
> > }
> >
> > cpuid_to_hartid_map(cpuid) = hart;
> > early_map_cpu_to_node(cpuid, of_node_to_nid(dn));
> > cpuid++;
> > }
> >
> > So cpuid >= CONFIG_NR_CPUS (too many CPU cores) is already rejected.
> >
> > How to fix this?
> >
> > We could skip hartids >= NR_CPUS, but that feels strange to me, as
> > you need NR_CPUS to be larger (much larger if the first usable hartid
> > is a large number) than the number of CPUs used.
> The Ubuntu distro config for HiFive Unmatched set this to CONFIG_NR_CPUS=8.

I know. Same for most defconfigs in Linux. But we do not tend to
work around buffer overflows by changing config values. Besides,
those configs will still experience the issue when run on e.g. an
8+1 core processor where the cores used by Linux have hartids 1-8.

I noticed because I started with a starlight config with
CONFIG_NR_CPUS=2 (which gave me only one core), changed that to
CONFIG_NR_CPUS=4, and got a kernel that didn't boot at all (no output
without earlycon).I know. Same for most defconfigs in Linux. But we
do not tend to
work around buffer overflows by changing config values. Besides,
those configs will still experience the issue when run on e.g. an
8+1 core processor where the cores used by Linux have hartids 1-8.

> > We could store the minimum hartid, and always subtract that when
> > accessing __cpu_up_{stack,pointer}_pointer[] (also in
> > arch/riscv/kernel/head.S), but that means unused cores cannot be in the
> > middle of the hartid range.
> >
> > Are hartids guaranteed to be continuous? If not, we have no choice but
> > to index __cpu_up_{stack,pointer}_pointer[] by cpuid instead, which
> > needs a more expensive conversion in arch/riscv/kernel/head.S.

https://riscv.org/wp-content/uploads/2017/05/riscv-privileged-v1.10.pdf
says:

Hart IDs might not necessarily be numbered contiguously in a
multiprocessor system, but at least one hart must have a hart
ID of zero.

Which means indexing arrays by hart ID is a no-go?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-10-26 07:57:31

by Ron Economos

[permalink] [raw]
Subject: Re: Out-of-bounds access when hartid >= NR_CPUS

On 10/25/21 8:54 AM, Geert Uytterhoeven wrote:

> Hi all,
>
> When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
> the 4th CPU either fails to come online, or the system crashes.
>
> This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
> and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
> - unused core has hartid 0 (sifive,e51),
> - processor 0 has hartid 1 (sifive,u74-mc),
> - processor 1 has hartid 2 (sifive,u74-mc),
> - processor 2 has hartid 3 (sifive,u74-mc),
> - processor 3 has hartid 4 (sifive,u74-mc).
>
> I assume the same issue is present on the SiFive fu540 and fu740
> SoCs, but I don't have access to these. The issue is not present
> on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
> hartid 0.
>
> arch/riscv/kernel/cpu_ops.c has:
>
> void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
> void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
>
> void cpu_update_secondary_bootdata(unsigned int cpuid,
> struct task_struct *tidle)
> {
> int hartid = cpuid_to_hartid_map(cpuid);
>
> /* Make sure tidle is updated */
> smp_mb();
> WRITE_ONCE(__cpu_up_stack_pointer[hartid],
> task_stack_page(tidle) + THREAD_SIZE);
> WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
>
> The above two writes cause out-of-bound accesses beyond
> __cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.
>
> }
>
> arch/riscv/kernel/smpboot.c:setup_smp(void) detects CPUs like this:
>
> for_each_of_cpu_node(dn) {
> hart = riscv_of_processor_hartid(dn);
> if (hart < 0)
> continue;
>
> if (hart == cpuid_to_hartid_map(0)) {
> BUG_ON(found_boot_cpu);
> found_boot_cpu = 1;
> early_map_cpu_to_node(0, of_node_to_nid(dn));
> continue;
> }
> if (cpuid >= NR_CPUS) {
> pr_warn("Invalid cpuid [%d] for hartid [%d]\n",
> cpuid, hart);
> break;
> }
>
> cpuid_to_hartid_map(cpuid) = hart;
> early_map_cpu_to_node(cpuid, of_node_to_nid(dn));
> cpuid++;
> }
>
> So cpuid >= CONFIG_NR_CPUS (too many CPU cores) is already rejected.
>
> How to fix this?
>
> We could skip hartids >= NR_CPUS, but that feels strange to me, as
> you need NR_CPUS to be larger (much larger if the first usable hartid
> is a large number) than the number of CPUs used.
The Ubuntu distro config for HiFive Unmatched set this to CONFIG_NR_CPUS=8.
>
> We could store the minimum hartid, and always subtract that when
> accessing __cpu_up_{stack,pointer}_pointer[] (also in
> arch/riscv/kernel/head.S), but that means unused cores cannot be in the
> middle of the hartid range.
>
> Are hartids guaranteed to be continuous? If not, we have no choice but
> to index __cpu_up_{stack,pointer}_pointer[] by cpuid instead, which
> needs a more expensive conversion in arch/riscv/kernel/head.S.
>
> Thanks for your comments!
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2021-10-26 10:47:12

by Heiko Stuebner

[permalink] [raw]
Subject: Re: Out-of-bounds access when hartid >= NR_CPUS

Am Dienstag, 26. Oktober 2021, 08:44:31 CEST schrieb Geert Uytterhoeven:
> On Tue, Oct 26, 2021 at 2:37 AM Ron Economos <[email protected]> wrote:
> > On 10/25/21 8:54 AM, Geert Uytterhoeven wrote:
> > > When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
> > > the 4th CPU either fails to come online, or the system crashes.
> > >
> > > This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
> > > and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
> > > - unused core has hartid 0 (sifive,e51),
> > > - processor 0 has hartid 1 (sifive,u74-mc),
> > > - processor 1 has hartid 2 (sifive,u74-mc),
> > > - processor 2 has hartid 3 (sifive,u74-mc),
> > > - processor 3 has hartid 4 (sifive,u74-mc).
> > >
> > > I assume the same issue is present on the SiFive fu540 and fu740
> > > SoCs, but I don't have access to these. The issue is not present
> > > on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
> > > hartid 0.
> > >
> > > arch/riscv/kernel/cpu_ops.c has:
> > >
> > > void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
> > > void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
> > >
> > > void cpu_update_secondary_bootdata(unsigned int cpuid,
> > > struct task_struct *tidle)
> > > {
> > > int hartid = cpuid_to_hartid_map(cpuid);
> > >
> > > /* Make sure tidle is updated */
> > > smp_mb();
> > > WRITE_ONCE(__cpu_up_stack_pointer[hartid],
> > > task_stack_page(tidle) + THREAD_SIZE);
> > > WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
> > >
> > > The above two writes cause out-of-bound accesses beyond
> > > __cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.
> > >
> > > }
> > >
> > > arch/riscv/kernel/smpboot.c:setup_smp(void) detects CPUs like this:
> > >
> > > for_each_of_cpu_node(dn) {
> > > hart = riscv_of_processor_hartid(dn);
> > > if (hart < 0)
> > > continue;
> > >
> > > if (hart == cpuid_to_hartid_map(0)) {
> > > BUG_ON(found_boot_cpu);
> > > found_boot_cpu = 1;
> > > early_map_cpu_to_node(0, of_node_to_nid(dn));
> > > continue;
> > > }
> > > if (cpuid >= NR_CPUS) {
> > > pr_warn("Invalid cpuid [%d] for hartid [%d]\n",
> > > cpuid, hart);
> > > break;
> > > }
> > >
> > > cpuid_to_hartid_map(cpuid) = hart;
> > > early_map_cpu_to_node(cpuid, of_node_to_nid(dn));
> > > cpuid++;
> > > }
> > >
> > > So cpuid >= CONFIG_NR_CPUS (too many CPU cores) is already rejected.
> > >
> > > How to fix this?
> > >
> > > We could skip hartids >= NR_CPUS, but that feels strange to me, as
> > > you need NR_CPUS to be larger (much larger if the first usable hartid
> > > is a large number) than the number of CPUs used.
> > The Ubuntu distro config for HiFive Unmatched set this to CONFIG_NR_CPUS=8.
>
> I know. Same for most defconfigs in Linux. But we do not tend to
> work around buffer overflows by changing config values. Besides,
> those configs will still experience the issue when run on e.g. an
> 8+1 core processor where the cores used by Linux have hartids 1-8.
>
> I noticed because I started with a starlight config with
> CONFIG_NR_CPUS=2 (which gave me only one core), changed that to
> CONFIG_NR_CPUS=4, and got a kernel that didn't boot at all (no output
> without earlycon).I know. Same for most defconfigs in Linux. But we
> do not tend to
> work around buffer overflows by changing config values. Besides,
> those configs will still experience the issue when run on e.g. an
> 8+1 core processor where the cores used by Linux have hartids 1-8.
>
> > > We could store the minimum hartid, and always subtract that when
> > > accessing __cpu_up_{stack,pointer}_pointer[] (also in
> > > arch/riscv/kernel/head.S), but that means unused cores cannot be in the
> > > middle of the hartid range.
> > >
> > > Are hartids guaranteed to be continuous? If not, we have no choice but
> > > to index __cpu_up_{stack,pointer}_pointer[] by cpuid instead, which
> > > needs a more expensive conversion in arch/riscv/kernel/head.S.
>
> https://riscv.org/wp-content/uploads/2017/05/riscv-privileged-v1.10.pdf
> says:
>
> Hart IDs might not necessarily be numbered contiguously in a
> multiprocessor system, but at least one hart must have a hart
> ID of zero.
>
> Which means indexing arrays by hart ID is a no-go?

Isn't that also similar on aarch64?

On a rk3399 you get 0-3 and 100-101 and with the paragraph above
something like this could very well exist on some riscv cpu too I guess.



>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv
>




2021-10-26 10:49:14

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: Out-of-bounds access when hartid >= NR_CPUS

Hi Heiko,

On Tue, Oct 26, 2021 at 10:53 AM Heiko Stübner <[email protected]> wrote:
> Am Dienstag, 26. Oktober 2021, 08:44:31 CEST schrieb Geert Uytterhoeven:
> > On Tue, Oct 26, 2021 at 2:37 AM Ron Economos <[email protected]> wrote:
> > > On 10/25/21 8:54 AM, Geert Uytterhoeven wrote:
> > > > When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
> > > > the 4th CPU either fails to come online, or the system crashes.
> > > >
> > > > This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
> > > > and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
> > > > - unused core has hartid 0 (sifive,e51),
> > > > - processor 0 has hartid 1 (sifive,u74-mc),
> > > > - processor 1 has hartid 2 (sifive,u74-mc),
> > > > - processor 2 has hartid 3 (sifive,u74-mc),
> > > > - processor 3 has hartid 4 (sifive,u74-mc).
> > > >
> > > > I assume the same issue is present on the SiFive fu540 and fu740
> > > > SoCs, but I don't have access to these. The issue is not present
> > > > on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
> > > > hartid 0.
> > > >
> > > > arch/riscv/kernel/cpu_ops.c has:
> > > >
> > > > void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
> > > > void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
> > > >
> > > > void cpu_update_secondary_bootdata(unsigned int cpuid,
> > > > struct task_struct *tidle)
> > > > {
> > > > int hartid = cpuid_to_hartid_map(cpuid);
> > > >
> > > > /* Make sure tidle is updated */
> > > > smp_mb();
> > > > WRITE_ONCE(__cpu_up_stack_pointer[hartid],
> > > > task_stack_page(tidle) + THREAD_SIZE);
> > > > WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
> > > >
> > > > The above two writes cause out-of-bound accesses beyond
> > > > __cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.
> > > >
> > > > }

> > https://riscv.org/wp-content/uploads/2017/05/riscv-privileged-v1.10.pdf
> > says:
> >
> > Hart IDs might not necessarily be numbered contiguously in a
> > multiprocessor system, but at least one hart must have a hart
> > ID of zero.
> >
> > Which means indexing arrays by hart ID is a no-go?
>
> Isn't that also similar on aarch64?
>
> On a rk3399 you get 0-3 and 100-101 and with the paragraph above
> something like this could very well exist on some riscv cpu too I guess.

Yes, it looks like hart IDs are similar to MPIDRs on ARM.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-10-26 10:49:13

by Atish Patra

[permalink] [raw]
Subject: Re: Out-of-bounds access when hartid >= NR_CPUS

On Mon, Oct 25, 2021 at 8:54 AM Geert Uytterhoeven <[email protected]> wrote:
>
> Hi all,
>
> When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
> the 4th CPU either fails to come online, or the system crashes.
>
> This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
> and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
> - unused core has hartid 0 (sifive,e51),
> - processor 0 has hartid 1 (sifive,u74-mc),
> - processor 1 has hartid 2 (sifive,u74-mc),
> - processor 2 has hartid 3 (sifive,u74-mc),
> - processor 3 has hartid 4 (sifive,u74-mc).
>
> I assume the same issue is present on the SiFive fu540 and fu740
> SoCs, but I don't have access to these. The issue is not present
> on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
> hartid 0.
>
> arch/riscv/kernel/cpu_ops.c has:
>
> void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
> void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
>
> void cpu_update_secondary_bootdata(unsigned int cpuid,
> struct task_struct *tidle)
> {
> int hartid = cpuid_to_hartid_map(cpuid);
>
> /* Make sure tidle is updated */
> smp_mb();
> WRITE_ONCE(__cpu_up_stack_pointer[hartid],
> task_stack_page(tidle) + THREAD_SIZE);
> WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
>
> The above two writes cause out-of-bound accesses beyond
> __cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.
>
> }
>

Thanks for reporting this. We need to fix this and definitely shouldn't hide it
using configs. I guess I never tested with lower values (2 or 4) for
CONFIG_NR_CPUS which explains how this bug was not noticed until now.


> arch/riscv/kernel/smpboot.c:setup_smp(void) detects CPUs like this:
>
> for_each_of_cpu_node(dn) {
> hart = riscv_of_processor_hartid(dn);
> if (hart < 0)
> continue;
>
> if (hart == cpuid_to_hartid_map(0)) {
> BUG_ON(found_boot_cpu);
> found_boot_cpu = 1;
> early_map_cpu_to_node(0, of_node_to_nid(dn));
> continue;
> }
> if (cpuid >= NR_CPUS) {
> pr_warn("Invalid cpuid [%d] for hartid [%d]\n",
> cpuid, hart);
> break;
> }
>
> cpuid_to_hartid_map(cpuid) = hart;
> early_map_cpu_to_node(cpuid, of_node_to_nid(dn));
> cpuid++;
> }
>
> So cpuid >= CONFIG_NR_CPUS (too many CPU cores) is already rejected.
>
> How to fix this?
>
> We could skip hartids >= NR_CPUS, but that feels strange to me, as
> you need NR_CPUS to be larger (much larger if the first usable hartid
> is a large number) than the number of CPUs used.
>
> We could store the minimum hartid, and always subtract that when
> accessing __cpu_up_{stack,pointer}_pointer[] (also in
> arch/riscv/kernel/head.S), but that means unused cores cannot be in the
> middle of the hartid range.

Yeah. Both of the above proposed solutions are not ideal.

>
> Are hartids guaranteed to be continuous? If not, we have no choice but
> to index __cpu_up_{stack,pointer}_pointer[] by cpuid instead, which
> needs a more expensive conversion in arch/riscv/kernel/head.S.
>

This will work for ordered booting with SBI HSM extension. However, it may
fail for spinwait booting because cpuid_to_hartid_map might not have setup
depending on when secondary harts are jumping to linux.

Ideally, the size of the __cpu_up_{stack,task}_pointer[] should be the maximum
hartid possible. How about adding a config for that ?

We also need sanity checks cpu_update_secondary_bootdata to make sure
that the hartid is within the bounds to avoid issues due to the
suboptimal config value.

> Thanks for your comments!
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv



--
Regards,
Atish

2021-10-26 10:59:10

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: Out-of-bounds access when hartid >= NR_CPUS

Hi Atish,

On Tue, Oct 26, 2021 at 10:55 AM Atish Patra <[email protected]> wrote:
> On Mon, Oct 25, 2021 at 8:54 AM Geert Uytterhoeven <[email protected]> wrote:
> > When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
> > the 4th CPU either fails to come online, or the system crashes.
> >
> > This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
> > and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
> > - unused core has hartid 0 (sifive,e51),
> > - processor 0 has hartid 1 (sifive,u74-mc),
> > - processor 1 has hartid 2 (sifive,u74-mc),
> > - processor 2 has hartid 3 (sifive,u74-mc),
> > - processor 3 has hartid 4 (sifive,u74-mc).
> >
> > I assume the same issue is present on the SiFive fu540 and fu740
> > SoCs, but I don't have access to these. The issue is not present
> > on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
> > hartid 0.
> >
> > arch/riscv/kernel/cpu_ops.c has:
> >
> > void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
> > void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
> >
> > void cpu_update_secondary_bootdata(unsigned int cpuid,
> > struct task_struct *tidle)
> > {
> > int hartid = cpuid_to_hartid_map(cpuid);
> >
> > /* Make sure tidle is updated */
> > smp_mb();
> > WRITE_ONCE(__cpu_up_stack_pointer[hartid],
> > task_stack_page(tidle) + THREAD_SIZE);
> > WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
> >
> > The above two writes cause out-of-bound accesses beyond
> > __cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.
> >
> > }
> >
>
> Thanks for reporting this. We need to fix this and definitely shouldn't hide it
> using configs. I guess I never tested with lower values (2 or 4) for
> CONFIG_NR_CPUS which explains how this bug was not noticed until now.

> > How to fix this?
> >
> > We could skip hartids >= NR_CPUS, but that feels strange to me, as
> > you need NR_CPUS to be larger (much larger if the first usable hartid
> > is a large number) than the number of CPUs used.
> >
> > We could store the minimum hartid, and always subtract that when
> > accessing __cpu_up_{stack,pointer}_pointer[] (also in
> > arch/riscv/kernel/head.S), but that means unused cores cannot be in the
> > middle of the hartid range.
>
> Yeah. Both of the above proposed solutions are not ideal.
>
> >
> > Are hartids guaranteed to be continuous? If not, we have no choice but
> > to index __cpu_up_{stack,pointer}_pointer[] by cpuid instead, which
> > needs a more expensive conversion in arch/riscv/kernel/head.S.
>
> This will work for ordered booting with SBI HSM extension. However, it may
> fail for spinwait booting because cpuid_to_hartid_map might not have setup
> depending on when secondary harts are jumping to linux.
>
> Ideally, the size of the __cpu_up_{stack,task}_pointer[] should be the maximum
> hartid possible. How about adding a config for that ?

(reading more RISC-V specs)
Hart IDs can use up to XLEN (32, 64, or 128) bits. So creative sparse
multi-level encodings like used in MPIDR on ARM[1] makes using a
simple array infeasible.

[1] arch/arm{,64}/include/asm/cputype.h

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-10-26 12:42:36

by Heiko Stuebner

[permalink] [raw]
Subject: Re: Out-of-bounds access when hartid >= NR_CPUS

Am Dienstag, 26. Oktober 2021, 10:57:26 CEST schrieb Geert Uytterhoeven:
> Hi Heiko,
>
> On Tue, Oct 26, 2021 at 10:53 AM Heiko St?bner <[email protected]> wrote:
> > Am Dienstag, 26. Oktober 2021, 08:44:31 CEST schrieb Geert Uytterhoeven:
> > > On Tue, Oct 26, 2021 at 2:37 AM Ron Economos <[email protected]> wrote:
> > > > On 10/25/21 8:54 AM, Geert Uytterhoeven wrote:
> > > > > When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
> > > > > the 4th CPU either fails to come online, or the system crashes.
> > > > >
> > > > > This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
> > > > > and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
> > > > > - unused core has hartid 0 (sifive,e51),
> > > > > - processor 0 has hartid 1 (sifive,u74-mc),
> > > > > - processor 1 has hartid 2 (sifive,u74-mc),
> > > > > - processor 2 has hartid 3 (sifive,u74-mc),
> > > > > - processor 3 has hartid 4 (sifive,u74-mc).
> > > > >
> > > > > I assume the same issue is present on the SiFive fu540 and fu740
> > > > > SoCs, but I don't have access to these. The issue is not present
> > > > > on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
> > > > > hartid 0.
> > > > >
> > > > > arch/riscv/kernel/cpu_ops.c has:
> > > > >
> > > > > void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
> > > > > void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
> > > > >
> > > > > void cpu_update_secondary_bootdata(unsigned int cpuid,
> > > > > struct task_struct *tidle)
> > > > > {
> > > > > int hartid = cpuid_to_hartid_map(cpuid);
> > > > >
> > > > > /* Make sure tidle is updated */
> > > > > smp_mb();
> > > > > WRITE_ONCE(__cpu_up_stack_pointer[hartid],
> > > > > task_stack_page(tidle) + THREAD_SIZE);
> > > > > WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
> > > > >
> > > > > The above two writes cause out-of-bound accesses beyond
> > > > > __cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.
> > > > >
> > > > > }
>
> > > https://riscv.org/wp-content/uploads/2017/05/riscv-privileged-v1.10.pdf
> > > says:
> > >
> > > Hart IDs might not necessarily be numbered contiguously in a
> > > multiprocessor system, but at least one hart must have a hart
> > > ID of zero.
> > >
> > > Which means indexing arrays by hart ID is a no-go?
> >
> > Isn't that also similar on aarch64?
> >
> > On a rk3399 you get 0-3 and 100-101 and with the paragraph above
> > something like this could very well exist on some riscv cpu too I guess.
>
> Yes, it looks like hart IDs are similar to MPIDRs on ARM.

and they have the set_cpu_logical_map construct to map hwids
to a continuous list of cpu-ids.

So with hartids not being necessarily continuous this looks like
riscv would need a similar mechanism.


2021-10-27 23:36:42

by Atish Patra

[permalink] [raw]
Subject: Re: Out-of-bounds access when hartid >= NR_CPUS

On Tue, Oct 26, 2021 at 2:34 AM Heiko Stübner <[email protected]> wrote:
>
> Am Dienstag, 26. Oktober 2021, 10:57:26 CEST schrieb Geert Uytterhoeven:
> > Hi Heiko,
> >
> > On Tue, Oct 26, 2021 at 10:53 AM Heiko Stübner <[email protected]> wrote:
> > > Am Dienstag, 26. Oktober 2021, 08:44:31 CEST schrieb Geert Uytterhoeven:
> > > > On Tue, Oct 26, 2021 at 2:37 AM Ron Economos <[email protected]> wrote:
> > > > > On 10/25/21 8:54 AM, Geert Uytterhoeven wrote:
> > > > > > When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
> > > > > > the 4th CPU either fails to come online, or the system crashes.
> > > > > >
> > > > > > This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
> > > > > > and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
> > > > > > - unused core has hartid 0 (sifive,e51),
> > > > > > - processor 0 has hartid 1 (sifive,u74-mc),
> > > > > > - processor 1 has hartid 2 (sifive,u74-mc),
> > > > > > - processor 2 has hartid 3 (sifive,u74-mc),
> > > > > > - processor 3 has hartid 4 (sifive,u74-mc).
> > > > > >
> > > > > > I assume the same issue is present on the SiFive fu540 and fu740
> > > > > > SoCs, but I don't have access to these. The issue is not present
> > > > > > on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
> > > > > > hartid 0.
> > > > > >
> > > > > > arch/riscv/kernel/cpu_ops.c has:
> > > > > >
> > > > > > void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
> > > > > > void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
> > > > > >
> > > > > > void cpu_update_secondary_bootdata(unsigned int cpuid,
> > > > > > struct task_struct *tidle)
> > > > > > {
> > > > > > int hartid = cpuid_to_hartid_map(cpuid);
> > > > > >
> > > > > > /* Make sure tidle is updated */
> > > > > > smp_mb();
> > > > > > WRITE_ONCE(__cpu_up_stack_pointer[hartid],
> > > > > > task_stack_page(tidle) + THREAD_SIZE);
> > > > > > WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
> > > > > >
> > > > > > The above two writes cause out-of-bound accesses beyond
> > > > > > __cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.
> > > > > >
> > > > > > }
> >
> > > > https://riscv.org/wp-content/uploads/2017/05/riscv-privileged-v1.10.pdf
> > > > says:
> > > >
> > > > Hart IDs might not necessarily be numbered contiguously in a
> > > > multiprocessor system, but at least one hart must have a hart
> > > > ID of zero.
> > > >
> > > > Which means indexing arrays by hart ID is a no-go?
> > >
> > > Isn't that also similar on aarch64?
> > >
> > > On a rk3399 you get 0-3 and 100-101 and with the paragraph above
> > > something like this could very well exist on some riscv cpu too I guess.
> >
> > Yes, it looks like hart IDs are similar to MPIDRs on ARM.
>
> and they have the set_cpu_logical_map construct to map hwids
> to a continuous list of cpu-ids.
>
> So with hartids not being necessarily continuous this looks like
> riscv would need a similar mechanism.
>

RISC-V already has a similar mechanism cpuid_to_hartid_map. Logical
cpu ids are continuous
while hartid can be sparse.

The issue here is that __cpu_up_stack/task_pointer are per hart but
array size depends on the NR_CPUs
which represents the logical CPU.

That's why, having a maximum number of hartids defined in config will
be helpful.

>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv



--
Regards,
Atish

2021-10-28 01:32:58

by Atish Patra

[permalink] [raw]
Subject: Re: Out-of-bounds access when hartid >= NR_CPUS

On Tue, Oct 26, 2021 at 2:03 AM Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Atish,
>
> On Tue, Oct 26, 2021 at 10:55 AM Atish Patra <[email protected]> wrote:
> > On Mon, Oct 25, 2021 at 8:54 AM Geert Uytterhoeven <[email protected]> wrote:
> > > When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
> > > the 4th CPU either fails to come online, or the system crashes.
> > >
> > > This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
> > > and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
> > > - unused core has hartid 0 (sifive,e51),
> > > - processor 0 has hartid 1 (sifive,u74-mc),
> > > - processor 1 has hartid 2 (sifive,u74-mc),
> > > - processor 2 has hartid 3 (sifive,u74-mc),
> > > - processor 3 has hartid 4 (sifive,u74-mc).
> > >
> > > I assume the same issue is present on the SiFive fu540 and fu740
> > > SoCs, but I don't have access to these. The issue is not present
> > > on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
> > > hartid 0.
> > >
> > > arch/riscv/kernel/cpu_ops.c has:
> > >
> > > void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
> > > void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
> > >
> > > void cpu_update_secondary_bootdata(unsigned int cpuid,
> > > struct task_struct *tidle)
> > > {
> > > int hartid = cpuid_to_hartid_map(cpuid);
> > >
> > > /* Make sure tidle is updated */
> > > smp_mb();
> > > WRITE_ONCE(__cpu_up_stack_pointer[hartid],
> > > task_stack_page(tidle) + THREAD_SIZE);
> > > WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
> > >
> > > The above two writes cause out-of-bound accesses beyond
> > > __cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.
> > >
> > > }
> > >
> >
> > Thanks for reporting this. We need to fix this and definitely shouldn't hide it
> > using configs. I guess I never tested with lower values (2 or 4) for
> > CONFIG_NR_CPUS which explains how this bug was not noticed until now.
>
> > > How to fix this?
> > >
> > > We could skip hartids >= NR_CPUS, but that feels strange to me, as
> > > you need NR_CPUS to be larger (much larger if the first usable hartid
> > > is a large number) than the number of CPUs used.
> > >
> > > We could store the minimum hartid, and always subtract that when
> > > accessing __cpu_up_{stack,pointer}_pointer[] (also in
> > > arch/riscv/kernel/head.S), but that means unused cores cannot be in the
> > > middle of the hartid range.
> >
> > Yeah. Both of the above proposed solutions are not ideal.
> >
> > >
> > > Are hartids guaranteed to be continuous? If not, we have no choice but
> > > to index __cpu_up_{stack,pointer}_pointer[] by cpuid instead, which
> > > needs a more expensive conversion in arch/riscv/kernel/head.S.
> >
> > This will work for ordered booting with SBI HSM extension. However, it may
> > fail for spinwait booting because cpuid_to_hartid_map might not have setup
> > depending on when secondary harts are jumping to linux.
> >
> > Ideally, the size of the __cpu_up_{stack,task}_pointer[] should be the maximum
> > hartid possible. How about adding a config for that ?
>
> (reading more RISC-V specs)
> Hart IDs can use up to XLEN (32, 64, or 128) bits. So creative sparse
> multi-level encodings like used in MPIDR on ARM[1] makes using a
> simple array infeasible.
>

Hmm. Should we worry about similar creative sparse encodings when it appears ?
Maybe we can dodge it all together.

The other approach would be to go with your proposed solution to
convert the hartid it to the cpuid in head.S
However, this can only be fixed for ordered booting. Most of today's
users have probably moved on to ordered booting.
The only user who would be using spinwait would be

1. whoever still uses BBL
2. whoever still uses OpenSBI v0.6 or older

Maybe we can document this bug in the Linux kernel for the spinwait
method and move on.
Hopefully, we can remove the spinwait method in a couple of years.

Is that acceptable ?

> [1] arch/arm{,64}/include/asm/cputype.h
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds



--
Regards,
Atish

2021-10-28 15:11:09

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: Out-of-bounds access when hartid >= NR_CPUS

On Wed, 27 Oct 2021 16:34:15 PDT (-0700), [email protected] wrote:
> On Tue, Oct 26, 2021 at 2:34 AM Heiko Stübner <[email protected]> wrote:
>>
>> Am Dienstag, 26. Oktober 2021, 10:57:26 CEST schrieb Geert Uytterhoeven:
>> > Hi Heiko,
>> >
>> > On Tue, Oct 26, 2021 at 10:53 AM Heiko Stübner <[email protected]> wrote:
>> > > Am Dienstag, 26. Oktober 2021, 08:44:31 CEST schrieb Geert Uytterhoeven:
>> > > > On Tue, Oct 26, 2021 at 2:37 AM Ron Economos <[email protected]> wrote:
>> > > > > On 10/25/21 8:54 AM, Geert Uytterhoeven wrote:
>> > > > > > When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
>> > > > > > the 4th CPU either fails to come online, or the system crashes.
>> > > > > >
>> > > > > > This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
>> > > > > > and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
>> > > > > > - unused core has hartid 0 (sifive,e51),
>> > > > > > - processor 0 has hartid 1 (sifive,u74-mc),
>> > > > > > - processor 1 has hartid 2 (sifive,u74-mc),
>> > > > > > - processor 2 has hartid 3 (sifive,u74-mc),
>> > > > > > - processor 3 has hartid 4 (sifive,u74-mc).
>> > > > > >
>> > > > > > I assume the same issue is present on the SiFive fu540 and fu740
>> > > > > > SoCs, but I don't have access to these. The issue is not present
>> > > > > > on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
>> > > > > > hartid 0.
>> > > > > >
>> > > > > > arch/riscv/kernel/cpu_ops.c has:
>> > > > > >
>> > > > > > void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
>> > > > > > void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
>> > > > > >
>> > > > > > void cpu_update_secondary_bootdata(unsigned int cpuid,
>> > > > > > struct task_struct *tidle)
>> > > > > > {
>> > > > > > int hartid = cpuid_to_hartid_map(cpuid);
>> > > > > >
>> > > > > > /* Make sure tidle is updated */
>> > > > > > smp_mb();
>> > > > > > WRITE_ONCE(__cpu_up_stack_pointer[hartid],
>> > > > > > task_stack_page(tidle) + THREAD_SIZE);
>> > > > > > WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
>> > > > > >
>> > > > > > The above two writes cause out-of-bound accesses beyond
>> > > > > > __cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.
>> > > > > >
>> > > > > > }
>> >
>> > > > https://riscv.org/wp-content/uploads/2017/05/riscv-privileged-v1.10.pdf
>> > > > says:
>> > > >
>> > > > Hart IDs might not necessarily be numbered contiguously in a
>> > > > multiprocessor system, but at least one hart must have a hart
>> > > > ID of zero.
>> > > >
>> > > > Which means indexing arrays by hart ID is a no-go?
>> > >
>> > > Isn't that also similar on aarch64?
>> > >
>> > > On a rk3399 you get 0-3 and 100-101 and with the paragraph above
>> > > something like this could very well exist on some riscv cpu too I guess.
>> >
>> > Yes, it looks like hart IDs are similar to MPIDRs on ARM.
>>
>> and they have the set_cpu_logical_map construct to map hwids
>> to a continuous list of cpu-ids.
>>
>> So with hartids not being necessarily continuous this looks like
>> riscv would need a similar mechanism.
>>
>
> RISC-V already has a similar mechanism cpuid_to_hartid_map. Logical
> cpu ids are continuous
> while hartid can be sparse.
>
> The issue here is that __cpu_up_stack/task_pointer are per hart but
> array size depends on the NR_CPUs
> which represents the logical CPU.
>
> That's why, having a maximum number of hartids defined in config will
> be helpful.

I don't understand why we'd have both: if we can't find a CPU number for
a hart, then all we can do is just leave it offline. Wouldn't it be
simpler to just rely on NR_CPUS? We'll need to fix the crashes on
overflows either way.

2021-10-28 16:15:55

by Heiko Stuebner

[permalink] [raw]
Subject: Re: Out-of-bounds access when hartid >= NR_CPUS

Am Donnerstag, 28. Oktober 2021, 17:09:44 CEST schrieb Palmer Dabbelt:
> On Wed, 27 Oct 2021 16:34:15 PDT (-0700), [email protected] wrote:
> > On Tue, Oct 26, 2021 at 2:34 AM Heiko St?bner <[email protected]> wrote:
> >>
> >> Am Dienstag, 26. Oktober 2021, 10:57:26 CEST schrieb Geert Uytterhoeven:
> >> > Hi Heiko,
> >> >
> >> > On Tue, Oct 26, 2021 at 10:53 AM Heiko St?bner <[email protected]> wrote:
> >> > > Am Dienstag, 26. Oktober 2021, 08:44:31 CEST schrieb Geert Uytterhoeven:
> >> > > > On Tue, Oct 26, 2021 at 2:37 AM Ron Economos <[email protected]> wrote:
> >> > > > > On 10/25/21 8:54 AM, Geert Uytterhoeven wrote:
> >> > > > > > When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
> >> > > > > > the 4th CPU either fails to come online, or the system crashes.
> >> > > > > >
> >> > > > > > This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
> >> > > > > > and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
> >> > > > > > - unused core has hartid 0 (sifive,e51),
> >> > > > > > - processor 0 has hartid 1 (sifive,u74-mc),
> >> > > > > > - processor 1 has hartid 2 (sifive,u74-mc),
> >> > > > > > - processor 2 has hartid 3 (sifive,u74-mc),
> >> > > > > > - processor 3 has hartid 4 (sifive,u74-mc).
> >> > > > > >
> >> > > > > > I assume the same issue is present on the SiFive fu540 and fu740
> >> > > > > > SoCs, but I don't have access to these. The issue is not present
> >> > > > > > on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
> >> > > > > > hartid 0.
> >> > > > > >
> >> > > > > > arch/riscv/kernel/cpu_ops.c has:
> >> > > > > >
> >> > > > > > void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
> >> > > > > > void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
> >> > > > > >
> >> > > > > > void cpu_update_secondary_bootdata(unsigned int cpuid,
> >> > > > > > struct task_struct *tidle)
> >> > > > > > {
> >> > > > > > int hartid = cpuid_to_hartid_map(cpuid);
> >> > > > > >
> >> > > > > > /* Make sure tidle is updated */
> >> > > > > > smp_mb();
> >> > > > > > WRITE_ONCE(__cpu_up_stack_pointer[hartid],
> >> > > > > > task_stack_page(tidle) + THREAD_SIZE);
> >> > > > > > WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
> >> > > > > >
> >> > > > > > The above two writes cause out-of-bound accesses beyond
> >> > > > > > __cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.
> >> > > > > >
> >> > > > > > }
> >> >
> >> > > > https://riscv.org/wp-content/uploads/2017/05/riscv-privileged-v1.10.pdf
> >> > > > says:
> >> > > >
> >> > > > Hart IDs might not necessarily be numbered contiguously in a
> >> > > > multiprocessor system, but at least one hart must have a hart
> >> > > > ID of zero.
> >> > > >
> >> > > > Which means indexing arrays by hart ID is a no-go?
> >> > >
> >> > > Isn't that also similar on aarch64?
> >> > >
> >> > > On a rk3399 you get 0-3 and 100-101 and with the paragraph above
> >> > > something like this could very well exist on some riscv cpu too I guess.
> >> >
> >> > Yes, it looks like hart IDs are similar to MPIDRs on ARM.
> >>
> >> and they have the set_cpu_logical_map construct to map hwids
> >> to a continuous list of cpu-ids.
> >>
> >> So with hartids not being necessarily continuous this looks like
> >> riscv would need a similar mechanism.
> >>
> >
> > RISC-V already has a similar mechanism cpuid_to_hartid_map. Logical
> > cpu ids are continuous
> > while hartid can be sparse.
> >
> > The issue here is that __cpu_up_stack/task_pointer are per hart but
> > array size depends on the NR_CPUs
> > which represents the logical CPU.
> >
> > That's why, having a maximum number of hartids defined in config will
> > be helpful.
>
> I don't understand why we'd have both: if we can't find a CPU number for
> a hart, then all we can do is just leave it offline. Wouldn't it be
> simpler to just rely on NR_CPUS? We'll need to fix the crashes on
> overflows either way.

I'd think so.

The mhartid register is xlen big and as the spec says they don't have to be
contiguously, so it looks like hw-designers could adopt really "creative"
numbering.

So having a NR_HARTS won't really help, because there could be huge
gaps in numbering at some point.


2021-10-28 16:26:52

by Anup Patel

[permalink] [raw]
Subject: Re: Out-of-bounds access when hartid >= NR_CPUS

On Thu, Oct 28, 2021 at 8:39 PM Palmer Dabbelt <[email protected]> wrote:
>
> On Wed, 27 Oct 2021 16:34:15 PDT (-0700), [email protected] wrote:
> > On Tue, Oct 26, 2021 at 2:34 AM Heiko Stübner <[email protected]> wrote:
> >>
> >> Am Dienstag, 26. Oktober 2021, 10:57:26 CEST schrieb Geert Uytterhoeven:
> >> > Hi Heiko,
> >> >
> >> > On Tue, Oct 26, 2021 at 10:53 AM Heiko Stübner <[email protected]> wrote:
> >> > > Am Dienstag, 26. Oktober 2021, 08:44:31 CEST schrieb Geert Uytterhoeven:
> >> > > > On Tue, Oct 26, 2021 at 2:37 AM Ron Economos <[email protected]> wrote:
> >> > > > > On 10/25/21 8:54 AM, Geert Uytterhoeven wrote:
> >> > > > > > When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
> >> > > > > > the 4th CPU either fails to come online, or the system crashes.
> >> > > > > >
> >> > > > > > This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
> >> > > > > > and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
> >> > > > > > - unused core has hartid 0 (sifive,e51),
> >> > > > > > - processor 0 has hartid 1 (sifive,u74-mc),
> >> > > > > > - processor 1 has hartid 2 (sifive,u74-mc),
> >> > > > > > - processor 2 has hartid 3 (sifive,u74-mc),
> >> > > > > > - processor 3 has hartid 4 (sifive,u74-mc).
> >> > > > > >
> >> > > > > > I assume the same issue is present on the SiFive fu540 and fu740
> >> > > > > > SoCs, but I don't have access to these. The issue is not present
> >> > > > > > on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
> >> > > > > > hartid 0.
> >> > > > > >
> >> > > > > > arch/riscv/kernel/cpu_ops.c has:
> >> > > > > >
> >> > > > > > void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
> >> > > > > > void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
> >> > > > > >
> >> > > > > > void cpu_update_secondary_bootdata(unsigned int cpuid,
> >> > > > > > struct task_struct *tidle)
> >> > > > > > {
> >> > > > > > int hartid = cpuid_to_hartid_map(cpuid);
> >> > > > > >
> >> > > > > > /* Make sure tidle is updated */
> >> > > > > > smp_mb();
> >> > > > > > WRITE_ONCE(__cpu_up_stack_pointer[hartid],
> >> > > > > > task_stack_page(tidle) + THREAD_SIZE);
> >> > > > > > WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
> >> > > > > >
> >> > > > > > The above two writes cause out-of-bound accesses beyond
> >> > > > > > __cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.
> >> > > > > >
> >> > > > > > }
> >> >
> >> > > > https://riscv.org/wp-content/uploads/2017/05/riscv-privileged-v1.10.pdf
> >> > > > says:
> >> > > >
> >> > > > Hart IDs might not necessarily be numbered contiguously in a
> >> > > > multiprocessor system, but at least one hart must have a hart
> >> > > > ID of zero.
> >> > > >
> >> > > > Which means indexing arrays by hart ID is a no-go?
> >> > >
> >> > > Isn't that also similar on aarch64?
> >> > >
> >> > > On a rk3399 you get 0-3 and 100-101 and with the paragraph above
> >> > > something like this could very well exist on some riscv cpu too I guess.
> >> >
> >> > Yes, it looks like hart IDs are similar to MPIDRs on ARM.
> >>
> >> and they have the set_cpu_logical_map construct to map hwids
> >> to a continuous list of cpu-ids.
> >>
> >> So with hartids not being necessarily continuous this looks like
> >> riscv would need a similar mechanism.
> >>
> >
> > RISC-V already has a similar mechanism cpuid_to_hartid_map. Logical
> > cpu ids are continuous
> > while hartid can be sparse.
> >
> > The issue here is that __cpu_up_stack/task_pointer are per hart but
> > array size depends on the NR_CPUs
> > which represents the logical CPU.
> >
> > That's why, having a maximum number of hartids defined in config will
> > be helpful.
>
> I don't understand why we'd have both: if we can't find a CPU number for
> a hart, then all we can do is just leave it offline. Wouldn't it be
> simpler to just rely on NR_CPUS? We'll need to fix the crashes on
> overflows either way.,

For HSM ops, we can easily fix this limitation because the HART
start call has an opaque parameter which can be used to specify TP
and SP for the HART being brought up.

For spinwait ops, I don't see much value in fixing sparse hartid
problems so let's document this problem and have appropriate
checks in spinwait ops for out-of-bound array checks.

Regards,
Anup

2021-10-28 17:19:21

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: Out-of-bounds access when hartid >= NR_CPUS

On Thu, 28 Oct 2021 09:21:31 PDT (-0700), [email protected] wrote:
> On Thu, Oct 28, 2021 at 8:39 PM Palmer Dabbelt <[email protected]> wrote:
>>
>> On Wed, 27 Oct 2021 16:34:15 PDT (-0700), [email protected] wrote:
>> > On Tue, Oct 26, 2021 at 2:34 AM Heiko Stübner <[email protected]> wrote:
>> >>
>> >> Am Dienstag, 26. Oktober 2021, 10:57:26 CEST schrieb Geert Uytterhoeven:
>> >> > Hi Heiko,
>> >> >
>> >> > On Tue, Oct 26, 2021 at 10:53 AM Heiko Stübner <[email protected]> wrote:
>> >> > > Am Dienstag, 26. Oktober 2021, 08:44:31 CEST schrieb Geert Uytterhoeven:
>> >> > > > On Tue, Oct 26, 2021 at 2:37 AM Ron Economos <[email protected]> wrote:
>> >> > > > > On 10/25/21 8:54 AM, Geert Uytterhoeven wrote:
>> >> > > > > > When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
>> >> > > > > > the 4th CPU either fails to come online, or the system crashes.
>> >> > > > > >
>> >> > > > > > This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
>> >> > > > > > and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
>> >> > > > > > - unused core has hartid 0 (sifive,e51),
>> >> > > > > > - processor 0 has hartid 1 (sifive,u74-mc),
>> >> > > > > > - processor 1 has hartid 2 (sifive,u74-mc),
>> >> > > > > > - processor 2 has hartid 3 (sifive,u74-mc),
>> >> > > > > > - processor 3 has hartid 4 (sifive,u74-mc).
>> >> > > > > >
>> >> > > > > > I assume the same issue is present on the SiFive fu540 and fu740
>> >> > > > > > SoCs, but I don't have access to these. The issue is not present
>> >> > > > > > on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
>> >> > > > > > hartid 0.
>> >> > > > > >
>> >> > > > > > arch/riscv/kernel/cpu_ops.c has:
>> >> > > > > >
>> >> > > > > > void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
>> >> > > > > > void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
>> >> > > > > >
>> >> > > > > > void cpu_update_secondary_bootdata(unsigned int cpuid,
>> >> > > > > > struct task_struct *tidle)
>> >> > > > > > {
>> >> > > > > > int hartid = cpuid_to_hartid_map(cpuid);
>> >> > > > > >
>> >> > > > > > /* Make sure tidle is updated */
>> >> > > > > > smp_mb();
>> >> > > > > > WRITE_ONCE(__cpu_up_stack_pointer[hartid],
>> >> > > > > > task_stack_page(tidle) + THREAD_SIZE);
>> >> > > > > > WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
>> >> > > > > >
>> >> > > > > > The above two writes cause out-of-bound accesses beyond
>> >> > > > > > __cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.
>> >> > > > > >
>> >> > > > > > }
>> >> >
>> >> > > > https://riscv.org/wp-content/uploads/2017/05/riscv-privileged-v1.10.pdf
>> >> > > > says:
>> >> > > >
>> >> > > > Hart IDs might not necessarily be numbered contiguously in a
>> >> > > > multiprocessor system, but at least one hart must have a hart
>> >> > > > ID of zero.
>> >> > > >
>> >> > > > Which means indexing arrays by hart ID is a no-go?
>> >> > >
>> >> > > Isn't that also similar on aarch64?
>> >> > >
>> >> > > On a rk3399 you get 0-3 and 100-101 and with the paragraph above
>> >> > > something like this could very well exist on some riscv cpu too I guess.
>> >> >
>> >> > Yes, it looks like hart IDs are similar to MPIDRs on ARM.
>> >>
>> >> and they have the set_cpu_logical_map construct to map hwids
>> >> to a continuous list of cpu-ids.
>> >>
>> >> So with hartids not being necessarily continuous this looks like
>> >> riscv would need a similar mechanism.
>> >>
>> >
>> > RISC-V already has a similar mechanism cpuid_to_hartid_map. Logical
>> > cpu ids are continuous
>> > while hartid can be sparse.
>> >
>> > The issue here is that __cpu_up_stack/task_pointer are per hart but
>> > array size depends on the NR_CPUs
>> > which represents the logical CPU.
>> >
>> > That's why, having a maximum number of hartids defined in config will
>> > be helpful.
>>
>> I don't understand why we'd have both: if we can't find a CPU number for
>> a hart, then all we can do is just leave it offline. Wouldn't it be
>> simpler to just rely on NR_CPUS? We'll need to fix the crashes on
>> overflows either way.,
>
> For HSM ops, we can easily fix this limitation because the HART
> start call has an opaque parameter which can be used to specify TP
> and SP for the HART being brought up.
>
> For spinwait ops, I don't see much value in fixing sparse hartid
> problems so let's document this problem and have appropriate
> checks in spinwait ops for out-of-bound array checks.

Seems reasonable. That's the legacy method anyway, so hopefully vendors
will have moved to the new stuff by the time we get sufficiently sparse
hart IDs that this matters.

We should fix the crashes, though. Happy to take a patch, otherwise
I'll throw something together.

>
> Regards,
> Anup

2021-10-28 23:50:43

by Atish Patra

[permalink] [raw]
Subject: Re: Out-of-bounds access when hartid >= NR_CPUS

On Thu, Oct 28, 2021 at 10:16 AM Palmer Dabbelt <[email protected]> wrote:
>
> On Thu, 28 Oct 2021 09:21:31 PDT (-0700), [email protected] wrote:
> > On Thu, Oct 28, 2021 at 8:39 PM Palmer Dabbelt <[email protected]> wrote:
> >>
> >> On Wed, 27 Oct 2021 16:34:15 PDT (-0700), [email protected] wrote:
> >> > On Tue, Oct 26, 2021 at 2:34 AM Heiko Stübner <[email protected]> wrote:
> >> >>
> >> >> Am Dienstag, 26. Oktober 2021, 10:57:26 CEST schrieb Geert Uytterhoeven:
> >> >> > Hi Heiko,
> >> >> >
> >> >> > On Tue, Oct 26, 2021 at 10:53 AM Heiko Stübner <[email protected]> wrote:
> >> >> > > Am Dienstag, 26. Oktober 2021, 08:44:31 CEST schrieb Geert Uytterhoeven:
> >> >> > > > On Tue, Oct 26, 2021 at 2:37 AM Ron Economos <[email protected]> wrote:
> >> >> > > > > On 10/25/21 8:54 AM, Geert Uytterhoeven wrote:
> >> >> > > > > > When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
> >> >> > > > > > the 4th CPU either fails to come online, or the system crashes.
> >> >> > > > > >
> >> >> > > > > > This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
> >> >> > > > > > and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
> >> >> > > > > > - unused core has hartid 0 (sifive,e51),
> >> >> > > > > > - processor 0 has hartid 1 (sifive,u74-mc),
> >> >> > > > > > - processor 1 has hartid 2 (sifive,u74-mc),
> >> >> > > > > > - processor 2 has hartid 3 (sifive,u74-mc),
> >> >> > > > > > - processor 3 has hartid 4 (sifive,u74-mc).
> >> >> > > > > >
> >> >> > > > > > I assume the same issue is present on the SiFive fu540 and fu740
> >> >> > > > > > SoCs, but I don't have access to these. The issue is not present
> >> >> > > > > > on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
> >> >> > > > > > hartid 0.
> >> >> > > > > >
> >> >> > > > > > arch/riscv/kernel/cpu_ops.c has:
> >> >> > > > > >
> >> >> > > > > > void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
> >> >> > > > > > void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
> >> >> > > > > >
> >> >> > > > > > void cpu_update_secondary_bootdata(unsigned int cpuid,
> >> >> > > > > > struct task_struct *tidle)
> >> >> > > > > > {
> >> >> > > > > > int hartid = cpuid_to_hartid_map(cpuid);
> >> >> > > > > >
> >> >> > > > > > /* Make sure tidle is updated */
> >> >> > > > > > smp_mb();
> >> >> > > > > > WRITE_ONCE(__cpu_up_stack_pointer[hartid],
> >> >> > > > > > task_stack_page(tidle) + THREAD_SIZE);
> >> >> > > > > > WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
> >> >> > > > > >
> >> >> > > > > > The above two writes cause out-of-bound accesses beyond
> >> >> > > > > > __cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.
> >> >> > > > > >
> >> >> > > > > > }
> >> >> >
> >> >> > > > https://riscv.org/wp-content/uploads/2017/05/riscv-privileged-v1.10.pdf
> >> >> > > > says:
> >> >> > > >
> >> >> > > > Hart IDs might not necessarily be numbered contiguously in a
> >> >> > > > multiprocessor system, but at least one hart must have a hart
> >> >> > > > ID of zero.
> >> >> > > >
> >> >> > > > Which means indexing arrays by hart ID is a no-go?
> >> >> > >
> >> >> > > Isn't that also similar on aarch64?
> >> >> > >
> >> >> > > On a rk3399 you get 0-3 and 100-101 and with the paragraph above
> >> >> > > something like this could very well exist on some riscv cpu too I guess.
> >> >> >
> >> >> > Yes, it looks like hart IDs are similar to MPIDRs on ARM.
> >> >>
> >> >> and they have the set_cpu_logical_map construct to map hwids
> >> >> to a continuous list of cpu-ids.
> >> >>
> >> >> So with hartids not being necessarily continuous this looks like
> >> >> riscv would need a similar mechanism.
> >> >>
> >> >
> >> > RISC-V already has a similar mechanism cpuid_to_hartid_map. Logical
> >> > cpu ids are continuous
> >> > while hartid can be sparse.
> >> >
> >> > The issue here is that __cpu_up_stack/task_pointer are per hart but
> >> > array size depends on the NR_CPUs
> >> > which represents the logical CPU.
> >> >
> >> > That's why, having a maximum number of hartids defined in config will
> >> > be helpful.
> >>
> >> I don't understand why we'd have both: if we can't find a CPU number for
> >> a hart, then all we can do is just leave it offline. Wouldn't it be
> >> simpler to just rely on NR_CPUS? We'll need to fix the crashes on
> >> overflows either way.,
> >
> > For HSM ops, we can easily fix this limitation because the HART
> > start call has an opaque parameter which can be used to specify TP
> > and SP for the HART being brought up.
> >

Sounds good to me. I will try to send a patch.

> > For spinwait ops, I don't see much value in fixing sparse hartid
> > problems so let's document this problem and have appropriate
> > checks in spinwait ops for out-of-bound array checks.
>
> Seems reasonable. That's the legacy method anyway, so hopefully vendors
> will have moved to the new stuff by the time we get sufficiently sparse
> hart IDs that this matters.

I hope so too. At least documenting this bug would be useful for
anybody trapping this bug while
using older spinwait methods. It will be an excuse for them to move to
shiny & better booting methods.

>
> We should fix the crashes, though. Happy to take a patch, otherwise
> I'll throw something together.
>
> >
> > Regards,
> > Anup



--
Regards,
Atish